Monday, September 23, 2019

A Character Display

The C640 showing off its character display.  It says
*** C640 COMPUTER SYSTEM ***
2MB RAM SYSTEM  38911 BASIC BYTES FREE
That took a lot longer than I wanted it to.

Last time, we had VIC displaying the contents of RAM as a bitmap, and a dummy CPU component copying ROM into RAM.  It sort of worked, but I wasn't happy with the SRAM interface, which was writing corrupted data to memory.

A little tweaking of the memory timing - what happens on which clock phases - fixed the memory problems.  I still wasn't comfortable with it: why did it work stop working with some signals delayed 6ns, when everything was happening at half the speed that should have worked?  But never mind.  It worked, and I was eager to move on to the character display.

This required getting a number of things to work.  First up, VIC doesn't have enough memory bandwidth to read all the information it needs (well, it does in the 320x200 mode I'm using here.  It wouldn't at higher resolution).  The Commodore 64 would stop the CPU every 8 lines so VIC could fetch character pointers from screen memory at $0400 into an internal buffer.  It could then use those pointers to create addresses for bitmap data, which is read on every line.

So the C640 needs DMA.  It needs to be able to pause the CPU, allow VIC to use the CPU's half of the cycle to access memory, have an internal buffer to store it, and create addresses from it to fetch bitmap data.  There's a fairly long pipeline there, and the sequence must start early enough that the bitmap data is ready for display before the border ends.

I didn't think to take any screenshots of the early attempts.  They weren't pretty, and it took weeks of not-at-all-intensive debugging to fix all the problems.

The first one was that SRAM writes immediately broke.  Thinking that it was obviously a timing problem, and the extra logic I'd added had pushed something past its limit, I dug through Xilinx's documentation and discovered the trce tool for generating a timing report after place and route.  That's the only time that can be expected to give accurate results, as a significant part of the total delay is the time it takes to get a signal from one part of the FPGA to another.

The report revealed a large number of timing violations, mostly in the clock enable signals.

mclk, system clock phase, and some of the C640's clock enables.
This is from a later (working) version of the design, so there are only 16 clock phases
Since I didn't want the extra complexity of dealing with multiple clocks, I'm using a single clock (160MHz at this point, and called 'mclk') with enable signals to tell various parts of the design when they should pay attention to it.  Most enables are only active one in every 5MHz system cycle.  To make them active at the right times, I have a "system clock phase" counter, which says how far through the 5MHz cycle this particular 160MHz clock pulse is.  So the clock phase will be active around the rising edge of mclk (which is the edge that everything else is latched on), this counter is incremented on the falling edge of mclk.

That means there's 3.125ns to increment the counter, combine it with whatever other logic is required for the clock enable in question, and get the result to the clock enable input of the register.  The Spartan 6 is fast, but it's not that fast.  Many clock enables were arriving too late.

So it's back to an 80MHz clock, and this time the memory controller uses both edges to generate control signals for the SRAM.  Memory access is now rock solid, and the timing report has no constraint violations.

There then followed far too much fiddling around trying to get the right sequence of actions to make DMA work.  For weeks I had an almost correct display, but there was always something wrong.  The first character on each row would be duplicated, the last character would appear at the start, the first character would have bitmap data from a different character displayed on its first line, ... my poor software brain was at its limit trying to deal with a system where everything happens simultaneously, but everything must happen in exactly the right sequence.

But, as you can see, I finally got there.  There is now a working character display, and I feel that I'm starting to get the hang of this FPGA thing.

Next, it's time to start on the CPU.  I can see significant failure ahead.