Tuesday, August 14, 2018

SAD Graphics

Last time we were left with a problem: too much graphics data.  The original Commodore 64 has 8 bits of bitmap data from system RAM each cycle.  The C640 doubles this by using the colour RAM interface, and doubles it again by having 16 bits per byte.  That gives us 4 bits per pixel in standard mode, and 8 bits in multicolour.

It would be nice to use the extra data to increase resolution, but I haven't been able to think of a good way of doing that.  We want to keep old behaviour when the new bits are 0, and try to avoid adding new modes.

My solution is "Store and Display" graphics.  This is inspired by the Amiga's HAM mode, and the Apple IIGS's fill mode.

In standard mode, we have four bits per pixel: ABCD.  The low bit (D) is stored in the low 8 bits of system RAM, C is in the high 8 bits of system RAM, B in the low 8 bits of colour RAM, and A in the high 8 bits of colour RAM.

CD selects the colour for this pixel, as described before.  A and B control store and display.

  • 00CD: store CD in register C0, display CD
  • 01CD: store CD in register C1, display CD
  • 10CD: display register C0
  • 11CD: display register C1
(I might change the last two to xor the contents of the colour register with CD, depending on how useful it is in practice.  I'm also tempted to add some new registers to set the initial values of C0 and C1 on each line)

If we clear the screen to 1000 with a column of 0000 down the left hand side, that gives us a screen full of the background colour 00 (register $d021).  If we set a single pixel to 0001, then it will display colour 01, and also store 01 in register C0.  Then on the rest of that line, every pixel will be displaying the colour from C0, which is now 01.  By changing a single byte, we have drawn a horizontal line.

If we didn't want to draw to the end of the line, we could write 0000 to a pixel to set C0 back to colour 00.

This gives us the ability to draw single-colour filled polygons by drawing only the pixels along the left and right edges.  But so far that's only using C0, not C1.

Now clear the screen to a checkerboard pattern of 1000 and 1100, ensuring that C0 and C1 are both set to 00 at the start of each line.  This gives us a full screen of the background colour.  If we set two adjacent pixels to 00xx and 01yy, then C0 and C1 will be set to xx and yy respectively, and the rest of the line will alternate between those two colours.  This lets us draw dithered filled polygons, as in the teaser picture from the last post:




Multicolour mode has 8 bits per pixel: ABCDEFGH.  These are stored as follows:

  • GH: system RAM low
  • EF: system RAM high
  • CD: colour RAM low
  • AB: colour RAM high
If A is 0, then BCD is used to select one of eight C registers (C0 to C7), and EFGH is both stored and displayed.  If A is 1, then the contents of the selected register will be displayed instead.  This is the same as standard resolution, but with many more colours to play with.

It probably won't be useful to store split-pixel 'colours' in the C registers, but if there is a use for it, it will work.


Currently, the simulator ignores the low bits when the top bit is 1 (when it is displaying the contents of a C register rather than storing a colour into it).  That feels a bit wasteful.  If those bits were xored with the contents of the displayed register, that would allow overlaying images onto a filled polygon without interfering with the rest of it.  This might be useful.  But it might end up too complicated to actually use.  That decision will have to wait until I've written some more test software.  And that will have to wait, as I've put the simulator aside for now, and moved development onto much more interesting things...

Sunday, July 22, 2018

A Better VIC

The 6567 "VIC II" chip in the Commodore 64 has two memory interfaces.  One has 14 address bits and 8 data bits, and is connected to system RAM.  It does a fetch of bitmap data in the first half of every cycle, and a fetch of character pointers (or colours in bitmap mode) in the second half of the cycle during DMA (usually every 8 lines).

The other interface has 10 address bits, shared with the first interface, and 4 data bits.  It is connected to the 1Kx4 colour RAM.  Both interfaces fetch data at the same times, but only the DMA data from the colour RAM is used.

If the 6567 had a few extra pins, and RAM hadn't been so expensive at the time, colour RAM could have been extended to 16Kx8.  It would need new register similar to $D018 to control the high bits of its address.  Then we can map the bitmap and DMA data to different parts of colour RAM, and use both.

For compatibility we will need another mode bit in one of the control registers, that will force bitmap reads of colour RAM to 0.  As always, we want everything to behave the same as a standard Commodore 64 if the extra bits are 0.

This gives us twice as many bits per pixel, and an extra 4 bit colour per character cell.

Standard resolution now has two bits per pixel.  The low bit comes from system RAM, the high bit from colour RAM.  Colours could be allocated as follows:

  • 00: Low 4 bits of $D021
  • 01: Low 4 bits of colour RAM DMA data
  • 10: High 4 bits of $D021
  • 11: High 4 bits of colour RAM DMA data
This is not the final allocation - there's a big change coming soon!

In multicolour mode, we now have four bits per double-width pixel.  The low two come from system RAM, the high two from colour RAM.  We could assign each of the 16 values a different colour source (two from colour RAM, and the rest from global colour registers like $D021), but in a system with only 16 colours in total that doesn't feel like a good fit.

Instead, we assign the first 10 values to various colours, and the remaining 6 to pairs of colours.  Each pixel can then choose a different colour for its left and right halves:
  • 1010 Left = $D021  Right = colour low
  • 1011 Left = $D021  Right = colour high
  • 1100 Left = colour low  Right = $D021
  • 1101 Left = colour high  Right = $D021
  • 1110 Left = colour low  Right = colour high
  • 1111 Left = colour high  Right = colour low
That gives us the best of both modes.  We have a choice of 10 colours for double-width pixels, with 3 colours for high resolution detail, all in the same graphics mode.

Similar things can be done with bitmap graphics and sprites.

But wait...

In the C640, every byte contains 16 bits.  That doubles the amount of data available again.  For the DMA data, that's easy to use.  We can extend character pointers to allow more than 256 characters, and add flags to mirror characters horizontally or vertically.  And we can have 5 bit values in colour RAM, giving a 32 colour palette, and have three available in each character cell.

But what can we do with the extra bitmap data?  Here's a clue

Saturday, July 7, 2018

Some 65020 code

Since the simulator can now handle C64 bitmap graphics, I thought it would be good to write some graphics primitives in proper 65020 code, using the new CPU features.


My little test program starts by entering graphics mode and clearing the screen
sbt #5, $d011
lda #$18
sta $d018
lda #$15
bra.l clear
That first instruction is actually one of the old ones with a new addressing mode.  The 6502 has instructions to set and clear flags within the status register.  The 65020 uses four of the extension bits to select different destination registers (for the original opcodes, which are now reinterpreted as acc mode), or an index register for an indexed addressing mode.  Three others are combined with the bit number implied in the original opcode's flag to select any of the 32 bits in the word.  SBT #5 gets assembled as a variant of SEI.

The other new feature here is the bra.l instruction.  This is BEQ, with the P bit set to select a different condition (in this case, it becomes "always").  The .l modifier tells it to push the current value of PC to the stack before branching.  This turns it into a JSR-like instruction, with relative addressing.  Branch instructions also have four bits to select a base register, if you want to branch relative to something other than PC.  We'll see a use for that later.

The clear routine is straighforward.  The only thing worth noting is that wider registers make it a lot less wordy than the original 6502
clear
ldx.w #1000
clear1
sta $0400,x
dex.w
bpl clear1
ldx.w #8000
lda #0
clear2
sta $2000,x
dex.w
bpl clear2
rts
One of the annoyances of Commodore 64 bitmap graphics is the way that the bitmap is arranged in memory.  Most hardware does this linearly, with the first N bytes representing pixels in the first line, from left to right, the next N bytes being the second line, and so on.  The Commodore 64 does things differently, as a result of re-using some of the hardware that handles the character screen.  The first byte represents the first 8 pixels in the first line (line 0).  But then the second byte is the first 8 pixels of line 1.  This continues, with byte 7 being the start of line 7.  Then byte 8 jumps back up to the second set of 8 pixels on line 0.  This continues until byte 319, which is the last 8 pixels of line 7.  The next 320 bytes repeat the same pattern 8 lines lower.

An essential function of any graphics code is going to be a routine to map pixel coordinates to the address of a byte, and a mask for the right bit within that byte.  Here's the 65020 way:
getPixelAddr
phx.l x1
; a3 = y & 7
mov.w a0, x1
and a0, #7
; x2 = y/8
lsr.w #3, x1
; addr = 320*(y/8) = 256*(y/8) + 64*(y/8)
mov.w a1, x1
asl.w #2, a1
add.w a1, x1
asl.w #3, a1
asl.w #3, a1
; addr = 40*(y/8) = 32*(y&~8) + 8*(y&~8)
; addr += low bits of y
add.w a1, a0
; a = 1<<(x&7)
mov.w a0, x0
and a0, #7
; pre-subtract low bits of x (so adding later only adds the high bits)
sub.w a1, a0
lda a0, bitmasks, a0
; addr += high bits of x
add.w a1, x0
plx.l x1
rts
bitmasks
.byte 128, 64, 32, 16, 8, 4, 2, 1
getPixelAddr takes the x coordinate in X0 and the y coordinate in X1.  It returns the offset within the bitmap in A1, and the bit mask for the pixel in A0.

There are a few things worth noting here.  The 65020 has 12 main registers (A0-A3, X0-X3, Y0-Y3).  The A and X registers are almost completely interchangeable.  The Y registers can be used as the destination of ALU instructions, but not the source.  It took a few false starts, but I've eventually settled on a convention for register use that I think might suffice for the future.  X registers are parameters, and their values are not changed by the routine.  A registers are result, and their values can be changed.  Y registers are for pointers, and I'm not sure whether they should be saved by routines or not.  They probably should be.

So getPixelAddr starts by saving the value of the one X register that it modifies, and ends by restoring it.  The rest is the usual sort of bit-fiddling that you expect to see in code like this.  I originally wrote it to calculate the mask first, then multiply y/8 by 320, then add all the components together.  A bit of re-arranging followed to make it use fewer registers, and also to work around the awkwardness of the two-operand instructions.  Being able to have a destination register different from the sources is a very useful feature if you're writing code by hand, but the 65020 just doesn't have enough opcode bits to do it.

The 2 bit constant in acc mode instructions like ASL is occasionally useful.  But you can see here I needed a shift by 6 bits, which doesn't quite fit and has to be done in two batches of 3.

getPixelAddr is a lot shorter and faster than the equivalent in 8 bit 6502 code.  But it's not the sort of thing you want to call more often than necessary.  For drawLine we'd like to call it once at the start of the line, and then use simpler modifications of the bit mask and address to move from the current pixel to one of its neighbours.  drawLine uses four helper routines to move right, left, down, and up:
drawLine_incX
rrb a0
bcc drawLine_incX_exit
add.w a1, #8
drawLine_incX_exit
rts
 
drawLine_decX
rlb a0
bcc drawLine_decX_exit
sub.w a1, #8
drawLine_decX_exit
rts
 
drawLine_incY
inx.w a1
mov a2, a1
and a2, #7
bne drawLine_incY_skip
add.w a1, #312
drawLine_incY_skip
rts
 
drawLine_decY
mov a2, a1
and a2, #7
bne drawLine_decY_skip
sub.w a1, #312
drawLine_decY_skip
dex.w a1
rts
To move right or left, I use the RRB and RLB instructions.  These are similar to the old ROR and ROL instructions, but instead of rotating a 9 bit value including the carry flag, they rotate within the 8, 16, or 32 bits selected by the instruction width (here it's 8 bit).  RRB A0 shifts the low 8 bits of A0 one bit to the right.  The old right-most bit gets shifted into bit 7, and also copied to the C flag.  Most of the time we can move right or left by just doing this rotate.  If C is clear after it, nothing more needs to be done.  If C is set, that means we've stepped outside this byte and need to move on to the next.  Because we're using RRB and RLB instead of ROR and ROL, the bit that was shifted out is already shifted in to the other end, and all we need to do is add or subtract 8 from the offset.  That's done with the ADD instruction, which is an add without carry in.  It's the old ADC instruction with the P bit of the extension set.

Now for drawLine itself.  First, a bit of set-up
drawLine
phx.l x0
phx.l x1
phx.l x2
phx.l x3
ldy.l y2, #drawLine_incX
sbx.w x2, x0 ; x2 = dx
bpl drawLine_noswap
ldy.l y2, #drawLine_decX
lda a3, #0
sub.w a3, x2
mov.w x2, a3
drawLine_noswap
bra.l getPixelAddr
ldy.l y3, #drawLine_incY
; get displacement of endpoint
sbx.w x3, x1 ; x3 = dy
bpl drawLine_ypositive
; y negative
ldy.l y3, #drawLine_decY
lda a3, #0
sub.w a3, x3
mov.w x3, a3
drawLine_ypositive
mov.w y0, x2 ; loop count = dx x major
cpx.w x2, x3
bgt drawLine_xmajor
mov.w y0, x3 ; loop count = dy y major
; swap x2, x3
mov.w a2, x2
mov.w x2, x3
mov.w x3, a2
; swap y2, y3
mov.w a2, y2
mov.w y2, y3
mov.w y3, a2
drawLine_xmajor
mov.w a3, x2 ; a3 = error
sub.w a3, x3
adx.w x2, x2 ; x2 = 2dx (dy)
adx.w x3, x3 ; x3 = 2dy (dx)

drawLine takes (x1, y1) in X0 and X1, (x2, y2) in X2 and X3.

Save the X registers that are changed, then calculate load Y2 with the address of the appropriate vertical movement helper: if y2 > y1 we increment y, otherwise we decrement.  Then we load Y3 with the helper routine for horizontal movement.  Increment x if x2 > x1, otherwise decrement.

As a side-effect of those tests, we've also calculated dx = abs(x2-x1) and dy = abs(y2-y1) and put them in X2 and X3.  If dy > dx, we need to swap dx and dy, and the pointers to the helper routines in Y2 and Y3.

Finally, we initialise a3 to the difference between dx and dy, and double X2 and Y2.  Now we're ready to draw
drawLine_loop
mov a2, a0
ora a2, $2000, a1
sta a2, $2000, a1
bra.l 0,y2
sub.w a3, x3
bpl drawLine_skip
bra.l 0,y3
add.w a3, x2
drawLine_skip
dey.w y0
bne drawLine_loop
plx.l x3
plx.l x2
plx.l x1
plx.l x0
rts
Set the current pixel, call the helper routine that Y2 points to (which will step right or left for x-major lines, down or up for y-major).  Then subtract X3 from A3 (this will be double either dx or dy).  If A3 goes negative, we need to step on the other axis and add X2.  Do this in a loop, and the whole line is drawn.  All that's left is to restore the registers we saved at the start and return.

BRA.L 0, Y2 uses a different base register to call a routine whose address is stored in Y2.  Normally, a branch instruction will add an offset to PC and jump to that address.  Here, the offset is 0, and the base is stored in Y2.  This is a very useful feature.

Branches have another extension bit, which I haven't used yet.  This enables indirection.  Instead of register+offset pointing to the branch destination directly, indirection makes it point to a memory location that contains the address of the destination.  This allows tables of function pointers, which could be used to implement virtual functions in a language like C++.  If Y0 points to an object, we might say
ldy y1, (0,y0)
bra.il 10, y1
Y0 points to the the start of the object.  Its first word (two 16-bit bytes) is a pointer to the virtual table.  LDY Y1, (0, Y0) loads this pointer into Y1.  The next instruction adds 10 to this pointer, loads the address of the routine we want to call, and calls it.

Throughout drawLine, I'm using the MOV instruction to copy values from one register to another.  This isn't a real instruction, but gets translated by the assembler into the appropriate transfer instruction.  A move from an X register to an A register, for example, will become TXA.  The existing transfer instructions cover a subset of moves between A, X, Y, and S.  Two extension bits for each of the source and destination give us A0-A3, X0-X3, Y0-Y3, and P, Z, SP, or PC.  Another extension bit for each of source and destination changes the group.  A becomes Y, Y becomes A, X becomes S, and S becomes X.  This gives a complete set of moves from any register to any other register, but the encoding is complicated enough that it's best to leave it to the assembler.

I think the same applies to the ALU instructions.  I currently have different instructions for ADD, ADX, and ADY, which do the same operation (add) on A, X, or Y registers.  Since I'm always explicitly specifying the register, it would be much nicer to just say ADD for all of them, and let the assembler choose the right opcode.

The drawLine routine has also highlighted some missing instructions.  NEG (negate) and ABS (absolute value) would be very useful.  LEA (load effective address, which loads the address specified by an addressing mode into a register, so it can be used later) would also be useful.  I also need to add variants of the shift and rotate instructions that allow a variable shift amount.  And drawLine would benefit from a SWP instruction to swap the contents of two registers (or possibly a register and a memory location).  I'm not sure about that one - it would require adding another write port to the register file, just for one instruction.  It's probably not worth it.

The image at the top was created by a simple test program.  Most of it isn't worth describing, but the pattern in the bottom left brought up one surprise
; Drawing lines in each octant
ldy y0, #0
octantLoop
ldx.w x0, octantLines,y0
cpx.w x0, #$ffff
beq octant_done
ldx.w x1, octantLines+1,y0
ldx.w x2, octantLines+2,y0
ldx.w x3, octantLines+3,y0
phy y0
bra.l drawLine
ply y0
iny #4, y0
bra octantLoop
octant_done
This tests the drawLine routine, getting it to draw a line in each octant.  The surprise wasn't in the code, but in the table of coordinates
octantLines
.byte 10, 180, 40, 190
.byte 40, 190, 50, 160
.byte 50, 160, 20, 150
.byte 20, 150, 10, 180
.byte 20, 190, 50, 180
.byte 50, 180, 40, 150
.byte 40, 150, 10, 160
.byte 10, 160, 20, 190
.byte $ffff
These are all 16 bit values (even though most of them are less than 256), but they're being assembled with .byte.  I had originally used .word, but of course that won't work.  For compatibility with existing source code, .word has to assemble values to two bytes.  Since bytes now contain 16 bits, it splits a 16 bit value into two 8 bit parts, and puts one part in the low 8 bits of each of two bytes.  That's not what you want when you're writing new 65020 code.  The .w modifier on instructions tells the CPU to use all 16 bits from the byte in question.  So .byte has to deal with 16 bit values.  .long will handle 32 bit values, putting them in two adjacent bytes.  .word will very rarely be used in new code.


Wednesday, June 27, 2018

Getting Commodore 64 BASIC to run

The plan was to use the source for the Commodore 64's ROM (https://github.com/mist64/cbmsrc) as a test for the assembler.  If I could get a 100% matching binary, then I could be reasonably confident that the assembler was working correctly, for 8-bit code at least.  I could then start optimising parts of it to get a feel for how well the extended instruction set works, which bits are useful, and what is missing.

This plan immediately ran into a problem, which in hindsight should have been obvious.

With the 65020, a byte contains 16 bits.  Every address used by the Commodore 64 ROM is (at most) 16 bit.  So instructions that need two byte operands in the original get assembled to only one byte.  It doesn't take long for the binary to get out of sync.

However, this approach was still good enough to find and fix a number of simple assembler bugs.  Feeding the result to the simulator gave this:

Commodore 64 start-up screen, "64K RAM SYSTEM  51199 BASIC BYTES FREE"
Great!  But isn't it supposed to be 38911 BASIC BYTES FREE?  It turns out that BASIC starts with a memory test.  It checks every location until it finds something that isn't RAM, and assumes it can use all of it.  My simulator didn't distinguish ROM from RAM, so it kept going until it hit I/O space at $D000.  Write-protecting the ROMs at $A000-BFFF and $E000-$FFFF fixed this.

Now we've got a fully working BASIC.  Except we don't.  It's not much use if you can't type programs in and run them.  So I added some I/O handler code to convert Windows keyboard scan codes into the Commodore 64 keyboard matrix, and return the appropriate values when $DC00 and $DC01 were accessed.

Now we can run a real test

Commodore 64 start-up screen with the program 10 PRINT"HELLO WORLD", followed by ?SYNTAX ERROR
Entering anything would give a syntax error.  That's not how it's supposed to go.

The nice thing about trying things out in a software simulator rather than jumping straight to hardware is that you can create useful debugging tools.  The simulator already had an instruction trace feature, printing out every instruction that is executed, along with the contents of some of the registers (PC, SP, P, A0, X0, Y0).  This quickly revealed the problem.

BASIC uses a small routine called CHRGET, which is copied into zero-page memory.  Here's the important part:
INITAT  INC CHRGET+7
        BNE CHDGOT
        INC CHRGET+8CHDGOT
        LDA 60000
It uses self-modifying code to increment the two bytes of the address (60000 is just the initial value in the source code, used to force absolute addressing mode.  It gets set to different values later).  But on the 65020, 60000 is a one-byte quantity.  That LDA gets assembled with zero page addressing.

Another easy fix, if a little hacky:
CHDGOT  .BYTE $AD, $00, $00
Now I can type in programs.  But it's annoying to type them in every time.  I need to be able to save programs and load them back in.  Again, this is the advantage of a software simulator.  I don't have to emulate Commodore's serial bus and floppy drive, or the tape drive, or anything like that.  I can simply replace the LOAD and SAVE kernal routines with STA LOADTRIGGER and STA SAVETRIGGER, writing to unused I/O locations.  The I/O handler traps these, reads register values from the CPU, and loads or saves chunks of memory to or from a regular file on my PC.

Now, some more tests.  PRINT 1+1 says 2.  PRINT 2*2 says 4.  PRINT 1/10 says 5.95173333E-09

What's going on there?  The floating point divide routine builds the result bit-by-bit.  It starts with A set to 1, and shifts in bits of the result one at a time.  When the 1 bit gets shifted out, the partial result is written to a temporary buffer.  Here's the code that does this:
        LDX #253-ADDPRC
        LDA #1DIVIDE        ... ; do the compare
SAVQUO  PHP
        ROL A
        BCC QSHFT
        INX
        STA RESLO,X
        BEQ LD100
RESLO is the last byte of the buffer.  X is initialised to -4, so the first byte is written to the start of the buffer.  It is incremented each time a byte is written, and when it reaches 0 the loop ends.  This relies on RESLO being in page 0, and zero-page indexed addressing wrapping if RESLO+X is greater than 255.  That's the main incompatibility between the 6502 and 65020.  On the 65020, indexing never wraps.  The solution again is a small change to the source:
        LDX.L #$FFFFFFFD-ADDPRC
        ...
        INX.L 
65020 indexing always uses all 32 bits of the register, so we must load X0 with a 32 bit version of -4, and do a 32 bit increment.

The simulator isn't meant to be a full Commodore 64 emulator, but I couldn't resist adding support for bitmap mode.  Typing in the example from the Programmers Reference Guide gives me this
A "high resolution" sine curve in black, on a cyan background
That's a good place to stop for now.  I have a collection of Commodore's public domain software as a set of .d64 files.  I plan to extract the individual programs and use them as further tests.  I want to play ARTILLERY again!  After that, I'll finally get back to the plan, and start optimising parts of the ROM using the 65020's extended features.  And then, finally, the FPGA.

Sunday, June 17, 2018

The 65020

The original idea for the 65020 came about 30 years ago, as a reaction to the 65816's mode bits.  The 65816 is a better 6502, but it's not a very satisfying processor.  I wanted to do better. 

The key idea in the 65020 is the extension of bytes to 16 bits.  That immediately opens up a lot of possibilities.  Instructions have 8 more bits to specify data width, extra registers, and other operations.  16 bit addresses become 32 bits.

The intention is that if the top 8 bits of every byte are all zero, then it will behave exactly like a 6502.  So far I've managed to do that, with a few minor exceptions: the stack doesn't wrap, and zero-page indexing doesn't wrap either.

Instruction set

Many of the gaps in the 6502's opcode map have been filled with new instructions, and new addressing modes for old instructions:



New opcodes are marked in gray.

  • ACX, ACY: Add with carry, with an X or Y register as destination
  • SCX, SCY: Subtract with carry, with an X or Y register as destination
  • ANX, ANY: Logical and, with an X or Y register as destination
  • EOX, EOY: Logical exclusive-or, with an X or Y register as destination
  • ORX, ORY: Logical or, with an X or Y register as destination
  • MUL, DIV, MOD: Multiply, divide, and mod operations
  • SQR: Square root (if floating point is ever implemented)
  • PHX, PHY: Push an X or Y register to the stack
  • PLX, PLY: Pull an X or Y register from the stack
  • SEV: Set overflow
  • CBT: Clear bit
  • SBT: Set bit


Registers

There are 16 registers:
  • 0-3: A0, A1, A2, A3
  • 4-7: X0, X1, X2, X3
  • 8-11: Y0, Y1, Y2, Y3
  • 12-15: P, Z, SP, PC
P is the processor status register.  If used as an index, a constant zero is used instead.  This explains the abs,0 and zp,0 addressing modes in the opcode table above.  They are now indexed modes, but use P as the default index register.  Selection bits in the opcode extension allow other Y0-Y3, Z, SP, or PC to be used instead.  This provides stack- and PC-relative addressing modes for most instructions.

All registers are 32 bits wide.  In most cases, writing to the low 8 or 16 bits will clear the rest of the register.  The exception is writing to the low 8 bits of SP.  For compatibility, this will set the high 24 bits to $000001.

Extension

Most of the new features come through the opcode extension.  There are a few formats for these, depending on the instruction and addressing mode
  • PRRXXXDD
    • Used by most instructions.
    • P selects the alternate operation
    • RR selects the destination register, or a small constant (1-4)
    • XXX selects the index register or source register
    • DD is the operation width. 00 for 8 bit, 01 for 16 bit, 10 for 32 bit, 11 for floating point (where that makes sense)
  • PAAARRRR
    • Used by bit-set and -clear instructions
    • AAA is combined with the base bit from the instruction to form the bit number
    • RRRR is the destination register
  • SSSRRRDD
    • Used by register-move instructions
    • SSS selects the source register
    • RRR selects the destination register
    • DD is the width
  • WNNNVVVV
    • Used by the BRK instruction
    • W enables waiting for an external interrupt
    • NNN selects the external interrupt to wait for
    • VVVV selects the interrupt vector: $0000fffe - 2*VVVV
  • CDILRRRR
    • Used by branch instructions
    • C selects a different set of conditions (BGE, BLT, BLE, BGT, BLS, BHI, BNV, BRA).  BNV is "never", BRA is "always"
    • I enables indirection.  If it is 0, the target address is base register + offset.  If it is 1, then base register + offset points to a memory location containing the target address
    • L enables subroutine calls.  If it is 1, the current PC will be pushed before the branch is taken
    • RRRR is the base register
The register selection fields in opcode extensions don't encode the register number directly.  Instead, each instruction has a default register (to give standard 6502 behavior when the extension is zero).  The register selection field is xored into the lower bits of this register number.

The 6502's flag-clearing and -setting instructions have been generalised to clear or set any bit of any register.  These general instructions as called CBT and SBT.  So CLC is CBT 0, CLI is CBT 2, and so on.  The AAA bits from the extension are shifted up two places, with the lowest bit copied to fill the new ones.  The resulting value is xored into the bit selected by the instruction.  This gives access to all 32 bits.

Similarly, the register-transfer instructions TAX, TYA, and so on, have been generalised to copy any register to any other.  By xoring the instruction's register number with the bits in the extension, all combinations of source and destination register are covered.

Increment, decrement, and the shift/roll instructions have a small constant encoded instead of a destination register.  This makes it possible to add or subtract numbers up to 4 in a single one-byte (16 bit) instruction.  For the acc mode, these instructions use the index register selection to specify the register.

Alternative operations

Many instructions use the 'P' bit in the extension to select a different operation
  • ADC, ACX, ACY -> ADD, ADX, ADY Add without carry
  • SBC, SCX, SCY -> SUB, SBX, SBY Subtract without carry
  • CMP, CPX, CPY -> CPC, CCX, CCY Compare with carry
  • AND, ANX, ANY -> BIC, BCX, BCY Bit clear (dest <- dest and ~source)
  • MUL, DIV, MOD -> MLS, DVS, MDS Signed multiply, divide, mod
  • SQR -> RSQ Reciprocal square root
  • ASL -> ESL Shift left, filling with the right-most bit
  • LSR -> ASR Shift right, filling with the left-most bit
  • ROL , ROR -> RLB, RRB Rotate without carry
  • CBT -> TBT Test bit.  The Z flag is set if the tested bit is zero, cleared if it is one
  • SBT -> XBT Toggle bit

The C640 Project

The C640 is a fantasy successor to the Commodore 64.  It is imagined as a computer that Commodore could have made in the mid-1980s, but didn't.

My design constraints are

  • It must 'feel' like a Commodore 64 (only better)
  • It should be moderately compatible
  • It should be feasible to have been manufactured in mid-1980s technology
  • Working on it must make me happy
The first three constraints will be broken any time they conflict with the fourth.  You can expect an end-result that doesn't feel very Commodore 64ish to you, won't run your favourite games, and does a few things that couldn't have been done at the time.  This is just for fun, and I'm not going to be a purist about anything.

So far, I have a reasonably worked-out design for the processor (which is called the 65020), a sketch of some improvements to VIC, and a few vague ideas of improvements to SID.  The 65020 has an assembler and simulator, and is capable of running the Commodore 64 ROMs with a few modifications.

So let's start...