Wednesday, June 27, 2018

Getting Commodore 64 BASIC to run

The plan was to use the source for the Commodore 64's ROM (https://github.com/mist64/cbmsrc) as a test for the assembler.  If I could get a 100% matching binary, then I could be reasonably confident that the assembler was working correctly, for 8-bit code at least.  I could then start optimising parts of it to get a feel for how well the extended instruction set works, which bits are useful, and what is missing.

This plan immediately ran into a problem, which in hindsight should have been obvious.

With the 65020, a byte contains 16 bits.  Every address used by the Commodore 64 ROM is (at most) 16 bit.  So instructions that need two byte operands in the original get assembled to only one byte.  It doesn't take long for the binary to get out of sync.

However, this approach was still good enough to find and fix a number of simple assembler bugs.  Feeding the result to the simulator gave this:

Commodore 64 start-up screen, "64K RAM SYSTEM  51199 BASIC BYTES FREE"
Great!  But isn't it supposed to be 38911 BASIC BYTES FREE?  It turns out that BASIC starts with a memory test.  It checks every location until it finds something that isn't RAM, and assumes it can use all of it.  My simulator didn't distinguish ROM from RAM, so it kept going until it hit I/O space at $D000.  Write-protecting the ROMs at $A000-BFFF and $E000-$FFFF fixed this.

Now we've got a fully working BASIC.  Except we don't.  It's not much use if you can't type programs in and run them.  So I added some I/O handler code to convert Windows keyboard scan codes into the Commodore 64 keyboard matrix, and return the appropriate values when $DC00 and $DC01 were accessed.

Now we can run a real test

Commodore 64 start-up screen with the program 10 PRINT"HELLO WORLD", followed by ?SYNTAX ERROR
Entering anything would give a syntax error.  That's not how it's supposed to go.

The nice thing about trying things out in a software simulator rather than jumping straight to hardware is that you can create useful debugging tools.  The simulator already had an instruction trace feature, printing out every instruction that is executed, along with the contents of some of the registers (PC, SP, P, A0, X0, Y0).  This quickly revealed the problem.

BASIC uses a small routine called CHRGET, which is copied into zero-page memory.  Here's the important part:
INITAT  INC CHRGET+7
        BNE CHDGOT
        INC CHRGET+8CHDGOT
        LDA 60000
It uses self-modifying code to increment the two bytes of the address (60000 is just the initial value in the source code, used to force absolute addressing mode.  It gets set to different values later).  But on the 65020, 60000 is a one-byte quantity.  That LDA gets assembled with zero page addressing.

Another easy fix, if a little hacky:
CHDGOT  .BYTE $AD, $00, $00
Now I can type in programs.  But it's annoying to type them in every time.  I need to be able to save programs and load them back in.  Again, this is the advantage of a software simulator.  I don't have to emulate Commodore's serial bus and floppy drive, or the tape drive, or anything like that.  I can simply replace the LOAD and SAVE kernal routines with STA LOADTRIGGER and STA SAVETRIGGER, writing to unused I/O locations.  The I/O handler traps these, reads register values from the CPU, and loads or saves chunks of memory to or from a regular file on my PC.

Now, some more tests.  PRINT 1+1 says 2.  PRINT 2*2 says 4.  PRINT 1/10 says 5.95173333E-09

What's going on there?  The floating point divide routine builds the result bit-by-bit.  It starts with A set to 1, and shifts in bits of the result one at a time.  When the 1 bit gets shifted out, the partial result is written to a temporary buffer.  Here's the code that does this:
        LDX #253-ADDPRC
        LDA #1DIVIDE        ... ; do the compare
SAVQUO  PHP
        ROL A
        BCC QSHFT
        INX
        STA RESLO,X
        BEQ LD100
RESLO is the last byte of the buffer.  X is initialised to -4, so the first byte is written to the start of the buffer.  It is incremented each time a byte is written, and when it reaches 0 the loop ends.  This relies on RESLO being in page 0, and zero-page indexed addressing wrapping if RESLO+X is greater than 255.  That's the main incompatibility between the 6502 and 65020.  On the 65020, indexing never wraps.  The solution again is a small change to the source:
        LDX.L #$FFFFFFFD-ADDPRC
        ...
        INX.L 
65020 indexing always uses all 32 bits of the register, so we must load X0 with a 32 bit version of -4, and do a 32 bit increment.

The simulator isn't meant to be a full Commodore 64 emulator, but I couldn't resist adding support for bitmap mode.  Typing in the example from the Programmers Reference Guide gives me this
A "high resolution" sine curve in black, on a cyan background
That's a good place to stop for now.  I have a collection of Commodore's public domain software as a set of .d64 files.  I plan to extract the individual programs and use them as further tests.  I want to play ARTILLERY again!  After that, I'll finally get back to the plan, and start optimising parts of the ROM using the 65020's extended features.  And then, finally, the FPGA.

Sunday, June 17, 2018

The 65020

The original idea for the 65020 came about 30 years ago, as a reaction to the 65816's mode bits.  The 65816 is a better 6502, but it's not a very satisfying processor.  I wanted to do better. 

The key idea in the 65020 is the extension of bytes to 16 bits.  That immediately opens up a lot of possibilities.  Instructions have 8 more bits to specify data width, extra registers, and other operations.  16 bit addresses become 32 bits.

The intention is that if the top 8 bits of every byte are all zero, then it will behave exactly like a 6502.  So far I've managed to do that, with a few minor exceptions: the stack doesn't wrap, and zero-page indexing doesn't wrap either.

Instruction set

Many of the gaps in the 6502's opcode map have been filled with new instructions, and new addressing modes for old instructions:



New opcodes are marked in gray.

  • ACX, ACY: Add with carry, with an X or Y register as destination
  • SCX, SCY: Subtract with carry, with an X or Y register as destination
  • ANX, ANY: Logical and, with an X or Y register as destination
  • EOX, EOY: Logical exclusive-or, with an X or Y register as destination
  • ORX, ORY: Logical or, with an X or Y register as destination
  • MUL, DIV, MOD: Multiply, divide, and mod operations
  • SQR: Square root (if floating point is ever implemented)
  • PHX, PHY: Push an X or Y register to the stack
  • PLX, PLY: Pull an X or Y register from the stack
  • SEV: Set overflow
  • CBT: Clear bit
  • SBT: Set bit


Registers

There are 16 registers:
  • 0-3: A0, A1, A2, A3
  • 4-7: X0, X1, X2, X3
  • 8-11: Y0, Y1, Y2, Y3
  • 12-15: P, Z, SP, PC
P is the processor status register.  If used as an index, a constant zero is used instead.  This explains the abs,0 and zp,0 addressing modes in the opcode table above.  They are now indexed modes, but use P as the default index register.  Selection bits in the opcode extension allow other Y0-Y3, Z, SP, or PC to be used instead.  This provides stack- and PC-relative addressing modes for most instructions.

All registers are 32 bits wide.  In most cases, writing to the low 8 or 16 bits will clear the rest of the register.  The exception is writing to the low 8 bits of SP.  For compatibility, this will set the high 24 bits to $000001.

Extension

Most of the new features come through the opcode extension.  There are a few formats for these, depending on the instruction and addressing mode
  • PRRXXXDD
    • Used by most instructions.
    • P selects the alternate operation
    • RR selects the destination register, or a small constant (1-4)
    • XXX selects the index register or source register
    • DD is the operation width. 00 for 8 bit, 01 for 16 bit, 10 for 32 bit, 11 for floating point (where that makes sense)
  • PAAARRRR
    • Used by bit-set and -clear instructions
    • AAA is combined with the base bit from the instruction to form the bit number
    • RRRR is the destination register
  • SSSRRRDD
    • Used by register-move instructions
    • SSS selects the source register
    • RRR selects the destination register
    • DD is the width
  • WNNNVVVV
    • Used by the BRK instruction
    • W enables waiting for an external interrupt
    • NNN selects the external interrupt to wait for
    • VVVV selects the interrupt vector: $0000fffe - 2*VVVV
  • CDILRRRR
    • Used by branch instructions
    • C selects a different set of conditions (BGE, BLT, BLE, BGT, BLS, BHI, BNV, BRA).  BNV is "never", BRA is "always"
    • I enables indirection.  If it is 0, the target address is base register + offset.  If it is 1, then base register + offset points to a memory location containing the target address
    • L enables subroutine calls.  If it is 1, the current PC will be pushed before the branch is taken
    • RRRR is the base register
The register selection fields in opcode extensions don't encode the register number directly.  Instead, each instruction has a default register (to give standard 6502 behavior when the extension is zero).  The register selection field is xored into the lower bits of this register number.

The 6502's flag-clearing and -setting instructions have been generalised to clear or set any bit of any register.  These general instructions as called CBT and SBT.  So CLC is CBT 0, CLI is CBT 2, and so on.  The AAA bits from the extension are shifted up two places, with the lowest bit copied to fill the new ones.  The resulting value is xored into the bit selected by the instruction.  This gives access to all 32 bits.

Similarly, the register-transfer instructions TAX, TYA, and so on, have been generalised to copy any register to any other.  By xoring the instruction's register number with the bits in the extension, all combinations of source and destination register are covered.

Increment, decrement, and the shift/roll instructions have a small constant encoded instead of a destination register.  This makes it possible to add or subtract numbers up to 4 in a single one-byte (16 bit) instruction.  For the acc mode, these instructions use the index register selection to specify the register.

Alternative operations

Many instructions use the 'P' bit in the extension to select a different operation
  • ADC, ACX, ACY -> ADD, ADX, ADY Add without carry
  • SBC, SCX, SCY -> SUB, SBX, SBY Subtract without carry
  • CMP, CPX, CPY -> CPC, CCX, CCY Compare with carry
  • AND, ANX, ANY -> BIC, BCX, BCY Bit clear (dest <- dest and ~source)
  • MUL, DIV, MOD -> MLS, DVS, MDS Signed multiply, divide, mod
  • SQR -> RSQ Reciprocal square root
  • ASL -> ESL Shift left, filling with the right-most bit
  • LSR -> ASR Shift right, filling with the left-most bit
  • ROL , ROR -> RLB, RRB Rotate without carry
  • CBT -> TBT Test bit.  The Z flag is set if the tested bit is zero, cleared if it is one
  • SBT -> XBT Toggle bit

The C640 Project

The C640 is a fantasy successor to the Commodore 64.  It is imagined as a computer that Commodore could have made in the mid-1980s, but didn't.

My design constraints are

  • It must 'feel' like a Commodore 64 (only better)
  • It should be moderately compatible
  • It should be feasible to have been manufactured in mid-1980s technology
  • Working on it must make me happy
The first three constraints will be broken any time they conflict with the fourth.  You can expect an end-result that doesn't feel very Commodore 64ish to you, won't run your favourite games, and does a few things that couldn't have been done at the time.  This is just for fun, and I'm not going to be a purist about anything.

So far, I have a reasonably worked-out design for the processor (which is called the 65020), a sketch of some improvements to VIC, and a few vague ideas of improvements to SID.  The 65020 has an assembler and simulator, and is capable of running the Commodore 64 ROMs with a few modifications.

So let's start...