Joggler Coreboot Diary

Goals for the day

 * Find some way of debugging coreboot on Joggler, i.e. serial port
 * Compile coreboot image, run and and see if I can get some feedback

It turns out that US15W has absolutely no built in UART, bad news because no way to see what's going on. Because if this, decide to switch to developing on the Crown Beach which has port 80/81 counters. First runs not looking good. Can see only a handful of bytes being fetched from FWH on scope. No assembler code no matter how early successfully updates port80 counter. coreboot image is completely DOA (for some reason)

Goals for the day

 * Find out why Crown Beach isn't executing a single instruction from ROM

3 PM arrives and very frustrated. for some reason the Crown Beach AMIBIOS just seems to "work" in terms of executing the reset vector. Starting to suspect there may be an RCW or something. Quick test: frankenstien the coreboot reset vector into the Intel supplied AMIBIOS. Bingo! numbers seen on port80 counter! first coreboot code is running

Found that problem was CMC (Chipset Microcode) missing. It's in the Crown Beach BIOS starting from 0xD0000 and ending at 0xDFFFF Wrote quick and dirty tool to copy CMC into coreboot image during compile process.

port80 code shows that coreboot is crashing on the DRAM init, surprise surprise. Annoyed to find that SPD DRAM setup isn't supported on this chipset. Wrote up manual DRAM config, DRAM now up and running.

Coreboot now running all the way through to SeaBIOS. woohoo! that was easy (?). Now let's put a VGA BIOS in there and see what happens. VGA BIOS crashes an burns. Bummer. looks like it's not quite going to be so easy.

Goals for the day

 * Try and move from the Crown Beach to the Joggler

Mr FedEx arrives bright and early with Ajays NET20DC EHCI debugger. Can now get some debug output on the joggler. Need to find out which is USB port 1 on Joggler, because only this port can be used for EHCI debug. Port 1 turns out to be TP138. Wired up USB port to TP138. Changed CMC tool to get CMC and soft straps from Joggler BIOS image (CMC is also at 0xD0000), plugged in NET20DC and got some console output on Joggler.

Bad news. coreboot image crashes far earlier than on Crown Beach. First problem turns out to be crappy code which shadows the CMC just below the 1GB mark, not a whole lot of use on the Joggler with only 512MB of ram. Hacked that, and away we go.

Having some strange problems with the EHCI debug system. Seems to cause a crash very early on, just before transition from romstage to ramstage. Back to the Crown Beach for now because need port80 counters to debug this. USB port 1 on Crown Beach is completely blown for some reason. argh!!! Eventually find a shorted tantalum cap on the port. Replaced it, and on the way again.

Goals for the day

 * Get to the bottom of EHCI debug crash
 * Switch back to Joggler and confirm that the RAM is working properly

EHCI debug crash turns out to be a small piece of data needed by the debug system being stored in CPU cache, which is wiped by the invalidation of it just before the ram stage. Don't have any ideas for an ideal solution to this issue right now, so hacked code to make problem go away. No doubt this will come back to bite.

Time to look at this RAM. Ram from 0x100000 and upwards seems to be passing a basic test (below that is not physical RAM). Coreboot image is now crashing during PCI resource allocation. Noticed that TSEG and IGD allocations are nil, and stolen memory base is at < 1GB mark. More work needed. Started reading about this in Intel datasheets. I need IGD, but do I need TSEG?

Major discovery: Many of the numerous missing sections from the US15W datasheet are contained in the 945M datasheet

Still feeling a bit confused though. Some registers documented and present in the 945M are present in US15W but some aren't. quite a guessing game to determine just what the heck registers are implemented in the US15W.

Goals for the day

 * Debug PCI resource allocation crash
 * Try and get VGA BIOS up and running

PCI Resource allocation crash fixed. Turned out to be a limitation of coreboot - resource allocator cannot notify drivers of BAR changes, specifically the shifting of the EHCI BAR. Temp fix with some code added to PCI resource allocator. Won't be needed in final build.

Now have a nice new crash after updating the CPU microcode. Oh joy.

Goals for the day

 * Debug latest (unknown) crash
 * Try and get back to the point where VGA BIOS execution begins

Back to the Crown Beach again this morning because this latest crash looks like a doosey. EHCI debugger is the likely culprit again. Crash turns out indeed to be EHCI debugger being killed off again (is this thing ever going to stop slapping me in the face?).

Looks like enabling of an MTRR over the IO range has killed it. Suspect EHCI range IO is now cached when it shouldn't be.

Goals for the day

 * Continue debugging MTRR crash

7 hours later, discover that all assumptions from previous day were wrong. Current code assumes that US15W has a 'TOLUD' register. Pretty sure this isn't implemented in US15W. The code basing MTRR setups on this was going spectacularly wrong as it always reads 0x0. For now, code has been modified to not base the top of DRAM on this. Need a nicer way to do this going forward.

Now, back to the VGA BIOS crash *moan*.

Hello hello, VGA BIOS isn't crashing anymore... instead it's executing, generating an IRQ (?) then doing absolutely nothing. That looks a lot better than yesterday!

Goals for the day

 * Debug VGA BIOS

Not a lot of progress today, lots of distractions, sent one of my dev boards to silicon heaven, hours lost repairing it. Hoping that the IRQ generated is IRQ15. It seems that this is quite common for BIOSes to have to handle IRQ15 upon starting the VGA BIOS. As for what IRQ15 handler is supposed to do? No idea. Fortunately GMA500 is now fairly well documented by the linux gma500_gfx driver, so going to have to look at the IRQ syndrome registers and find out what this thing wants from me.

Goals for the day

 * Untangle US15W IRQ routing system
 * Figure out what the IRQ generated the previous day is, and pray it is from the GMCH

Today was packed with job interviews, so not much achieved. Good news is that occasional crash seen when executing the VGA BIOS is in fact my dodgy (not to mention, non existant) IRQ handler. Not 100% sure what is needed here but can currently see that some code for IRQ routing in US15W is likely missing. Also suspecting that the i8259 hookup is dodgy too as the source currently reads 0xFF. Could this IRQ routing system possibly be more complicated?! Lots of datasheets to read before I can get to the bottom of this one.

Goals for the day

 * Same as yesterday

Started looking at RCBA, suspect this is not set at all. RCBA is (somehow?) at 0xFED1C000 already? Cool what about D02IP: 0x1. That looks wrong, set it to 0x4 to match current ACPI, but still no joy. Time to start looking from the CPU end. According to assembler code for IRQ handler, IRQ 0xFF+ should not be possible. Something strange going on here... Can see that IRQ is 0x1A (no idea what that is), from realmode port 81 write. Looks like stack pointer is lost. *sigh* not a good way to end the day.

Goals for the day

 * Figure out how ASM interrupt handler works
 * Figure out what happened to stack during ISR

Unfortunately ISR is mostly 16-bit and my understanding of 16 bit x86 is minimal. Had nose in 8086 programming manual most of the day. Although there's some crazy shit going on in here, at present it appears that the issue is caused by the IEGD pointing all of the segment registers in funny places, then the coreboot ISR assuming they were exactly as they would be as if no other external code had run (wtf?). No idea how to get around this one, that's a job for another day.

Goals for the day

 * Examine situation with segment registers in 16-bit ISR

VGA BIOS Problem now pretty well understood. Coreboot sets segment registers to 0x0000 making a stack in the 0x00000000 segment for real mode, and sets GDT entry 0x18 (also pointing at 0x00000000) when switching to protected mode in the ISR, allowing the stacks to be shared. It seems that the IEGD is a "segment register fiddling" option rom which Coreboot has absolutely no support for, i.e. segment registers are 0xED81 when entering the ISR, when Coreboot is expecting 0x0000.

This definitely a bit of a tricky one. Currently experimenting with some workarounds with varying success. IRQ 0x1A previously observed turns out to be a PCIBIOS request, which is good, expecting plenty of those. Current ISR hack can successfully execute 1 PCIBIOS request then crash and burn, almost certainly due to stack corruption.

Day 13
Decided to abandon the execution of VGA BIOS in coreboot for now, might be interesting for some obscure ELF boot project but generally not much use. Instead focus on execution in SeaBIOS. Looks a lot better: DCLK is on at 33MHz and sync signals seen. No picture data though.

Getting a bit frustrated with the situation of having no real UART for debugging, this is going to make debugging SeaBIOS tough. Started working on rig to attach UART to the LPC bus. Fortunately the VIA LPC-01G board provides a ready to go set of UARTs with LPC interface, bought one today.

Known signals on J15 so far:


 * 16 - GND
 * 15 - ?
 * 14 - ?
 * 13 - MCU - C2CK
 * 12 - MCU - C2D
 * 11 - FRAME
 * 10 - ?
 * 9 - GND
 * 8 - CLK
 * 7 - SERIRQ
 * 6 - RESET
 * 5 - AD3
 * 4 - AD2
 * 3 - AD1
 * 2 - AD0
 * 1 - +3v3

Goals for the day

 * Wire up VIA LPC-01G board
 * Find out why there's no video output

LPC-01G arrived yesterday. Hooking it up and getting it going couldn't have been simpler. Literally just pasted the BIOS setup code from the datasheet into romstage.c and it worked!

Why not just use a USB to serial adapter? Well those, which require mountains of complex software to operate, are almost impossible to make work in a BIOS environment, especially when the system is 'half' working. These guys just work, with literally a handful of lines of code.





Well hello there SeaBIOS. You seem to be working a little better than expected.

TODAY'S GEM OF WISDOM: When buying test sockets, don't buy knock offs, unless spare time is in excess. How to tell? The price.



Goals for the day

 * The usual: Get the VGA BIOS executing

Managed to get a few (not many) hours of coreboot hacking in amongst Olympic events. Having PC/ISA UARTs is a real bonus - has allowed detailed debugging of SeaBIOS. Can immediately see that there's some IRQ15's going off - except, they're software generated, not hardware. Interesting.

There is one particularly interesting one there: 0x5F/0x31 which according to an old Chips & Technologies VGABIOS manual means: VGA POST Complete. OK, why nothing on the screen then? LVDS TX 0,1,3 remain flat-lined? Assuming for now that the old Poulsbo splash screen which used to appear on IEGD builds circa 2008 has been removed - fair enough, it was a tad 1980's.

OK, if the above is true, could flat-lined LVDS TX could be explained by a complete lack of anything in the frame buffer? Quite possibly. Seems that execution of the IEGD VGA BIOS has in fact completed, then crashed again in SeaBIOS when trying to jump from an area in the legacy 1MB to top of RAM, where SeaBIOS lives, ironically right inside the IGD shared frame buffer memory. Whoops. I think I can see what's going on here...

Day 17
Other than wiring up post code counter, Did almost nothing today. At least I now have a complete debugging setup, so no more going back to the Crown Beach when the going gets tough.



F8 = Coreboot POST complete, yet my screen is blank. Darn you Joggler, Darn you ;-)

On another note, the VGA BIOS works OK when the Joggler's screen is connected to the Crown Beach, this is good, at least I know it's possible, so now just have to figure out what the last missing piece is. It's also executing OK on the Joggler too, with no SeaBIOS crashes, except the screen remains stubbornly blank. So close now, can almost taste success!



Testing the VGA BIOS on the Crown Beach, at least it works on something!

Day 18
Today is a glorious day. After days of reading data sheets, hacking code and fiddling registers, The VBIOS is finally alive. The problem in the end turned out to be some errata workarounds kicking around in the code which don't apply to the Joggler.

Now, it's finally time to start working on the (comparatively) easy stuff.



Goals for the day

 * Devise a tidy mechanism for controlling the backlight brightness
 * Figure out why SeaBIOS crashes when any USB device is connected

It turns out that the IEGD can indeed control the backlight brightness, when it is configured properly. Direct MMIO writes to the PWM register aren't needed. Good!

Unfortunately the USB crash is an absolute doosey. Spent the entire day debugging it. Haven't figured out the cause yet but do know that it's a nasty memory corruption problem (there is always one!).

Day 20/21/22
Progress has been abysmal over the last 3 days. Been stuck on a USB DMA crash which is turning out to be one of the most difficult problems I've encountered in my career to date. All that has really been achieved so far is demonstration of how bizarre and complex the problem is. Have started a thread on the SeaBIOS mailing list but not a lot of useful response yet.

Thread is here: http://www.seabios.org/pipermail/seabios/2012-August/004318.html

On a lighter note, to cheer myself up, here's a Joggler running MS-DOS 6.22. That's definitely not something that's been seen in public before ;-)



In theory this thing is probably almost working well enough to go ahead and boot Win 7 now, except that could be a little difficult with no working keyboard or mouse. Pretty sure there aren't too many pre-made i8042 LPC PS/2 stubs on the market, so perhaps I'd better get back on to this USB problem before I start doing silly stuff like that.

Goals for the day
The USB crash is fixed. Will probably never know the exact problem, but in vague, it was SeaBIOS writing to a register that doesn't exist (according to the datasheet) in US15W. There must be something there though, because a write to that location caused some very interesting problems indeed.
 * Fix the USB crash already
 * Get booting some operating systems

Details here: http://www.seabios.org/pipermail/seabios/2012-August/004327.html

The rest of operation looks pretty good. It's happy to boot from USB CD-ROM or Hard disk. PATA boot works good too. USB Keyboard emulation is also working under DOS... but.. does Win 7 boot? Nope. Installer crashes. Hardly surprising. Can see some funny PnPBIOS requests which likely went wrong, let's hope this doesn't burn another 3 days because am running out of time off to do this port!

Pretty surprised how well this thing is working so far. It booted a Gparted LiveCD I had kicking around. It gets about half way through booting Ubuntu live too. As for any version of Windows: Not a chance.

VGA Console in Grub and Linux is working good, no issues with text or graphical modes, not even any stretching or geometry problems, that's a first for me.

One very interesting observation is that memtest86 fails spectacularly. This is good, because if I'm lucky, it's dying for the same reason that Windows is, and memtest86 is going to be one hang of a lot easier to debug than the Windows Kernel.

Goals for the day

 * Investigate the multitude of problems booting various operating systems

Windows status
Windows XP: Fails executing NTDETECT.COM. Currently seems that XP can't boot without an i8042, which the Joggler doesn't have. for f***s sake. I might be wrong about this, and I had better be wrong about this.

Okay, so this isn't totally unworkable. patching the XP boot files is already commonplace with the current "XOJ" boot process, but this dashes my fantasies of running an unmodified Windows XP.

Windows 7:

Doesn't seem to mind the missing i8042. Currently crashes because the ACPI tables are incomplete. I know this, Just haven't taken the time to learn about this stuff yet, and how to fix them.

Misc
memtest86: Didn't really get to the bottom of the memtest86 crash, but didn't look at this much either. Does seem to work properly if forced to use e820 memory map (seems to default to LinuxBIOS, which I'm not sure if is correct)

Also seems to have an intermittent crash, which seems not to occur if a decent heatsink is attached to the Poulsbo.

Linux:

Might be working OK, not sure, haven't spent a lot of time on it. All installer CDs I've tried so far die trying to start the X server. Priority for Linux is basically zero though, as Linux already works good on Joggler, not expecting to spent much more time on this unless it aids debugging.

Maybe someone else with some spare time could look at this. Probably about time to start asking for help from the rest of the community,

Goals for the day

 * Fix EHCI (USB 2.0) problem
 * Try get some alpha builds out for other community members
 * Investigate Windows XP boot crash a little further

Windows XP boot crash is 99% confirmed to be a lack of i8042. NTDETECT.COM works OK on Crown Beach with same binary, which has an i8042. Probably easy to patch around. Had a look at the Eric Huang provided NTDETECT.COM. Unfortunately it's a completely unmodified copy of NTDETECT.CHK which isn't going to help here. Must be some fancy stuff going on in NTLDR. Too bad because I can't use the XOJ NTLDR. Probably a days work at most to make a new hacked NTDETECT but this is low priority right now.

EHCI Problem status: Not looking good. Almost worst case scenario: memory corruption caused by DMA read from the EHCI controller. Very similar problem to my original UHCI crash except the same tricks don't seem to want to fix this one. EHCI is significantly more complex too, which is slowing progress here.

Goals for the day
EHCI (USB 2.0) Problem finally fixed. SeaBIOS needs to be patched to support 64-bit EHCI when running on US15W, which is a little silly as it can only ever be a 32-bit platform. There is already a proposed patch on the mailing list. Applied this exactly as described and it works. Have also been fiddling with the EHCI PCI Configuration space, so need to double check that nothing I did in there was also critical.
 * Fix EHCI (USB 2.0) problem
 * Try get some alpha builds out for other community members

Upon enabling EHCI, there now seems to be some kind of race condition occurring, which I was able to work around with a small delay after initialising USB - Better get to the bottom of that properly.

Aplha build made. Just noticed that El Torito Floppy emulation is broken since I fixed the EHCI problem. gah. Does this have some dependency on running UHCI mode? That's a question for another day.

Also noticed that Linux is complaining that EHCI interrupts aren't going off. Hardly surprising - Haven't even touched that stuff yet.

The End
Developments will be discussed on this thread from now on:http://www.jogglerwiki.com/forum/viewtopic.php?f=2&t=686

In terms of what remains to be completed, highest priority first:


 * ACPI
 * A bit more work on core logic initialisation
 * Need to write a mechanism for updating the FWH from SeaBIOS
 * Also need some code which checks and fixes the ROM in the built in NIC (I think)
 * Likely lots of little bugs
 * eMMC boot