Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲80386 microcode disassembled (reenigne.org)

277 points by nand2mario 3 days ago | 54 comments

liendolucas 3 days ago [-]

> ...they mentioned that it would be interesting to get high resolution images of the 80386 die and try to extract the microcode from it.

Can someone explain how is that from a high resolution image of the die the microcode can be reconstructed? I'm really curious, what's the process? Is the output some sort of Verilog? Does the process involve recognizing each and every transistor and model a circuit from that? I'm fascinated that something like this is possible at all...

GloriousCow 3 days ago [-]

I worked a bit on the extraction process so I can chime in here a bit. The first part is to just mark the x,y locations of where all the bits are, generally by the intersection of the rows and columns of the microcode array.

Then you have to classify them as 0's or 1's. Each is visually distinct, a 1 being encoded by the presence of a transistor and a gap in the polysilicon. We didn't have to guess which is which is by the nature of Intel microcode we could assume 0's were much more frequent, so a transistor meant a 1.

There are some automatic tools designed to perform this work via color thresholding, but they didn't work very well here because some of the mosaic was blurry, and a lot of dust had crept in which created false 1 bits.

Instead, we trained a convolutional neural network to classify the extracted bit regions into 0's and 1's. This was overlaid back onto the original mosaic as white or black squares at 50% opacity.

Then we spent several long, tedious days just checking the results for errors. Finally we had the raw 2d array of bits - the next step is to extract the microcode words from the bit array.

GloriousCow 3 days ago [-]

Intel had given us some clues - they had written somewhere that the 386 had 2560 microcode words. The microcode array has 37 banks - each bank resolves one bit from the 37 bits that comprise a microcode word. But which way to decode them? From top down? Bottom up? Were they interleaved in weird ways?

Documentation from the NEC vs Intel lawsuit ended up documenting the microcode word format for both the 8088 and NEC V20 CPUs, but unfortunately, we were on our own for the 386. But we could take educated guesses - working off the 8088 field format, what additional microcode fields would a 386 add? What fields would expand and how many bits would they need?

We used a lot of python scripts to decode the microcode array into 37-pixel wide, very long bitmaps, in different permutations, to see if any vertical patterns emerged that would hint to us the boundaries of microcode word fields. And some did emerge!

GloriousCow 3 days ago [-]

We also had decoded the 386's match-decoder PLA, so we knew roughly the locations of different opcodes were in the microcode itself, which was very helpful. Some opcodes have very specific operands, so would have unique field references. Some forms only operate on EAX/AX, for example, so if you find those instructions you have a hint of how the AX register is encoded as an operand.

Other instructions like PUSHA and POPA are implemented as loops that iterate by incrementing the fields corresponding to registers - and we know in what order they operate.

Bit by bit, relation by relation, you can puzzle out the format of the microcode. Of course, this is glossing over the enormous added complexity of protected-mode operations. This was a herculean effort by reenigne, and I don't think it is hyperbole to call it one of the more impressive human achievements I have witnessed in my lifetime.

GloriousCow 3 days ago [-]

The actual output of microcode disassembly is just a text file - a line of code for each microcode word, in essentially a a new dialect - a type of static assembly language. reenigne had to invent names for a lot of things, that will now become the official names of these things, unless Intel ever decides to speak up and make corrections.

That language can then be translated into Verilog, and has been.

ddtaylor 3 days ago [-]

Here's a video of some guys de layering the chips for the Nintendo 64 lockout mechanism. It's pretty in-depth and it goes over a lot of different ways they do this.

https://youtu.be/HwEdqAb2l50?si=VFLed64PZvpCHfy1

liendolucas 3 days ago [-]

Thanks for sharing, will definitely watch it!

dboreham 3 days ago [-]

The microcode is in a ROM. It's a regular structure where a 1 looks different to a 0.

jdblair 3 days ago [-]

Yes, literally this. No verilog decode, just looking for signals in the image of a 1 vs. a 0. For example, a 1 may be the existence of a transistor at a particular intersection of wiring.

drob518 3 days ago [-]

Right. And the best way to think about microcode is as code for a wacky, custom VLIW processor that implements the programmer-level x86 (in this case) instruction set. Various fields in the microcode send signals to different parts of the processor to activate them, routing values along internal busses and between registers, functional units and memory to cause the processor to execute the x86 instructions.

liendolucas 3 days ago [-]

So what you actually need is a program that navigates through the huge image of the die and detects if the structure that is looking at is a 1 or a 0? This at the fundamental level is a cross between machine learning and image processing?

electroly 3 days ago [-]

I helped out on this image-to-bits transcription, doing manual verification of the automated work. I did the whole thing by hand: I sliced the ROM images into strips that excluded parts of the image that don't encode bits, used my tablet and stylus to manually place a black dot on every 1 bit, then wrote a trivial program that detected the presence or absence of the black dot in each cell. From my perspective, the ROM is organized like a series of "ladders" where the 1 bits are missing legs of the ladder, and I was placing dots on the missing legs. I compared my results with the ML output and manually re-checked each bit where we disagreed.

http://brianluft.com/images/2026/05/386_microcode_bits.jpg -- my fully annotated result. I was working from a higher-quality PNG; this is highly compressed because it's a big image.

ForOldHack 3 hours ago [-]

Thank you so much for your work. Thank you!

I wanted to give HN a perspective on working on this stuff: Working on these micrographs is like looking for a penny on 4 football fields: I tried to see how long it would take me to search the physical area for any coins, and it took 4 1/2 hours and I did not find a penny, but I found two dimes.

This is maddening work, and again, thanks.

bri3d 3 days ago [-]

Yes, exactly. Historically you would make some simple image processing software that will align the grid and then look for properties at each specific bit position. Usually die shots are highly imperfect (the delayering usually leaves some artifacts or damage) so frequently merging multiple scans is important as well. Travis Goodspeed has a neat tool for this workflow at https://github.com/travisgoodspeed/maskromtool and the blog mentions John McMaster’s bitract: https://github.com/SiliconAnalysis/bitract although I think most people working on these projects usually just one-off it as the mentioned Discord users in the blog post eventually did.

More modern devices are of course more difficult due to layers, feature size, and less visually obvious ROM bit designs.

Anyway, the impressive part of this project was really understanding the undocumented microcode assembly language through inference and trace following; the 1s and 0s look like they were the easy part!

photochemsyn 3 days ago [-]

The full workflow seems to look something like this, with the added complications relative to the 8086 microcode being that the 80386 microcode acts as an orchestration layer on top of hardwired engines, programmable logic arrays, and fault/protection redirection. The 8086 microcode does all that algorithmically, reusing the same hardware instead of having dedicated transistors.

1. Extract the ROM bits. 2. Determine physical-to-logical bit ordering. 3. Identify microinstruction boundaries. 4. Infer field boundaries. 5. Associate fields with hardware destinations (check with die tracing). 6. Decode instruction-dispatch programmable logic arrays. 7. Associate x86 instructions with microcode entry points. 8. Infer repeated idioms: moves, ALU ops, termination, calls, tests, redirects. 9. Decode accelerator protocols. 10. Validate against known architectural behavior.

ForOldHack 3 hours ago [-]

Keep in mind, that this was Intel's flagship processor, From October 1985, until April of 1998, and they had tried to eliminate all the second sourcing. It wasn't until 1989, that the Am386 was released, and out came all the lawyers.

They were using the 6th and 7th bytes of the GDT/LDT, which were reserved, and since it affected protected mode, and virtual mode addressing, was likely stored in the microcode. Which affected Xenix, and pissed off Intel enough, that they fixed their version of Xenix, and no one else's, SCO did a rewrite and charged $500 for the privilege of running a multi-user OS.

Add to #8, the new addressing modes, the new protected modes, which affected ALU OPS, Moves, calls, redirects, and indirects.

#7, the Microcode entry points are linked directly to the instruction decode logic, and of course not limited to the great LOADALL instruction, and the new multi-stage instruction pipeline, and prefetch.

This took years for AMD to blackbox the 386, and then:

"1987–1992: The arbitration proceeding, originally expected to take only six weeks, dragged on for nearly five years."

https://en.wikipedia.org/wiki/List_of_discontinued_x86_instr....

"The 80386 microcode was successfully extracted and publicly disassembled by a team of hardware historians and demoscene researchers (including reenigne, Ken Shirriff, and others). They extracted the 94,720-bit microcode ROM from 80386 die shots by combining image processing, neural networks, and human-aided automation. AI tools played a crucial role in cleaning up the die images, detecting cell patterns, and binarizing the data before humans parsed the 37-bit microinstruction formats. You can read about the full process on the Reenigne Blog.By contrast, the 8086 microcode was extracted through purely human-driven analysis of die photos. The 8086's 21-bit microcode is simpler and was fully reverse-engineered and disassembled in 2020. You can explore the decoded 8086 microinstructions interactively using the nand2mario 8086 Microcode Browser.8086 Microcode Browser - Small Things Retro - nand2marioDec 4, 2025 — Since releasing 486Tang, I've been working on recreating the 8086 with a design that stays as faithful as possible to the original...GitHub80386 Microcode Disassembled - Reenigne blogMay 23, 2026 — Well, they may have taken that as a bit of a challenge - they threw various bits of image processing, neural networks, and human-a...www.reenigne.org80386 microcode disassembled « Reenigne blog - daily.devMay 23, 2026 — 80386 microcode disassembled « Reenigne blog. A detailed account of disassembling the Intel 80386 microcode ROM, a 94720-bit blob ...daily.devi386 - WikipediaMicrocode reverse engineering In May 2026, the Intel 80386 microcode was reverse engineered and publicly disassembled by a group i...Wikipedia8086 microcode disassembled - Reenigne blogSep 3, 2020 — Recently I realised that, as part of his 8086 reverse-engineering series, Ken Shirriff had posted online a high resolution photogr...www.reenigne.orgThe 386 microcode has been fully reverse engineered - Reddit May 24, 2026 — In a group effort, a bunch of demoscene legends like reenigne have reverse engineered the microcode for the 80386, opening the pat...Reddit·r/thisweekinretro80386 microcode disassembled « Reenigne blog | daily.devMay 23, 2026 — 80386 microcode disassembled « Reenigne blog. A detailed account of disassembling the Intel 80386 microcode ROM, a 94720-bit blob ...daily.dev"

Levitating 3 days ago [-]

Just look at the images[1].

> The photo above shows part of the microcode ROM. Under a microscope, the contents of the microcode ROM are visible, and the bits can be read out, based on the presence or absence of transistors in each position.

[1]: https://www.righto.com/2020/06/a-look-at-die-of-8086-process...

trollbridge 3 days ago [-]

I checked reenigne's blog a few days ago. "Hmm, nothing posted since 2020. Oh well."

It's especially fun seeing his blog going back 33 years.

kgwxd 3 days ago [-]

Maybe the hit counter increment was the inspiration for the post.

whent 3 days ago [-]

Where's the hit counter? Mind pointing me to it. Can't find it anywhere at TFA.

ChrisClark 3 days ago [-]

He's making a joke. As in, "the site is so old, it probably still has a hit counter."

p1esk 3 days ago [-]

Here’s a great book explaining microprogramming from ground up: https://www.amazon.com/Computation-Structures-Optical-Electr...

Easy to find a free pdf

themafia 3 days ago [-]

Wow. Virtual86 modes, the floating point unit, and memory paging really created an explosion of complexity within the microcode.

There's sort of a wild west nostalgia that came with the 8086 and 8088 chips and a sense of approachable individual adventure that came along with it. Staring into the 386 is like staring into the cold and dispassionate industrial machine future that Fritz Lang was trying to portray in Metropolis.

Still fun to look at though. Great post.

dang 3 days ago [-]

Related ongoing thread:

z386: An Open-Source 80386 Built Around Original Microcode - https://news.ycombinator.com/item?id=48248014 - May 2026 (22 comments)

danborn26 3 days ago [-]

This is an incredible piece of reverse engineering. Seeing the actual microcode implementation helps demystify how these older processors handled complex operations.

bmenrigh 3 days ago [-]

The black box analysis needed to decode this is incredibly hard but also incredibly fun and rewarding to pull off. Very impressive work.

userbinator 3 days ago [-]

I agree with the first comment there, that it's important to know which revision of the 386 this came from, since the 386 did receive many small changes over its 22-year production run.

rep_lodsb 3 days ago [-]

Well, one indication is the value loaded into EDX on reset:

    9B5 BIST1  -> TMPD    0x0303         PASS2
    9B6 SIGMA  -> EDX
    9B7 BIST2  -> TMPE    TMPD           XOR
    9B8 SIGMA             0x3ddc0c2c     XOR
    9B9 SIGMA  -> EAX     BOOTUP_JUMP    JFPUOK

0x303 = family 3, model 0, stepping id 3.

userbinator 3 days ago [-]

That's either a B0 or B1 according to https://www.pcjs.org/documents/manuals/intel/80386/ , or an A3 according to https://www.geoffchappell.com/studies/windows/km/cpu/precpui... , all of which are very buggy.

mettamage 3 days ago [-]

For me, this is peak Hacker News. I am happy I took the hard courses at uni to understand a post like this. I’m also happy that HN was there to stimulate this thinking at the time (2015). Even if I now don’t really do anything with my humble knowledge of low level programming, every time it feels consciousnesses enriching. And it’s an awesome feeling.

For people that don’t have access to a uni, I recommend nand2tetris.org

morphle 3 days ago [-]

Just building your own microprocessor from gates is an easier way to learn about designing microcode and understanding how processors work(ed). But it can't hurt to study a few simple old designs like RISC or Transputer. The 80386 is on the other side of that spectrum, needlessly complicated because they wanted to be backwards compatible with an old bad design.

There certainly is no need to go to university to learn chip design. Watching a few Alan Kay talks [3] or browsing Bitsavers computer designs [4] are good starting points.

We made an easier way (than FPGA) to simulate and convert your gate level design into transistors on a chip (for less than $200 in 2026). We call it Morphle Logic [1].

Eventually you grow into making the largest fastest and cheapest supercomputer wafer scale integration [2].

[1] https://github.com/fiberhood/MorphleLogic/blob/main/README_M...

[2]https://www.youtube.com/watch?v=vbqKClBwFwI

[3] https://www.youtube.com/watch?v=f1605Zmwek8

[4] http://bitsavers.informatik.uni-stuttgart.de/pdf/xerox/alto/...

joleyj 3 days ago [-]

> needlessly complicated because they wanted to be backwards compatible with an old bad design.

It's not really needless complication of there is a reason for the complication. Obvioudsly in this case the need to be backward compatible with an old design made the implemtation more complicated than if they didn't need to do that. There were very, very strong business reasons why backward compatibility was a design requirment.

fortran77 3 days ago [-]

And was it a bad design? It was very succcessful and enabled a lot of progress.

drivers99 3 days ago [-]

I did nand2tetris a couple times, but it emphasizes simplicity in every level of abstraction. That in itself is an amazing lesson and has been an inspiration, but that also means it skips things like microcode. In college (in the 1990s) I took a EE class as part of my CS degree that went through how an 8086-like[0] CPU is made, a lot like nand2tetris but without necessarily making each part an assignment. It did cover how microcode worked where there was an internal program counter that stepped through a table of control words whose bits directly orchestrated each controllable piece of the CPU. We each got an instruction to implement on a simulator that the teacher had made previously. (I got DEC, decrement.)

In a way I guess the instructions in nand2tetris are the microcode. The bits of the instructions directly control the hardware with the first bit choosing 2 instruction types, so there’s only 1 step of code per instruction, unlike with microcode where an instruction can have any number of microcode steps.

In Ben Eater’s series of videos building an 8-bit CPU on breadboards he has ROMs that are indexed by the opcode (4 bits of the instruction) + a step counter to determine the control word. The ROM stands in for what could be done with sufficiently complicated logic gates. I like it as a next step on the hardware side as you get hands on experience with electronics and having to troubleshoot it.

It’s disappointing how it only has 16 bytes of RAM so you can’t really build higher levels of abstraction like you can with nand2tetris. But at that point you could (I should) either redo it with a better design (and put it on PCBs) or move on to the 6502 project, and then since that puts together a timer, CPU, ROM, RAM, I/O, UART, etc. mentally group those together and move on to microcontrollers that already have them together.

Anyone interested in reading about how a CPU could be made out of logic gates could also read Code by Charles Petzold (moves slower, recently updated) and/or Pattern on the Stone by Danny Hillis (moves faster).

Edit: I just checked Code (2nd edition) and that uses a 4 bit cycle counter and hard logic gates to determine what to do each cycle. But then it uses an array of diodes for part of the logic. Would that be considered microcode?

[0] there were classes that covered more advanced (pipelined) CPUs in another CS class but not at quite a low level where you felt like you could make one yourself

anthk 3 days ago [-]

You might like this, a CPU made by TTL's running Minix 2.

https://www.homebrewcpu.com/

I might upload Tristam Island (Z-Machine v3 game, like Zork and infocom games they already have the interpreter) among the feelies in ASCII format. Yes, dfrotz runs snappier than the vi clone they have. And more stable than their ed implementation.

chadgpt3 2 days ago [-]

Microcode in modern CPUs isn't anything like classic microcode - except for certain instructions like CPUID and WRMSR, where the CPU really does interpret a microcode subroutine.

It's become a generic term for patching parts of a CPU at startup time instead. Most of microcode is things like code for the Management Engine, chicken bits to disable features, routines for CPUID/etc, and yes, a small number of patch registers that can intercept execution of ordinary instructions and run microcode instead, but not enough to override all instructions accessing ah-dh.

deskamess 3 days ago [-]

Do you know if nand2tetris covers/uses microcode?

drivers99 3 days ago [-]

It doesn’t. I posted a reply to the same comment before I saw your question. Even the books I mentioned didn’t really get into it. I tried a search for some that did and ran across Constructing a Microprogrammed Computer by O.J. Mengali which looks interesting. It says it has you implement the microcode for 4 different architectures. I’m going to check it out.

mettamage 3 days ago [-]

Ah that's a shame. I had a computer systems course at uni where we were playing around with the microcode from the MIC-1 created by Tanenbaum. I sort of figured that Nand2Tetris just had that in it.

danborn26 3 days ago [-]

This is an incredible deep dive into the 386 architecture. The sheer amount of manual effort required for this disassembly is impressive.

Levitating 3 days ago [-]

I wonder if an OpenFletcher[1] would be able to get such images

[1]: https://openflexure.org/projects/microscope/

kiddico 3 days ago [-]

I'm absolutely going to make one of those

danborn26 2 days ago [-]

The amount of effort required to reverse engineer this microcode is impressive. Great deep dive into the 386 architecture.

ChrisArchitect 3 days ago [-]

z386: An Open-Source 80386 Built Around Original Microcode

https://news.ycombinator.com/item?id=48248014

Dwedit 3 days ago [-]

Meanwhile the original ARM didn't use any microcode at all.

phire 3 days ago [-]

I wouldn’t say it didn’t have any microcode. It actually had a small PLA for sequencing the multi-cycle instructions. [0]

I don’t think anyone would actually label it as microcode (not when the entire point of RISC was to avoid microcode) they would call it a sequencer or finite state machine; But really it’s the same thing. It’s certainly much simpler than the full microcode of any contemporary CISC, and the bulk of instructions execute in a single cycle without using it.

If you want a design with zero microcode, you really need to look at MIPS, or the original Berkeley RISC. Those ISAs go out of their way to avoid multicycle instructions. Not entirely successfully, but they don't use PLAs [1] to implement any state machines for the few remaining instructions like multiply and divide.

[0] http://daveshacks.blogspot.com/2016/01/inside-armv1-instruct...

[1] At least on the few MIPS designs I've looked at. And I'm not sure if they deliberately avoided PLAs for doctrine reasons, or it was just more efficient to do so.

themafia 3 days ago [-]

Yet their purity brought them no commercial benefit.

chadgpt3 2 days ago [-]

It turned out the die area saved by eliminating the complicated sequencer and microcode ROM enabled them to add another 16 datapath bits and make the first 32 bit microprocessor.

Dwedit 3 days ago [-]

ARM got all the commercial benefit once they switched from making chips to providing full designs ready to integrate into other chips.

themafia 3 days ago [-]

> to providing full designs ready to integrate

Yes, once the market came into existence, ARM was well situated to take advantage of it.

> all the commercial benefit

"All" is a tricky term to use here. They got some. An appreciable amount even. Their business model leaves quite a bit on the floor compared to desktop chips.

yukIttEft 3 days ago [-]

If you put this into an emulator, would it boot linux?

GloriousCow 3 days ago [-]

nand2mario has made a Verilog implementation from it. It currently runs DOOM, but some of the more fiddly protected-mode bits prevent it from running full operating systems (besides DOS). I'm sure the bugs will get ironed out eventually.

cobbzilla 3 days ago [-]

beautiful work! any plans for the 80387 coprocessor?

compliancedoc 3 days ago [-]

Great!

Rendered at 23:29:12 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.