Understanding the Go Compiler: From SSA to Machine Code

📚 Understanding the Go Compiler (7 of 7)
  1. 1. The Scanner
  2. 2. The Parser
  3. 3. The Type Checker
  4. 4. The Unified IR Format
  5. 5. The IR
  6. 6. The SSA Phase
  7. 7. From SSA to Machine Code You are here
From SSA to Machine Code

In the previous post , we explored how the compiler transforms IR into SSA—a representation where every variable is assigned exactly once. We saw how the compiler builds SSA using Values and Blocks, then runs 30+ optimization passes. We watched the lowering pass convert generic operations into architecture-specific instructions like AMD64ADDQ and ARM64ADD.

Now we’re at the final stretch. The compiler has optimized SSA with architecture-specific operations. All that’s left is to turn those operations into actual machine code bytes.

But here’s the thing: we don’t go directly from SSA to machine code bytes. There’s an intermediate step—a representation that sits between SSA and raw bytes. It’s called obj.Prog, and it’s essentially an abstract assembly language.

Let me show you the complete journey from SSA to the bytes that end up in your executable.

The Three-Phase Journey

The final code generation happens in three phases:

digraph CodeGen {
    rankdir=LR;
    node [shape=box, style=rounded];

    SSA [label="Optimized SSA\n(architecture-specific)"];
    Prog [label="obj.Prog\n(assembly instructions)"];
    Bytes [label="Machine Code\n(raw bytes)"];
    Object [label="Object File\n(.o file)"];

    SSA -> Prog [label="genssa()"];
    Prog -> Bytes [label="span6()"];
    Bytes -> Object [label="WriteObjFile()"];
}

Phase 1 (genssa): Convert SSA operations to obj.Prog structures—an assembly-like representation

Phase 2 (assembler): Encode obj.Prog instructions into actual machine code bytes

Phase 3 (object writer): Package those bytes into an object file for the linker

Let’s start with that mysterious intermediate representation.

What is obj.Prog?

The obj.Prog structure represents a single assembly instruction. Let me show you what this structure looks like—it’s defined in src/cmd/internal/obj/link.go:304 :

type Prog struct {
    Ctxt     *Link         // Linker context
    Link     *Prog         // Next instruction (linked list)
    From     Addr          // Source operand
    To       Addr          // Destination operand
    RestArgs []AddrPos     // Additional operands (for 3+ operand instructions)
    Pc       int64         // Program counter
    As       As            // Opcode (AMOVQ, AADDQ, etc.)
    Reg      int16         // Second source register
    // ... more fields
}

Think of this as portable assembly. The structure is architecture-independent—the same Prog type represents x86 instructions, ARM instructions, RISC-V instructions, everything. What changes is the opcode (the As field) and how the operands are interpreted.

Here’s a simple example. The SSA operation:

v1 = AMD64ADDQ <int> v2 v3 : AX

Becomes this obj.Prog:

&obj.Prog{
    As: x86.AADDQ,        // ADDQ instruction
    From: obj.Addr{       // Source operand
        Type: obj.TYPE_REG,
        Reg:  x86.REG_BX,
    },
    To: obj.Addr{         // Destination operand
        Type: obj.TYPE_REG,
        Reg:  x86.REG_AX,
    },
}

This represents the assembly instruction ADDQ BX, AX (add BX to AX).

But notice—this isn’t machine code yet. It’s still symbolic. The registers are named (BX, AX), not encoded as binary values. The opcode is named (AADDQ), not the actual opcode byte. This is an abstract representation that the assembler can work with.

Now let’s see how the compiler generates these structures from SSA.

Phase 1: SSA to obj.Prog (genssa)

The genssa function (src/cmd/compile/internal/ssagen/ssa.go:6603 ) takes those SSA Blocks and Values we built earlier and converts them into obj.Prog instructions.

Here’s the thing: SSA organizes code into Blocks with arrows showing how they connect (like “if true, go to Block A, otherwise go to Block B”). But your CPU doesn’t understand blocks—it just executes instructions one after another in a straight line. So genssa flattens everything. It walks through each Block, turns each Value into an instruction, then adds jump instructions at the end to get to the next Block.

One interesting thing is that every architecture has its own way of generating Progs from SSA Values. For example, you can see how AMD64 does it in src/cmd/compile/internal/amd64/ssa.go:202 —it’s a massive switch statement handling every AMD64-specific SSA operation. ARM has its own version, RISC-V has another, and so on.

Now, if you browse through that code, you’ll notice that most SSA values translate directly to a single Prog instruction—one SSA operation becomes one assembly instruction. That’s the common case, and it keeps things simple.

But not always. Some SSA operations are more complex and need multiple assembly instructions to implement. For example, look at ssa.OpAMD64AVGQU (src/cmd/compile/internal/amd64/ssa.go:466 )—it computes the average of two unsigned integers. That single SSA operation generates two assembly instructions: an AADDQ (add) followed by an ARCRQ (rotate right through carry, effectively a shift to divide by 2). The compiler breaks down that higher-level “average” operation into the primitive instructions the CPU actually has.

Let me walk you through a specific example to see how this works. Say we have this SSA Value after all the optimization passes:

v4 = AMD64ADDQ <int> v2 v3 : AX

This represents a 64-bit addition where:

  • v2 is in register AX
  • v3 is in register BX
  • The result v4 should go in register AX

Here’s roughly how the AMD64 code generator handles this case (simplified to show the key idea—check the actual implementation at the link above):

case ssa.OpAMD64ADDQ:
    r := v.Reg()           // Destination register (AX)
    r1 := v.Args[0].Reg()  // First operand register (AX)
    r2 := v.Args[1].Reg()  // Second operand register (BX)

    p := s.Prog(v.Op.Asm())  // Create ADDQ instruction
    p.From.Type = obj.TYPE_REG
    p.From.Reg = r2          // Source: BX
    p.To.Type = obj.TYPE_REG
    p.To.Reg = r             // Destination: AX

The result is an obj.Prog representing ADDQ BX, AX. On x86, ADDQ is a two-address instruction: ADDQ src, dst means dst = dst + src. So this adds BX to AX and stores the result in AX. Notice that v2 and v4 share the same register (AX)—that’s because on x86, one source operand must be the same as the destination.

But generating instructions for individual values is only half the story. We also need to handle how blocks connect to each other.

Generating Control Flow

Remember that genssa walks through Blocks sequentially. First it generates instructions for all the Values in the Block, then it generates the control flow instruction that ends the Block.

This is where things get interesting—the compiler needs to decide how to jump to the next block. The exact instruction depends on what kind of Block we’re finishing:

Return blocks are the simplest—just emit a RET instruction and you’re done. The function returns to its caller.

Conditional blocks are where it gets fun. Say you have a “less than” block—one that checks if x < 10 and branches accordingly. The compiler generates a jump instruction like JLT (jump if less than). If the condition is true, jump to the “true” block. Otherwise, fall through to the “false” block.

Here’s the clever part: at this stage, we don’t know where Block B will actually be in the final machine code. Think of it like writing “go to the kitchen” in directions—you’re using a name, not GPS coordinates. The compiler records “jump to Block B” using the block’s name. Later, once the assembler knows exactly where each block ends up (like “Block B starts at position 150”), it comes back and fills in the real location.

You can see how AMD64 handles this in ssaGenBlock (src/cmd/compile/internal/amd64/ssa.go:1462 )—it’s a switch statement that handles each block type (return, conditional branch, unconditional jump, etc.) and emits the right control flow instruction.

After genssa finishes, we have a complete sequence of obj.Prog instructions—still symbolic, still abstract, but ready to be turned into actual machine code bytes.

Phase 2: obj.Prog to Machine Code (Assembler)

The assembler’s job is to encode obj.Prog instructions into actual machine code bytes. This happens in the span6 function (src/cmd/internal/obj/x86/asm6.go:2057 ).

Here’s where it gets tricky: jump instructions come in different sizes. On x86, if you’re jumping to something nearby, you can use a short jump (2 bytes). But if you’re jumping far away, you need a long jump (6 bytes). The assembler has to figure out which size to use.

This creates a chicken-and-egg problem. To know if a jump should be short or long, you need to know how far away the target is. To know how far away the target is, you need to know the sizes of all the instructions in between. But some of those instructions might be jumps too, whose sizes depend on their targets!

You can’t calculate distances until you know sizes, but you can’t pick sizes until you know distances.

The solution? Try, check, and retry if needed.

Watch what the assembler does: it makes an initial pass trying to encode every branch as short. It walks through all the instructions, assigns byte offsets, and checks if any branches are too far for their encoding. If a branch doesn’t fit, it marks it for expansion and tries again. The next pass uses a longer encoding for that branch, shifts everything after it, and checks again.

This typically converges in just 1-2 iterations—most functions don’t have deeply nested or far-jumping branches.

With all the offsets figured out, the assembler can now turn symbolic instructions into actual bytes.

Encoding Instructions

Once the assembler knows where everything goes, it encodes each instruction into bytes. These bytes are written to a buffer that will eventually become the machine code section of the object file.

Let me show you what happens with our ADDQ BX, AX instruction—this is where the magic really happens.

The assembler looks up ADDQ in the opcode table (src/cmd/internal/obj/x86/asm6.go:921 ) and finds that register-to-register addition uses opcode 0x01. Then it encodes the operands—which registers are involved, what addressing mode to use—into additional bytes following x86’s encoding rules. For 64-bit operations, it also adds a REX prefix byte.

You can see how this encoding happens in the doasm function (src/cmd/internal/obj/x86/asm6.go:4249 ), which handles all the x86 encoding details.

The final result for ADDQ BX, AX is three bytes written to the buffer:

[0x48, 0x01, 0xD8]
 REX.W  ADDQ  ModR/M

Let me break down what each byte does. That first byte (0x48) is called a REX prefix—it’s x86’s way of saying “this is a 64-bit operation.” The second byte (0x01) is the actual ADDQ opcode. The third byte (0xD8) is called the ModR/M byte—it encodes which registers to use (BX as source, AX as destination) and what addressing mode we’re in.

x86 encoding has tons of variations for different situations—memory access, array indexing, different addressing modes, you name it. But the assembler handles all of this automatically by following the architecture’s encoding rules.

But what about addresses we can’t know yet?

Relocations

Here’s one more detail you should know about: what if an instruction references a global variable or calls a function in another file? The assembler doesn’t know where those symbols will end up—that’s the linker’s job.

So instead of encoding a real address, the assembler emits placeholder bytes (usually zeros) and creates a relocation entry. The relocation says: “When you link this code, patch these bytes with the address of symbol X.” The linker will read these relocations later and fill in the actual addresses once it’s combined all the object files together.

This happens for function calls too. When you call a function from another package, the assembler generates a CALL instruction with a placeholder address and a relocation pointing to that function’s symbol. The linker resolves it when building the final executable.

Now we have machine code bytes! But they’re still in memory. We need to write them to disk.

Phase 3: Writing Object Files

Now we’ve got machine code bytes sitting in a buffer. The object file writer’s job is to take all those bytes and package them into a .o file that the linker can read.

Before we can write those bytes to disk, let me show you what structure they go into.

The Object File Format

Go uses a custom object file format with several sections:

┌──────────────────────────────────┐
│ Header                           │
│  - Magic: "\x00go120ld"          │
│  - Fingerprint                   │
│  - Offsets to blocks             │
├──────────────────────────────────┤
│ String Table                     │
│  - All symbol names              │
│  - File names                    │
├──────────────────────────────────┤
│ Symbol Definitions               │
│  - Name, type, size, alignment   │
├──────────────────────────────────┤
│ Relocations                      │
│  - Offset, type, symbol, addend  │
├──────────────────────────────────┤
│ Data Block ← MACHINE CODE HERE   │
│  - [48 01 D8 C3] for add func    │
└──────────────────────────────────┘

Here’s what you’ll find inside. The header kicks things off with a magic number—that’s how tools know this is a Go object file—plus a fingerprint (basically a hash of the package’s exported API) and offsets that point to where each section lives.

Next comes the string table, which is a clever space-saving trick. Every text string—function names, variable names, file paths—gets stored here once. Instead of repeating “main.add” fifty times throughout the file, we store it once and other sections just reference it by index.

The symbol definitions are where you’ll find every function and variable in this file. Each entry has a name (pointing back to that string table), a type, a size, and whatever other metadata the linker needs.

Then there’s the relocations section—remember those placeholder bytes we talked about? This section contains all those relocation entries. Each one says “at byte offset X, patch in the address of symbol Y.”

And finally, the data block—this is where the magic happens! This is where those bytes the assembler generated end up. Each function’s machine code is written sequentially, along with any constant data.

The WriteObjFile function (src/cmd/internal/obj/objfile.go:32 ) takes care of generating all these sections based on the data we already have—the machine code bytes from the assembler, the symbols we’ve defined, the relocations we’ve collected. It just packages everything up into this structured format that the linker knows how to read.

But there’s one more piece to the puzzle—how does this object file get stored for the build cache?

The Archive Format

Remember in the Unified IR article when we explored archive files? Each compiled package creates an archive with two files inside: __.PKGDEF and _go_.o. Back then we focused on __.PKGDEF—the Unified IR representation with all the export data. But we saw that second file, _go_.o, and didn’t really dig into it.

Well, that’s what we’ve been generating throughout this entire article.

Once the assembler finishes its job and we have the complete object file, the compiler needs to cache that work for future builds. This is where the archive file comes in. Creating the archive isn’t part of the machine code generation itself—it’s what happens afterward to store the results.

So what goes into each file? The __.PKGDEF side has the Unified IR representation—type information, function signatures, constants, and those inlinable function bodies we talked about. Other packages read this when they import your code. The compiler writes it using dumpCompilerObj() (src/cmd/compile/internal/gc/obj.go:113-116 ).

The _go_.o side is what we’ve been building in this article—the actual machine code bytes from the assembler, the symbol table, and all those relocations for the linker. This is the result of our three-phase journey: SSA to obj.Prog, obj.Prog to bytes, bytes packaged into an object file. The compiler writes this using dumpLinkerObj() (src/cmd/compile/internal/gc/obj.go:133-149 ), which calls the WriteObjFile() function we discussed earlier.

When you run go build, the compiler creates these archives and stashes them in the build cache. Later, when type-checking imports, the compiler reads the __.PKGDEF side. When linking the final executable, the linker reads the _go_.o side. Two files, two different jobs, one convenient package.

Now that we understand how it all works, let’s see it in action.

Try It Yourself

Want to see this whole process in action? Let’s generate some actual machine code and peek at those bytes!

First, create a file add.go:

package main

func add(x, y int) int {
    return x + y
}

Now you can use go tool compile with the -S flag:

go tool compile -S add.go

This shows the assembly code generated for each function. Look for the ADDQ BX, AX and RET instructions—those are the obj.Prog instructions we talked about. At the bottom of the function listing, you’ll see the actual machine code bytes:

0x0000 00000 (/path/to/add.go:4)    ADDQ    BX, AX
0x0003 00003 (/path/to/add.go:4)    RET
0x0000 48 01 d8 c3                  H...

Those 48 01 d8 c3 bytes are the same ones we walked through encoding earlier in the article!

For a clearer view, use objdump on the object file:

go tool compile -o add.o add.go
go tool objdump add.o

This gives you a more readable output with the machine code bytes right next to each instruction:

TEXT main.add(SB) /path/to/add.go
  add.go:4        0x6a0            4801d8            ADDQ BX, AX
  add.go:4        0x6a3            c3                RET

Here you can clearly see the 48 01 d8 bytes for the ADDQ instruction and c3 for the RET.

Now let’s recap what we’ve learned about this final stage of compilation.

Summary

We’ve just watched optimized SSA transform into actual machine code bytes!

Three phases make it happen. Phase 1: SSA operations become obj.Prog structures—portable assembly with symbolic names. Phase 2: The assembler solves a chicken-and-egg problem (need sizes to calculate offsets, need offsets to pick sizes) by trying and retrying until everything fits. Then it encodes instructions to bytes—ADDQ BX, AX becomes 48 01 d8. Phase 3: Everything gets packaged into an object file with all the necessary metadata and relocations.

After these three phases, the object file gets bundled into an archive alongside __.PKGDEF and stored in the build cache.

The elegant part? Layered abstractions all the way down. SSA for analysis, obj.Prog for portable code generation, raw bytes for the CPU. Each layer serves its purpose.

But wait—we have machine code bytes sitting in object files. How do they become an executable program you can actually run? That’s the linker’s job, and it’s the final piece of the puzzle. In the next post, we’ll explore how the linker stitches all these object files together, resolves those relocations we talked about, and produces your final executable.