In the previous post , we explored the IR—the compiler’s working format where devirtualization, inlining, and escape analysis happen. The IR optimizes your code at a high level, making smart decisions about which functions to inline and where values should live—on the heap or stack.
But the IR still looks a lot like your source code. It has variables that can be assigned multiple times, complex control flow with loops and conditionals, and operations that map closely to Go syntax.
Now we’re entering a different world. The compiler takes that optimized IR and transforms it into Static Single Assignment form—SSA. This is where your code gets restructured into a format that’s perfect for aggressive optimization. The SSA phase is where some of the compiler’s most sophisticated transformations happen.
Let me show you what makes SSA special and why the compiler needs it.
What is SSA?
Static Single Assignment (SSA) is an intermediate representation used by most modern compilers—GCC, LLVM, V8, HotSpot, and of course, the Go compiler. It’s not specific to Go; it’s a fundamental technique in compiler design that’s been around since the late 1980s.
The core idea: every variable is assigned exactly once. Once you write to a variable, that’s it—no reassignments, no modifications.
This constraint might sound limiting, but it’s incredibly powerful for optimization. Let me show you why with a simple example:
x := 5
x = x + 10
x = x * 2
y := x
The variable x gets assigned three times. When the compiler reads y := x, it has to ask: “Which x? The first one? The second? The third?” It has to trace back through all the assignments to figure out what value x actually holds at that point.
Now here’s the same logic in SSA form:
x1 := 5
x2 := x1 + 10
x3 := x2 * 2
y := x3
Every assignment creates a new variable. Now when the compiler reads y := x3, there’s no ambiguity—x3 is defined exactly once, and we know exactly what value it holds. The compiler doesn’t have to trace anything. Just look at the definition.
This makes optimization analysis much simpler. Want to know if a variable is used? Just check if anything references it. Want to know where a value comes from? Look at its single definition. Want to eliminate dead code? If a variable is never used, delete its definition—done.
But SSA introduces a challenge with branching code.
The PHI Node Problem
What happens when you have an if/else statement where each branch assigns a different value to the same variable? Look at this:
var x int
if condition {
x = 10 // One possibility
} else {
x = 20 // Another possibility
}
// Both branches come back together here
y := x // But which value does x have?
Either branch could have executed. After the if/else, execution continues, but x could be 10 or 20 depending on which path was taken. In SSA form, we need to represent this explicitly. That’s what PHI nodes do:
var x1 int
var x2 int
if condition {
x1 = 10 // True branch
} else {
x2 = 20 // False branch
}
// Both branches come back together
x3 = φ(x1, x2) // PHI: "x3 is x1 if we took the true branch, x2 if we took the false branch"
y := x3
The φ (PHI) function creates a new variable x3 that represents “whichever value was assigned.” It says: “x3’s value depends on which branch we took. If the condition was true, x3 gets the value from x1. If the condition was false, x3 gets the value from x2.”
PHI nodes are how SSA handles branching code. They’re the glue that holds everything together when different execution paths come back together.
Now let’s see how the Go compiler actually generates SSA from the IR. But first, we need to understand the building blocks that SSA is made of.
The Building Blocks: Values and Blocks
Before we dive into how the compiler generates SSA, we need to understand what SSA actually looks like in the Go compiler. It’s built from two fundamental structures: Values and Blocks.
Values
Think of a Value as a single computation or operation. When you write a + b in your code, the compiler represents that addition as a Value in SSA.
Each Value gets a unique identifier—you’ll see these written as v1, v2, v3, and so on. That ID is what makes the “single assignment” work. The Value v3 is defined exactly once, and every time you see v3 anywhere in the function, you know it refers to the same computation.
Each Value has an operation (like Add64 or Load), a type, and references to its input Values. You can see the full structure in the compiler source (src/cmd/compile/internal/ssa/value.go:20
).
When you see SSA like this:
v3 = Add64 <int> v1 v2
You’re looking at a Value with ID v3, operation Add64, type int, and two arguments: Values v1 and v2. It’s saying “compute the 64-bit addition of v1 and v2, and call the result v3.”
Blocks
Now, Values don’t just float around randomly—they’re organized into Blocks. A Block is a sequence of Values that execute straight through, one after another, with no branching in the middle. Once you enter a Block, you execute every Value in it from top to bottom until you reach the end.
Think of your code’s if statements, loops, and function calls. These create different execution paths, right? Blocks are how SSA represents those paths. Each possible execution path through your function becomes a sequence of Blocks connected together.
Like Values, Blocks get unique IDs: b1, b2, b3, and so on. Each Block contains a list of Values to execute, a kind that determines what happens at the end (fall through, branch, return), and edges to successor and predecessor Blocks. You can see the full structure in the compiler source (src/cmd/compile/internal/ssa/block.go:13
).
Let me show you a concrete example. Take this simple if/else:
if x > 10 {
y = 20
} else {
y = 30
}
use(y)
The compiler breaks this into multiple Blocks:
b1: // Entry block
v5 = Arg <int> {x} // Function argument x
v7 = Const64 <int> [10]
v8 = Less64 <bool> v7 v5 // Check if 10 < x (i.e., x > 10)
v9 = Const64 <int> [20]
v10 = Const64 <int> [30]
If v8 → b3, b4 // Branch: b3 for true, b4 for false
b3: // True branch (when x > 10)
Plain → b2 // Continue to merge point
b4: // False branch (when x <= 10)
Plain → b2 // Continue to merge point
b2: // Both paths come back together
v11 = Phi <int> v9 v10 // y is either v9 or v10 depending on path
v13 = Call {use} v11 // Call to use()
Plain → b5 // Continue to final block
Block b1 evaluates the condition (x > 10) and creates several Values including the comparison. Then it branches—if the condition is true, jump to b3. If false, jump to b4.
Blocks b3 and b4 are the two branches. Notice they don’t create Values for 20 or 30—those constants (v9 and v10) were already created in b1. The branches just fall through to b2.
Block b2 is where both paths come back together. It starts with a PHI Value (v11) that represents “whichever value was assigned to y”—either v9 (20) from the true branch or v10 (30) from the false branch. Then it calls use(y) and continues.
Now that we understand what SSA looks like, let’s see how the compiler actually builds these Values and Blocks from the IR.
IR to SSA Generation
The transformation from IR to SSA happens in the buildssa function (src/cmd/compile/internal/ssagen/ssa.go:312
)—this is Phase 1 of SSA processing.
The buildssa function works per function. It takes a single IR function (*ir.Func) as input and produces a single SSA function (*ssa.Func) as output. The compiler calls buildssa once for each function in your program, converting them one at a time from IR to SSA.
Here’s what needs to happen for each function:
- Create the initial Blocks for the function’s control flow graph
- Convert IR nodes into SSA Values organized into those Blocks
- Insert PHI nodes (special Values) at control flow merge points
- Resolve all variable references to their defining Values
Let’s walk through each step.
Step 1: Setting Up the SSA Function
The first step is simple: buildssa (src/cmd/compile/internal/ssagen/ssa.go:312
) creates an empty SSA function structure to hold all the Blocks and Values we’re about to generate.
Think of it like setting up a blank canvas. The compiler creates the SSA function object, then creates the very first Block—the entry Block. This is where execution begins when your function is called. Every function needs one.
With the structure in place, the compiler is ready to start converting your IR code into SSA.
Step 2: Converting IR to SSA
Now the compiler walks through your function’s IR code and converts it into SSA—generating Values and Blocks as it goes. Different kinds of statements become different SSA structures.
As the compiler builds SSA, it keeps track of variables. For each variable, it remembers which Value currently defines it. So when your code uses variable x, the compiler knows which Value x refers to.
But there’s a problem: what happens when x could come from different paths? At control flow merge points, x might have different Values depending on which branch was taken. The compiler handles this by creating a FwdRef (forward reference)—a placeholder that says “we’ll figure this out later.” These placeholders get resolved to PHI nodes in Step 3.
Let’s see how this works with some examples.
Assignments
Let’s start simple. When you write:
y := 5
x := y + 10
The compiler breaks this down into individual operations. The constant 5 becomes one Value. The constant 10 becomes another Value. Then the addition y + 10 becomes a third Value that uses the first two as inputs.
Here’s the resulting SSA:
v6 = Const64 <int> [5] // y := 5
v7 = Const64 <int> [10] // constant 10
v8 = Add64 <int> v6 v7 // x := y + 10
The compiler keeps track that variable x is now defined by Value v8. Whenever you use x later in the function, the compiler knows exactly which Value it refers to—no ambiguity.
Control Flow
Now let’s look at something more interesting—branching. Remember the if/else example we saw when introducing Blocks? Let’s see how it’s actually generated during IR to SSA conversion:
if x > 10 {
y = 20
} else {
y = 30
}
use(y)
Here’s the resulting SSA (before PHI insertion):
b1: // Entry block
v5 = Arg <int> {x} // Function argument x
v7 = Const64 <int> [10]
v8 = Less64 <bool> v7 v5 // Check if 10 < x (i.e., x > 10)
v9 = Const64 <int> [20]
v10 = Const64 <int> [30]
If v8 → b3, b4 // Branch: b3 for true, b4 for false
b3: // True branch (when x > 10)
Plain → b2 // Continue to merge point
b4: // False branch (when x <= 10)
Plain → b2 // Continue to merge point
b2: // Both paths come back together
v11 = FwdRef <int> {{[] y}} // Placeholder: "we need y, figure it out later"
v13 = Call {use} v11 // Call to use()
Plain → b5 // Continue to final block
Block b1 evaluates the condition and creates the constants for both branches (v9 for 20, v10 for 30), then branches based on the comparison. Block b3 handles the true case (when x > 10), Block b4 handles the false case (when x <= 10). Both branches then continue to Block b2 where the paths merge.
Notice that y gets assigned different Values depending on which path we took—v9 (20) if true, v10 (30) if false. When the compiler processes Block b2 (the merge point), it doesn’t yet know which Value y has, so it creates a FwdRef placeholder (v11).
This brings us to the final step: resolving these FwdRef placeholders.
Step 3: PHI Node Insertion
Remember those FwdRef placeholders we created at merge points? Now it’s time to figure out what they should actually be. The compiler uses the insertPhis function (src/cmd/compile/internal/ssagen/phi.go:42
) to go through each FwdRef and resolve it.
Here’s how it works: for each FwdRef, the compiler looks at all the paths that could reach that point. It checks the variable tracking map from Step 2 to see which Value defines the variable in each incoming Block.
Sometimes all the paths have the same Value. Easy—the FwdRef just becomes a copy of that Value.
But when different paths have different Values, the compiler creates a PHI node. The PHI node says “this variable could be Value A or Value B, depending on which path we took.” It lists all the possible Values from the incoming paths.
Let me show you what this looks like with our Control Flow example from Step 2. After the initial conversion, Block b2 (the merge point) had this FwdRef:
b2: // Both paths come back together
v11 = FwdRef <int> {{[] y}} // Placeholder: "we need y, figure it out later"
v13 = Call {use} v11 // Call to use()
The insertPhis function looks at this FwdRef and checks the variable tracking map to see where y could come from. Block b2 has two predecessors: b3 and b4. Looking at the tracking information:
- In Block
b3, variableyis defined byv9(20) - In Block
b4, variableyis defined byv10(30)
Different Values! So the FwdRef gets replaced with a PHI node that merges both:
b2: // Both paths come back together
v11 = Phi <int> v9 v10 // Resolved: y is v9 or v10 depending on path
v13 = Call {use} v11 // Call to use()
Perfect. Now use(y) has a clear Value to work with: v11, which represents “whichever value y got” from the branches.
And that’s it! With all the FwdRefs resolved and PHI nodes inserted where needed, we have valid SSA. The compiler can now move on to optimization.
The SSA Optimization Pipeline
Once we have valid SSA, the real magic happens. The compiler runs your code through a series of transformations (src/cmd/compile/internal/ssa/compile.go:30
)—over 30 different optimization passes that each improve the code in some way.
The goal is to transform the SSA until every Value maps to an actual machine instruction, and the Blocks are in the right order for code generation. Let’s see one example optimization to understand how this works.
Eliminating Duplicate Computations
One of the first things the compiler does is look for duplicate work. If you compute the same thing twice, why keep both? This is called Common Subexpression Elimination (CSE).
Say you have this SSA:
v1 = Const64 <int> [5]
v2 = Const64 <int> [10]
v3 = Add64 v1 v2 // 5 + 10
v4 = Add64 v1 v2 // Same computation!
v5 = Mul64 v3 v4
CSE notices that v4 is computing the exact same thing as v3—adding v1 and v2. So it keeps the first one and reuses it:
v1 = Const64 <int> [5]
v2 = Const64 <int> [10]
v3 = Add64 v1 v2
v5 = Mul64 v3 v3 // Just use v3 twice
Now v4 is gone entirely. Less work, same result.
CSE is just one of many optimizations that run on the generic SSA. There are passes that eliminate dead code, simplify control flow, remove redundant checks, and much more. But eventually, all this generic SSA needs to become actual machine code.
Lowering to Machine Operations
Here’s a critical transformation in the pipeline: this is where your program becomes architecture-specific.
Up until this point—through parsing, type checking, IR generation, SSA conversion, and all the optimizations we’ve seen—everything has been completely architecture-agnostic. The same SSA code could compile for Intel, ARM, RISC-V, or any other platform Go supports.
The lower pass is where the compiler commits to a specific architecture and transforms generic SSA operations into architecture-specific machine instructions. It’s just another transformation pass in the SSA pipeline, but it’s a crucial one. After lowering, many more passes run—more optimizations, register allocation, instruction scheduling, and others—all working on the now architecture-specific SSA.
Let’s see what one of these lowering transformations looks like. A generic addition:
v1 = Add64 <int> v2 v3
Becomes an AMD64-specific instruction:
v1 = AMD64ADDQ <int> v2 v3
Now each Value maps directly to an actual machine instruction that your CPU knows how to execute. AMD64ADDQ is the 64-bit add instruction on Intel/AMD processors.
If you were compiling for ARM instead, the same generic Add64 would become a completely different ARM instruction. This is the power of SSA—the compiler can keep your code in a generic form for as long as possible, applying optimizations that work on any architecture, then specialize it at the very end.
Putting It All Together
The compiler runs over 30 optimization passes throughout the SSA pipeline. Some run before lowering—like CSE, dead code elimination, and control flow simplification—working on generic, architecture-independent SSA. Then comes the lowering pass, converting to architecture-specific operations. After that, more passes run—register allocation, instruction scheduling, and architecture-specific optimizations—working on the now-specialized SSA.
Each pass makes the code a little simpler, a little more efficient. And here’s the key: these passes create opportunities for each other. When CSE eliminates a duplicate computation, it might make some Values unused. Other passes can then remove that dead code, which simplifies the control flow, which opens up new optimization opportunities. That’s why the compiler runs multiple passes—each round of optimization makes the next round more effective.
By the time all these passes complete, your SSA has been transformed into tight, efficient, architecture-specific code ready for assembly generation.
Try It Yourself
Want to see SSA for your own code? The Go compiler can dump SSA at various stages with the GOSSAFUNC environment variable:
GOSSAFUNC=max go build max.go
This creates an HTML file showing the SSA at every pass. You can watch your function transform step by step, seeing exactly what each optimization does.
Try it with different functions—loops, nested conditions, method calls. Watch how PHI Values appear at Block merge points, how CSE eliminates duplicate Values, how bounds checks disappear.
If you want to work with SSA programmatically in your own tools, check out the golang.org/x/tools/go/ssa
package—it provides a userspace SSA representation you can use directly in your Go programs for analysis and tooling.
Now that we’ve explored the SSA phase from building blocks to optimization pipeline, let’s recap what we’ve learned.
Summary
The SSA phase transforms your code into a form that’s perfect for optimization. Here’s how it works:
The compiler converts IR into SSA—Blocks (execution paths) containing Values (operations with unique IDs)—in three steps: setting up the function structure, converting statements into Values and Blocks (using FwdRef placeholders at merge points), and resolving those placeholders into PHI nodes.
Then it runs 30+ optimization passes. Some work on generic SSA before lowering (like CSE). Lowering converts everything to architecture-specific instructions. Then more passes optimize the specialized code (register allocation, instruction scheduling). Each pass creates opportunities for the next one.
By the end, your code has been transformed into tight, efficient, architecture-specific SSA ready for assembly generation.
If you want to dive deeper, explore the actual SSA code in src/cmd/compile/internal/ssa/
. The rewrite rules (used by the lower and optimization passes) are particularly interesting—they’re written in a domain-specific language in src/cmd/compile/internal/ssa/_gen/
and automatically compiled into Go code.
In the next post, I’ll cover assembly generation and machine code encoding—how the compiler takes this optimized SSA and turns it into the actual bytes that run on your CPU.
