In the previous posts, we’ve explored how the Go compiler processes your code: the scanner breaks it into tokens, the parser builds an Abstract Syntax Tree, the type checker validates everything, and the Unified IR format serializes the type-checked AST into a compact binary representation.
Now we’re at a critical transformation point. The compiler takes that Unified IR—whether it was just serialized from your code or loaded from a cached archive file—and deserializes it directly into IR nodes. This is where your source code truly becomes the compiler’s working format.
The IR isn’t just another representation of your code—it’s a representation optimized for what comes next: analysis and transformation. The compiler needs to answer questions that are hard to answer from the AST:
- Can this allocation stay on the stack, or must it go to the heap?
- Can this interface call be replaced with a direct call?
- Can this function be inlined?
- What variables are never used?
The AST is great at representing what you wrote—files, declarations, statements mirroring your source structure. But the compiler needs something different: code organized by compilation units, with explicit operation types, embedded type information, and all implicit operations made visible.
That’s the Intermediate Representation, or IR.
The Unified IR format we covered in the previous post is the serialized form—the compact binary encoding that goes into archive files. What we’re covering now is the in-memory form—the actual data structures the compiler manipulates during optimization and code generation. The deserialization process in src/cmd/compile/internal/noder/reader.go transforms the binary format directly into these IR nodes, giving the compiler the working format it needs.
Let’s see what these IR nodes actually look like and how they differ from what came before.
What is the IR?
The IR organizes code by package, not by file. Files are an artifact of how you organized your source—the compiler cares about packages as compilation units. Here’s what the package structure looks like (from src/cmd/compile/internal/ir/package.go:9-42):
type Package struct {
Imports []*types.Pkg // Imported packages
Inits []*Func // Package init functions
Funcs []*Func // Top-level functions
Externs []*Name // Package-level declarations
AsmHdrDecls []*Name // Assembly declarations
CgoPragmas [][]string // Cgo directives
Embeds []*Name // Variables with //go:embed
PluginExports []*Name // Exported plugin symbols
}
This structure tells us what the compiler actually cares about at the package level: what other packages does this package depend on (Imports)? What initialization functions need to run before anything else (Inits)? What are all the functions (Funcs) and package-level declarations (Externs)? Everything else—the actual implementation code—lives inside these top-level structures. Function bodies, for example, are stored inside the Func nodes.
Inside those function bodies, the IR represents every operation as a node. Each node has an operation code (or “op”) that identifies what kind of operation it represents. There are about 150 different operation types:
OADD- addition operationOIF- if statementOCALL- function callOCONVIFACE- conversion to interfaceOLITERAL- literal value- And many more…
These operation codes are the building blocks of the IR. Every expression, every statement, every operation in your Go code gets represented as a node with a specific op code. The nodes form a tree structure, just like the AST, but with key differences that make them optimized for the compiler’s needs.
Node Structure: How Nodes Are Built
Every IR node starts with the same foundation—a miniNode (src/cmd/compile/internal/ir/mini.go:16-87):
type miniNode struct {
pos src.XPos // source position
op Op // operation type (OADD, OIF, OCALL, etc.)
bits bitset8 // flags (typecheck status, walked)
esc uint16 // escape analysis result
}
Four essential pieces: where this came from (pos), what operation it is (op), some status flags (bits), and the escape analysis result (esc).
From here, nodes split into two categories. Expressions produce values, so they add type information:
type miniExpr struct {
miniNode // embedded base
typ *types.Type // type information
init Nodes // initialization statements
flags bitset8 // expression-specific flags
}
Statements perform actions without producing values, so they skip the type field:
type miniStmt struct {
miniNode // embedded base
init Nodes // initialization statements
}
Then specific operations build on these. Binary operations like x + y? That’s a BinaryExpr:
type BinaryExpr struct {
miniExpr // embedded expression fields
X Node // left operand
Y Node // right operand
}
If statements? That’s IfStmt:
type IfStmt struct {
miniStmt // embedded statement fields
Cond Node // condition expression
Body Nodes // statements when true
Else Nodes // statements when false
}
This layered design keeps it efficient—common fields live at the base, specific nodes only add what they need. Now let’s see what the compiler actually does with these nodes.
The Optimization Pipeline
Once the IR is built, the compiler runs several optimizations at the IR level—before converting to SSA. The key optimizations are:
- Devirtualization: Converting indirect calls (interface methods) into direct calls when the concrete type is known
- Inlining: Replacing function calls with the function’s body
- Escape Analysis: Determining whether values can stay on the stack or must escape to the heap
- Dead Locals Elimination: Removing assignments to unused local variables
The most interesting aspect is that devirtualization and inlining run interleaved—they loop together until no more optimizations are possible. Devirtualization enables inlining, and inlining can expose more devirtualization opportunities.
Let’s start with devirtualization.
Devirtualization: Making the Indirect Direct
Devirtualization is the process of converting indirect calls (interface method calls) into direct calls when the concrete type is known.
Why does this matter? Three reasons:
- Inlining: Direct calls can be inlined; indirect calls cannot
- Better optimizations: Direct calls have known side effects
- Reduced overhead: No virtual dispatch or interface lookup
Let’s see this in action.
Static Devirtualization
Consider this code:
type Processor interface {
Process(x int) int
}
type SimpleProcessor struct{}
func (s SimpleProcessor) Process(x int) int {
return x * 2
}
func compute(p Processor, x int) int {
return p.Process(x) // Interface call
}
The call p.Process(x) goes through the interface dispatch mechanism—your program looks up the method in the interface’s method table at runtime. This is slower and cannot be inlined.
But the compiler can often prove what the concrete type is. When you call compute like this:
func main() {
proc := SimpleProcessor{}
result := compute(proc, 42) // Concrete type visible!
}
The compiler traces back through the conversions and sees: “Ah, p is actually a SimpleProcessor!” It then devirtualizes the call:
func compute(p Processor, x int) int {
// Devirtualized: type assertion inserted (no runtime check)
return (SimpleProcessor(p)).Process(x) // Direct call!
}
This transformation happens in src/cmd/compile/internal/devirtualize/devirtualize.go. The algorithm is straightforward:
- Find the interface call (
OCALLINTERoperation) - Trace back to where the interface value was set
- Extract the concrete type from that assignment
- Replace the interface call with a direct method call
The OCALLINTER (interface call) becomes OCALLMETH (method call)—a direct call that can now be inlined.
But the compiler has another devirtualization trick up its sleeve.
Profile-Guided Devirtualization
Static devirtualization only works when the compiler can prove the concrete type. But what if the type varies at runtime?
That’s where Profile-Guided Optimization (PGO) comes in. With PGO, the compiler uses runtime profile data to identify hot call sites and inserts conditional devirtualization:
// Original:
func process(i Processor) {
i.Process() // Hot call site, profile shows 95% SimpleProcessor
}
// After PGO devirtualization:
func process(i Processor) {
if concrete, ok := i.(SimpleProcessor); ok {
concrete.Process() // Direct call - fast path!
} else {
i.Process() // Fallback - slow path
}
}
The fast path is a direct call that can be inlined. The slow path handles the other 5% of cases. The result? Massive speedups on hot paths.
This happens in src/cmd/compile/internal/devirtualize/pgo.go. The algorithm:
- Query the profile for this call site
- Find the hottest callee (highest execution count)
- Generate a type assertion and conditional branch
- Inline the fast path if possible
The key insight: PGO devirtualization is only applied when the fast path can be inlined. Otherwise, the overhead of the conditional outweighs the benefit.
We’ve mentioned inlining several times now. Let’s see how it actually works.
Function Inlining: Eliminating Calls
Function inlining replaces a function call with the function’s body. This is a very powerful compiler optimization:
// Before:
func add(x, y int) int {
return x + y
}
func compute() int {
return add(5, 10)
}
// After inlining:
func compute() int {
return 5 + 10 // Function body copied, call eliminated
}
Eliminating the call is just the beginning. Inlining exposes the function body to the caller’s context, enabling:
- Constant propagation: The compiler sees
5 + 10and folds it to15 - Unused variable removal: Parameters and locals that aren’t needed become obvious
- More devirtualization: Concrete types become visible
- Stack allocation: Variables that would escape can now stay on the stack
But inlining has a cost: it increases code size. Inline too aggressively, and your binary bloats. The compiler needs a strategy to decide what to inline.
The Cost Model
Go uses a node-based budget system. Each function gets a “budget” of nodes it can contain and still be inlinable:
| Budget Type | Value | When Applied |
|---|---|---|
| Default | 80 nodes | Most functions |
| Closure called once | 800 nodes | Single-use closures (10×!) |
| PGO hot function | 2000 nodes | Profile-identified hot functions (25×!) |
The compiler walks the function’s IR tree and counts nodes. Most operations cost 1 node. Some are more expensive:
- Function calls: 57 nodes (expensive!)
- Interface calls: 57 nodes
- Calls to known inlinable functions: Use that function’s actual cost
- Panic calls: Nearly free (1 node)
Here’s the clever part: if a function calls another inlinable function, the compiler uses the callee’s cost instead of the generic call cost. This rewards building programs from small, composable functions.
So how does inlining actually happen? It’s a two-phase process.
The Inlining Process
First, the compiler figures out what’s inlinable:
Phase 1: Analysis (CanInline)
The compiler determines if each function is inlinable:
func CanInline(fn *ir.Func, profile *pgoir.Profile) {
// Check hard constraints (go:noinline pragma, etc.)
if reason := InlineImpossible(fn); reason != "" {
return
}
// Calculate cost by walking the IR tree
visitor := hairyVisitor{
budget: inlineBudget(fn, profile),
}
visitor.visitList(fn.Body)
// Store result
if visitor.budget >= 0 {
fn.Inl = &ir.Inline{
Cost: initialBudget - visitor.budget,
}
}
}
The “hairiness visitor” walks the IR tree, decrementing the budget for each node. If the budget hits zero, the function is “too hairy” (too complex) to inline.
Once the compiler knows what can be inlined, it’s time to actually do it.
Phase 2: Transformation (TryInlineCall)
At call sites, the compiler decides whether to inline:
func mkinlcall(call *ir.CallExpr, fn *ir.Func) ir.Node {
// Copy function body
body := ir.DeepCopy(fn.Inl.Body, inlvars)
// Create InlinedCallExpr
res := ir.NewInlinedCallExpr(...)
res.Body = body
return res
}
The function body is copied into the caller, with parameters replaced by arguments. The call to add(5, 10) gets replaced with an InlinedCallExpr node that contains the function’s body—with x and y bound to 5 and 10. The call disappears, and the function’s statements become part of the caller.
But the budget system is just the beginning. The compiler has gotten much smarter about when to inline.
Advanced Heuristics
The compiler uses sophisticated heuristics for smarter inlining decisions (implemented in src/cmd/compile/internal/inline/inlheur/scoring.go). Beyond just the function’s size, it looks at the context of each call site and adjusts the score up or down based on what optimizations inlining might unlock.
Some adjustments discourage inlining by increasing the score. If a call is on a panic path—code that unconditionally leads to panic or exit—the compiler adds 40 points to discourage inlining. Why? Panic paths rarely execute, so inlining them wastes code space. Similarly, calls in init() functions get penalized because init runs once at startup; inlining provides minimal benefit.
Other adjustments encourage inlining by decreasing the score. Calls inside loops get a small boost (-5) because they execute repeatedly—inlining saves call overhead many times over.
The most interesting adjustments involve parameters and return values. The compiler analyzes what you’re passing to the function and what it returns, looking for optimization opportunities. Passing a concrete type that gets converted to an interface for a method call? That’s -30 points—inlining enables devirtualization. Passing a constant that feeds into an if statement? The compiler knows inlining will enable branch elimination.
There are about 14 different adjustments organized into three categories: context-based (where the call is), parameter-based (what you’re passing), and return-value-based (what the caller does with the result). The compiler doesn’t just look at function size—it predicts the optimization opportunities that inlining will unlock.
There’s also a safety mechanism for big functions. When a function grows beyond 5000 nodes, the compiler considers it “big” and becomes much more conservative about inlining into it. In these large functions, the compiler will only inline functions that cost 20 nodes or less. This prevents already-large functions from exploding in size and keeps compilation times reasonable.
Now here’s where devirtualization and inlining really work their magic together.
The Interleaved Strategy
Devirtualization and inlining run together in a loop until no more optimizations are possible:
digraph InterleaveStrategy {
rankdir=LR;
node [shape=box, style=rounded];
Devirt [label="Devirtualization"];
Inline [label="Inlining"];
Check [label="More changes?", shape=diamond];
Done [label="Escape Analysis"];
Devirt -> Inline;
Inline -> Check;
Check -> Devirt [label="Yes"];
Check -> Done [label="No"];
}
The loop is simple: devirtualization converts indirect calls to direct ones, inlining exposes those function bodies and reveals more concrete types, then the compiler checks if any changes were made. If yes, it runs another round—the newly exposed types might enable more devirtualization. If no changes happened, we’ve reached a fixed point and move on to escape analysis.
Let me show you why this matters with a concrete example:
type Processor interface {
Process(int) int
}
func helper(p Processor, x int) int {
return p.Process(x) // Interface call
}
func compute(p Processor, x int) int {
return helper(p, x) // Call to helper
}
func main() {
proc := ConcreteProcessor{}
result := compute(proc, 42)
}
Iteration 1:
Initial state in main:
result := compute(proc, 42)
Process compute(proc, 42):
- Devirtualize: Not applicable (direct call)
- Inline: ✓ Inline
compute→ exposeshelper(p, x)
After Iteration 1:
result := helper(proc, 42) // New call discovered!
Iteration 2:
Process helper(proc, 42):
- Devirtualize: Not applicable (direct call)
- Inline: ✓ Inline
helper→ exposesp.Process(x)
After Iteration 2:
var p Processor = proc // OCONVIFACE
result := p.Process(42) // OCALLINTER - New interface call!
Iteration 3:
Process p.Process(42):
- Devirtualize: ✓
devirtualize.StaticCallseesOCONVIFACE→ changesOCALLINTERtoOCALLMETH - Inline: If
Processis small enough, inline it too!
After Iteration 3:
result := ConcreteProcessor.Process(proc, 42) // Direct call
// OR if Process inlines:
result := 42 * 2 // Inlined body of Process
Iteration 4:
- No changes found
- Loop exits (fixed point reached)
Without interleaving, we wouldn’t be able to inline the devirtualized method call. With interleaving, we eliminated three function calls by repeatedly inlining and devirtualizing until no more opportunities remained.
This interleaved strategy is implemented in src/cmd/compile/internal/inline/interleaved/interleaved.go. It’s the heart of Go’s IR optimizer.
With calls optimized and function bodies exposed, the compiler now has a crucial decision to make about every variable: where should it live?
Escape Analysis: Heap or Stack?
After devirtualization and inlining, the compiler runs escape analysis. This determines whether variables can be stack-allocated or must be heap-allocated.
The stack is a region of memory that grows and shrinks with function calls. Each function gets a “stack frame” containing its local variables. When the function returns, that frame is popped off—instantly freeing all its memory. Stack allocation is just bumping a pointer; it’s incredibly fast and requires no cleanup.
The heap is a region of memory for data that needs to live beyond a single function call. Allocating on the heap is slower—it involves finding free space and requires the GC to clean it up later. But the heap is necessary when data needs to outlive the function that created it.
Let’s see this difference in action:
func foo() *int {
x := 42 // Escapes: address returned
return &x // Must be on heap!
}
func bar() int {
x := 42 // Doesn't escape
return x // Can stay on stack
}
Look at bar first. The variable x is created, used, and then its value (42) is returned. Once bar finishes, x is gone—nobody needs it anymore. The stack frame disappears, and that’s fine. Stack allocation is perfect here: fast and automatically cleaned up.
Now foo. Here we’re returning &x—a pointer to x. The caller will use this pointer to access x after foo returns. But here’s the problem: if x lived on the stack, that stack frame would be destroyed when foo returns. The pointer would point to memory that’s been reclaimed, potentially overwritten by the next function call. That’s a dangling pointer—classic undefined behavior.
So the compiler moves x to the heap. Heap memory persists after the function returns. The pointer stays valid, and the garbage collector will clean up x later when nobody’s using it anymore. The variable escapes the function’s scope—it needs to outlive the function call.
But detecting these cases in real code isn’t as straightforward as our simple examples. The compiler needs a systematic way to track where every value flows.
How It Works
The compiler needs to answer one question for every variable: can this stay on the stack, or must it go to the heap?
Here’s a simple example:
func process() *Data {
x := Data{value: 10}
y := &x
z := y
return z
}
The compiler traces through this code, following where x goes:
Step 1: Look at what gets returned. We’re returning z. That’s a pointer, and it’s leaving the function—whoever called process() will use it. So whatever z points to must survive after process() returns.
Step 2: Where does z come from? It’s assigned from y. So whatever y points to must survive too.
Step 3: Where does y come from? It’s the address of x (&x). So y points to x, which means x must survive—it has to outlive the function.
Conclusion: x must go to the heap. If we put x on the stack, it would disappear when process() returns, and the pointer we returned would point to garbage memory.
Here’s the key insight: the compiler works backward. It starts from the obvious escape points (returns, global variables, values passed to unknown functions) and traces backward through assignments to find what else must escape. The compiler keeps tracing backward until it’s checked everything. At the end, every variable is marked: stack or heap. These marks stay attached to the variables through the rest of compilation—when the compiler generates the final machine code, it’ll use these marks to decide whether to allocate each variable on the stack or call into the heap allocator.
After all this inlining and optimization, there’s usually some garbage left behind—variables that were copied in but never actually used. Time for one final cleanup pass.
Dead Locals Elimination: Cleaning Up
The final IR-level optimization is dead locals elimination. After all that inlining, there are often variables lying around that were copied into the caller’s context but never actually used. Dead locals elimination finds and removes them.
The compiler walks through your code with a simple strategy: assume everything is dead until proven otherwise. When it sees an assignment like x := 10, it records that assignment as “potentially dead.” When it sees that variable used somewhere—y := x + 5—it marks the variable as “definitely live.” At the end of the scan, any variables still marked “potentially dead” get their assignments removed.
Here’s the important safety check: the compiler only removes assignments where the right-hand side has no side effects. Assignments like x := 42 or x := otherVariable can be safely removed if x is never used—they’re just numbers or reads. But x := expensiveFunc() stays even if x is unused, because calling that function might do important work: writing to a file, sending a network request, modifying global state.
This pass is the cleanup crew—it removes the garbage left behind after inlining and devirtualization merge functions together. Variables that served a purpose in the original function are often redundant in the merged context, and dead locals elimination sweeps them away.
That covers the major IR-level optimizations. These passes work together—each one sets up opportunities for the next—transforming your high-level Go code into something much more efficient before it even reaches the SSA phase.
Summary
The IR is where some of the compiler’s magic happens. After your code is parsed and type-checked, the IR becomes the working format for optimization.
We’ve covered the major IR-level optimizations: devirtualization converts interface calls to direct calls, inlining copies function bodies to eliminate calls and expose more optimization opportunities, and these two run together in a loop until no more changes are possible. Escape analysis marks variables for stack or heap allocation. Dead locals elimination removes unused variable assignments (when safe).
These optimizations form a cascade—each one sets up opportunities for the next. The result is code that’s dramatically more efficient than what you wrote, while you get to write high-level, idiomatic Go.
Want to see these optimizations in action on your own code? The -gcflags='-m' flag tells the Go compiler to print optimization decisions about devirtualization, inlining, and escape analysis. The flag has multiple levels that control the amount of detail:
-m: Shows devirtualization, inlining decisions, and escape analysis-m=2: More detailed information (including why things aren’t inlined)-m=3: Even more details-m=4: Maximum verbosity
Try go build -gcflags='-m' your_file.go to see the basic optimization decisions, or use -m=2 to get more detail including why certain functions weren’t inlined. The compiler will print out each decision it makes.
In the next post in this series, I’ll cover the SSA phase—where the IR gets converted to Static Single Assignment form and goes through even more sophisticated optimizations.
