The IR | Internals for Interns

In the previous posts, we’ve explored how the Go compiler processes your code: the scanner breaks it into tokens, the parser builds an Abstract Syntax Tree, the type checker validates everything, and the Unified IR format serializes the type-checked AST into a compact binary representation.

Now we’re at a critical transformation point. The compiler takes that Unified IR—whether it was just serialized from your code or loaded from a cached archive file—and deserializes it directly into IR nodes. This is where your source code truly becomes the compiler’s working format.

The IR isn’t just another representation of your code—it’s a representation optimized for what comes next: analysis and transformation. The compiler needs to answer questions that are hard to answer from the AST:

Can this allocation stay on the stack, or must it go to the heap?
Can this interface call be replaced with a direct call?
Can this function be inlined?
What variables are never used?

The AST is great at representing what you wrote—files, declarations, statements mirroring your source structure. But the compiler needs something different: code organized by compilation units, with explicit operation types, embedded type information, and all implicit operations made visible.

That’s the Intermediate Representation, or IR.

The Unified IR format we covered in the previous post is the serialized form—the compact binary encoding that goes into archive files. What we’re covering now is the in-memory form—the actual data structures the compiler manipulates during optimization and code generation. The deserialization process in src/cmd/compile/internal/noder/reader.go transforms the binary format directly into these IR nodes, giving the compiler the working format it needs.

Let’s see what these IR nodes actually look like and how they differ from what came before.

What is the IR?

The IR organizes code by package, not by file. Files are an artifact of how you organized your source—the compiler cares about packages as compilation units. Here’s what the package structure looks like (from src/cmd/compile/internal/ir/package.go:9-42):

type Package struct {
    Imports       []*types.Pkg  // Imported packages
    Inits         []*Func       // Package init functions
    Funcs         []*Func       // Top-level functions
    Externs       []*Name       // Package-level declarations
    AsmHdrDecls   []*Name       // Assembly declarations
    CgoPragmas    [][]string    // Cgo directives
    Embeds        []*Name       // Variables with //go:embed
    PluginExports []*Name       // Exported plugin symbols
}

This structure tells us what the compiler actually cares about at the package level: what other packages does this package depend on (Imports)? What initialization functions need to run before anything else (Inits)? What are all the functions (Funcs) and package-level declarations (Externs)? Everything else—the actual implementation code—lives inside these top-level structures. Function bodies, for example, are stored inside the Func nodes.

Inside those function bodies, the IR represents every operation as a node. Each node has an operation code (or “op”) that identifies what kind of operation it represents. There are about 150 different operation types:

OADD - addition operation
OIF - if statement
OCALL - function call
OCONVIFACE - conversion to interface
OLITERAL - literal value
And many more…

These operation codes are the building blocks of the IR. Every expression, every statement, every operation in your Go code gets represented as a node with a specific op code. The nodes form a tree structure, just like the AST, but with key differences that make them optimized for the compiler’s needs.

Node Structure: How Nodes Are Built

Every IR node starts with the same foundation—a miniNode (src/cmd/compile/internal/ir/mini.go:16-87):

type miniNode struct {
    pos  src.XPos  // source position
    op   Op        // operation type (OADD, OIF, OCALL, etc.)
    bits bitset8   // flags (typecheck status, walked)
    esc  uint16    // escape analysis result
}

Four essential pieces: where this came from (pos), what operation it is (op), some status flags (bits), and the escape analysis result (esc).

From here, nodes split into two categories. Expressions produce values, so they add type information:

type miniExpr struct {
    miniNode                 // embedded base
    typ      *types.Type     // type information
    init     Nodes           // initialization statements
    flags    bitset8         // expression-specific flags
}

Statements perform actions without producing values, so they skip the type field:

type miniStmt struct {
    miniNode         // embedded base
    init     Nodes   // initialization statements
}

Then specific operations build on these. Binary operations like x + y? That’s a BinaryExpr:

type BinaryExpr struct {
    miniExpr         // embedded expression fields
    X        Node    // left operand
    Y        Node    // right operand
}

If statements? That’s IfStmt:

type IfStmt struct {
    miniStmt        // embedded statement fields
    Cond     Node   // condition expression
    Body     Nodes  // statements when true
    Else     Nodes  // statements when false
}

This layered design keeps it efficient—common fields live at the base, specific nodes only add what they need. Now let’s see what the compiler actually does with these nodes.

The Optimization Pipeline

Once the IR is built, the compiler runs several optimizations at the IR level—before converting to SSA. The key optimizations are:

Devirtualization: Converting indirect calls (interface methods) into direct calls when the concrete type is known
Inlining: Replacing function calls with the function’s body
Escape Analysis: Determining whether values can stay on the stack or must escape to the heap
Dead Locals Elimination: Removing assignments to unused local variables

The most interesting aspect is that devirtualization and inlining run interleaved—they loop together until no more optimizations are possible. Devirtualization enables inlining, and inlining can expose more devirtualization opportunities.

Let’s start with devirtualization.

Devirtualization: Making the Indirect Direct

Devirtualization is the process of converting indirect calls (interface method calls) into direct calls when the concrete type is known.

Why does this matter? Three reasons:

Inlining: Direct calls can be inlined; indirect calls cannot
Better optimizations: Direct calls have known side effects
Reduced overhead: No virtual dispatch or interface lookup

Let’s see this in action.

Static Devirtualization

Consider this code:

type Processor interface {
    Process(x int) int
}

type SimpleProcessor struct{}

func (s SimpleProcessor) Process(x int) int {
    return x * 2
}

func compute(p Processor, x int) int {
    return p.Process(x)  // Interface call
}

The call p.Process(x) goes through the interface dispatch mechanism—your program looks up the method in the interface’s method table at runtime. This is slower and cannot be inlined.

But the compiler can often prove what the concrete type is. When you call compute like this:

func main() {
    proc := SimpleProcessor{}
    result := compute(proc, 42)  // Concrete type visible!
}

The compiler traces back through the conversions and sees: “Ah, p is actually a SimpleProcessor!” It then devirtualizes the call:

func compute(p Processor, x int) int {
    // Devirtualized: type assertion inserted (no runtime check)
    return (SimpleProcessor(p)).Process(x)  // Direct call!
}

This transformation happens in src/cmd/compile/internal/devirtualize/devirtualize.go. The algorithm is straightforward:

Find the interface call (OCALLINTER operation)
Trace back to where the interface value was set
Extract the concrete type from that assignment
Replace the interface call with a direct method call

The OCALLINTER (interface call) becomes OCALLMETH (method call)—a direct call that can now be inlined.

But the compiler has another devirtualization trick up its sleeve.

Profile-Guided Devirtualization

Static devirtualization only works when the compiler can prove the concrete type. But what if the type varies at runtime?

That’s where Profile-Guided Optimization (PGO) comes in. With PGO, the compiler uses runtime profile data to identify hot call sites and inserts conditional devirtualization:

// Original:
func process(i Processor) {
    i.Process()  // Hot call site, profile shows 95% SimpleProcessor
}

// After PGO devirtualization:
func process(i Processor) {
    if concrete, ok := i.(SimpleProcessor); ok {
        concrete.Process()  // Direct call - fast path!
    } else {
        i.Process()  // Fallback - slow path
    }
}

The fast path is a direct call that can be inlined. The slow path handles the other 5% of cases. The result? Massive speedups on hot paths.

This happens in src/cmd/compile/internal/devirtualize/pgo.go. The algorithm:

Query the profile for this call site
Find the hottest callee (highest execution count)
Generate a type assertion and conditional branch
Inline the fast path if possible

The key insight: PGO devirtualization is only applied when the fast path can be inlined. Otherwise, the overhead of the conditional outweighs the benefit.

We’ve mentioned inlining several times now. Let’s see how it actually works.

Function Inlining: Eliminating Calls

Function inlining replaces a function call with the function’s body. This is a very powerful compiler optimization:

// Before:
func add(x, y int) int {
    return x + y
}

func compute() int {
    return add(5, 10)
}

// After inlining:
func compute() int {
    return 5 + 10  // Function body copied, call eliminated
}

Eliminating the call is just the beginning. Inlining exposes the function body to the caller’s context, enabling:

Constant propagation: The compiler sees 5 + 10 and folds it to 15
Unused variable removal: Parameters and locals that aren’t needed become obvious
More devirtualization: Concrete types become visible
Stack allocation: Variables that would escape can now stay on the stack

But inlining has a cost: it increases code size. Inline too aggressively, and your binary bloats. The compiler needs a strategy to decide what to inline.

The Cost Model

Go uses a node-based budget system. Each function gets a “budget” of nodes it can contain and still be inlinable:

Budget Type	Value	When Applied
Default	80 nodes	Most functions
Closure called once	800 nodes	Single-use closures (10×!)
PGO hot function	2000 nodes	Profile-identified hot functions (25×!)

The compiler walks the function’s IR tree and counts nodes. Most operations cost 1 node. Some are more expensive:

Function calls: 57 nodes (expensive!)
Interface calls: 57 nodes
Calls to known inlinable functions: Use that function’s actual cost
Panic calls: Nearly free (1 node)

Here’s the clever part: if a function calls another inlinable function, the compiler uses the callee’s cost instead of the generic call cost. This rewards building programs from small, composable functions.

So how does inlining actually happen? It’s a two-phase process.

The Inlining Process

First, the compiler figures out what’s inlinable:

Phase 1: Analysis (CanInline)

The compiler determines if each function is inlinable:

func CanInline(fn *ir.Func, profile *pgoir.Profile) {
    // Check hard constraints (go:noinline pragma, etc.)
    if reason := InlineImpossible(fn); reason != "" {
        return
    }

    // Calculate cost by walking the IR tree
    visitor := hairyVisitor{
        budget: inlineBudget(fn, profile),
    }
    visitor.visitList(fn.Body)

    // Store result
    if visitor.budget >= 0 {
        fn.Inl = &ir.Inline{
            Cost: initialBudget - visitor.budget,
        }
    }
}

The “hairiness visitor” walks the IR tree, decrementing the budget for each node. If the budget hits zero, the function is “too hairy” (too complex) to inline.

Once the compiler knows what can be inlined, it’s time to actually do it.

Phase 2: Transformation (TryInlineCall)

At call sites, the compiler decides whether to inline:

func mkinlcall(call *ir.CallExpr, fn *ir.Func) ir.Node {
    // Copy function body
    body := ir.DeepCopy(fn.Inl.Body, inlvars)

    // Create InlinedCallExpr
    res := ir.NewInlinedCallExpr(...)
    res.Body = body
    return res
}

The function body is copied into the caller, with parameters replaced by arguments. The call to add(5, 10) gets replaced with an InlinedCallExpr node that contains the function’s body—with x and y bound to 5 and 10. The call disappears, and the function’s statements become part of the caller.

But the budget system is just the beginning. The compiler has gotten much smarter about when to inline.

Advanced Heuristics

The compiler uses sophisticated heuristics for smarter inlining decisions (implemented in src/cmd/compile/internal/inline/inlheur/scoring.go). Beyond just the function’s size, it looks at the context of each call site and adjusts the score up or down based on what optimizations inlining might unlock.

Some adjustments discourage inlining by increasing the score. If a call is on a panic path—code that unconditionally leads to panic or exit—the compiler adds 40 points to discourage inlining. Why? Panic paths rarely execute, so inlining them wastes code space. Similarly, calls in init() functions get penalized because init runs once at startup; inlining provides minimal benefit.

Other adjustments encourage inlining by decreasing the score. Calls inside loops get a small boost (-5) because they execute repeatedly—inlining saves call overhead many times over.

The most interesting adjustments involve parameters and return values. The compiler analyzes what you’re passing to the function and what it returns, looking for optimization opportunities. Passing a concrete type that gets converted to an interface for a method call? That’s -30 points—inlining enables devirtualization. Passing a constant that feeds into an if statement? The compiler knows inlining will enable branch elimination.

There are about 14 different adjustments organized into three categories: context-based (where the call is), parameter-based (what you’re passing), and return-value-based (what the caller does with the result). The compiler doesn’t just look at function size—it predicts the optimization opportunities that inlining will unlock.

There’s also a safety mechanism for big functions. When a function grows beyond 5000 nodes, the compiler considers it “big” and becomes much more conservative about inlining into it. In these large functions, the compiler will only inline functions that cost 20 nodes or less. This prevents already-large functions from exploding in size and keeps compilation times reasonable.

Now here’s where devirtualization and inlining really work their magic together.

The Interleaved Strategy

Devirtualization and inlining run together in a loop until no more optimizations are possible:

digraph InterleaveStrategy {
    rankdir=LR;
    node [shape=box, style=rounded];

    Devirt [label="Devirtualization"];
    Inline [label="Inlining"];
    Check [label="More changes?", shape=diamond];
    Done [label="Escape Analysis"];

    Devirt -> Inline;
    Inline -> Check;
    Check -> Devirt [label="Yes"];
    Check -> Done [label="No"];
}

The loop is simple: devirtualization converts indirect calls to direct ones, inlining exposes those function bodies and reveals more concrete types, then the compiler checks if any changes were made. If yes, it runs another round—the newly exposed types might enable more devirtualization. If no changes happened, we’ve reached a fixed point and move on to escape analysis.

Let me show you why this matters with a concrete example:

type Processor interface {
    Process(int) int
}

func helper(p Processor, x int) int {
    return p.Process(x)  // Interface call
}

func compute(p Processor, x int) int {
    return helper(p, x)  // Call to helper
}

func main() {
    proc := ConcreteProcessor{}
    result := compute(proc, 42)
}

Iteration 1:

Initial state in main:

result := compute(proc, 42)

Process compute(proc, 42):

Devirtualize: Not applicable (direct call)
Inline: ✓ Inline compute → exposes helper(p, x)

After Iteration 1:

result := helper(proc, 42)  // New call discovered!

Iteration 2:

Process helper(proc, 42):

Devirtualize: Not applicable (direct call)
Inline: ✓ Inline helper → exposes p.Process(x)

After Iteration 2:

var p Processor = proc  // OCONVIFACE
result := p.Process(42)  // OCALLINTER - New interface call!

Iteration 3:

Process p.Process(42):

Devirtualize: ✓ devirtualize.StaticCall sees OCONVIFACE → changes OCALLINTER to OCALLMETH
Inline: If Process is small enough, inline it too!

After Iteration 3:

result := ConcreteProcessor.Process(proc, 42)  // Direct call
// OR if Process inlines:
result := 42 * 2  // Inlined body of Process

Iteration 4:

No changes found
Loop exits (fixed point reached)

Without interleaving, we wouldn’t be able to inline the devirtualized method call. With interleaving, we eliminated three function calls by repeatedly inlining and devirtualizing until no more opportunities remained.

This interleaved strategy is implemented in src/cmd/compile/internal/inline/interleaved/interleaved.go. It’s the heart of Go’s IR optimizer.

With calls optimized and function bodies exposed, the compiler now has a crucial decision to make about every variable: where should it live?

Escape Analysis: Heap or Stack?

After devirtualization and inlining, the compiler runs escape analysis. This determines whether variables can be stack-allocated or must be heap-allocated.

The stack is a region of memory that grows and shrinks with function calls. Each function gets a “stack frame” containing its local variables. When the function returns, that frame is popped off—instantly freeing all its memory. Stack allocation is just bumping a pointer; it’s incredibly fast and requires no cleanup.

The heap is a region of memory for data that needs to live beyond a single function call. Allocating on the heap is slower—it involves finding free space and requires the GC to clean it up later. But the heap is necessary when data needs to outlive the function that created it.

Let’s see this difference in action:

func foo() *int {
    x := 42        // Escapes: address returned
    return &x      // Must be on heap!
}

func bar() int {
    x := 42        // Doesn't escape
    return x       // Can stay on stack
}

Look at bar first. The variable x is created, used, and then its value (42) is returned. Once bar finishes, x is gone—nobody needs it anymore. The stack frame disappears, and that’s fine. Stack allocation is perfect here: fast and automatically cleaned up.

Now foo. Here we’re returning &x—a pointer to x. The caller will use this pointer to access x after foo returns. But here’s the problem: if x lived on the stack, that stack frame would be destroyed when foo returns. The pointer would point to memory that’s been reclaimed, potentially overwritten by the next function call. That’s a dangling pointer—classic undefined behavior.

So the compiler moves x to the heap. Heap memory persists after the function returns. The pointer stays valid, and the garbage collector will clean up x later when nobody’s using it anymore. The variable escapes the function’s scope—it needs to outlive the function call.

But detecting these cases in real code isn’t as straightforward as our simple examples. The compiler needs a systematic way to track where every value flows.

How It Works

The compiler needs to answer one question for every variable: can this stay on the stack, or must it go to the heap?

Here’s a simple example:

func process() *Data {
    x := Data{value: 10}
    y := &x
    z := y
    return z
}

The compiler traces through this code, following where x goes:

Step 1: Look at what gets returned. We’re returning z. That’s a pointer, and it’s leaving the function—whoever called process() will use it. So whatever z points to must survive after process() returns.

Step 2: Where does z come from? It’s assigned from y. So whatever y points to must survive too.

Step 3: Where does y come from? It’s the address of x (&x). So y points to x, which means x must survive—it has to outlive the function.

Conclusion: x must go to the heap. If we put x on the stack, it would disappear when process() returns, and the pointer we returned would point to garbage memory.

Here’s the key insight: the compiler works backward. It starts from the obvious escape points (returns, global variables, values passed to unknown functions) and traces backward through assignments to find what else must escape. The compiler keeps tracing backward until it’s checked everything. At the end, every variable is marked: stack or heap. These marks stay attached to the variables through the rest of compilation—when the compiler generates the final machine code, it’ll use these marks to decide whether to allocate each variable on the stack or call into the heap allocator.

After all this inlining and optimization, there’s usually some garbage left behind—variables that were copied in but never actually used. Time for one final cleanup pass.

Dead Locals Elimination: Cleaning Up

The final IR-level optimization is dead locals elimination. After all that inlining, there are often variables lying around that were copied into the caller’s context but never actually used. Dead locals elimination finds and removes them.

The compiler walks through your code with a simple strategy: assume everything is dead until proven otherwise. When it sees an assignment like x := 10, it records that assignment as “potentially dead.” When it sees that variable used somewhere—y := x + 5—it marks the variable as “definitely live.” At the end of the scan, any variables still marked “potentially dead” get their assignments removed.

Here’s the important safety check: the compiler only removes assignments where the right-hand side has no side effects. Assignments like x := 42 or x := otherVariable can be safely removed if x is never used—they’re just numbers or reads. But x := expensiveFunc() stays even if x is unused, because calling that function might do important work: writing to a file, sending a network request, modifying global state.

This pass is the cleanup crew—it removes the garbage left behind after inlining and devirtualization merge functions together. Variables that served a purpose in the original function are often redundant in the merged context, and dead locals elimination sweeps them away.

That covers the major IR-level optimizations. These passes work together—each one sets up opportunities for the next—transforming your high-level Go code into something much more efficient before it even reaches the SSA phase.

Summary

The IR is where some of the compiler’s magic happens. After your code is parsed and type-checked, the IR becomes the working format for optimization.

We’ve covered the major IR-level optimizations: devirtualization converts interface calls to direct calls, inlining copies function bodies to eliminate calls and expose more optimization opportunities, and these two run together in a loop until no more changes are possible. Escape analysis marks variables for stack or heap allocation. Dead locals elimination removes unused variable assignments (when safe).

These optimizations form a cascade—each one sets up opportunities for the next. The result is code that’s dramatically more efficient than what you wrote, while you get to write high-level, idiomatic Go.

Want to see these optimizations in action on your own code? The -gcflags='-m' flag tells the Go compiler to print optimization decisions about devirtualization, inlining, and escape analysis. The flag has multiple levels that control the amount of detail:

-m: Shows devirtualization, inlining decisions, and escape analysis
-m=2: More detailed information (including why things aren’t inlined)
-m=3: Even more details
-m=4: Maximum verbosity

Try go build -gcflags='-m' your_file.go to see the basic optimization decisions, or use -m=2 to get more detail including why certain functions weren’t inlined. The compiler will print out each decision it makes.

In the next post in this series, I’ll cover the SSA phase—where the IR gets converted to Static Single Assignment form and goes through even more sophisticated optimizations.