The Unified IR Format | Internals for Interns

In the previous post , we explored how the Go compiler’s type checker analyzes your code. We saw how it resolves identifiers, checks type compatibility, and ensures your program is semantically correct.

Now that we have a fully type-checked AST, the next logical step would be to generate the compiler’s Intermediate Representation (IR)—the form it uses for optimization and code generation. But here’s something interesting: the Go compiler doesn’t immediately transform the AST into IR. Instead, it takes what might seem like a detour—it serializes the type-checked AST into a binary format, then deserializes it back into IR nodes.

Wait, what? That sounds inefficient, right? Why not just convert AST nodes directly to IR nodes?

This is where the Unified IR format comes in. And despite the seemingly roundabout approach, it’s actually a brilliant design that solves several problems at once. To understand why, we first need to look at how Go organizes compiled packages.

The Compilation Cache and Archive Files

When you compile a Go program, the compiler compiles packages in dependency order. If package B imports package A, the compiler ensures package A is compiled first.

At the end of compiling each package, the compiler generates an archive file and stores it in the build cache. These archive files contain two important files:

__.PKGDEF - The Unified IR representation (what we’re covering in this article): type information, function signatures, constants, generic type parameters, and function bodies for inlinable functions.

_go_.o - The compiled machine code and debug information for the linker (we’ll cover this in a future post).

When compiling a package that imports fmt, the compiler opens the archive from the build cache and reads __.PKGDEF to get all the type information it needs. This is how the type checker knows about types from other packages—that information was already compiled and serialized when fmt was built. It’s also how the compiler can inline functions from imported packages—the function bodies of inlinable functions are right there in the __.PKGDEF file.

The clever part is that the compiler uses the exact same format for both scenarios. When compiling your local package, it serializes the type-checked AST into this Unified IR format, then immediately deserializes it to build the IR. When reading imported packages, it deserializes the Unified IR that was written to __.PKGDEF when those packages were compiled. Same format, same deserialization code—whether the package was just compiled or came from the cache. You can see this orchestrated in src/cmd/compile/internal/noder/unified.go , which coordinates the serialize-deserialize pipeline.

Now that we understand where the Unified IR format fits in the compilation process, let’s dive into the details of the format itself.

What Gets Serialized?

At its core, serialization is a transformation from one representation to another. You start with an AST—a tree structure in memory that represents your code. The compiler takes that entire tree, with all its type information and structure, and converts it into a compact binary format that can be written to disk and read back later.

Now, there are two different contexts where this serialization happens, and they serialize different amounts of data.

When compiling your local package, the compiler serializes everything from the fully-typed AST: all type information (every expression knows its type), generic type parameters and constraints, import dependencies, function signatures, constants and their values, and crucially—all function bodies. Nothing is lost. This complete serialization is immediately deserialized to build the IR that the compiler uses for the rest of the compilation process.

But when writing the __.PKGDEF file for the archive, the compiler is more selective. The exported version contains all type information for exported symbols, function signatures, constants and their values—but only the function bodies that can be inlined. This selective export is deliberate: it gives downstream packages everything they need for type checking and cross-package optimization, while keeping archive files compact. Large functions that wouldn’t be inlined anyway don’t take up space in the export data. The serialization logic lives in src/cmd/compile/internal/noder/writer.go , which encodes the AST into the binary format.

This is how imported packages work—when you compile a package that imports fmt, the compiler reads fmt’s __.PKGDEF file and recreates the typed representation it needs, including those carefully selected inlinable function bodies. The deserialization code in src/cmd/compile/internal/noder/reader.go handles decoding the binary format back into IR nodes.

Now that we know what gets serialized, let’s see how this data is organized.

The Binary Format Structure

The format organizes all this data into 10 specialized sections, each handling a different aspect of the program. Think of it like a well-organized filing cabinet—everything has its place, and you can find what you need quickly. The binary format implementation lives in the src/internal/pkgbits/ package, which provides the encoding and decoding primitives.

At the top, there’s a Header with version information, flags, and indices that tell you where each section starts and ends. At the bottom, there’s a Fingerprint (an 8-byte SHA-256 hash) used for build caching—if this hasn’t changed, packages importing this one can skip recompilation.

In between, the Payload contains 10 sections that work together to represent your program.

SectionString handles deduplication—every string in your program (package names, identifiers, import paths) gets stored here once and referenced everywhere else by index. If “main” appears 500 times in your AST, it’s stored once and those 500 references just use index 1.

SectionPkg reveals the full dependency graph. Even for our simple hello world, this section lists all 59 packages the compiler needs—not just fmt that we import, but errors that fmt imports, internal/reflectlite that errors imports, and so on down the chain.

SectionBody is where the actual code lives. This stores function bodies as encoded AST trees, capturing the structure of your program’s logic. As we mentioned earlier, for local packages being compiled, all function bodies are included. But when exporting to an archive file, only inlinable function bodies make it in—this keeps archive files compact while still enabling cross-package optimization.

SectionMeta is the entry point—where the compiler starts when reading a package. It contains two roots: the public root lists all exported symbols (the package’s API that other packages can use), and the private root contains internal implementation details like initialization tasks and function bodies. Think of it as the directory at the front that points to everything else in the format.

The remaining sections handle the type system and metadata. SectionType, SectionName, and SectionObj work together to store type definitions, qualified names (like fmt.Println), and object declarations (functions, variables, constants). SectionPosBase maps things back to source file locations for error messages. SectionObjExt stores optimization metadata like inlining costs and escape analysis results. And SectionObjDict contains the instantiation information needed for generics.

If you want a deeper dive into the format specification, check out src/cmd/compile/internal/noder/doc.go —it documents the complete format in detail.

Now that we’ve seen what gets serialized and how it’s organized, let’s see the format in action.

Try It Yourself

Want to see the Unified IR format in action? Let’s compile our hello world program and explore the archive file.

First, create the hello world program:

// main.go
package main

import "fmt"

func main() {
    fmt.Println("Hello world")
}

To compile this into an archive file, we need to provide an import configuration that tells the compiler where to find the fmt package. Generate it with:

go list -export -json fmt | jq -r '"packagefile fmt=\(.Export)"' > importcfg

This creates an importcfg file that maps the fmt package to its archive file in the build cache.

Now compile the program to an archive file:

go tool compile -p main -importcfg importcfg -o main.a main.go

This creates main.a—an archive file containing both the Unified IR representation and the compiled object code for your program.

Let’s peek inside the archive:

ar t main.a

You’ll see:

__.PKGDEF
_go_.o

The __.PKGDEF file contains the Unified IR representation of your main package. To decode and explore it, we can use unified-ir-reader—a tool I created specifically to help visualize and understand the concepts we’re exploring in this article. Install it with:

go install github.com/jespino/unified-ir-reader@latest

Now decode the Unified IR directly from the archive file:

unified-ir-reader --limit 5 main.a

You’ll see output like this:

╔═══════════════════════════════════════════════════════════════╗
║                   Unified IR Binary Format                    ║
╚═══════════════════════════════════════════════════════════════╝

=== Format Metadata ===
Sync Markers: false
Total Elements: 159
Fingerprint: 4b413f76f05d40fa

=== Section Statistics ===
  SectionString   :   98 elements
  SectionMeta     :    2 elements
  SectionPosBase  :    0 elements
  SectionPkg      :   59 elements
  SectionName     :    0 elements
  SectionType     :    0 elements
  SectionObj      :    0 elements
  SectionObjExt   :    0 elements
  SectionObjDict  :    0 elements
  SectionBody     :    0 elements

=== SectionString (Deduplicated Strings) ===
Total strings: 98
(showing first 5)

  [  0] ""
  [  1] "main"
  [  2] "fmt"
  [  3] "errors"
  [  4] "unsafe"
  ... and 93 more

=== SectionPosBase (Source File Locations) ===
  (none)

=== SectionPkg (Package References) ===
  [0] <unlinkable> (name: main)
  [1] fmt (name: fmt)
  [2] errors (name: errors)
  [3] (error reading package: unexpected decoding error: EOF)
  [4] internal/reflectlite (name: reflectlite)
  ... and 54 more

=== SectionType (Type Definitions) ===
Total types: 0

=== SectionObj (Object Declarations) ===
  (none)

=== SectionMeta - Private Root (Function Bodies & Internal Data) ===
Has .inittask: true
Function bodies: 0

The output reveals how your hello world program looks in the Unified IR format. The section statistics show that most data lives in SectionString (98 elements) and SectionPkg (59 elements), with the rest being metadata.

SectionString shows the first 5 deduplicated strings—"main", "fmt", "errors", "unsafe". Each appears once and gets referenced everywhere by index.

SectionPkg reveals something surprising: even a simple “hello world” references 59 packages! This is the full transitive dependency graph—fmt depends on errors, which depends on internal/reflectlite, and so on down the chain.

Notice that SectionName, SectionType, and SectionObj show 0 elements. This is because the main package doesn’t export anything—the public root is essentially empty. All the actual code lives in the private root, which just contains the .inittask for initialization.

But what about a package that actually exports things? Let’s explore one of those.

Exploring Packages from the Cache

You can also use unified-ir-reader to explore already-compiled packages from your build cache. Remember the importcfg file we created? It contains the path to the fmt package’s archive:

cat importcfg

You’ll see something like packagefile fmt=/path/to/cache/go-build/14/14898fc9cf93ae046520d51da5f69ccf5f09b5d5d3faaf48d33ebf2088e4ded2-d, showing where fmt lives in your build cache. You can pass that path directly to unified-ir-reader:

unified-ir-reader --limit 5 /path/to/cache/go-build/14/14898fc9cf93ae046520d51da5f69ccf5f09b5d5d3faaf48d33ebf2088e4ded2-d

This reveals the fmt package’s Unified IR representation—all its exported types, functions, and the inlinable function bodies it makes available to importing packages. You’ll see far more content than our simple main package: dozens of exported functions like Println, Printf, Sprintf, along with types like Stringer and Formatter. This is what the compiler reads when you import fmt—everything it needs to type-check your code and inline small functions across package boundaries.

Now that we’ve explored the format both conceptually and hands-on, let’s talk about why the Go team built it this way.

Why Unified IR?

Here’s the thing: before unified IR, the Go compiler had four separate code paths that all did similar jobs—copying and transforming IR trees. There was one for converting the AST to IR (called “noding”), another for handling generics (stenciling), a third for inlining functions, and a fourth for importing and exporting package data. Each one had its own implementation, and each one had to handle all the same tricky corner cases in Go’s type system.

The unified IR approach collapsed all four of these into a single code path. Now there’s one way to serialize an IR tree and one way to deserialize it, and that same code handles everything—local compilation, package export, import, inlining, and generics. This dramatically reduces the surface area for bugs since there’s only one implementation to get right, eliminates all that redundant code, and simplifies the entire compilation pipeline. Instead of tracking how four different processes interact, you understand one core serialize-deserialize mechanism. The compiler becomes easier to reason about and maintain.

The real win shows up in language features. Inlining became significantly more powerful because it uses the same machinery as package import/export. Functions that were too complex for the old inliner—things like function literals and type switches—work naturally now because the format already handles them. Similarly, generics support is more complete since the unified approach handles type parameter substitutions and instantiations without needing separate stenciling logic.

This is why the serialize-then-deserialize approach that seemed inefficient at first is actually quite elegant. Yes, local packages get serialized and immediately deserialized, which adds a bit of overhead. But that overhead buys you a dramatically simpler compiler that naturally supports more complex language features. Sometimes the best solution isn’t the most direct one.

Now that we understand both the format and why it exists, let’s recap what we’ve covered.

Wrap-Up

We’ve explored the Unified IR format—the binary serialization format that sits between type checking and IR generation in the Go compiler. We saw how it lives in __.PKGDEF files inside archive files in the build cache, and how the same format handles both local packages (serialize everything, immediately deserialize) and imported packages (deserialize what was written during their compilation).

We walked through the format’s structure: 10 sections organizing everything from deduplicated strings to function bodies, with selective export that keeps archive files compact by only including inlinable functions. We got hands-on experience compiling a hello world program and exploring its serialized representation with the unified-ir-reader tool.

We also explored why the Go team chose this approach—how unifying four separate code paths into one serialize-deserialize mechanism reduces bugs, eliminates redundant code, simplifies the compilation pipeline, and enables more powerful inlining and generics support. The seemingly roundabout serialize-then-deserialize approach turns out to be an elegant solution that makes the compiler simpler and more capable.

In the next post, we’ll explore the IR itself and the optimization passes that transform it before code generation.