From TCP to Ready: PostgreSQL Connections

From TCP to Ready: PostgreSQL Connections

In the previous article, we explored the complete journey a SQL query takes through PostgreSQL—from parsing to execution. But before any of that can happen, your application needs to establish a connection with the database.

This might seem like a simple handshake, but there’s actually a sophisticated process happening behind the scenes—involving process management, authentication, and a binary protocol for efficient communication.

Understanding how PostgreSQL handles connections helps explain why connection pooling matters, how to troubleshoot connectivity issues, and why PostgreSQL’s architecture differs from thread-based databases. Let’s trace what happens when your application connects to PostgreSQL.

The Postmaster: PostgreSQL’s Doorkeeper

When you start PostgreSQL, the first process that launches is the postmaster. Think of it as the database’s receptionist—it listens for incoming connections and coordinates how they’re handled.

The postmaster doesn’t actually process your queries. Instead, when a client connects, the postmaster creates a brand new backend process dedicated to that specific client. Each connection gets its own separate process with its own private memory for query processing.

However, backends don’t work in complete isolation. They all share access to common resources through shared memory—most notably the shared buffer cache where PostgreSQL keeps database pages in memory. This shared memory allows backends to efficiently share data without duplicating it in each process.

This process-per-connection architecture provides strong isolation. If one backend crashes due to a problematic query, it doesn’t affect other connections. The postmaster remains lightweight and stable because it only manages connections—it never accesses shared memory or processes complex queries.

The tradeoff is that PostgreSQL connections are more expensive to create than in thread-based databases. Creating a new operating system process takes more resources than spawning a thread, which is why connection pooling becomes crucial for high-traffic applications.

Now let’s trace through exactly what happens when a connection is established.

The Connection Dance: From TCP to Ready

Establishing a PostgreSQL connection involves several distinct steps. Let’s walk through them in order.

Step 1: TCP Connection

Your application initiates a standard TCP connection to PostgreSQL, typically on port 5432. The postmaster accepts this connection. At this point, no PostgreSQL-specific communication has happened yet—this is just basic network plumbing.

With the TCP connection established, the next step addresses security.

Step 2: SSL Negotiation (Optional)

If SSL is configured, the client and server negotiate encryption. This happens early because it affects all subsequent communication. Once SSL is established, everything that follows flows through the encrypted channel, protecting passwords and query data from interception.

Now the client can formally request a database session.

Step 3: Startup Packet

The client sends a startup packet containing:

  • The PostgreSQL protocol version it speaks (currently 3.0)
  • The target database name
  • The username
  • Optional connection parameters (timezone, character encoding, etc.)

This packet is the client’s formal request for a database session.

With the session request received, the postmaster creates a dedicated process to handle this connection.

Step 4: Backend Process Creation

The postmaster validates the startup packet—checking that the requested database exists and the system can handle another connection. If everything looks good, it creates a new backend process using the operating system’s fork mechanism.

After forking, the postmaster hands off the connection to this new backend and returns to listening for more connections. From this point forward, your client communicates directly with its dedicated backend process.

But before the connection can be used, there’s one more critical step: verifying the client’s identity.

Step 5: Authentication

Now comes the security checkpoint. The backend process takes over and begins authentication based on the rules in pg_hba.conf (PostgreSQL’s host-based authentication configuration file).

These rules determine what authentication method to use based on where the connection is coming from, which database is being accessed, and which user is connecting.

PostgreSQL supports several authentication methods including trust (no credentials), password-based methods like SCRAM-SHA-256, SSL certificate authentication, and integration with external systems like LDAP or Kerberos. The authentication method can vary dramatically based on context—you might use trust authentication for local connections but require certificate authentication for remote access.

Once authentication passes, we’ve reached the final step.

Step 6: Ready for Queries

Once authentication succeeds, the backend enters the ready state. The connection is fully established, and client and server can now communicate using PostgreSQL’s wire protocol to exchange queries and results.

So what exactly is this wire protocol, and how does it work? Let’s take a look.

The Wire Protocol: How Client and Server Talk

PostgreSQL’s wire protocol defines the language that clients and servers use to exchange queries and results. It’s a binary protocol designed for efficiency and reliability.

At its core, the protocol uses a simple but effective message format.

Message Structure

Every message follows the same basic format:

  1. A length field (4 bytes) indicating the message size
  2. A message type identifier (1 byte, typically an ASCII character)
  3. The message content (variable length)

This structure allows both sides to parse messages efficiently and handle everything from tiny commands to massive result sets.

Now let’s look at how these messages work together to execute queries.

Two Ways to Execute Queries

PostgreSQL provides two distinct protocols for executing queries, each suited to different scenarios.

Simple Query Protocol

The straightforward approach: send SQL, get results.

Let’s trace through what happens when we look up a user. The client sends a single Query message containing the complete SQL statement:

Client → Server: Query ('Q') + "SELECT name, email FROM users WHERE id = 42"

PostgreSQL receives this, parses the SQL, plans the query, executes it, and sends back a sequence of messages:

Server → Client: RowDescription ('T') + column metadata
                  - Column 1: "name" (text type)
                  - Column 2: "email" (text type)

First comes RowDescription, which tells the client what columns to expect in the results—their names, data types, and other metadata. This lets the client prepare to receive and format the data appropriately.

Server → Client: DataRow ('D') + "John Doe", "john@example.com"

Next, DataRow messages contain the actual result data. For queries returning multiple rows, PostgreSQL sends one DataRow message per row.

Server → Client: CommandComplete ('C') + "SELECT 1"

CommandComplete signals that the query finished successfully and includes a tag indicating what happened (in this case, “SELECT 1” means one row was returned).

Server → Client: ReadyForQuery ('Z') + transaction status

Finally, ReadyForQuery tells the client that PostgreSQL is ready to accept the next command. It also includes the current transaction status (idle, in a transaction block, or in a failed transaction).

This protocol is perfect for ad-hoc queries where you execute each statement once. The downside? PostgreSQL must parse and plan the query from scratch every time, even if you’re running the same query structure repeatedly with different values.

For repeated queries, PostgreSQL offers a more sophisticated approach.

Extended Query Protocol

This protocol separates preparation from execution.

Instead of sending complete SQL each time, the extended protocol lets you create a prepared statement (a query template with placeholders) and then execute it multiple times with different parameters.

Let’s trace through the same user lookup, but this time using the extended protocol. The process is more involved, but the benefits become clear for repeated queries.

First, the client creates a query template with a placeholder:

Client → Server: Parse ('P') + statement name "get_user" +
                  "SELECT name, email FROM users WHERE id = $1" +
                  parameter types [INTEGER]

The Parse message tells PostgreSQL to create a prepared statement called “get_user”. The SQL contains a placeholder ($1) where the actual user ID will go, and we specify that this placeholder expects an INTEGER. PostgreSQL parses the SQL and creates a query plan that can be reused.

Server → Client: ParseComplete ('1')

ParseComplete confirms that the prepared statement is ready to use.

Next, the client binds specific parameter values to the template:

Client → Server: Bind ('B') + portal name "user_portal" +
                  statement "get_user" +
                  parameter values [42]

The Bind message creates what’s called a “portal”—a specific instance of the prepared statement with actual parameter values filled in. We’re binding the value 42 to the $1 placeholder in our “get_user” statement, creating a portal called “user_portal”.

Server → Client: BindComplete ('2')

BindComplete confirms the portal is ready to execute.

Now we can execute the portal:

Client → Server: Execute ('E') + portal "user_portal"

The Execute message runs the portal. PostgreSQL sends back the results just like in the simple protocol:

Server → Client: RowDescription ('T') + column metadata
                  - Column 1: "name" (text type)
                  - Column 2: "email" (text type)

Server → Client: DataRow ('D') + "John Doe", "john@example.com"

Server → Client: CommandComplete ('C') + "SELECT 1"

Finally, we synchronize:

Client → Server: Sync ('S')

Server → Client: ReadyForQuery ('Z') + transaction status

The Sync message tells PostgreSQL we’re done with this extended query sequence. If anything went wrong during the sequence, PostgreSQL cleans up and gets ready for the next command. ReadyForQuery confirms everything is ready.

The beauty of this approach becomes apparent with repeated queries. To look up user ID 99, we skip the Parse step entirely—just create a new portal binding the existing “get_user” template with the new parameter value, then execute it. PostgreSQL reuses the already-parsed and planned template, making subsequent executions much faster.

The extended protocol also provides better security against SQL injection because parameters are sent separately from the SQL structure, using typed values rather than string substitution.

We’ve seen how connections are established and how they communicate, but what happens when they end?

How Connections End

PostgreSQL connections can terminate in several ways:

Normal Disconnection - The client sends a Terminate message, the backend completes pending operations, releases resources, and exits gracefully.

Idle Timeouts - PostgreSQL can automatically close connections that remain inactive too long. The idle_in_transaction_session_timeout parameter is particularly important—it prevents connections from holding database locks indefinitely.

Administrative Termination - Database administrators can forcibly close connections using commands like pg_terminate_backend(). This is essential for stopping runaway queries or forcing disconnection of problematic applications.

Process Crashes - When a backend crashes, it can’t perform normal cleanup. The postmaster detects the crash and initiates recovery procedures to ensure database consistency. The process-per-connection model isolates problems within individual backends, preventing one crash from affecting other connections.

We’ve covered a lot of ground—let’s recap the key points.

Summary

PostgreSQL’s connection architecture centers around the postmaster process, which creates dedicated backend processes for each client connection. Connections establish through six steps: TCP connection, SSL negotiation, startup packet, backend creation, authentication, and ready state.

Once established, connections communicate using PostgreSQL’s wire protocol—supporting both simple queries (ad-hoc SQL) and extended queries (prepared statements with parameters). The extended protocol enables query plan reuse and better security against SQL injection.

Now that we understand how connections work, the next article will explore what happens once your SQL query reaches PostgreSQL: the parser. We’ll see how PostgreSQL transforms SQL text into structured parse trees that represent the query’s meaning and structure.