Introduction
The announcement of Pydantic AI V2 sent ripples through the Python community, not just because of its new features for agent creation, but because of a seemingly radical dependency shift hidden in the beta release notes. While the framework maintained compatibility with standard HTTP clients, the architectural blueprint of V2—specifically its deep integration with the Model Context Protocol (MCP)—functionally demanded a total overhaul of how the underlying HTTP client operates.
For years, the Python ecosystem was defined by a beautiful yet simplistic relationship: for HTTP, you used Requests; for data, you used Pydantic V1. Pydantic AI V2 breaks this mold with extreme prejudice. To understand why the requests library—a beloved “batteries-included” stalwart—had to be sidelined in favor of HTTPX, one must look beyond simple “async support” and examine the mechanical physics of modern AI agents.
This article argues that Pydantic AI V2 did not merely “choose” HTTPX; the framework’s architecture forced a total overhaul of the HTTP client paradigm. It required a shift from a blocking, one-off request library to an asynchronous, connection-pooled, protocol-aware transport layer.
The Death of the “One-Off” Request in Agentic Systems
The fundamental flaw with the legacy requests library in the context of Pydantic AI is its architectural assumption: that HTTP requests are discrete, isolated events. You open a connection, ask a question, get an answer, and close the connection. This works perfectly for static APIs. However, an AI Agent is not a static API; it is a stateful, iterative loop.
Pydantic AI V2 is built around the concept of “Agentic Workflows.” An Agent does not make one call; it makes dozens. It calls a model, gets a tool request, calls a tool, returns the result to the model, and gets a final answer. In many production scenarios, this loop involves continuous communication with an MCP server.
The requests library, being synchronous and connection-agnostic, treats each of these steps as an independent transaction. This creates a massive mechanical inefficiency. Every tool call requires a new TCP handshake, TLS negotiation, and header transmission. In the world of AI latency, where milliseconds count toward user retention, this “one-off” model is a death sentence.
Furthermore, Pydantic AI V2’s shift toward Streaming responses required a fundamental shift away from requests‘ loading-based model. When an LLM streams tokens, the HTTP client cannot wait for the full response body. It must handle chunked encoding efficiently. HTTPX’s design, which treats responses as async iterable streams (response.aiter_bytes()), fits naturally into Pydantic V2’s token-generation pipeline, whereas requests would have required complex and brittle threading hacks to achieve the same non-blocking stream consumption.
Protocol Upgrades: HTTP/2 and the MCP Mandate
Search results indicate that Pydantic AI V2 is aggressively adopting the Model Context Protocol (MCP) using Streamable HTTP and SSE transports. This is where the technical rubber meets the road regarding requests.
The requests library, despite its dominance, has a notorious blind spot: HTTP/2. While the world has moved to HTTP/2 for multiplexing and server push, requests remains fundamentally an HTTP/1.1 library.
Pydantic AI’s MCP transport layer benefits immensely from HTTP/2 multiplexing. In a complex agent setup, a single Pydantic AI client might need to maintain a connection to an LLM provider (like OpenAI) while simultaneously communicating to a local MCP server for database queries and a third-party MCP server for calendar lookups.
With HTTP/1.1 (the domain of requests), managing these connections is a mess of threading and connection limits. With HTTP/2 (via HTTPX), a single connection can handle multiple concurrent requests and responses simultaneously without head-of-line blocking. For an asynchronous AI engine, this is non-negotiable. The shift to HTTPX allowed the Pydantic AI team to abstract away connection management entirely, allowing the developer to focus on the agent logic rather than the socket logistics.
The Validation Tightrope: Type Safety at the Edge
Pydantic’s core identity is validation. Pydantic V2 rewrote the validation engine in Rust, offering performance gains of 5x to 50x. However, validation cannot happen until the data arrives. If the data arrives slowly or in a fragmented way, the validation engine sits idle.
This created a pressure point between the HTTP layer and the Pydantic core. requests returns a Response object where .json() is a blocking decode. HTTPX, combined with Pydantic, allows for a much tighter integration. Pydantic AI V2 leverages HTTPX’s ability to handle AsyncClient to parse JSON incrementally.
Moreover, the modern AI ecosystem demands “type safety at the edge.” Libraries like dify-oapi2 explicitly list their tech stack as “Pydantic 2.x + HTTPX”. The reason is simple: requests‘ dynamic typing forces runtime type checks that break linters and IDEs. HTTPX is fully type-annotated, allowing Pydantic AI V2 to perform static analysis on the HTTP calls themselves. In a V2 world where agents write code and call APIs autonomously, the compiler must understand the shape of the network boundary; requests‘ “duck typing” approach was too dangerous for deterministic agent execution.
Retry Logic, Timeouts, and Enterprise Survival
Production AI is brittle. APIs fail, rate limits hit, networks lag. Pydantic AI V2 introduced advanced retry mechanisms built on the tenacity library, specifically designed to work with HTTP transports. While you can theoretically wrap requests in a retry loop, doing so efficiently with proper backoff and connection management is notoriously difficult because requests blocks the thread during the wait.
Pydantic AI V2 demanded a non-blocking retry transport. The AsyncTenacityTransport in HTTPX allows the event loop to handle other tasks while waiting for a retry timeout.
Consider a scenario: An agent calls a rate-limited API. The server returns a 429 error with a Retry-After: 30 header.
- In Requests: The thread sleeps for 30 seconds. In a web server (e.g., FastAPI/Uvicorn), that entire worker is locked, unable to serve other requests. This scales horribly.
- In HTTPX (Pydantic AI V2): The async call awaits the retry. The event loop frees the thread to handle other incoming requests or other agent tasks. The 30-second wait is virtually free in terms of system resources.
Furthermore, Pydantic AI V2 pushes responsibility for connection lifecycle to the user. The documentation explicitly shows developers passing a pre-configured httpx.AsyncClient to MCP servers to handle mTLS, custom CAs, or specific proxy configurations. This “dependency injection” style of client management is impossible with requests‘ global singleton defaults but is a first-class feature of HTTPX.
The Breaking Point: Pydantic V2 Validation vs. HTTP Construction
There is a specific, mechanical breaking point found in large migrations (such as the InvokeAI upgrade to Pydantic V2) that forced the HTTP client change. Pydantic V2 moved its validation core to Rust (pydantic-core). This creates a strict separation between Python objects and validated data.
In the requests ecosystem, developers often abuse the library by passing Pydantic models directly to the params or data arguments, relying on requests to convert them to primitives. However, with Pydantic V2’s strict Rust boundaries, this becomes a serialization nightmare. The Python model must be explicitly dumped to JSON or a dict before the HTTP client touches it.
HTTPX aligns much better with Pydantic V2’s “explicit is better than implicit” mantra. It does not try to magically “guess” how to serialize your Pydantic model. Instead, it expects you to use json=model.model_dump() (the V2 method, replacing .dict()). This explicit contract prevents the silent data corruption that could occur in V1 when requests coerced complex objects into strings.
The Pydantic AI V2 maintainers recognized that maintaining backwards compatibility with the requests way of doing things—magic serialization and blocking IO—would require building a compatibility shim that negated the performance gains of the Rust validator. It was easier to drop requests entirely and standardize on HTTPX.
The “Context” Manager Mandate
Pydantic AI V2 introduces a strict usage pattern involving context managers for MCP servers:
async with agent:
# Run agent logic here
# Connections are opened and heldThis pattern assumes that the underlying HTTP client is a long-lived, reusable pool of connections. requests.Session allows for connection pooling, but it is not safe to use across multiple threads or asyncio tasks without heavy locking. HTTPX’s AsyncClient is explicitly designed to be shared and reused in an async context manager.
If Pydantic AI V2 tried to use requests.Session in an async context, it would block the event loop during network IO. If it tried to use requests without a session, it would leak TCP connections (the dreaded “too many open files” error) under the high throughput of an AI agent making thousands of sequential tool calls.
HTTPX provides the “glue” that allows Pydantic AI V2 to treat the network like a database connection pool: open once, use many times, close gracefully. This resource management is essential for long-running agent deployments, which V2 targets specifically over V1’s simpler request-response model.
Conclusion: The Irreversible Shift
The overhaul of the HTTP client in Pydantic AI V2 was not a trend-chasing whim. It was a structural necessity driven by the physics of asynchronous processing, the demands of HTTP/2 multiplexing for MCP, and the strict validation boundaries of Pydantic V2’s Rust core.
While requests remains a fantastic tool for data scientists running ETL scripts or simple API fetches, it failed the “Agent Test.” An AI Agent is a living loop, not a batch job. It requires a client that can stream, multiplex, retry without blocking, and manage connections intelligently.
By demanding a total overhaul—dropping the legacy synchrony of requests and fully embracing the async-first, type-safe architecture of HTTPX—Pydantic AI V2 has defined the baseline for Python AI development for the next five years. Reverting to the old ways would mean reverting to slower agents, more brittle connections, and validation errors that crash the runtime.
The message from the Pydantic team is clear: The age of the polite, synchronous, one-off HTTP request for AI is over. The age of the persistent, streaming, async-first transport layer has begun.




