Python Httpx vs Requests vs Aiohttp - The Key Differences

Python has a rich ecosystem of HTTP client libraries. When it comes to web scraping and HTTP clients, three libraries stand out as popular options:

Requests – The mature, feature-rich, sync HTTP library.
Aiohttp – An async HTTP client/server for fast, non-blocking requests.
Httpx – A next-gen, async HTTP client with HTTP/2 support.

So how do you choose between them and what are the key differences you need to know? In this comprehensive guide, we'll compare Requests, Aiohttp, and Httpx to highlight the strengths of each and help you decide which is best for your Python web scraping needs.

Overview

In the diverse landscape of Python HTTP clients, several options, such as Requests, Aiohttp, and Httpx have distinguished themselves. If you're wondering which is apt for your web scraping projects, consider the following:

Requests is the seasoned choice, perfect for simple scripts due to its reliability, user-friendliness, and expansive ecosystem.
Aiohttp stands out with its asynchronous capabilities, especially suited for crafting async web applications.
Httpx is swiftly gaining traction as it encapsulates the best features of both Requests and Aiohttp, offering cutting-edge performance and functionalities.

While the trusted Requests has served many well, the potency of Httpx in supporting asyncio, HTTP/2, and seamless integrations positions it as a premier HTTP client for modern, efficient web scraping. Let's continue to delve into comparative research.

High Level Comparison

Before diving into details, let's start with a high-level overview of how Requests, Aiohttp and Httpx differ:

Requests

Released in 2012, Requests is the oldest and most mature option. It has the richest ecosystem of supporting libraries and integrations.
Requests is synchronous and blocking – it does not support asyncio.
Simple and easy to use interface modeled after Python's standard urllib2 library. Great docs and tutorials available.
Lacks support for HTTP/2 and some modern features.

Aiohttp

Asynchronous HTTP client released in 2014 built on asyncio. Provides non-blocking I/O for better performance.
Also can act as an HTTP server making it great for building asynchronous web apps and scrapers.
Supports HTTP/1.1 but not HTTP/2.
Steeper learning curve than Requests due to asynchronous usage.

Httpx

Released in 2019, Httpx is the new, modern HTTP client for Python. It is asynchronous and supports HTTP/2.
Unifies the interfaces of Requests and Aiohttp into one fast, feature-rich library.
Fewer third-party integrations compared to Requests but quickly gaining popularity.

Sync vs Async – Performance

One of the biggest differences between these libraries is synchronous versus asynchronous requests.

Requests uses synchronous, blocking I/O. Each request must completely finish before another can be sent. This can impact performance when you need to send many requests.

Aiohttp and Httpx use asynchronous I/O via the asyncio module. This allows them to perform non-blocking requests in parallel rather than sequentially. Async performance is especially noticeable when sending multiple requests or when responses are delayed.

Let's compare them with a basic benchmark:

# Example to fetch 100 URLs sequentially

import requests
import time

urls = [#list of 100 urls]

start = time.time()

for url in urls:
    response = requests.get(url)

end = time.time()
print(f"Total time: {end - start}")

equests: ~28 seconds

Now the async version:

import httpx
import asyncio

async def get_url(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)  

start = time.time()

loop = asyncio.get_event_loop()
coroutines = [get_url(url) for url in urls]
results = loop.run_until_complete(asyncio.gather(*coroutines))

end = time.time()
print(f"Total time: {end - start}")

Httpx: ~3 seconds

Using async allows Httpx to send all requests in parallel. This provides significant performance benefits for fetching multiple URLs. Aiohttp will show similar async performance gains over Requests.

HTTP/2 Benefits

In addition to async I/O, Httpx also supports HTTP/2. This modern protocol provides further performance improvements:

Multiplexing – Multiple requests can be sent over one TCP connection, removing the lag of establishing new connections.
Server Push – The server can push additional resources to clients without waiting for new requests.
Header Compression – Reduces transferred header data volume for faster transfers.

According to benchmarks, these HTTP/2 features can provide 2-3x speed improvements in Httpx when fetching multiple resources from the same domain compared to HTTP/1.1:

# Fetch 100 SVG images from same domain

Requests time: 55 seconds 
Httpx time: 18 seconds

Aiohttp and Requests currently lack HTTP/2 support. This gives Httpx a big performance advantage when scraping modern websites utilizing HTTP/2.

Async Applications with Aiohttp

In addition to an async HTTP client, Aiohttp also provides an HTTP server. This allows you to build asynchronous Python web apps and APIs using asyncio. The server handles incoming requests while the client handles outgoing ones. Why does this matter for web scraping?

Scrapers often persist data to databases or pass it between other services. By using Aiohttp's client and server together, you can build a asynchronous scraper application that avoids unnecessary network overhead and improves performance.

For example:

async def handle_request(request):

    data = await scrape_page(request.url)
    
    await save_to_database(data)

    return web.Response(text=f"Scraped {request.url}")

app = web.Application()
app.add_routes([web.get('/', handle_request)])

# Run client and server together
async def main():
    server = await aiohttp_server(app)
    async with aiohttp_client() as client:
        await client.get("http://localhost:8080/")

asyncio.run(main())

This allows scraping requests to be handled asynchronously by the application without extra network hops.

Feature Comparison

Beyond high-level differences, Requests, Aiohttp, and Httpx contain similar feature sets but with varying APIs and implementations. Let's dive deeper on how they compare across common usage:

Sending Requests

All three provide simple, standard ways to make HTTP requests:

# Requests
requests.get("https://www.example.com") 

# Aiohttp 
async with aiohttp.ClientSession() as session:
    await session.get("https://www.example.com")
    
# Httpx
async with httpx.AsyncClient() as client:
    await client.get("https://www.example.com")

Httpx mirrors both Requests familiar interface and Aiohttp's async context manager approach.

Sessions & Connection Pooling

Reusing session connections and pools provides performance benefits. All three libraries support this:

# Requests
session = requests.Session()
session.get("https://example.com") 

# Aiohttp
async with aiohttp.ClientSession() as session:
    await session.get("https://example.com")
    
# Httpx 
client = httpx.AsyncClient()
await client.get("https://example.com")

Sessions handle connection persistency. Pools manage a reusable set of connections.

Httpx matches Requests API while still providing asyncio support.

Timeouts, Retries & Errors

Robust request handling is important for scrapers. All three libraries support configurable timeouts, retries, and error handling:

# Requests
requests.get("https://example.com", timeout=3.05)
response.raise_for_status()

# Aiohttp
try:
    async with timeout(3.05), session.get("https://example.com") as response:
         response.raise_for_status()
except aiohttp.ClientResponseError:
    print("Request failed")
         
# Httpx          
client = httpx.AsyncClient(timeout=3.05)
try:
    await client.get("https://example.com") 
except httpx.ConnectTimeout:
    client.retries = 3
    await client.get("https://example.com")

The async nature of Aiohttp and Httpx requires special async syntax for features like timeouts.

Proxies

Proxied requests are important for careful web scraping. All three libraries allow proxying requests:

# Requests
proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}
requests.get("https://example.com", proxies=proxies)

# Aiohttp
connector = aiohttp.ProxyConnector(proxy="http://10.10.1.10:3128") 
async with aiohttp.ClientSession(connector=connector) as session:
  await session.get('https://example.com')
  
# Httpx
client = httpx.AsyncClient(proxies={'all://': 'http://myproxy.com'})
await client.get("https://example.com")

Httpx matches Requests proxy syntax while supporting async actions.

Cookies, Headers & Redirects

All three clients handle HTTP features like cookies, headers, and redirects:

# Handle cookies
session.cookies.set("sessionid", "1234abc")

# Custom headers
headers = {"User-Agent": "MyScraper 1.0"}
response = session.get("https://example.com", headers=headers)

# Handle redirects
response = session.get("https://example.com", allow_redirects=True)

Httpx and Aiohttp provide async friendly interfaces like ClientResponse for accessing headers and cookies. Overall capabilities are similar.

Streaming & Downloads

For large responses, streaming & chunked downloads are supported:

# Stream response
with requests.get("https://example.com/bigfile") as response:
    for chunk in response.iter_content(1024):
        print(chunk)

# Save response directly to file  
with open("bigfile.zip", "wb") as f:
    f.write(response.content)

Aiohttp and Httpx provide async asynchronous iterators and context managers for streaming.

Ecosystem & Utils

Given its maturity, Requests has the biggest ecosystem of supporting libraries and integrations:

Utility packages like requests-html for parsing HTML responses.
Packages like requests-cache for caching responses.
Integration with data science tools like Pandas and NumPy.

Aiohttp and Httpx have smaller ecosystems since they are newer. But both have robust util packages:

aiohttp-socks – SOCKS proxy support for Aiohttp.
httpx-oauth – OAuth 1.0 and 2.0 support for Httpx.

Over time, expect the Httpx and Aiohttp ecosystems to grow and match Requests capabilities.

Use Cases & Recommendations

Given their differences and tradeoffs, when should you choose Requests, Aiohttp or Httpx?

Requests

Simplicity and synchronous usage is preferred.
Already have existing code using Requests.
Need compatibility with a library or tool only supporting Requests.
Require some complex Requests ecosystem feature like caching or integrations.

Aiohttp

Require asynchronous performance for many requests or slow websites.
Building asynchronous web applications along with scraping capability.
Plan to integrate with other async tools like asyncio queues.

Httpx

Require modern performance features like HTTP/2 and asyncio support.
Building a new application without existing legacy dependencies.
Prefer a single, integrated solution combining Requests and Aiohttp pros.

If I had to choose just one for a robust web scraper, I would go with Httpx since it combines the best of both worlds with HTTP/2 and asyncio capability. However, many scrapers leverage multiple libraries – using Requests for simplicity and Aiohttp when asynchronous performance matters.

Example Code Snippets

To demonstrate usage, here are some examples of how these clients can accomplish common web scraping tasks:

Fetch a page and extract the title:

# Requests
import requests 

resp = requests.get("https://example.com")
print(resp.text.split("<title>")[1].split("</title>")[0])

# Aiohttp
import asyncio
import aiohttp

async def get_title(url):
  async with aiohttp.ClientSession() as session:
    async with session.get(url) as resp:
      data = await resp.text()
      print(data.split("<title>")[1].split("</title>")[0])

asyncio.run(get_title("https://example.com"))

# Httpx
import httpx

async with httpx.AsyncClient() as client:
  resp = await client.get("https://example.com") 
  print(resp.text.split("<title>")[1].split("</title>")[0])

Speed up multiple requests with async:

# Aiohttp
import asyncio
import aiohttp

async def fetch(url, session):
  async with session.get(url) as response:
    return await response.read()

async def main():
  
  async with aiohttp.ClientSession() as session:
    urls = [
      "https://example.com/1",
      "https://example.com/2",  
      # etc
    ]
    coroutines = [fetch(url, session) for url in urls]
    results = await asyncio.gather(*coroutines)

asyncio.run(main())

# Httpx
import httpx
import asyncio

async with httpx.AsyncClient() as client:

  tasks = [client.get(url) for url in urls]

  results = await asyncio.gather(*tasks)

Make proxied requests:

# Requests
import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',  
}
requests.get("https://example.com", proxies=proxies)

# Aiohttp 
import aiohttp

async with aiohttp.ClientSession(
  connector=aiohttp.ProxyConnector.from_url('http://10.10.1.10:3128')) as session:
    
  await session.get('https://example.com') 
  
# Httpx
import httpx

async with httpx.AsyncClient(
  proxies={'all://': 'http://10.10.1.10:3128'}) as client:

  await client.get('https://example.com')