How to Configure Python Requests to Use a Proxy?

Using proxies with Python's Requests module can be very useful for web scraping and accessing web data. Proxies allow you to mask your real IP address, bypass geographic restrictions, rotate IP addresses to avoid getting blocked, and more.

In this comprehensive guide, I'll explain how to configure the Python Requests module to use proxies. After reading, you'll know how to easily use proxies of all types in your Python scripts to scrape and access web data anonymously.

What are Proxies and Why Use Them?

A proxy acts as an intermediary between your computer and the wider internet. When you use a proxy, instead of connecting directly to a website, your traffic first goes through the proxy server which then connects to the site.

This means the website you are accessing will not see your true IP address – it will instead see the IP of the proxy server. There are several benefits to using proxies:

  • Hide your real IP address – Your IP can be used to identify and track you. Proxies allow you to mask your real IP.
  • Bypass geographic restrictions – Many sites restrict content based on IP address location. Proxies often let you appear to connect from different countries and access geo-restricted content.
  • Avoid getting blocked – When doing web scraping, using the same IP over and over can get you blocked. Proxies enable rotating different IP addresses to minimize the risk of blocks.
  • Improve performance – When scraping data, proxies can help distribute the load over many IPs to speed up requests.
  • Enhanced privacy – By masking your real IP and location, proxies provide greater privacy.

For web scraping and accessing web data, proxies are extremely useful to avoid blocks, bypass geographic restrictions, and scrape data faster.

Types of Proxies

There are a few different types of proxies that can be used:

  • HTTP proxies – These forward HTTP and HTTPS traffic. HTTP proxies are the most common and work for most sites.
  • SOCKS proxies – SOCKS5 proxies are more flexible and can handle almost any TCP connection including HTTP, HTTPS, FTP etc. SOCKS proxies also tend to be more anonymous.
  • Residential proxies – These are proxies running on real devices in residential IP ranges. They mimic real users closely so are less likely to get blocked. But they are more expensive.
  • Datacenter proxies – Proxies hosted in data centers on servers. They are fast and reliable, but being datacenter IPs they are easier to detect.
  • Shared proxies – Low cost proxies shared by many users. They are unreliable and often get blocked.
  • Private/Dedicated proxies – More expensive but dedicated to you only so are very reliable. Useful for large scale scraping.

The main types you'll likely use are HTTP, SOCKS5 and residential proxies, depending on your use case.

Setting a Global Proxy in Python Requests

The easiest way to use a proxy with the Python Requests module is to set a global proxy that will be used for all requests. This just requires passing a proxy URL to the proxies parameter when making requests:

import requests

proxy = "http://52.211.6.77:3080" 

proxies = {
  "http": proxy,
  "https": proxy
}

response = requests.get("http://example.com", proxies=proxies)

Here we define a proxy URL and set it as the http and https proxies. All requests made will now route through this proxy. The proxy URL format is:

protocol://IP:PORT

For example:

  • http://123.45.67.89:8080 – HTTP proxy
  • socks5://123.45.67.89:9090 – SOCKS5 proxy

We can also set an HTTPS proxy separately if needed:

proxies = {
  "http": "http://34.138.82.11:3500",
  "https": "https://161.18.82.13:9090", 
}

This lets you use separate proxies for HTTP and HTTPS requests.

Setting a Proxy Per Request

Instead of a global proxy, you can configure proxies on a per request basis. This involves specifying the proxies parameter when making each request:

import requests

proxy = "http://52.211.6.77:3080"

response = requests.get("http://example.com", proxies={"http": proxy}) 
response2 = requests.get("http://example2.com", proxies={"http": proxy})

Here we pass the proxies into each request rather than defining them globally. This allows using different proxies per request, which is useful when rotating proxies to avoid blocks.

Using a Proxy for a Specific Domain

Another option is to use a proxy for a specific domain only. This can be useful if you want all requests to go direct except for some problematic sites that require a proxy. To do this, specify the domain in the proxies dictionary:

import requests

proxies = {
  "http://problematic-site.com": "http://52.211.6.77:3080",
}

# this will use the proxy
response = requests.get("http://problematic-site.com/data")

# this will connect directly without a proxy
response = requests.get("http://not-problematic-site.com/data")

Now only requests to problematic-site.com will use the defined proxy. All other requests will connect directly.

Using Authentication with Proxies

Some proxies require authentication, in which case the username/password can be embedded in the proxy URL:

http://username:password@IP:PORT

For example:

proxy = "http://scraper:p@[email protected]:80"

This will authenticate using the username scraper and password p@ssw0rd. If the username or password contains special characters like @, they will need to be URL encoded:

from urllib.parse import quote

username = "[email protected]"
password = "p@ssword@123"

encoded_username = quote(username)
encoded_password = quote(password)

proxy = f"http://{encoded_username}:{encoded_password}@123.45.67.8:80"

This URL encodes the username and password before embedding them into the proxy URL.

Rotating Proxies

When scraping large amounts of data, it's common to rotate proxies, using a different proxy IP for each request. This prevents the scraping from originating only from a single IP address, lowering the chance of getting blocked. Here is an example of how to rotate proxies in Python:

import requests
from random import choice

proxy_list = [
  "http://52.211.6.77:3080",
  "http://56.33.11.123:2231",
  "http://121.34.22.11:5800",
]

for _ in range(100):
  # select random proxy
  proxy = choice(proxy_list)
  proxies = {"http": proxy}
  
  response = requests.get("http://example.com", proxies=proxies)
  
  # do something with response...

This chooses a random proxy from the list for each request, enabling rotating IP addresses. There are also services like BrightData, and Smartproxy that provide API access to thousands of proxies to make rotating easy.

Setting Proxies via Environment Variables

An alternative to setting proxies in code is to use the standard proxy environment variables:

HTTP_PROXY
HTTPS_PROXY
ALL_PROXY

For example:

export HTTP_PROXY="http://52.211.6.77:80"
export HTTPS_PROXY="https://52.211.6.77:80"

Any requests made will automatically pick up these proxy settings. This can be useful for quick testing, but environment variables don't allow the flexibility of rotating proxies, using different proxies per domain, etc.

Common Proxy Issues

There are some common issues that can occur when using proxies:

  • Access denied – If you receive access denied errors, it often means the proxy requires authentication, so you need to include username/password in the proxy URL.
  • Connection timeouts – Timeouts when connecting to proxies usually means the proxy is unreliable and needs replacing.
  • SSL errors – Getting SSL certificate errors can mean you need to upgrade Requests to use PySocks, or the proxy doesn't properly support SSL.
  • Too many redirects – Endless redirects usually means the proxy doesn't properly support HTTP redirects.
  • Getting blocked – If you scraping intensively with a small pool of proxies, the proxies themselves may get blocked. Use larger proxy pools and rotate IPs frequently.
  • Slow proxies – Some proxy services overload proxies, making them slow. Test response times and switch to more reputable providers if needed.
  • Unstable proxies – Similarly, low grade proxy services often have reliability issues. Again, switching to more reliable providers like BrightData usually solves these problems.

Debugging and troubleshooting proxies takes some initial trial and error – but once up and running, proxies provide invaluable anonymity for web scraping.

Conclusion

Properly using proxies allows you to scrape and access web data at scale without getting blocked. It prevents your real IP from being exposed, bypasses geographic restrictions, and speeds up requests. I recommend starting out with reputable proxy services like BrightData, Smartproxy, Soax, or Proxy-Seller which make it easy to continuously rotate thousands of reliable, fully anonymous proxies.

The proxies landscape does change over time as sites actively try to block proxies. So you need to continually test and evolve your approaches. But by implementing robust proxy handling you can scrape almost any site successfully.

I hope this guide has provided a good overview of how to properly leverage proxies within your Python code. Proxies are an essential tool for any serious web scraper or data analyst!

John Rooney

John Rooney

John Watson Rooney, a self-taught Python developer and content creator with a focus on web scraping, APIs, and automation. I love sharing my knowledge and expertise through my YouTube channel, My channel caters to all levels of developers, from beginners looking to get started in web scraping to experienced programmers seeking to advance their skills with modern techniques. I have worked in the e-commerce sector for many years, gaining extensive real-world experience in data handling, API integrations, and project management. I am passionate about teaching others and simplifying complex concepts to make them more accessible to a wider audience. In addition to my YouTube channel, I also maintain a personal website where I share my coding projects and other related content.

We will be happy to hear your thoughts

      Leave a reply

      Proxy-Zone
      Compare items
      • Total (0)
      Compare
      0