Using proxies with Python's Requests module can be very useful for web scraping and accessing web data. Proxies allow you to mask your real IP address, bypass geographic restrictions, rotate IP addresses to avoid getting blocked, and more.
In this comprehensive guide, I'll explain how to configure the Python Requests module to use proxies. After reading, you'll know how to easily use proxies of all types in your Python scripts to scrape and access web data anonymously.
What are Proxies and Why Use Them?
A proxy acts as an intermediary between your computer and the wider internet. When you use a proxy, instead of connecting directly to a website, your traffic first goes through the proxy server which then connects to the site.
This means the website you are accessing will not see your true IP address – it will instead see the IP of the proxy server. There are several benefits to using proxies:
- Hide your real IP address – Your IP can be used to identify and track you. Proxies allow you to mask your real IP.
- Bypass geographic restrictions – Many sites restrict content based on IP address location. Proxies often let you appear to connect from different countries and access geo-restricted content.
- Avoid getting blocked – When doing web scraping, using the same IP over and over can get you blocked. Proxies enable rotating different IP addresses to minimize the risk of blocks.
- Improve performance – When scraping data, proxies can help distribute the load over many IPs to speed up requests.
- Enhanced privacy – By masking your real IP and location, proxies provide greater privacy.
For web scraping and accessing web data, proxies are extremely useful to avoid blocks, bypass geographic restrictions, and scrape data faster.
Types of Proxies
There are a few different types of proxies that can be used:
- HTTP proxies – These forward HTTP and HTTPS traffic. HTTP proxies are the most common and work for most sites.
- SOCKS proxies – SOCKS5 proxies are more flexible and can handle almost any TCP connection including HTTP, HTTPS, FTP etc. SOCKS proxies also tend to be more anonymous.
- Residential proxies – These are proxies running on real devices in residential IP ranges. They mimic real users closely so are less likely to get blocked. But they are more expensive.
- Datacenter proxies – Proxies hosted in data centers on servers. They are fast and reliable, but being datacenter IPs they are easier to detect.
- Shared proxies – Low cost proxies shared by many users. They are unreliable and often get blocked.
- Private/Dedicated proxies – More expensive but dedicated to you only so are very reliable. Useful for large scale scraping.
The main types you'll likely use are HTTP, SOCKS5 and residential proxies, depending on your use case.
Setting a Global Proxy in Python Requests
The easiest way to use a proxy with the Python Requests module is to set a global proxy that will be used for all requests. This just requires passing a proxy URL to the proxies
parameter when making requests:
import requests proxy = "http://52.211.6.77:3080" proxies = { "http": proxy, "https": proxy } response = requests.get("http://example.com", proxies=proxies)
Here we define a proxy URL and set it as the http
and https
proxies. All requests made will now route through this proxy. The proxy URL format is:
protocol://IP:PORT
For example:
http://123.45.67.89:8080
– HTTP proxysocks5://123.45.67.89:9090
– SOCKS5 proxy
We can also set an HTTPS proxy separately if needed:
proxies = { "http": "http://34.138.82.11:3500", "https": "https://161.18.82.13:9090", }
This lets you use separate proxies for HTTP and HTTPS requests.
Setting a Proxy Per Request
Instead of a global proxy, you can configure proxies on a per request basis. This involves specifying the proxies
parameter when making each request:
import requests proxy = "http://52.211.6.77:3080" response = requests.get("http://example.com", proxies={"http": proxy}) response2 = requests.get("http://example2.com", proxies={"http": proxy})
Here we pass the proxies into each request rather than defining them globally. This allows using different proxies per request, which is useful when rotating proxies to avoid blocks.
Using a Proxy for a Specific Domain
Another option is to use a proxy for a specific domain only. This can be useful if you want all requests to go direct except for some problematic sites that require a proxy. To do this, specify the domain in the proxies dictionary:
import requests proxies = { "http://problematic-site.com": "http://52.211.6.77:3080", } # this will use the proxy response = requests.get("http://problematic-site.com/data") # this will connect directly without a proxy response = requests.get("http://not-problematic-site.com/data")
Now only requests to problematic-site.com
will use the defined proxy. All other requests will connect directly.
Using Authentication with Proxies
Some proxies require authentication, in which case the username/password can be embedded in the proxy URL:
http://username:password@IP:PORT
For example:
proxy = "http://scraper:p@[email protected]:80"
This will authenticate using the username scraper
and password p@ssw0rd
. If the username or password contains special characters like @
, they will need to be URL encoded:
from urllib.parse import quote username = "[email protected]" password = "p@ssword@123" encoded_username = quote(username) encoded_password = quote(password) proxy = f"http://{encoded_username}:{encoded_password}@123.45.67.8:80"
This URL encodes the username and password before embedding them into the proxy URL.
Rotating Proxies
When scraping large amounts of data, it's common to rotate proxies, using a different proxy IP for each request. This prevents the scraping from originating only from a single IP address, lowering the chance of getting blocked. Here is an example of how to rotate proxies in Python:
import requests from random import choice proxy_list = [ "http://52.211.6.77:3080", "http://56.33.11.123:2231", "http://121.34.22.11:5800", ] for _ in range(100): # select random proxy proxy = choice(proxy_list) proxies = {"http": proxy} response = requests.get("http://example.com", proxies=proxies) # do something with response...
This chooses a random proxy from the list for each request, enabling rotating IP addresses. There are also services like BrightData, and Smartproxy that provide API access to thousands of proxies to make rotating easy.
Setting Proxies via Environment Variables
An alternative to setting proxies in code is to use the standard proxy environment variables:
HTTP_PROXY HTTPS_PROXY ALL_PROXY
For example:
export HTTP_PROXY="http://52.211.6.77:80" export HTTPS_PROXY="https://52.211.6.77:80"
Any requests made will automatically pick up these proxy settings. This can be useful for quick testing, but environment variables don't allow the flexibility of rotating proxies, using different proxies per domain, etc.
Common Proxy Issues
There are some common issues that can occur when using proxies:
- Access denied – If you receive access denied errors, it often means the proxy requires authentication, so you need to include username/password in the proxy URL.
- Connection timeouts – Timeouts when connecting to proxies usually means the proxy is unreliable and needs replacing.
- SSL errors – Getting SSL certificate errors can mean you need to upgrade Requests to use PySocks, or the proxy doesn't properly support SSL.
- Too many redirects – Endless redirects usually means the proxy doesn't properly support HTTP redirects.
- Getting blocked – If you scraping intensively with a small pool of proxies, the proxies themselves may get blocked. Use larger proxy pools and rotate IPs frequently.
- Slow proxies – Some proxy services overload proxies, making them slow. Test response times and switch to more reputable providers if needed.
- Unstable proxies – Similarly, low grade proxy services often have reliability issues. Again, switching to more reliable providers like BrightData usually solves these problems.
Debugging and troubleshooting proxies takes some initial trial and error – but once up and running, proxies provide invaluable anonymity for web scraping.
Conclusion
Properly using proxies allows you to scrape and access web data at scale without getting blocked. It prevents your real IP from being exposed, bypasses geographic restrictions, and speeds up requests. I recommend starting out with reputable proxy services like BrightData, Smartproxy, Soax, or Proxy-Seller which make it easy to continuously rotate thousands of reliable, fully anonymous proxies.
The proxies landscape does change over time as sites actively try to block proxies. So you need to continually test and evolve your approaches. But by implementing robust proxy handling you can scrape almost any site successfully.
I hope this guide has provided a good overview of how to properly leverage proxies within your Python code. Proxies are an essential tool for any serious web scraper or data analyst!