cURL is a versatile command-line tool that allows you to transfer data to and from a server. With cURL, you can make requests, download files, authenticate with servers, and much more. While cURL originated as a Linux/Unix tool, it can also be easily used in Python scripts. In this comprehensive guide, we'll cover how to make HTTP requests with cURL using Python.
Overview of cURL
cURL stands for “Client URL.” It is a command-line tool that is installed by default on most Linux/Unix systems and macOS. cURL allows you to communicate with various servers using major internet protocols like HTTP, HTTPS, FTP, and more. Some key features of cURL include:
- Making HTTP GET and POST requests
- Downloading and uploading files via FTP, SFTP, and SCP
- Support for proxies, cookies, authentication, and SSL
- Follow redirects and download page contents
- Submit forms and POST JSON data
- Set custom headers and user agent strings
cURL is highly flexible – you can customize requests in many ways using various command line options and arguments. While cURL originated as a command line tool, it can also be used in code via libraries that wrap the underlying libcurl
library. This allows you to harness the power of cURL in your Python applications.
Using cURL in Python with pycurl
The most popular way to use cURL in Python code is via the pycurl
library. pycurl
provides a Python interface to the libcurl
C library. It has nearly all the same capabilities and features as the command-line tool.
To use pycurl
, first install it via pip:
pip install pycurl
The main class in pycurl
is Curl
. Here's how to make a simple GET request:
import pycurl curl = pycurl.Curl() curl.setopt(curl.URL, 'https://www.example.com') curl.perform()
This initializes a Curl
object, sets the URL option, and performs the request. However, in most cases you'll also want to pass data and handle the response. Let's look at a more complete example:
import pycurl from io import BytesIO buffer = BytesIO() curl = pycurl.Curl() curl.setopt(curl.URL, 'https://www.example.com') # Write response data to buffer curl.setopt(curl.WRITEDATA, buffer) # Perform request curl.perform() # End curl session curl.close() # Get response code response_code = curl.getinfo(curl.RESPONSE_CODE) # Get response data response_data = buffer.getvalue() # Print response print(response_code, response_data.decode('utf-8'))
Here we create a BytesIO
buffer to store the response. We pass this buffer to setopt()
to save the response data. After calling perform()
, we can get the response HTTP code and data. This is the basic pattern for making requests with pycurl
and handling responses. Now let's go over some other common options and use cases.
Setting Options in pycurl
pycurl
has tons of options that can be set with setopt()
to control how requests are made. Some common options include:
# Set request URL curl.setopt(curl.URL, 'https://www.example.com') # Follow redirects curl.setopt(curl.FOLLOWLOCATION, True) # Set referer header curl.setopt(curl.REFERER, 'https://www.google.com') # Set user agent curl.setopt(curl.USERAGENT, 'Mozilla/5.0') # Include response headers curl.setopt(curl.HEADER, True) # Enable verbose mode curl.setopt(curl.VERBOSE, True) # Maximum redirects to follow curl.setopt(curl.MAXREDIRS, 5) # Timeout in seconds curl.setopt(curl.TIMEOUT, 10) # Continue/resume a previous download curl.setopt(curl.RESUME_FROM, 10)
Refer to the full list of options to see all available configurations.
Making POST Requests
By default, pycurl
will make GET requests. To make POST requests, you need to set some additional options:
import pycurl from io import BytesIO buffer = BytesIO() curl = pycurl.Curl() # Form data to send data = {'key1': 'value1', 'key2': 'value2'} # POST request curl.setopt(curl.POST, True) # Set request data curl.setopt(curl.POSTFIELDS, data) curl.setopt(curl.WRITEDATA, buffer) curl.perform()
We set the POST
option to True
, then pass the form data to POSTFIELDS
. You can also pass JSON data by encoding to a string first:
import json json_data = {'key1': 'value1'} # Encode JSON to string data = json.dumps(json_data) curl.setopt(curl.POSTFIELDS, data)
Setting Custom Headers
To set custom headers, use the HTTPHEADER
option:
headers = [ 'Content-Type: application/json', 'Authorization: Bearer 1234567' ] curl.setopt(curl.HTTPHEADER, headers)
The headers must be passed as a list of strings.
Handling Response Headers
By default, response headers are not included in the response data. To get headers, set the HEADER
option to True
:
# Include headers curl.setopt(curl.HEADER, True) # Get headers headers = curl.getinfo(curl.HEADER_SIZE) print(headers.decode('utf-8'))
The header data will be included along with the regular response data. Alternatively, you can have headers written to a separate file object:
header_buffer = BytesIO() curl.setopt(curl.HEADERFUNCTION, header_buffer.write) curl.perform() print(header_buffer.getvalue().decode('utf-8'))
Handling Cookies
pycurl
handles cookies automatically by default. Cookies from responses will be stored and sent with subsequent requests. You can also enable cookie writing to parse and inspect cookies:
curl.setopt(curl.COOKIEFILE, 'cookies.txt') curl.setopt(curl.COOKIEJAR, 'cookies.txt')
This will read cookies from and write cookies to the cookies.txt
file.
Setting a Proxy
To make requests through a proxy server, use the PROXY
and PROXYTYPE
options:
curl.setopt(curl.PROXY, 'http://192.168.0.100:8888') curl.setopt(curl.PROXYTYPE, pycurl.PROXYTYPE_HTTP)
Replace the proxy URL with your actual proxy address. You can also authenticate with the proxy using PROXYUSERPWD
:
proxy_user_pass = 'user123:password456' curl.setopt(curl.PROXYUSERPWD, proxy_user_pass)
Authenticating with Servers
To authenticate requests, use the USERNAME
and PASSWORD
options:
curl.setopt(curl.USERNAME, 'myusername') curl.setopt(curl.PASSWORD, 'mypassword')
This will use basic HTTP authentication. Note this sends the credentials in plain text – use HTTPS for secure authentication. For more advanced OAuth flows, you may need to handle authentication manually by generating and sending access tokens.
Downloading Files
A common use case for cURL is downloading files. As shown in the first example, you can download to a buffer:
buffer = BytesIO() curl.setopt(curl.WRITEDATA, buffer) curl.perform() # Save buffer to file with open('download.zip', 'wb') as f: f.write(buffer.getvalue())
Alternatively, you can have cURL write directly to a file:
with open('download.zip', 'wb') as f: curl.setopt(curl.WRITEDATA, f) curl.perform()
This avoids saving to an intermediate buffer. To resume a partial download, use the RESUME_FROM
option:
curl.setopt(curl.RESUME_FROM, 10)
This will continue from 10 bytes.
Uploading Files
To upload files, read the file contents into memory and pass them to POSTFIELDS
:
with open('upload.zip', 'rb') as f: file_data = f.read() curl.setopt(curl.POSTFIELDS, file_data)
If uploading large files, it's better to stream the contents rather than reading to memory all at once:
import functools file_size = os.path.getsize('upload.zip') def read_callback(ctx, dst, length): f.readinto(dst) return length with open('upload.zip', 'rb') as f: curl.setopt(curl.READFUNCTION, functools.partial(read_callback, f)) curl.setopt(curl.INFILESIZE_LARGE, file_size) curl.setopt(curl.UPLOAD, 1) curl.perform()
This streams chunks of the file to the server as needed.
Using SOCKS Proxies
In addition to standard HTTP proxies, pycurl
also supports SOCKS proxies:
import pycurl curl = pycurl.Curl() # Enable SOCKS5 proxy curl.setopt(curl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5) # Set proxy hostname/IP and port curl.setopt(curl.PROXY, 'socks5host:1080') # Perform request curl.perform()
SOCKS proxies like Bright Data, Smartproxy, Proxy-Seller, and Soax can be used to route connections based on destination IP rather than URL host. This allows you to access servers that don't have a public domain name.
Asynchronous Requests with multi_curl
The default pycurl.Curl
class only supports synchronous requests. To make asynchronous requests, you can use the multi_curl
package.
First install it:
pip install multi-curl
Then you can run multiple requests concurrently:
import multi_curl m = multi_curl.MultiCurl() # Add requests m.add_handle(pycurl.Curl()) # Curl object 1 m.add_handle(pycurl.Curl()) # Curl object 2 # Run all requests m.start() # Get individual responses for curl in m.handles: response = curl.get_response_data()
This provides an asynchronous concurrency model similar to other Python async libraries.
Debugging with Verbose Output
During development, it can be useful to enable verbose mode:
curl.setopt(curl.VERBOSE, True)
This will print detailed information about each request, including:
- HTTP headers sent and received
- Curl info such as redirects followed
- Connection timeouts and retries
- SSL/TLS connection info
Verbose output is great for debugging issues with requests and responses.
Comparison to Requests
The requests
library provides a simpler interface compared to pycurl
. Here are some key differences:
requests
has a nicer API whilepycurl
exposes low-level optionspycurl
supports more advanced features like custom auth and proxiesrequests
has built-in async support whilepycurl
requiresmulti_curl
requests
has some implicit behavior vspycurl
which is more explicit
In general, requests
is better for basic HTTP calls in scripts. pycurl
gives you lower-level control for advanced use cases.
Pros and Cons of pycurl
Some pros of using pycurl
:
- Mature and well-supported library
- Exposes numerous low-level configuration options
- Easily replicate behaviors of cURL command line
- Supports advanced features like custom auth and proxies
- Can provide better performance vs high-level libraries
Some cons:
- More complex API than requests and httpx
- Requires more low-level handling of datastreams
- Does not support async natively – needs separate lib
- Less common Python skill vs requests usage
Conclusion
The pycurl library offers seamless integration of cURL's capabilities within Python programming, enabling developers to effectively replicate various cURL commands for HTTP requests, file transfers, downloads, and uploads using just a few lines of code.
Although it presents a lower-level API compared to the ‘requests' module, pycurl shines in scenarios where detailed control and full utilization of cURL's features are necessary. This makes it an exceptionally versatile tool for crafting Python scripts that interact with web APIs, perform web scraping tasks, and handle a multitude of other web-related functions.