How to Use cURL in Python?

cURL is a versatile command-line tool that allows you to transfer data to and from a server. With cURL, you can make requests, download files, authenticate with servers, and much more. While cURL originated as a Linux/Unix tool, it can also be easily used in Python scripts. In this comprehensive guide, we'll cover how to make HTTP requests with cURL using Python.

Overview of cURL

cURL stands for “Client URL.” It is a command-line tool that is installed by default on most Linux/Unix systems and macOS. cURL allows you to communicate with various servers using major internet protocols like HTTP, HTTPS, FTP, and more. Some key features of cURL include:

Making HTTP GET and POST requests
Downloading and uploading files via FTP, SFTP, and SCP
Support for proxies, cookies, authentication, and SSL
Follow redirects and download page contents
Submit forms and POST JSON data
Set custom headers and user agent strings

cURL is highly flexible – you can customize requests in many ways using various command line options and arguments. While cURL originated as a command line tool, it can also be used in code via libraries that wrap the underlying libcurl library. This allows you to harness the power of cURL in your Python applications.

Using cURL in Python with `pycurl`

The most popular way to use cURL in Python code is via the pycurl library. pycurl provides a Python interface to the libcurl C library. It has nearly all the same capabilities and features as the command-line tool.

To use pycurl, first install it via pip:

pip install pycurl

The main class in pycurl is Curl. Here's how to make a simple GET request:

import pycurl

curl = pycurl.Curl()
curl.setopt(curl.URL, 'https://www.example.com')
curl.perform()

This initializes a Curl object, sets the URL option, and performs the request. However, in most cases you'll also want to pass data and handle the response. Let's look at a more complete example:

import pycurl
from io import BytesIO

buffer = BytesIO()
curl = pycurl.Curl()

curl.setopt(curl.URL, 'https://www.example.com')

# Write response data to buffer
curl.setopt(curl.WRITEDATA, buffer)

# Perform request
curl.perform()

# End curl session
curl.close() 

# Get response code
response_code = curl.getinfo(curl.RESPONSE_CODE)

# Get response data
response_data = buffer.getvalue()

# Print response
print(response_code, response_data.decode('utf-8'))

Here we create a BytesIO buffer to store the response. We pass this buffer to setopt() to save the response data. After calling perform(), we can get the response HTTP code and data. This is the basic pattern for making requests with pycurl and handling responses. Now let's go over some other common options and use cases.

Setting Options in pycurl

pycurl has tons of options that can be set with setopt() to control how requests are made. Some common options include:

# Set request URL 
curl.setopt(curl.URL, 'https://www.example.com')

# Follow redirects
curl.setopt(curl.FOLLOWLOCATION, True)

# Set referer header
curl.setopt(curl.REFERER, 'https://www.google.com')

# Set user agent 
curl.setopt(curl.USERAGENT, 'Mozilla/5.0') 

# Include response headers 
curl.setopt(curl.HEADER, True)

# Enable verbose mode
curl.setopt(curl.VERBOSE, True)

# Maximum redirects to follow
curl.setopt(curl.MAXREDIRS, 5)

# Timeout in seconds
curl.setopt(curl.TIMEOUT, 10)

# Continue/resume a previous download
curl.setopt(curl.RESUME_FROM, 10)

Refer to the full list of options to see all available configurations.

Making POST Requests

By default, pycurl will make GET requests. To make POST requests, you need to set some additional options:

import pycurl
from io import BytesIO 

buffer = BytesIO()
curl = pycurl.Curl()

# Form data to send
data = {'key1': 'value1', 'key2': 'value2'}

# POST request
curl.setopt(curl.POST, True)

# Set request data 
curl.setopt(curl.POSTFIELDS, data) 

curl.setopt(curl.WRITEDATA, buffer)

curl.perform()

We set the POST option to True, then pass the form data to POSTFIELDS. You can also pass JSON data by encoding to a string first:

import json 

json_data = {'key1': 'value1'}

# Encode JSON to string
data = json.dumps(json_data)

curl.setopt(curl.POSTFIELDS, data)

Setting Custom Headers

To set custom headers, use the HTTPHEADER option:

headers = [
    'Content-Type: application/json',
    'Authorization: Bearer 1234567'  
]

curl.setopt(curl.HTTPHEADER, headers)

The headers must be passed as a list of strings.

Handling Response Headers

By default, response headers are not included in the response data. To get headers, set the HEADER option to True:

# Include headers
curl.setopt(curl.HEADER, True)

# Get headers
headers = curl.getinfo(curl.HEADER_SIZE)
print(headers.decode('utf-8'))

The header data will be included along with the regular response data. Alternatively, you can have headers written to a separate file object:

header_buffer = BytesIO()
curl.setopt(curl.HEADERFUNCTION, header_buffer.write)

curl.perform()

print(header_buffer.getvalue().decode('utf-8'))

Handling Cookies

pycurl handles cookies automatically by default. Cookies from responses will be stored and sent with subsequent requests. You can also enable cookie writing to parse and inspect cookies:

curl.setopt(curl.COOKIEFILE, 'cookies.txt')
curl.setopt(curl.COOKIEJAR, 'cookies.txt')

This will read cookies from and write cookies to the cookies.txt file.

Setting a Proxy

To make requests through a proxy server, use the PROXY and PROXYTYPE options:

curl.setopt(curl.PROXY, 'http://192.168.0.100:8888') 
curl.setopt(curl.PROXYTYPE, pycurl.PROXYTYPE_HTTP)

Replace the proxy URL with your actual proxy address. You can also authenticate with the proxy using PROXYUSERPWD:

proxy_user_pass = 'user123:password456'
curl.setopt(curl.PROXYUSERPWD, proxy_user_pass)

Authenticating with Servers

To authenticate requests, use the USERNAME and PASSWORD options:

curl.setopt(curl.USERNAME, 'myusername')
curl.setopt(curl.PASSWORD, 'mypassword')

This will use basic HTTP authentication. Note this sends the credentials in plain text – use HTTPS for secure authentication. For more advanced OAuth flows, you may need to handle authentication manually by generating and sending access tokens.

Downloading Files

A common use case for cURL is downloading files. As shown in the first example, you can download to a buffer:

buffer = BytesIO()
curl.setopt(curl.WRITEDATA, buffer)
curl.perform()

# Save buffer to file
with open('download.zip', 'wb') as f:
    f.write(buffer.getvalue())

Alternatively, you can have cURL write directly to a file:

with open('download.zip', 'wb') as f:
    curl.setopt(curl.WRITEDATA, f)
    curl.perform()

This avoids saving to an intermediate buffer. To resume a partial download, use the RESUME_FROM option:

curl.setopt(curl.RESUME_FROM, 10)

This will continue from 10 bytes.

Uploading Files

To upload files, read the file contents into memory and pass them to POSTFIELDS:

with open('upload.zip', 'rb') as f:
    file_data = f.read()
    
curl.setopt(curl.POSTFIELDS, file_data)

If uploading large files, it's better to stream the contents rather than reading to memory all at once:

import functools

file_size = os.path.getsize('upload.zip')

def read_callback(ctx, dst, length):
    f.readinto(dst)
    return length

with open('upload.zip', 'rb') as f:
    curl.setopt(curl.READFUNCTION, functools.partial(read_callback, f))
    curl.setopt(curl.INFILESIZE_LARGE, file_size)
    curl.setopt(curl.UPLOAD, 1)
    
    curl.perform()

This streams chunks of the file to the server as needed.

Using SOCKS Proxies

In addition to standard HTTP proxies, pycurl also supports SOCKS proxies:

import pycurl

curl = pycurl.Curl()

# Enable SOCKS5 proxy
curl.setopt(curl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5) 

# Set proxy hostname/IP and port
curl.setopt(curl.PROXY, 'socks5host:1080')

# Perform request 
curl.perform()

SOCKS proxies like Bright Data, Smartproxy, Proxy-Seller, and Soax can be used to route connections based on destination IP rather than URL host. This allows you to access servers that don't have a public domain name.

Asynchronous Requests with `multi_curl`

The default pycurl.Curl class only supports synchronous requests. To make asynchronous requests, you can use the multi_curl package.

First install it:

pip install multi-curl

Then you can run multiple requests concurrently:

import multi_curl

m = multi_curl.MultiCurl()

# Add requests
m.add_handle(pycurl.Curl()) # Curl object 1
m.add_handle(pycurl.Curl()) # Curl object 2

# Run all requests 
m.start() 

# Get individual responses 
for curl in m.handles:
    response = curl.get_response_data()

This provides an asynchronous concurrency model similar to other Python async libraries.

Debugging with Verbose Output

During development, it can be useful to enable verbose mode:

curl.setopt(curl.VERBOSE, True)

This will print detailed information about each request, including:

HTTP headers sent and received
Curl info such as redirects followed
Connection timeouts and retries
SSL/TLS connection info

Verbose output is great for debugging issues with requests and responses.

Comparison to Requests

The requests library provides a simpler interface compared to pycurl. Here are some key differences:

requests has a nicer API while pycurl exposes low-level options
pycurl supports more advanced features like custom auth and proxies
requests has built-in async support while pycurl requires multi_curl
requests has some implicit behavior vs pycurl which is more explicit

In general, requests is better for basic HTTP calls in scripts. pycurl gives you lower-level control for advanced use cases.

Pros and Cons of pycurl

Some pros of using pycurl:

Mature and well-supported library
Exposes numerous low-level configuration options
Easily replicate behaviors of cURL command line
Supports advanced features like custom auth and proxies
Can provide better performance vs high-level libraries

Some cons:

More complex API than requests and httpx
Requires more low-level handling of datastreams
Does not support async natively – needs separate lib
Less common Python skill vs requests usage

Conclusion

The pycurl library offers seamless integration of cURL's capabilities within Python programming, enabling developers to effectively replicate various cURL commands for HTTP requests, file transfers, downloads, and uploads using just a few lines of code.

Although it presents a lower-level API compared to the ‘requests' module, pycurl shines in scenarios where detailed control and full utilization of cURL's features are necessary. This makes it an exceptionally versatile tool for crafting Python scripts that interact with web APIs, perform web scraping tasks, and handle a multitude of other web-related functions.