Cookies are a critical component of maintaining stateful sessions for web scraping and automation using Selenium. By saving cookies from the browser, we can resume logged-in sessions across multiple runs without having to re-authenticate every time. In this guide, we'll cover the techniques for persisting cookies in Selenium using Python.
Why Save and Load Cookies?
Here are some common use cases where saving and reusing cookies is helpful:
- Resume scraping after restarts – If your scraper crashes or the browser closes, saving cookies lets you pick up where you left off without starting over.
- Preserve logins – After authenticating in a website, you can save the session cookies to remain logged in for future runs.
- Maintain user profiles – Cookies often store information about user accounts, profiles and preferences. Saving cookies preserves this context.
- Share session data across browsers – Cookies can be loaded into different browser instances to coordinate scraping across multiple threads.
When Should You Save and Load Cookies?
Here are four common use cases where saving and reusing browser cookies is useful:
- Resuming Scraping After Restarts: If your scraper crashes or the browser closes unexpectedly, saving cookies allows you to resume where you left off. The preserved cookies store session data like logins and navigation state. Without cookies, you'd have to rerun any login steps and renavigate manually to your last position after a restart. This slows down data collection massively over many scraping runs.
- Persisting Logins Over Time: Performing a fresh login before each scrape is inefficient. Saved cookies allow you to reuse logins indefinitely or until expiration. This is especially helpful for sites that limit login attempts or have complex multi-step authentication. Cookies reduce login friction drastically.
- Maintaining User Context: Beyond logins, cookies often contain rich user profile data, preferences, regions, and other session contexts. Saving cookies keeps this context intact across scraping runs for consistency. Without it, your scraper loses the personalized touch.
- Coordinating Browsers: In a distributed scraping environment, cookie sharing allows coordinating multiple browser instances. Scrapers can save their session cookies to a shared storage for others to load, pooling their collective contexts.
Step-by-Step: Saving Cookies with Selenium in Python
Now that we understand why cookie persistence matters, let's walk through a complete example demonstrating how to save and load cookies using Selenium in Python. We'll use the
webdriver library from Selenium along with Python's built-in JSON utilities.
1. Setup Driver and Login
First we initialize the ChromeDriver browser and login to our target site:
from selenium import webdriver driver = webdriver.Chrome() driver.get("http://www.example.com/login") driver.find_element_by_id("username").send_keys("myuser") driver.find_element_by_id("password").send_keys("mypass") driver.find_element_by_id("submit").click()
This performs the initial authentication so we have an active session.
2. Extract Cookies with
Once logged in, we can use Selenium's
get_cookies() method to extract the current browser cookies:
cookies = driver.get_cookies()
3. Serialize Cookies to JSON
Next we'll serialize the cookies to a JSON string that can be easily saved to file:
import json json_cookies = json.dumps(cookies)
The JSON string contains all the same cookie details.
4. Save JSON to File
We can then write the JSON cookie data to a file using standard Python file handling:
with open('cookies.json', 'w') as f: f.write(json_cookies)
The cookies are now persisted to disk ready for loading again later.
Loading Saved Cookies into Selenium
To reuse our saved cookies, we need to:
- Read the JSON cookie data back into memory
- Iterate through each cookie
- Add the cookie back to Selenium using
Here's what that looks like in code:
import json from pathlib import Path # Read cookie file cookies = json.loads(Path('cookies.json').read_text()) # Add each cookie for cookie in cookies: driver.add_cookie(cookie) # Now browser has same cookies as before!
Once the cookies are loaded, the browser session will be identical to when they were first saved – no need to relogin.
Cookie Persistence Best Practices
While the core process of saving and loading cookies is straightforward, there are some best practices to consider:
- Periodically Resave Updated Cookies: Don't just save cookies once – resample and override your cookie file on a regular schedule. This updates the cookie expiration timestamps. Without refreshing, your saved cookies may eventually expire and cease working.
- Isolate Domain-Specific Cookies: Cookies from one domain won't work directly on another. Use different cookie files for different sites.
- Robustly Handle Loading Issues: Use try/except blocks when adding cookies to catch loading errors gracefully. Stale cookies may fail to load.
- Encrypt Your Cookie Files: Store cookie files securely, even encrypting them if they contain sensitive session data. Don't leave them exposed!
- Occasionally Re-Login: Periodically re-authenticate fully to refresh session IDs and prevent account lockouts. Don't rely on cookies alone endlessly.
- Delete No Longer Needed Cookies: Be sure to delete cookie files after scraping. Don't leave orphaned user session data littering your drives!
Real-World Selenium Cookie Examples
Let's look at some real-world examples demonstrating how cookie persistence is used with Selenium scrapers:
# Example 1 - Resuming scrape after restart cookies = load_cookies_from_file() try: # Add cookies to new driver for cookie in cookies: driver.add_cookie(cookie) # Resume scraping where we left off driver.get(LAST_URL) driver.find_element_by_xpath("//div[contains(@id,'results')]") # ... remaining scrape logic except Exception as e: # Invalid cookies, need to reauth login(driver) save_cookies(driver)
Here cookies allow the script to resume mid-scrape after crashing, without redoing the initial login steps. If the cookies are expired, it fails gracefully to re-login.
# Example 2 - Persist login across multiple scripts cookies = load_shared_cookies() driver_1.get("http://www.mysite.com") driver_2.get("http://www.mysite.com/dashboard") driver_3.get("http://www.mysite.com/reports") for driver in [driver_1, driver_2, driver_3]: for cookie in cookies: driver.add_cookie(cookie) # All three now share the same login session!
This shows loading a common set of cookies into multiple browser instances to coordinate scraping across threads.
Potential Cookie Pitfalls and Troubleshooting
Cookies are powerful but occasionally temperamental. Some potential pitfalls and troubleshooting tips:
- Expired cookies – Re-login and save cookies again if you get unauthorized errors
- Domain mismatches – Double check cookie domains if sessions aren't persisting
- Clearing cookies – Some sites actively clear session cookies, defeating persistence
- Overwriting – New cookies may displace old ones unexpectedly
- Blocking – Some sites block cookie saving via Selenium
- Client-side encryption – May prevent cookie access if keys are unavailable
- 500 errors on load – Try adding cookies one by one to isolate bad cookies
- Dashboard logouts – Some sites logout other sessions if you log in again
Cookie persistence takes trial and error. Refer to browser debugging tools to inspect specific cookie values when issues arise.
Cookies are the glue that holds together civilized browsing across the wild web. For Selenium scraping, cookie persistence moves you from one-off scripts to robust, production-grade frameworks. By honing your skills of saving and reloading cookies, you can build scrapers resilient to crashes, timeouts, and session expirations. Your automation workflows will become more seamless and humanlike.