How to Save and Load Cookies in Selenium?

Cookies are a critical component of maintaining stateful sessions for web scraping and automation using Selenium. By saving cookies from the browser, we can resume logged-in sessions across multiple runs without having to re-authenticate every time. In this guide, we'll cover the techniques for persisting cookies in Selenium using Python.

Why Save and Load Cookies?

Here are some common use cases where saving and reusing cookies is helpful:

  • Resume scraping after restarts – If your scraper crashes or the browser closes, saving cookies lets you pick up where you left off without starting over.
  • Preserve logins – After authenticating in a website, you can save the session cookies to remain logged in for future runs.
  • Maintain user profiles – Cookies often store information about user accounts, profiles and preferences. Saving cookies preserves this context.
  • Share session data across browsers – Cookies can be loaded into different browser instances to coordinate scraping across multiple threads.

When Should You Save and Load Cookies?

Here are four common use cases where saving and reusing browser cookies is useful:

  1. Resuming Scraping After Restarts: If your scraper crashes or the browser closes unexpectedly, saving cookies allows you to resume where you left off. The preserved cookies store session data like logins and navigation state. Without cookies, you'd have to rerun any login steps and renavigate manually to your last position after a restart. This slows down data collection massively over many scraping runs.
  2. Persisting Logins Over Time: Performing a fresh login before each scrape is inefficient. Saved cookies allow you to reuse logins indefinitely or until expiration. This is especially helpful for sites that limit login attempts or have complex multi-step authentication. Cookies reduce login friction drastically.
  3. Maintaining User Context: Beyond logins, cookies often contain rich user profile data, preferences, regions, and other session contexts. Saving cookies keeps this context intact across scraping runs for consistency. Without it, your scraper loses the personalized touch.
  4. Coordinating Browsers: In a distributed scraping environment, cookie sharing allows coordinating multiple browser instances. Scrapers can save their session cookies to a shared storage for others to load, pooling their collective contexts.

Step-by-Step: Saving Cookies with Selenium in Python

Now that we understand why cookie persistence matters, let's walk through a complete example demonstrating how to save and load cookies using Selenium in Python. We'll use the webdriver library from Selenium along with Python's built-in JSON utilities.

1. Setup Driver and Login

First we initialize the ChromeDriver browser and login to our target site:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.example.com/login")

driver.find_element_by_id("username").send_keys("myuser")
driver.find_element_by_id("password").send_keys("mypass") 
driver.find_element_by_id("submit").click()

This performs the initial authentication so we have an active session.

2. Extract Cookies with get_cookies()

Once logged in, we can use Selenium's get_cookies() method to extract the current browser cookies:

cookies = driver.get_cookies()

3. Serialize Cookies to JSON

Next we'll serialize the cookies to a JSON string that can be easily saved to file:

import json

json_cookies = json.dumps(cookies)

The JSON string contains all the same cookie details.

4. Save JSON to File

We can then write the JSON cookie data to a file using standard Python file handling:

with open('cookies.json', 'w') as f:
  f.write(json_cookies)

The cookies are now persisted to disk ready for loading again later.

Loading Saved Cookies into Selenium

To reuse our saved cookies, we need to:

  • Read the JSON cookie data back into memory
  • Iterate through each cookie
  • Add the cookie back to Selenium using add_cookie()

Here's what that looks like in code:

import json
from pathlib import Path

# Read cookie file 
cookies = json.loads(Path('cookies.json').read_text())

# Add each cookie 
for cookie in cookies:
  driver.add_cookie(cookie)

# Now browser has same cookies as before!

Once the cookies are loaded, the browser session will be identical to when they were first saved – no need to relogin.

Cookie Persistence Best Practices

While the core process of saving and loading cookies is straightforward, there are some best practices to consider:

  • Periodically Resave Updated Cookies: Don't just save cookies once – resample and override your cookie file on a regular schedule. This updates the cookie expiration timestamps. Without refreshing, your saved cookies may eventually expire and cease working.
  • Isolate Domain-Specific Cookies: Cookies from one domain won't work directly on another. Use different cookie files for different sites.
  • Robustly Handle Loading Issues: Use try/except blocks when adding cookies to catch loading errors gracefully. Stale cookies may fail to load.
  • Encrypt Your Cookie Files: Store cookie files securely, even encrypting them if they contain sensitive session data. Don't leave them exposed!
  • Occasionally Re-Login: Periodically re-authenticate fully to refresh session IDs and prevent account lockouts. Don't rely on cookies alone endlessly.
  • Delete No Longer Needed Cookies: Be sure to delete cookie files after scraping. Don't leave orphaned user session data littering your drives!

Real-World Selenium Cookie Examples

Let's look at some real-world examples demonstrating how cookie persistence is used with Selenium scrapers:

# Example 1 - Resuming scrape after restart

cookies = load_cookies_from_file() 

try:
  # Add cookies to new driver
  for cookie in cookies:
    driver.add_cookie(cookie)

  # Resume scraping where we left off  
  driver.get(LAST_URL)
  driver.find_element_by_xpath("//div[contains(@id,'results')]")
  # ... remaining scrape logic
  
except Exception as e:
  # Invalid cookies, need to reauth
  login(driver) 
  save_cookies(driver)

Here cookies allow the script to resume mid-scrape after crashing, without redoing the initial login steps. If the cookies are expired, it fails gracefully to re-login.

# Example 2 - Persist login across multiple scripts

cookies = load_shared_cookies()

driver_1.get("http://www.mysite.com")
driver_2.get("http://www.mysite.com/dashboard") 
driver_3.get("http://www.mysite.com/reports")

for driver in [driver_1, driver_2, driver_3]:
  for cookie in cookies:
    driver.add_cookie(cookie) 

# All three now share the same login session!

This shows loading a common set of cookies into multiple browser instances to coordinate scraping across threads.

Potential Cookie Pitfalls and Troubleshooting

Cookies are powerful but occasionally temperamental. Some potential pitfalls and troubleshooting tips:

  • Expired cookies – Re-login and save cookies again if you get unauthorized errors
  • Domain mismatches – Double check cookie domains if sessions aren't persisting
  • Clearing cookies – Some sites actively clear session cookies, defeating persistence
  • Overwriting – New cookies may displace old ones unexpectedly
  • Blocking – Some sites block cookie saving via Selenium
  • Client-side encryption – May prevent cookie access if keys are unavailable
  • 500 errors on load – Try adding cookies one by one to isolate bad cookies
  • Dashboard logouts – Some sites logout other sessions if you log in again

Cookie persistence takes trial and error. Refer to browser debugging tools to inspect specific cookie values when issues arise.

Closing Thoughts

Cookies are the glue that holds together civilized browsing across the wild web. For Selenium scraping, cookie persistence moves you from one-off scripts to robust, production-grade frameworks. By honing your skills of saving and reloading cookies, you can build scrapers resilient to crashes, timeouts, and session expirations. Your automation workflows will become more seamless and humanlike.

Yet, don't rely on cookies as a sole strategy – periodically re-authenticate fully to mimic real user behavior. Use cookies as an optimization, not as a crutch. Whatever your scraping goals, cookie mastery should be part of your Selenium repertoire. So revisit those stale HTTP fundamentals and learn anew the humble cookie!

John Rooney

John Rooney

John Watson Rooney, a self-taught Python developer and content creator with a focus on web scraping, APIs, and automation. I love sharing my knowledge and expertise through my YouTube channel, My channel caters to all levels of developers, from beginners looking to get started in web scraping to experienced programmers seeking to advance their skills with modern techniques. I have worked in the e-commerce sector for many years, gaining extensive real-world experience in data handling, API integrations, and project management. I am passionate about teaching others and simplifying complex concepts to make them more accessible to a wider audience. In addition to my YouTube channel, I also maintain a personal website where I share my coding projects and other related content.

We will be happy to hear your thoughts

      Leave a reply

      Proxy-Zone
      Compare items
      • Total (0)
      Compare
      0