How to Wait for Page to Load In Selenium?

In the world of web scraping and automation, one of the most common frustrations is dealing with pages that load too slowly. As soon as you attempt to interact with the page, ominous errors pop up: “Element not interactable”, “Element stale – DOM changed”.

How do we overcome these pesky timing issues? The answer is robust page load waits in Selenium. Waiting for the right moment allows us to synchronize our scripts with the actual contents of the page. In this comprehensive guide, we’ll explore best practices for graceful page load handling.

Why Page Load Waits Are Absolutely Necessary

Let’s briefly understand why pages can be so slow to begin with. Back in the 90s when the web was new, pages were pretty straightforward. The server returned basic HTML, and the browser rendered it instantly. Fast forward to today – websites are incredibly dynamic. The average page now makes 115 requests for additional resources like JavaScript, CSS, images, and API data.

Instead of rendering everything upfront, pages rely on asynchronous JavaScript to dynamically fetch and display content. Technologies like AJAX allow data to be loaded in the background without blocking the initial page render.

The result is pages can appear loaded but are still actively being assembled after the fact. Content gets streamed in gradually. UI elements are slotted into place by client-side JavaScript.

Average webpage load time increased by 21% from 2018 to 2020 based on HTTP Archive data. Pages are getting heavier.

As scrapers, we need to wait for the dust to settle before taking action. Interacting too early leads to all sorts of errors:

Stale element reference – The DOM node has changed since it was located. Any action on the element will fail.
Element not interactable – The element has not yet appeared in the DOM. Selenium will be unable to find and act on it.
Empty responses – Attempting to extract data before AJAX content loads will return nothing.

To avoid these issues, we need mechanisms to pause execution until the page has fully loaded. Selenium provides a few options.

Limits of Implicit Waits

Selenium has a feature called implicit waits that pauses execution globally for a certain timeout. For example:

driver.implicitly_wait(10) # seconds

This will wait up to 10 seconds when finding/interacting with elements. Implicit waits seem like an easy fix, but they have limitations:

Only work for find element and interaction commands
Fixed duration regardless of page state – wait will still occur even if page loads faster
Can mask underlying issues vs. explicit waits

For more robust synchronization, explicit waits are recommended.

Enter Explicit Waits

The Selenium WebDriverWait class gives fine-grained control over page wait conditions. You specify:

Timeout – Maximum wait time
Polling interval – Frequency of checking condition
Condition – What constitutes page readiness

For example:

from selenium.webdriver.support.ui import WebDriverWait

wait = WebDriverWait(driver, 10, poll_frequency=1, ignored_exceptions=[ElementNotVisibleException])

This polls the current state of the page every 1 second, up to 10 seconds, checking if the condition we specified is met. Once true, the wait ends and execution continues. If the timeout is reached, an exception is thrown. The real power comes from the flexibility in defining conditions.

Expected Conditions for Page Load Events

The expected_conditions module provides a wide array of conditions we can use to detect page readiness:

Element checks – Wait for element to be present, visible, clickable etc.
Text checks – Wait for title, URL or text match
Navigation checks – Wait for new page load to complete
AJAX checks – Wait for async requests to finish

Let’s explore some common conditions with examples.

Presence and Visibility Checks

Two of the most widely used conditions are:

presence_of_element_located – Waits until the target element appears in the DOM
visibility_of_element_located – Waits until the element is not only present but visible on the page

For example, waiting for a search box:

from selenium.webdriver.support import expected_conditions as EC

search_box = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.ID, "searchInput"))
)

This pauses execution until the element is ready for interaction. Similarly, we can wait on:

presence_of_all_elements_located – Wait for multiple elements
invisibility_of_element_located – Wait for element to disappear

The expected conditions API supports nearly all locator strategies like ID, XPath, CSS selector etc.

As per 2020 data, 96% of webpages use JavaScript. Dynamically waiting for elements is crucial.

Interactability Checks

Visibility alone doesn’t guarantee an element can be acted upon. We also need it to be in an interactable state. The element_to_be_clickable condition covers this, waiting for the element to:

Be visible
Be enabled
Be in the DOM

For example:

button = WebDriverWait(driver, 15).until(
    EC.element_to_be_clickable((By.CSS_SELECTOR, ".cta-button"))
)
button.click()

This ensures the button appears and becomes ready for clicking within 15 seconds. Similarly, we can wait for an element to become:

selected – Wait for element to be selectable
invisible – Wait for element to disappear
attribute_to_include – Wait for attribute to contain text

Text-Based Conditions

Useful for determining when a new page has loaded:

title_contains – Wait for page title to contain text
title_is – Wait for exact match with page title

For example:

WebDriverWait(driver, 10).until(
   EC.title_contains("Checkout") # wait for checkout page
)

This halts execution until observing the expected page title. Other text-based conditions include:

text_to_be_present_in_element – Wait for text in an element
element_to_contain – Wait for target element to contain text

These provide flexibility in syncing up with textual changes during navigation.

AJAX-Specific Conditions

Modern sites use AJAX to retrieve content dynamically. Some useful waits include:

Presence of updated element

results = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "ajaxResults"))
)

This halts execution until the AJAX request finishes and the results element appears.

Staleness of old element

button = driver.find_element(By.ID, "ajaxButton")
button.click()

WebDriverWait(driver, 5).until(
   EC.staleness_of(button)
)

Here we wait for the button to go stale after being re-rendered. Staleness indicates the AJAX request completed.

Custom Expected Conditions

For advanced cases, you can define custom conditions by implementing the __call__ method:

Here we wait for the button to go stale after being re-rendered. Staleness indicates the AJAX request completed.

Custom Expected Conditions
For advanced cases, you can define custom conditions by implementing the __call__ method:

This allows waiting on any arbitrary condition required for your unique use case.

There are over 30 expected condition methods available in the Selenium Python package.

Setting Smart Wait Timeouts

A key factor in effective waits is setting an appropriate timeout. Here are some best practices:

Start with a shorter timeout like 5 seconds – Long waits slow down test execution time.
Set timeout based on metrics – Use page load metrics to calculate optimal timeout dynamically.
Tune timeouts per page – Setting different timeouts for slow vs. fast loading pages.
Ramp up timeouts gradually – Incrementally increase timeout duration if failures occur.
Use longer timeouts sparingly – Avoid arbitrarily long 30+ second waits unless absolutely required.

Getting timeouts right is an art – set them too short and pages won't load, but excessively long timeouts hamper velocity.

Handling Timeouts Gracefully

In some cases, even our best timeout may not be sufficient for heavily burdened pages. It's important to handle these scenarios gracefully. Here are some options when the timeout expires before the page is ready:

Swallow the timeout exception – Allow test execution to continue
Log the error – Record the failure for analysis
Retry the wait – Recursively wait with exponential backoff
Fail the test – Depending on the situation

Robust timeout handling prevents cascading failures when pages struggle to load.

Pros and Cons of Explicit Waits

Explicit waits provide powerful synchronization, but also some drawbacks:

Pros

Fine-grained control over what constitutes page loaded
Flexible conditions for visibility, text, AJAX etc.
Dynamic timeout tuning per page/action
Encourages proper page readiness detection

Cons

More complex than implicit waits
Easy to overuse – should be used judiciously
Still depend on timeouts being set reasonably
Advanced conditions require custom logic

When used properly, explicit waits handle the majority of page load timing issues.

Key Takeaways and Best Practices

Adding proper waits for page loads is crucial for creating robust, stable Selenium scripts. Leveraging the flexible expected conditions in WebDriverWait will help detect when a page is fully interactive. By following web best practices and avoiding common wait pitfalls, you can confidently scrape and automate even the most complex, dynamic sites.