How to Click On Cookie Popups and Modal Alerts in Puppeteer?

As e-commerce and websites continue advancing, many now utilize cookie popups, modal alerts, and other attention-grabbing elements to display notices, request data permissions, prompt login, and more. These can cause significant headaches in web scraping if you're capturing page content with tools like Puppeteer. When they appear layered over the target region you need, your scraper can't access the underlying data – rendering the process useless.

So how do you properly handle cookie consent forms, login popups, and other “blocking” modal alerts when scraping sites in Puppeteer? Today I'll provide comprehensive techniques for smoothly clicking, bypassing, or removing these annoying popups so you can extract the content you actually need.

Understanding Cookie Popups and Modal Alerts

First, let's briefly define what these web elements are so you know what to look for.

A modal alert refers to any popup type of component generated by JavaScript code that “blocks” access to a site's main content when active. The most well known is the cookie consent form – those “We use cookies! Click OK to consent and continue!” boxes. But login forms prompting you to sign in, email capture popups offering discounts, and more also fall under the modal alert umbrella term.

These overlaying popups utilize CSS styling to disable scrolling or clicking beneath them. Some fade the background content, others completely hide covered elements. So from a scraping perspective, they stop you from scraping anything on-page until addressed.

Now that you can visually spot common modal popups, let's explore proven techniques to resume frictionless scraping using Puppeteer scripts despite their intrusions.

How to Click On Cookie Popups and Modal Alerts in Puppeteer?

To handle cookie popups and modal alerts in Puppeteer, you can use different strategies depending on the specific situation.

For cookie popups, you can use the `page.click()` function to click on the “Accept” button of the popup. You need to identify the CSS selector of the button and pass it to the `page.click()` function. If the popup doesn't appear, you can catch the error and proceed with the rest of your script. Alternatively, you can remove the popup from the DOM using the `page.evaluate()` function. Here's an example of how to do this:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
  await page.goto('https://web-scraping.dev/login');

  // Option #1 - use page.click() to click on the button
  try {
    await page.waitForSelector('#cookie-ok', { timeout: 2000 });
    await page.click('#cookie-ok');
  } catch (error) {
    console.log('no cookie popup... ');
  }

  // Option #2 - delete the popup HTML
  // remove pop up
  const cookieModal = await page.$('#cookieModal');
  if (cookieModal) {
    await page.evaluate((el) => el.remove(), cookieModal);
    // remove grey backgdrop which covers the screen
    const modalBackdrop = await page.$('.modal-backdrop');
    if (modalBackdrop) {
      await page.evaluate((el) => el.remove(), modalBackdrop);
    }
  }

  await browser.close();
})();

For modal alerts, you can use the dialog event handler to check the dialog message and press yes/no. This can be done using the page.on("dialog", handler) method. Here's an example:

const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  // set up a dialog event handler
  page.on('dialog', async dialog => {
    console.log(dialog.message());
    if(dialog.message().includes('clear your cart')) {
      console.log(`clicking "Yes" to ${dialog.message()}`);
      await dialog.accept(); // press 'Yes'
    } else {
      await dialog.dismiss(); // press 'No'
    }
  });

  // rest of your script...

  await browser.close();
}

run();

Remember that the specific CSS selectors and dialog messages will vary depending on the website you're interacting with, so you'll need to adjust your script accordingly.

Click vs. Remove

You might be wondering – why click rather than just delete the whole popup element from DOM?

Clicking buttons to close popups appropriately simulates natural user behavior. This allows any background functionality tied to that action to execute properly, like setting browser cookies to remember the consent choice and not bug you again later. Straight removing pop-ups could break site processes expecting those clicks or prevent disabling future occurrences.

However, when dealing with things like login overlays on public pages you don't actually need access to, stripping them out via DOM removal is perfectly fine since there are no side effects. We'll cover how to do this next safely…

Removing Popups from the DOM

To remove popups from the Document Object Model (DOM) in JavaScript, you can use the removeChild() method or the remove() method. The removeChild() method is used on the parent node of the element you want to remove. Here's an example:

var elem = document.querySelector('#popup1'); // Select the popup element
elem.parentNode.removeChild(elem); // Remove the popup element

In this example, #popup1 is the id of the popup you want to remove. The remove() method removes an element directly from the document. Here's an example:

const element = document.getElementById("demo"); // Select the popup element
element.remove(); // Remove the popup element

In this example, "demo" is the id of the popup you want to remove. Please note that the remove() method is not supported in Internet Explorer 11 or earlier. It's important to remember that these methods permanently remove the element from the DOM.

If you want to use the popup again, you'll need to create a new element or keep a copy of the removed element. Also, it's not possible to remove an element from the DOM using CSS. CSS can only prevent an element from being rendered in the layout with display: none, but the element still exists in the DOM.

Troubleshooting Tips

Despite the best efforts, modal popups still may thwart scraping attempts:

  • Timeouts expiring¬†– Expand wait timeout durations. 10+ seconds could be needed for very heavy pages/loads
  • New selectors not found¬†– Double check locator accuracy if changed. Console log to output element first
  • Page hangs/freezes¬†– Valid chance of getting stuck in recursion loop if removal triggers reload. May need to click instead

The key is thoroughly vetting target sites first, then intelligently applying the right mix of waits, clicks, and DOM removals to account for all edge cases. Test, tweak, repeat until a puppeteer script runs smoothly without popup impediments!

Conclusion

I hope this guide gives you greater confidence in overcoming pesky cookie alerts and modal popups when web scraping using Puppeteer. Employ the robust button clicking procedures, safe DOM removal tactics, and troubleshooting tips outlined here to conquer consent forms, login prompts, and other blocking alerts.

Feel free to reach out if you need any assistance getting your target scraper back on track and extracting the content you actually desire! With resilient popup handling methodology now in your back pocket, those superficial website alerts stand no chance of thwarting your web data aspirations any longer.

Tags:

John Rooney

John Rooney

John Watson Rooney, a self-taught Python developer and content creator with a focus on web scraping, APIs, and automation. I love sharing my knowledge and expertise through my YouTube channel, My channel caters to all levels of developers, from beginners looking to get started in web scraping to experienced programmers seeking to advance their skills with modern techniques. I have worked in the e-commerce sector for many years, gaining extensive real-world experience in data handling, API integrations, and project management. I am passionate about teaching others and simplifying complex concepts to make them more accessible to a wider audience. In addition to my YouTube channel, I also maintain a personal website where I share my coding projects and other related content.

We will be happy to hear your thoughts

      Leave a reply

      Proxy-Zone
      Compare items
      • Total (0)
      Compare
      0