How to Scrape Zoominfo Company Data?

ZoomInfo hosts millions of company profiles containing useful data like financials, technologies used, contacts, and more. This public information can be valuable for competitive intelligence, lead generation, recruitment, and other business use cases.

In this post, we'll walk through how to effectively scrape ZoomInfo to extract this company data at scale using Python.

The Power of ZoomInfo Data

ZoomInfo company profiles contain a wealth of data including:

  • Key contacts – names, titles, direct phone numbers and email addresses
  • Firmographics – employee counts, revenue, funding details, executives
  • Technologies used – software, services, integrations and more
  • Enriched data – social handles, web traffic, keywords, and more

Access to this data enables critical business use cases:

  • Competitive intelligence – 45% of businesses rely on ZoomInfo for tracking competitors. The structured data aids in market and industry analysis.
  • Recruitment – Recruiters scrape ZoomInfo contact info to identify and engage candidates. Email accuracy exceeds 90%.
  • Lead generation – Sales teams generate more qualified leads faster with enriched ZoomInfo social and web data.
  • Market research – Analysts extract ZoomInfo company lists and directories for better market sampling and coverage.

No wonder premier data resellers like Dun & Bradstreet rely on ZoomInfo as a keystone data source. But is scraping the best method to leverage ZoomInfo data? Let's discuss the tradeoffs.

Why Scrape ZoomInfo Data?

ZoomInfo provides a robust paid API for accessing their data. But scraping has some advantages:

Wider Coverage

  • The API has usage limits. Scraping can extract ZoomInfo's full public data corpus.
  • API plans have company data caps. Scraping accesses info for all 90M+ companies.

Cost Savings

  • Scraping avoid monthly API subscription fees, for large volumes.
  • Data resellers mark up ZoomInfo API access further. Scraping removes this middleman premium.

According to BuyerZone, small businesses can save 80% or more by scraping vs paid APIs depending on usage.

Customization

  • Scraping allows full control over data collection and storage as needed.
  • Scraping enables combining ZoomInfo data with other sources for enrichment.

Of course, scraping has downsides we must mitigate:

  • TOU Compliance – ZoomInfo's terms permit personal use only. Commercial use requires an API license.
  • Anti-Scraping Measures – ZoomInfo does employ blocking. Scraping requires procedures to avoid disruptions.
  • Data Management – Scraping places the burden of infrastructure and re-scraping avoidance on the implementer.

Overall, for many commercial applications, ZoomInfo's API may be the best choice. But for large volumes of data, scraping can provide wider access with cost savings if done carefully.

Discovering ZoomInfo Company Profile URLs

To extract company data, we first need to discover profile page URLs. ZoomInfo does not provide a public sitemap, so we must get creative:

Scraping Directories

ZoomInfo's public directories like location and industry searches provide curated sets of company results.We can harvest the profile links from these directories via scraping. Paginating through the full results expands our access.

# Scrape directory pagination

from parsel import Selector 

def scrape_directory(url):

  while url:

    # Request page
    response = requests.get(url)  

    # Extract company links
    selector = Selector(text=response.text)
    companies = selector.css('a.company::attr(href)').getall()

    # Follow to next page
    url = selector.css('a.next::attr(href)').get()

  return companies

Some high-value directories to focus on:

  • Industry verticals – e.g. Software, Financial, Healthcare
  • Enterprise – Larger companies
  • Location – Major metro areas
  • Recent IPOs – Newer public companies

Searching by Company Name

ZoomInfo's site search can also uncover company pages, given a name:

# Search for company by name

def search_company(name):

  params = {'query': name}
  response = requests.get('https://www.zoominfo.com/search', params=params)

  # Extract profile link if found
  link = response.css('a.match::attr(href)').get()
  return link

Sources for seed names:

  • Competitor websites
  • Business directories
  • News articles referencing companies

This expands our access beyond directories.

Crawling Related Companies

Finally, each ZoomInfo profile displays similar/competing businesses. By recursively crawling outwards to scrape these related companies, we can discover new profile URLs. This technique of using scraped links to find new links is known as web crawling. Combined with other sources, it greatly expands our URL corpus.

Avoiding Anti-Scraping Blocks

To prevent abuse, ZoomInfo employs blocking – IP bans, CAPTCHAs etc. To scrape at scale, we need to avoid triggering these defenses.

Leveraging Proxies

Proxies allow routing requests through intermediary IPs, obscuring the scraper's true location. Here's how to use a rotating proxy with the Python Requests module:

# Proxy scrape request

import requests
from proxy_rotator import ProxyRotator

rotator = ProxyRotator() # Create rotator

proxy = rotator.get_proxy() # Get proxy 

response = requests.get('https://www.zoominfo.com', proxies={'https': proxy})

Dedicated proxy services like Bright Data, Smartproxy, and Soax provide access to millions of IPs for distribution at scale.

Comparison of Popular Proxies

ProviderIPsCountriesPricing
BrightData72M+WorldwideStart at $10.5/GB
Smartproxy55M+195+ Start at $8.5/GB
Soax155M+190+Start at $99/15 GB

Using Residential Proxies

Regular “datacenter” proxies are more susceptible to blocking vs residential IPs from homes or mobile networks. Services like Bright Data, Smartproxy, Proxy-Seller, and Soax offer large residential proxy pools perfect for scraping tougher sites.

Browser Automation

Headless browsers like Selenium render JavaScript and are harder to fingerprint, providing another anti-scraping option:

from selenium import webdriver

# Start browser
driver = webdriver.Chrome()

# Access page
driver.get('https://www.zoominfo.com/company') 

# Extract data
html = driver.page_source
data = parse(html)

The tradeoff is reduced speed vs proxies. But both are useful tools.

Extracting and Storing Company Data

Once we have ZoomInfo profile URLs, we next need to parse the key company details. Rather than scrape HTML, ZoomInfo conveniently provides a JSON object with all profile data – no complex extraction is needed. We simply download and parse the object:

# Parse company profile data

import json
import requests

def parse_company(url):

  response = requests.get(url)

  # Extract JSON data 
  json_str = response.css('script#rawData').get()
  data = json.loads(json_str)

  company = {
    'Name': data['name'], 
    'Description': data['description'],
    'Employees': data['size'],
    'Industry': data['industry'],
    'Keywords': data['keywords'],
    'Contacts': data['contacts']
  }

  return company

For best performance, we would scrape URLs asynchronously using Python asyncio:

# Async scrape companies

import asyncio

async def async_scrape(urls):
  
  async with httpx.AsyncClient() as client:

    tasks = []

    for url in urls:
      tasks.append(parse_company(url, client))
    
    companies = await asyncio.gather(*tasks)

    return companies

This concurrent model maximizes throughput. Scraped results can be saved to databases or file formats like JSON for analysis.

Analytical Use Cases for ZoomInfo Data

Once scraped, ZoomInfo data enables a multitude of applications:

Competitive Intelligence

Analysts can compile market landscapes based on keywords, technologies used, contacts and more. Interactive dashboards provide insights into competitors and opportunities.

Lead Enrichment and Contact Discovery

Sales teams can expand lead contact information and identify new targets.

Leads                           Matched ZoomInfo Profile
-----------------------------------------------------------------------------
John Doe - Acme Co             John Doe - [email protected] | VP Sales @ Acme
                                Jane Doe - [email protected] | CEO @ Acme

Bob Smith - Def Corp           Robert Smith - [email protected] | CTO @ Def Corp
                                Susie Kline - [email protected] | COO @ Def Corp

ZoomInfo provides direct contacts not available in other sources.

Market Sizing and Benchmarking

Company headcounts, revenue ranges, and keywords enable better market research sampling and sizing:

Industry - SaaS HR Software
-----------------------------------------------------------------------------  
Median Employees - 62
Average Funding - $27M
Leaders:
- Zenefits (470 Employees, $583M Funding)
- Gusto (433 Employees, $135M Funding)
- Rippling (249 Employees, $45M Funding)

Data can feed market models, reports, and content. These are just a sample of the possibilities unlocked by ZoomInfo data mining.

Limitations and Considerations

While scraping opens up wider access to ZoomInfo data, it has downsides to consider:

  • TOU Compliance – Zoominfo's Terms only allow personal non-commercial use. Ensure your scraping aligns with these terms.
  • Blocks and Captchas – Anti-scraping defenses may interfere with scraping and require workarounds.
  • Data Scale Limitations – Directory sampling provides incomplete market coverage vs API access. Crawling and multiple sources help maximize coverage.
  • Legal Factors – Respect personal data protections like GDPR when harvesting contact information.
  • Infrastructure Needs – Scraping places the burden of data management, storage and re-scraping avoidance on the implementer.
  • Cost at Scale – For scraping millions of companies, proxy and infrastructure costs add up. Paid APIs become more efficient at ultra high volumes.

Weight these factors against your use case needs and data scale. For commercial usage, ZoomInfo's API offers higher reliability and coverage. But for some applications, strategic scraping provides a more accessible option.

Conclusion

In this post, we walked through a methodology for scraping ZoomInfo company data at scale using Python. Storing data for business intelligence and lead generation use cases while not without its challenges, with the right approach, scraping can provide access to ZoomInfo's trove of company data for various applications. Of course, this should be done ethically and legally.

John Rooney

John Rooney

John Watson Rooney, a self-taught Python developer and content creator with a focus on web scraping, APIs, and automation. I love sharing my knowledge and expertise through my YouTube channel, My channel caters to all levels of developers, from beginners looking to get started in web scraping to experienced programmers seeking to advance their skills with modern techniques. I have worked in the e-commerce sector for many years, gaining extensive real-world experience in data handling, API integrations, and project management. I am passionate about teaching others and simplifying complex concepts to make them more accessible to a wider audience. In addition to my YouTube channel, I also maintain a personal website where I share my coding projects and other related content.

We will be happy to hear your thoughts

      Leave a reply

      Proxy-Zone
      Compare items
      • Total (0)
      Compare
      0