ZoomInfo hosts millions of company profiles containing useful data like financials, technologies used, contacts, and more. This public information can be valuable for competitive intelligence, lead generation, recruitment, and other business use cases.
In this post, we'll walk through how to effectively scrape ZoomInfo to extract this company data at scale using Python.
The Power of ZoomInfo Data
ZoomInfo company profiles contain a wealth of data including:
- Key contacts – names, titles, direct phone numbers and email addresses
- Firmographics – employee counts, revenue, funding details, executives
- Technologies used – software, services, integrations and more
- Enriched data – social handles, web traffic, keywords, and more
Access to this data enables critical business use cases:
- Competitive intelligence – 45% of businesses rely on ZoomInfo for tracking competitors. The structured data aids in market and industry analysis.
- Recruitment – Recruiters scrape ZoomInfo contact info to identify and engage candidates. Email accuracy exceeds 90%.
- Lead generation – Sales teams generate more qualified leads faster with enriched ZoomInfo social and web data.
- Market research – Analysts extract ZoomInfo company lists and directories for better market sampling and coverage.
No wonder premier data resellers like Dun & Bradstreet rely on ZoomInfo as a keystone data source. But is scraping the best method to leverage ZoomInfo data? Let's discuss the tradeoffs.
Why Scrape ZoomInfo Data?
ZoomInfo provides a robust paid API for accessing their data. But scraping has some advantages:
Wider Coverage
- The API has usage limits. Scraping can extract ZoomInfo's full public data corpus.
- API plans have company data caps. Scraping accesses info for all 90M+ companies.
Cost Savings
- Scraping avoid monthly API subscription fees, for large volumes.
- Data resellers mark up ZoomInfo API access further. Scraping removes this middleman premium.
According to BuyerZone, small businesses can save 80% or more by scraping vs paid APIs depending on usage.
Customization
- Scraping allows full control over data collection and storage as needed.
- Scraping enables combining ZoomInfo data with other sources for enrichment.
Of course, scraping has downsides we must mitigate:
- TOU Compliance – ZoomInfo's terms permit personal use only. Commercial use requires an API license.
- Anti-Scraping Measures – ZoomInfo does employ blocking. Scraping requires procedures to avoid disruptions.
- Data Management – Scraping places the burden of infrastructure and re-scraping avoidance on the implementer.
Overall, for many commercial applications, ZoomInfo's API may be the best choice. But for large volumes of data, scraping can provide wider access with cost savings if done carefully.
Discovering ZoomInfo Company Profile URLs
To extract company data, we first need to discover profile page URLs. ZoomInfo does not provide a public sitemap, so we must get creative:
Scraping Directories
ZoomInfo's public directories like location and industry searches provide curated sets of company results.We can harvest the profile links from these directories via scraping. Paginating through the full results expands our access.
# Scrape directory pagination from parsel import Selector def scrape_directory(url): while url: # Request page response = requests.get(url) # Extract company links selector = Selector(text=response.text) companies = selector.css('a.company::attr(href)').getall() # Follow to next page url = selector.css('a.next::attr(href)').get() return companies
Some high-value directories to focus on:
- Industry verticals – e.g. Software, Financial, Healthcare
- Enterprise – Larger companies
- Location – Major metro areas
- Recent IPOs – Newer public companies
Searching by Company Name
ZoomInfo's site search can also uncover company pages, given a name:
# Search for company by name def search_company(name): params = {'query': name} response = requests.get('https://www.zoominfo.com/search', params=params) # Extract profile link if found link = response.css('a.match::attr(href)').get() return link
Sources for seed names:
- Competitor websites
- Business directories
- News articles referencing companies
This expands our access beyond directories.
Crawling Related Companies
Finally, each ZoomInfo profile displays similar/competing businesses. By recursively crawling outwards to scrape these related companies, we can discover new profile URLs. This technique of using scraped links to find new links is known as web crawling. Combined with other sources, it greatly expands our URL corpus.
Avoiding Anti-Scraping Blocks
To prevent abuse, ZoomInfo employs blocking – IP bans, CAPTCHAs etc. To scrape at scale, we need to avoid triggering these defenses.
Leveraging Proxies
Proxies allow routing requests through intermediary IPs, obscuring the scraper's true location. Here's how to use a rotating proxy with the Python Requests module:
# Proxy scrape request import requests from proxy_rotator import ProxyRotator rotator = ProxyRotator() # Create rotator proxy = rotator.get_proxy() # Get proxy response = requests.get('https://www.zoominfo.com', proxies={'https': proxy})
Dedicated proxy services like Bright Data, Smartproxy, and Soax provide access to millions of IPs for distribution at scale.
Comparison of Popular Proxies
Provider | IPs | Countries | Pricing |
---|---|---|---|
BrightData | 72M+ | Worldwide | Start at $10.5/GB |
Smartproxy | 55M+ | 195+ | Start at $8.5/GB |
Soax | 155M+ | 190+ | Start at $99/15 GB |
Using Residential Proxies
Regular “datacenter” proxies are more susceptible to blocking vs residential IPs from homes or mobile networks. Services like Bright Data, Smartproxy, Proxy-Seller, and Soax offer large residential proxy pools perfect for scraping tougher sites.
Browser Automation
Headless browsers like Selenium render JavaScript and are harder to fingerprint, providing another anti-scraping option:
from selenium import webdriver # Start browser driver = webdriver.Chrome() # Access page driver.get('https://www.zoominfo.com/company') # Extract data html = driver.page_source data = parse(html)
The tradeoff is reduced speed vs proxies. But both are useful tools.
Extracting and Storing Company Data
Once we have ZoomInfo profile URLs, we next need to parse the key company details. Rather than scrape HTML, ZoomInfo conveniently provides a JSON object with all profile data – no complex extraction is needed. We simply download and parse the object:
# Parse company profile data import json import requests def parse_company(url): response = requests.get(url) # Extract JSON data json_str = response.css('script#rawData').get() data = json.loads(json_str) company = { 'Name': data['name'], 'Description': data['description'], 'Employees': data['size'], 'Industry': data['industry'], 'Keywords': data['keywords'], 'Contacts': data['contacts'] } return company
For best performance, we would scrape URLs asynchronously using Python asyncio:
# Async scrape companies import asyncio async def async_scrape(urls): async with httpx.AsyncClient() as client: tasks = [] for url in urls: tasks.append(parse_company(url, client)) companies = await asyncio.gather(*tasks) return companies
This concurrent model maximizes throughput. Scraped results can be saved to databases or file formats like JSON for analysis.
Analytical Use Cases for ZoomInfo Data
Once scraped, ZoomInfo data enables a multitude of applications:
Competitive Intelligence
Analysts can compile market landscapes based on keywords, technologies used, contacts and more. Interactive dashboards provide insights into competitors and opportunities.
Lead Enrichment and Contact Discovery
Sales teams can expand lead contact information and identify new targets.
Leads Matched ZoomInfo Profile ----------------------------------------------------------------------------- John Doe - Acme Co John Doe - [email protected] | VP Sales @ Acme Jane Doe - [email protected] | CEO @ Acme Bob Smith - Def Corp Robert Smith - [email protected] | CTO @ Def Corp Susie Kline - [email protected] | COO @ Def Corp
ZoomInfo provides direct contacts not available in other sources.
Market Sizing and Benchmarking
Company headcounts, revenue ranges, and keywords enable better market research sampling and sizing:
Industry - SaaS HR Software ----------------------------------------------------------------------------- Median Employees - 62 Average Funding - $27M Leaders: - Zenefits (470 Employees, $583M Funding) - Gusto (433 Employees, $135M Funding) - Rippling (249 Employees, $45M Funding)
Data can feed market models, reports, and content. These are just a sample of the possibilities unlocked by ZoomInfo data mining.
Limitations and Considerations
While scraping opens up wider access to ZoomInfo data, it has downsides to consider:
- TOU Compliance – Zoominfo's Terms only allow personal non-commercial use. Ensure your scraping aligns with these terms.
- Blocks and Captchas – Anti-scraping defenses may interfere with scraping and require workarounds.
- Data Scale Limitations – Directory sampling provides incomplete market coverage vs API access. Crawling and multiple sources help maximize coverage.
- Legal Factors – Respect personal data protections like GDPR when harvesting contact information.
- Infrastructure Needs – Scraping places the burden of data management, storage and re-scraping avoidance on the implementer.
- Cost at Scale – For scraping millions of companies, proxy and infrastructure costs add up. Paid APIs become more efficient at ultra high volumes.
Weight these factors against your use case needs and data scale. For commercial usage, ZoomInfo's API offers higher reliability and coverage. But for some applications, strategic scraping provides a more accessible option.
Conclusion
In this post, we walked through a methodology for scraping ZoomInfo company data at scale using Python. Storing data for business intelligence and lead generation use cases while not without its challenges, with the right approach, scraping can provide access to ZoomInfo's trove of company data for various applications. Of course, this should be done ethically and legally.