Real estate data is incredibly valuable for understanding housing markets, conducting analytics, and making data-driven decisions. Zillow has one of the largest real estate datasets, with over 110 million U.S. homes. In this guide, I'll show you how to extract Zillow real estate data with Python web scraping.
Specifically, we'll cover:
- The wealth of real estate data on Zillow
- Finding property listings with Zillow's search
- Scraping full listing details from property pages
- Bypassing blocks and captchas with proxies
- Scraping best practices and ethics
Let's dive in!
What Data is Available on Zillow?
Zillow contains a wealth of real estate data that's useful for analytics and research. Here are some of the key data points available:
- Property details – address, bedrooms, bathrooms, square footage, etc.
- Pricing info – sale price, rent price, price history
- Location data – latitude, longitude, county, school district
- Photos and virtual tours
- Agent and owner contact info
This data is available across 110M+ property pages that can be discovered via Zillow's search feature. Next, let's take a look at how we can find properties.
Finding Listings with Zillow Search
Zillow provides a search feature to lookup real estate listings by address, city, neighborhood, school district, ZIP code, and more. Under the hood, these search queries map to geographic bounding boxes that define an area on the map. Zillow then returns all listings that fall within this area.
For example, searching for “Seattle, WA” translates to a bounding box encompassing the Seattle region:
{ "west": -122.531866, "east": -121.996501, "south": 47.25259, "north": 47.745746 }
We can scrape these geographic coordinates from the Zillow search page HTML:
import re import httpx search_url = "https://www.zillow.com/seattle-wa_rb/" response = httpx.get(search_url) coords = re.search(r'"mapBounds":({.+?}),', response.text).group(1) coords = json.loads(coords) print(coords)
This prints out the bounding box coordinates for Seattle, WA:
{ "west": -122.531866, "east": -121.996501, "south": 47.25259, "north": 47.745746 }
With these coordinates, we can directly call Zillow's search API to retrieve listing results:
search_api = "https://www.zillow.com/search/GetSearchPageState.htm" params = { "searchQueryState": json.dumps({"mapBounds": coords, "regionSelection": []}), "wants": json.dumps({"cat1": ["listResults"]}), "requestId": 1 # random ID } response = httpx.get(search_api, params=params) data = response.json() print(len(data['cat1']['searchResults']['listResults'])) # Prints number of listings found
This API call will return all the property listings within the given geographic bounds. By extracting these coordinates from any Zillow search page, we can easily lookup listings for a given location. Now let's look at scraping the full details from each listing.
Extracting Listing Details by Scraping Property Pages
While the search API provides a listing overview, to get all the details we need to scrape the individual property page. For example:
https://www.zillow.com/homedetails/1519-36th-Ave-Seattle-WA-98122/48791411_zpid/
The page data is loaded into JavaScript variables that we can extract:
import re import json import httpx url = "https://www.zillow.com/homedetails/1519-36th-Ave-Seattle-WA-98122/48791411_zpid/" response = httpx.get(url) page_data = re.search(r'window.__WEBPACK_DEFAULT_DATA__ = JSON.parse\("(.+)"\);', response.text).group(1) page_data = json.loads(page_data) details = page_data["updatedProps"]["pageProps"]["detailData"] print(details["price"]) # $1,150,000 print(details["area"]) # 2810 sqft print(details["bathrooms"]) # 2.5 bathrooms
This gives us access to the full listing details, including:
- Sale price, rent price, days on market
- Full address, neighborhood, county, school district
- Home details like square footage, beds, baths
- Full photo gallery
- Agent contact information
- And more!
For example, here is a sample of some listing details scraped:
Detail | Value |
---|---|
Price | $1,150,000 |
Address | 1519 36th Ave, Seattle, WA 98122 |
Sqft | 2810 sqft |
Bedrooms | 4 |
Bathrooms | 2.5 |
Days on Market | 16 |
Agent Name | Jane Smith |
Agent Phone | 206-555-1234 |
As you can see, an immense amount of useful data is available on each listing page, spanning pricing, location, home details, agent info, and more. Now let's discuss how to run this at scale without getting blocked.
Avoiding Blocks with Proxies
If you scrape too many pages or too rapidly, Zillow may block your IP or serve captchas to prove you are human. This will halt or severely slow down your scraper. To avoid this, we can use proxies which mask the origin of our requests. Proxies make it appear as if your scraper is running from different locations and IPs.
This prevents your scraper from triggering any usage thresholds and appearing like an automated bot. Proxies are essential for scalable, resilient scraping without blocks. Here is how to use proxies with the Python requests library:
import requests proxies = { "http": "http://user:[email protected]:3128", "https": "http://user:[email protected]:3128", } response = requests.get(url, proxies=proxies)
Some key points on proxies:
- Use authentication – authenticated proxies with username/password are the most stable and will not be blocked.
- Rotate frequently – each request should use a different proxy IP to appear human.
- Use many providers – combining proxies from multiple sources improves uptime.
- Use residential proxies – residential IPs are less likely to be blocked than datacenter IPs.
Providers like BrightData and Smartproxy offer millions of residential IPs perfect for scraping Zillow and avoiding blocks. Their proxies fully support authentication and Python integration. With proxies, you can scrape Zillow at scale without worries of captchas or getting blocked. Next, let's discuss some best practices.
Scraping Best Practices and Ethics
When scraping any website, it's important to follow ethical practices and respect the terms of service. Here are some guidelines for Zillow:
- Scrape respectfully: Use reasonable scrape rates and delays to minimize load.
- Attribute properly: If using Zillow's data, provide attribution and citations.
- Don't redistribute: Do not mass redistribute Zillow's raw data outside your organization.
- Use legally: Avoid scraping data like agent contact info for spam/marketing.
- Delete on request: Remove any specific listings from your database if requested.
- Comply with ToS: Understand and comply with Zillow's ToS. Cease scraping if requested.
- Protect data: Store scraped data securely to prevent unauthorized access.
By following ethical practices, we can build scalable scrapers while respecting platforms like Zillow.
Scraping Zillow Data: Next Steps
With these techniques, you can build a Zillow web scraper in Python to extract large real estate datasets. The data is immensely useful for analytics and visualization. Some next steps to consider:
- Building a consolidated real estate database from this data
- Analyzing trends across different cities, neighborhoods, zip codes
- Creating compelling visualizations and dashboards
- Building a Zillow price tracking tool
- Expanding the scraper to capture school and crime data