How to Scrape Zillow Real Estate Property Data in Python?

Real estate data is incredibly valuable for understanding housing markets, conducting analytics, and making data-driven decisions. Zillow has one of the largest real estate datasets, with over 110 million U.S. homes. In this guide, I'll show you how to extract Zillow real estate data with Python web scraping.

Specifically, we'll cover:

The wealth of real estate data on Zillow
Finding property listings with Zillow's search
Scraping full listing details from property pages
Bypassing blocks and captchas with proxies
Scraping best practices and ethics

Let's dive in!

What Data is Available on Zillow?

Zillow contains a wealth of real estate data that's useful for analytics and research. Here are some of the key data points available:

Property details – address, bedrooms, bathrooms, square footage, etc.
Pricing info – sale price, rent price, price history
Location data – latitude, longitude, county, school district
Photos and virtual tours
Agent and owner contact info

This data is available across 110M+ property pages that can be discovered via Zillow's search feature. Next, let's take a look at how we can find properties.

Finding Listings with Zillow Search

Zillow provides a search feature to lookup real estate listings by address, city, neighborhood, school district, ZIP code, and more. Under the hood, these search queries map to geographic bounding boxes that define an area on the map. Zillow then returns all listings that fall within this area.

For example, searching for “Seattle, WA” translates to a bounding box encompassing the Seattle region:

{
  "west": -122.531866,
  "east": -121.996501, 
  "south": 47.25259,
  "north": 47.745746
}

We can scrape these geographic coordinates from the Zillow search page HTML:

import re
import httpx

search_url = "https://www.zillow.com/seattle-wa_rb/" 

response = httpx.get(search_url)

coords = re.search(r'"mapBounds":({.+?}),', response.text).group(1)
coords = json.loads(coords)

print(coords)

This prints out the bounding box coordinates for Seattle, WA:

{
  "west": -122.531866,
  "east": -121.996501,
  "south": 47.25259,
  "north": 47.745746 
}

With these coordinates, we can directly call Zillow's search API to retrieve listing results:

search_api = "https://www.zillow.com/search/GetSearchPageState.htm"

params = {
  "searchQueryState": json.dumps({"mapBounds": coords,
                                  "regionSelection": []}), 
  "wants": json.dumps({"cat1": ["listResults"]}),
  "requestId": 1 # random ID 
}

response = httpx.get(search_api, params=params)
data = response.json()

print(len(data['cat1']['searchResults']['listResults'])) 
# Prints number of listings found

This API call will return all the property listings within the given geographic bounds. By extracting these coordinates from any Zillow search page, we can easily lookup listings for a given location. Now let's look at scraping the full details from each listing.

Extracting Listing Details by Scraping Property Pages

While the search API provides a listing overview, to get all the details we need to scrape the individual property page. For example:

https://www.zillow.com/homedetails/1519-36th-Ave-Seattle-WA-98122/48791411_zpid/

The page data is loaded into JavaScript variables that we can extract:

import re
import json
import httpx

url = "https://www.zillow.com/homedetails/1519-36th-Ave-Seattle-WA-98122/48791411_zpid/"

response = httpx.get(url)

page_data = re.search(r'window.__WEBPACK_DEFAULT_DATA__ = JSON.parse\("(.+)"\);',
                       response.text).group(1)

page_data = json.loads(page_data)

details = page_data["updatedProps"]["pageProps"]["detailData"] 

print(details["price"])
# $1,150,000

print(details["area"])  
# 2810 sqft

print(details["bathrooms"])
# 2.5 bathrooms

This gives us access to the full listing details, including:

Sale price, rent price, days on market
Full address, neighborhood, county, school district
Home details like square footage, beds, baths
Full photo gallery
Agent contact information
And more!

For example, here is a sample of some listing details scraped:

Detail	Value
Price	$1,150,000
Address	1519 36th Ave, Seattle, WA 98122
Sqft	2810 sqft
Bedrooms	4
Bathrooms	2.5
Days on Market	16
Agent Name	Jane Smith
Agent Phone	206-555-1234

As you can see, an immense amount of useful data is available on each listing page, spanning pricing, location, home details, agent info, and more. Now let's discuss how to run this at scale without getting blocked.

Avoiding Blocks with Proxies

If you scrape too many pages or too rapidly, Zillow may block your IP or serve captchas to prove you are human. This will halt or severely slow down your scraper. To avoid this, we can use proxies which mask the origin of our requests. Proxies make it appear as if your scraper is running from different locations and IPs.

This prevents your scraper from triggering any usage thresholds and appearing like an automated bot. Proxies are essential for scalable, resilient scraping without blocks. Here is how to use proxies with the Python requests library:

import requests

proxies = {
  "http": "http://user:[email protected]:3128",
  "https": "http://user:[email protected]:3128",  
}

response = requests.get(url, proxies=proxies)

Some key points on proxies:

Use authentication – authenticated proxies with username/password are the most stable and will not be blocked.
Rotate frequently – each request should use a different proxy IP to appear human.
Use many providers – combining proxies from multiple sources improves uptime.
Use residential proxies – residential IPs are less likely to be blocked than datacenter IPs.

Providers like BrightData and Smartproxy offer millions of residential IPs perfect for scraping Zillow and avoiding blocks. Their proxies fully support authentication and Python integration. With proxies, you can scrape Zillow at scale without worries of captchas or getting blocked. Next, let's discuss some best practices.

Scraping Best Practices and Ethics

When scraping any website, it's important to follow ethical practices and respect the terms of service. Here are some guidelines for Zillow:

Scrape respectfully: Use reasonable scrape rates and delays to minimize load.
Attribute properly: If using Zillow's data, provide attribution and citations.
Don't redistribute: Do not mass redistribute Zillow's raw data outside your organization.
Use legally: Avoid scraping data like agent contact info for spam/marketing.
Delete on request: Remove any specific listings from your database if requested.
Comply with ToS: Understand and comply with Zillow's ToS. Cease scraping if requested.
Protect data: Store scraped data securely to prevent unauthorized access.

By following ethical practices, we can build scalable scrapers while respecting platforms like Zillow.

Scraping Zillow Data: Next Steps

With these techniques, you can build a Zillow web scraper in Python to extract large real estate datasets. The data is immensely useful for analytics and visualization. Some next steps to consider: