JSON has rapidly become the universal data language of the web. Whether you're working with web APIs, scraping sites, analyzing data files, or wrangling JSON logs, having a way to easily extract and query JSON data is essential. This is where JSONPath comes in.
JSONPath provides a simple but powerful syntax for lookups, extracts, and filters on JSON documents. With JSON now used almost everywhere, JSONPath is an invaluable tool for any Python developer. In this comprehensive guide, we'll dive into how to use JSONPath for querying JSON data in Python.
Querying JSON Data
Querying and extracting values from JSON docs is easy with Python's json
module. But things get messy quickly when working with nested structures and large, complex JSON documents. Looping, indexes, and lots of conditions are needed to zero in on the data you want.
This is where JSONPath comes in – it provides a simple domain-specific language for declaratively specifying paths in JSON objects.
Modeled after XPath for querying XML, JSONPath expressions describe the location and filters needed to extract values.
Here's a quick example. Given this JSON data:
{ "store": { "books": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 } ] } }
We can use JSONPath to directly grab the titles with:
import jsonpath_ng titles = jsonpath_ng.parse('$.store.books[*].title') # ['Sayings of the Century', 'Sword of Honour']
No need for loops, indexes, or conditions – just a simple path to the data we want.
JSONPath Syntax Basics
The JSONPath syntax consists of just dot selectors, array indexing, filters, and other handy shortcuts:
$
– The root object@
– The current node.
or[]
– Child selectors..
– Recursive descent*
– Wildcard for keys[start:end:step]
– Array slice[?(condition)]
– Filter
Here are some examples of JSONPath expressions and what they select:
# All books $.store.books[*] # First book $.store.books[0] # Last two books $.store.books[-2:] # Filter books under 10 $.store.books[?(@.price < 10)] # Recursively find all prices $..price
As you can see, JSONPath provides an extensive set of operators for extracting and filtering JSON data. Let's look at how we can use it for parsing JSON in Python.
JSONPath Libraries for Python
There are a few different JSONPath implementations for Python. The two most popular are:
- jsonpath-ng – Actively maintained, simple and Pythonic syntax.
- jsonpath-rw – More full-featured but less Pythonic API.
I generally recommend jsonpath-ng as I find its syntax cleaner and easier to use from Python code:
from jsonpath_ng import parse data = # ... parse('$.store.books[0]').find(data)
Make sure to import it as jsonpath_ng
to get the nice parse()
method. You can install it easily with pip:
$ pip install jsonpath-ng
Now let's see it in action!
Querying JSON Data with JSONPath
Let's walk through some examples to see how JSONPath makes querying JSON a breeze in Python. We'll use the following sample data of books separated by fiction/non-fiction category:
data = { "store": { "books": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 }, { "category": "fiction", "author": "J. R. R. Tolkien", "title": "The Lord of the Rings", "isbn": "0-395-19395-8", "price": 22.99 } ], "bicycle": { "color": "red", "price": 19.95 } } }
Let's extract some data:
Get all book titles:
from jsonpath_ng import parse titles = parse('$.store.books[*].title').find(data) print(titles) # ['Sayings of the Century', 'Sword of Honour', 'The Lord of the Rings']
Get the first book:
first_book = parse('$.store.books[0]').find(data) print(first_book[0].value) # {'category': 'reference', 'author': 'Nigel Rees', ...}
Get all fiction book titles:
fiction_titles = parse('$..book[?(@.category=="fiction")].title').find(data) print(fiction_titles) # ['Sword of Honour', 'The Lord of the Rings']
Get the bicycle price:
price = parse('$.store.bicycle.price').find(data)[0].value print(price) # 19.95
As you can see, JSONPath allows us to directly specify the key paths and filters we need to extract values from JSON objects.
Using JSONPath for Web Scraping
One of the most common uses for JSONPath is extracting data when web scraping. Many modern websites render content dynamically via JavaScript. Scraper bots like Python can retrieve this raw JSON data, but then we need a way to parse it.
Let's walk through a real-world example using JSONPath and Python to scrape a site. We'll extract book data from the site books.toscrape.com, which loads content via JSON APIs.
First we'll make requests to grab the JSON data. I like using the httpx library:
import httpx index_data = httpx.get('https://books.toscrape.com/').json() book_data = httpx.get('https://books.toscrape.com/catalogue/the-grand-design_405/index.html').json()
This returns JSON objects with lots of extra properties and markup. We want to extract fields like title, price, stock, etc. This is where JSONPath comes in handy!
from jsonpath_ng import parse # Get index page book titles titles = parse('$..product_pod[*].title').find(index_data) # Get stock for a book stock = parse('$.stock').find(book_data)[0].value # Get book price (float value) price = parse('$.price').find(book_data)[0].value price = float(price[1:]) # Remove £ character
So with just a few simple JSONPath expressions, we were able to extract the key data we want from the raw JSON.
Going Beyond Basic Paths
So far we've looked at simple path-based queries. But JSONPath also provides operators for filtering, slicing, and transformations. Let's look at some examples:
Filter books over 15 dollars:
over_15 = parse('$[?(@.price > 15)]').find(books_data)
Sort books by price:
cheap_first = parse('$[*]').sort('@.price').find(books_data)
Sum all book prices:
total_price = parse('$..price[*]').sum().find(books_data)[0].value
Extract ISBNs:
isbns = [match.value for match in parse('$..isbn').find(books_data)]
We can even define custom filters:
def under_10(value): return value < 10 cheap_books = parse('$[?(@.price)]', under_10).find(books_data)
So JSONPath provides extensive query and transformation capabilities beyond just simple data extraction.
JSONPath Implementations
We've covered the jsonpath-ng library which I recommend for Python use. But there are also implementations for many other languages:
Language | Library |
---|---|
Python | jsonpath-ng |
Javascript | jsonpath |
PHP | jsonpath |
Go | gjsonpath |
Java | JsonPath |
C# | Json.NET |
JSONPath is also integrated directly into many databases like MongoDB, CouchDB, and Elasticsearch. So learning JSONPath is applicable across many programming languages and systems. It's well worth becoming familiar with.
JSONPath Alternatives
The main alternative to JSONPath is JMESPath. JMESPath provides a very similar JSON query language but takes more inspiration from JavaScript rather than XPath.
Some key differences between JMESPath vs JSONPath:
- JMESPath uses . dotted paths rather than / slashes
- JMESPath has no explicit root $ selector
- Advanced filters like contains(), starts_with(), etc
- More operators for transforms like limiting output
In practice they achieve largely the same goal. JSONPath takes inspiration from XPath while JMESPath has more JavaScript similarity. I tend to use JSONPath out of habit, but for more complex queries and transformations JMESPath can be preferable.
Tips for Production JSONPath Usage
Here are some tips for using JSONPath effectively based on my experience as a web scraping expert:
- Use an HTTP library that returns JSON decoded data – Avoid extra JSON decoding steps.
- Extract IDs/keys first, then make lookups to gather related data – Often faster than mega queries.
- Cache common query parsers – Reuse jsonpath_ng objects when querying repeatedly.
- Profile queries on large docs – JSONPath can get slow on huge JSON. Use indexing if needed.
- Pair with a proxy API to avoid blocks – Rotate IPs when making JSON scraping requests.
- Implement in Scrapy spiders – Integrates nicely with Scrapy's item pipelines.
- Transform results for cleaner data – Leverage sorting, filtering, mappings, etc.
- Combine with other tools – Use JSONPath for extraction, then pass data to Pandas, jq, etc for more complex wrangling.
Limitations to Be Aware Of
For all its usefulness, JSONPath does have some limitations:
- No joins between different structures/docs
- Minimal built-in operations and transforms
- Can get slow on huge documents
- No complex expressions or calculations
Chaining multiple JSONPath queries can get messy. So for more complex analysis I typically recommend extracting data with JSONPath and then passing it to Pandas or even a real database. JSONPath serves the singular purpose of simplifying queries and extraction – not as a full data analysis toolkit.
JSONPath – Essential for Python Devs Working With JSON
Hopefully, this article provided a great intro to using JSONPath for querying JSON documents in Python. The simple but flexible syntax makes extracting nested values and performing filters a breeze. For any Python developer working with JSON data – whether scraping sites, consuming APIs, analyzing logs, etc – JSONPath is an invaluable tool to add to your toolkit.