Wednesday, March 11, 2026

Access shared data with the novel Python API client

Share

Access shared data with the novel Python API client
Photo by the editor

# Entry

Data is the foundation of every data scientist’s work. Without useful and valid data sources, we cannot perform our responsibilities. Moreover, low-quality or irrelevant data will make our work go to waste. Therefore, access to reliable datasets is an significant starting point for data professionals.

Shared data is an open-source initiative by Google that aims to organize the world’s data and make it accessible to anyone. Anyone can view publicly available data for free. What sets Data Commons apart from other public dataset projects is that it already does the schematic work, making the data ready to utilize much faster.

Given the usefulness of Data Commons in our work, access to them becomes crucial for many data-related tasks. Fortunately, Data Commons provides a novel Python API client for accessing these datasets.

# Accessing the Data Commons using Python

Data Commons works by organizing data into a searchable knowledge graph that unifies information from various sources. At its core, it uses a model based on the z-schema schema.org standardization of data representation.

Using this schema, Data Commons can combine data from different sources into a single graph in which nodes represent entities (such as cities, locations, and people), events, and statistical variables. Edges represent the relationships between these nodes. Each node is unique and can be identified by a DCID (Data Commons ID), and many nodes contain observations – measurements associated with a variable, unit, and period.

Thanks to the Python API, we can easily access the knowledge graph to obtain the necessary data. Let’s try how we can do this.

First we need to get a free one API key to access the Data Commons. Create a free account and copy your API key to a unthreatening location. You can also utilize trial API keybut access is more circumscribed.

Then install the Data Commons Python library. We will be using API Client V2 as it is the latest version. To do this, run the following command to install the Data Commons client with optional support Pandas Data frames too.

pip install "datacommons-client[Pandas]"

After installing the library, we are ready to download data using the Data Commons Python client.

To create a client that will access data from the cloud, run the code below.

from datacommons_client.client import DataCommonsClient

client = DataCommonsClient(api_key="YOUR-API-KEY")

One of the most significant concepts in Data Commons is entity, which refers to a persistent and physical thing in the real world, such as a city or country. It becomes an significant part of data retrieval because most data sets require a unit to be specified. You can visit A place for shared data page to see all available entities.

For most users, the data we want to capture is more specific: statistical variables stored in the Data Commons. To select the data we want to download, we need to know the DCID of the statistical variables, which can be found using the button Statistical Variable Explorer.

Access shared data with the new Python API clientAccess shared data with the new Python API client

You can filter variables and select a dataset from the above options. For example, select the World Bank dataset for “ATMs per 100,000 adults.” In this case, you can get the DCID by checking the information provided in Explorer.

Access shared data with the new Python API clientAccess shared data with the new Python API client

If you click on the DCID, you can see all the information related to the node, including how it connects to other information.

Access shared data with the new Python API clientAccess shared data with the new Python API client

For the DCID statistical variable, we also need to determine the entity’s DCID for a given geographic location. We can look at the Data Commons Place page mentioned above, or we can utilize the code below to see the available DCIDs for a specific place name.

# Look up DCIDs by place name (returns multiple candidates)
resp = client.resolve.fetch_dcids_by_name(names="Indonesia").to_dict()
dcid_list = [c["dcid"] for c in resp["entities"][0]["candidates"]]
print(dcid_list)

With output similar to the following:

['country/IDN', 'geoId/...' , '...']

Using the above code, we retrieve the DCID candidates available for a specific place name. For example, we can choose among the candidates for “Indonesia”. country/IDN as a DCID country.

All the information we need is already ready, all we need to do is execute the following code:

variable = ["worldBank/GFDD_AI_25"]
entity = ["country/IDN"]

df = client.observations_dataframe(
    variable_dcids=variable,
    date="all",
    entity_dcids=entity
)

The result is shown in the dataset below.

Access shared data with the new Python API clientAccess shared data with the new Python API client

The current code returns all available observations for the selected variables and units over the entire time frame. In the above code you will also notice that we utilize lists instead of individual strings.

This is because we can pass multiple variables and units at the same time to obtain a combined set of data. For example, the code below retrieves two different statistical variables and two units at once.

variable = ["worldBank/GFDD_AI_25", "worldBank/SP_DYN_LE60_FE_IN"]
entity = ["country/IDN", "country/USA"]

df = client.observations_dataframe(
    variable_dcids=variable,
    date="all",
    entity_dcids=entity
)

With results similar to the following:

Access shared data with the new Python API clientAccess shared data with the new Python API client

You can see that the resulting DataFrame combines the variables and units you set earlier. With this method, you can get the data you need without performing separate queries for each combination.

That’s everything you need to know about accessing Data Commons with the novel Python API client. Exploit this library when you need reliable public data for your work.

# Summary

Data Commons is an open-source project of Google that aims to democratize access to data. The project is inherently different from many public data projects because the datasets are built around a knowledge graph framework, making it easier to standardize the data.

In this article, we discussed how to access data sets in a graph using Python – using statistical variables and entities to retrieve observations.

I hope this helped!

Cornelius Yudha Vijaya is an assistant data analytics manager and data writer. Working full time at Allianz Indonesia, he loves sharing Python tips and data through social media and writing media. Cornellius writes on a variety of topics related to artificial intelligence and machine learning.

Latest Posts

More News