Linking your data

By Andrew Fergusson

June 7, 2022

0:00 / 2:15

Linking your data

Andrew

csvcubed goes beyond improving discoverability through description. It also helps users start their journey towards linked data.

Linked data takes common dimensions, like time periods, geographies, and observation status columns and assigns shared definitions. This makes comparisons between data sets for humans and machines explicit.

In the example data set Sweden at Eurovision no missing, the year column can be linked to a web wide shared definition of year using a template. Templates are easily referenced csvcubed designed pre-configured definitions.

Using a template requires the use of the "columns" property in your configuration json file.

There could be a column definition for each column in your CSV.

However, each column not configured explicitly in your configuration json file will be interpreted by the defaults you saw in previous examples. As a reminder, this is called configuration by convention and is covered in the guide section of csvcubed's documentation.

Since we're providing additional details on the year column, we create a new year property within the columns property and provide the from template year key-value pair. Building the CSVW from this configuration file will now create a linked dimension based on the calendar year with shared IDs representing each year, which can be compared with other data sets years.

The build command becomes

csvcubed build Sweden at Eurovision no missing dot CSV dash dash config linked data dot json

The impacts of the year template become clear when looking in the out folder for the year dot CSV file. In this file you will see in the last column called "Original Concept URI", a series of URLs pointing to reference.data.gov.uk/id/year with the year number at the end. This is the shared definition side of linked data.

Auto-scroll