Design your CSV
By Andrew Fergusson
June 7, 2022
0:00 / 1:22
Design your CSV
Andrew

A variant of tidy data is the starting point for csvcubed. We support a long form of tidy data where each observation is on its own row, which we call "standard form" in our project.

I'll demonstrate this using one of my favorite data sets: Sweden's participation in Eurovision. Eurovision is the world's largest and most watched live music event produced by the European Broadcasting Union. It started in 1956 for many reasons one of which was to improve adoption of broadcasting standards in Europe. Sweden first joined Eurovision in 1958.

This dataset captures three kinds of observations per year. Number of points received in the finale, rank in the finale, and number of artists representing Sweden onstage during the contest.

As you can see, this is a fairly straightforward dataset with standard dimensions or factors of "year", "entrant", "language", and "song".

The values in a column called "values" are paired with two columns, which provide the measure and unit information for each value - called "measure" and "unit".

It is good practice to reuse units that have already been defined. And in this case, the number people on the stage is a number, however points and rank are "unitless".

Next we'll cover building a CSVW.