This tutorial will show you how to install the R package for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.

Setup

For this tutorial, we will need the Data Package R package (datapackage.r).

devtools package is required to install the datapackage.r package from github.

# Install devtools package if not already
install.packages("devtools")

And then install the development version of datapackage.r from github.

install.packages("datapackage.r")
# or install the development package
devtools::install_github("frictionlessdata/datapackage.r")

Load

You can start using the package by loading datapackage.r.

library(datapackage.r)

You can add useful metadata by adding keys to metadata dict attribute. Below, we are adding the required name key as well as a human-readable title key. For the keys supported, please consult the full Data Package spec. Note, we will be creating the required resources key further down below.

dataPackage <- Package.load()
dataPackage$descriptor['name'] <- 'period-table'
dataPackage$descriptor['title'] <- 'Periodic Table'
# commit the changes to Package class
dataPackage$commit()

## [1] TRUE

Infer a CSV Schema

We will use periodic-table data from remote path: https://raw.githubusercontent.com/frictionlessdata/datapackage-r/master/vignettes/exampledata/data.csv

atomic.number	symbol	name	atomic.mass	metal.or.nonmetal.
1	H	Hydrogen	1.00794	nonmetal
2	He	Helium	4.002602	noble gas
3	Li	Lithium	6.941	alkali metal
4	Be	Beryllium	9.012182	alkaline earth metal
5	B	Boron	10.811	metalloid
6	C	Carbon	12.0107	nonmetal
7	N	Nitrogen	14.0067	nonmetal
8	O	Oxygen	15.9994	nonmetal
9	F	Fluorine	18.9984032	halogen
10	Ne	Neon	20.1797	noble gas

We can guess at our CSV’s schema by using infer from the Table Schema package. We pass directly the remote link to the infer function, the result of which is an inferred schema. For example, if the processor detects only integers in a given column, it will assign integer as a column type.

filepath <- 'https://raw.githubusercontent.com/frictionlessdata/datapackage-r/master/vignettes/exampledata/data.csv'

schema <- tableschema.r::infer(filepath)

Once we have a schema, we are now ready to add a resource key to the Data Package which points to the resource path and its newly created schema. Below we define resources with three ways, using json text format with usual assignment operator in R list objects and directly using addResource function of Package class:

# define resources using json text 
resources <- helpers.from.json.to.list(
  '[{
    "name": "data",
    "path": "filepath",
    "schema": "schema"
  }]'
)
resources[[1]]$schema <- schema
resources[[1]]$path <- filepath

# or define resources using list object
resources <- list(list(
  name = "data",
  path = filepath,
  schema = schema
  ))

And now, add resources to the Data Package:

dataPackage$descriptor[['resources']] <- resources
dataPackage$commit()

## [1] TRUE

Or you can directly add resources using addResources function of Package class:

resources <- list(list(
  name = "data",
  path = filepath,
  schema = schema
  ))

dataPackage$addResource(resources)

Now we are ready to write our datapackage.json file to the current working directory.

dataPackage$save('exampledata')

The datapackage.json (download) is inlined below. Note that atomic number has been correctly inferred as an integer and atomic mass as a number (float) while every other column is a string.

jsonlite::prettify(helpers.from.list.to.json(dataPackage$descriptor))

## {
##     "profile": "data-package",
##     "name": "period-table",
##     "title": "Periodic Table",
##     "resources": [
##         {
##             "name": "data",
##             "path": "https://raw.githubusercontent.com/frictionlessdata/datapackage-r/master/vignettes/exampledata/data.csv",
##             "schema": {
##                 "fields": [
##                     {
##                         "name": "atomic number",
##                         "type": "integer",
##                         "format": "default"
##                     },
##                     {
##                         "name": "symbol",
##                         "type": "string",
##                         "format": "default"
##                     },
##                     {
##                         "name": "name",
##                         "type": "string",
##                         "format": "default"
##                     },
##                     {
##                         "name": "atomic mass",
##                         "type": "number",
##                         "format": "default"
##                     },
##                     {
##                         "name": "metal or nonmetal?",
##                         "type": "string",
##                         "format": "default"
##                     }
##                 ],
##                 "missingValues": [
##                     ""
##                 ]
##             },
##             "profile": "data-resource",
##             "encoding": "utf-8"
##         }
##     ]
## }
##

Publishing

Now that you have created your Data Package, you might want to publish your data online so that you can share it with others.

Creating Data Packages in R

Kleanthis Koupidis

2020-03-12

Setup

Load

Infer a CSV Schema

Publishing

Contents