Access PFU Database Products Via `Pins`
Matthew Kuperus Heun
2024-02-07
access_via_pins.Rmd
Introduction
This vignette demonstrates how to use the pins
package
to access products in the PFU database.
Preparation
Take the following steps to prepare for accessing PFU Database products.
(1) Obtain Access to IEA EWEB Data
The PFU database contains Extended World Energy Balance (EWEB) data. To use the PFU Database, you or your institution must already have purchased the EWEB data, obtained a license to use the EWEB data, or subscribed to a data service with access to the EWEB data. At present, you must have access to all countries and all years of EWEB data to access the PFU Database.
(2) Install Dropbox
At present, the PFU Database is distributed via a shared Dropbox folder; thus, users of the database must install Dropbox. At present (May 2023), data in the Dropbox folder are 12 GB. The database will grow in size in the future. You may need to upgrade your Dropbox plan if that amount of data will cause you to exceed your current storage limit.
(3) Receive and accept an invitation
A member of the PFU Database team will provide read-only access to the Dropbox folder in which the PFU Database products are stored. Accept the invitation and let Dropbox sync the PFU Database folder to your computer.
The folder containing PFU Database products is called
PipelineReleases
.
(4) Become familiar with the pins
package
The pins package is the
mechanism by which versions of the PFU Database and its products are
stored, maintained, and distributed. Please review the pins
package before accessing the PFU database data.
(5) Become familiar with the PFU Database and products
Strictly speaking, the PFU Database consists of only primary, final,
and useful energy and exergy data for all countries and all years
available in the IEA EWEB data. The data are arranged in Physical Supply
Use Table (PSUT) format as described in the paper by Heun et al.
entitled “A physical supply-use table framework for energy analysis on
the energy conversion chain.” The data are stored in the Dropbox folder
as .rds
files in matsindf
format.
In addition to the database itself, there are several data products available, most of which are aggregations, subsets, or other calculations made from the database. For example, primary, final, and useful energy and exergy aggregations are available for each country and each year (data product C). Primary-to-final, final-to-useful, and primary-to-final efficiencies are also available for each country and each year (data product D).
The database, its versions, and its data products are documented in
the file named versions and products.xlsx
at the top level
of the PipelineReleases
folder. Look through that file to
determine which PFU Database products are needed for your research. In
particular, note the “Pin name” and the “Pin version” of the data
products that you desire.
Note: there may be other versions of pins in the Dropbox folder.
Ignore all versions of pins except those identified in the
versions and products.xlsx
file.
Accessing the PFU Database and other data products
To load the PFU Database itself into your R
session,
supply the pin name and pin version to pins
functions. The
following code loads v1 of the database itself, using the name of the
pin (“psut”) and the version string for version 1 of the database
(“20221109T152414Z-7d7ad”).
library(pins)
pfu_pinboard <- pins::board_folder(path = "~/Dropbox/PipelineReleases", versioned = TRUE)
psut_data_frame <- pins::pin_read(board = pfu_pinboard, name = "psut",
version = "20221109T152414Z-7d7ad")
Because the database contains a large amount of data, it may take a minute or two to load.
Although we recommend against it, you may load the latest version of the same pin by not specifying the version string.
psut_data_frame_latest <- pins::pin_read(board = pfu_pinboard, name = "psut")
The procedure is the same to load any data product in the database. For example, data product A is a subset of the database that contains “USA” only. Product A may be useful for testing, because it is much smaller in memory than the full database and loads much faster. To load product A for version 1 of the database, use the following code.
psut_data_frame_usa <- pins::pin_read(board = pfu_pinboard, name = "psut_usa",
version = "20230217T182459Z-287a0")
To select only a single country of data, use the following code.
col_only <- psut_data_frame |>
dplyr::filter(Country == "COL")