Upsert a data frame with optional recoding of foreign keys
pl_upsert.Rd
Upserts
(inserts or updates,
depending on whether the private keys in .df
already exist in db_table_name
)
.df
into db_table_name
at conn
.
Usage
pl_upsert(
.df,
conn,
db_table_name = NULL,
additional_hash_group_cols = NULL,
usual_hash_group_cols = PFUPipelineTools::usual_hash_group_cols,
keep_single_unique_cols = TRUE,
in_place = FALSE,
encode_fks = TRUE,
compress = FALSE,
round_double_columns = FALSE,
digits = 15,
index_map =
magrittr::set_names(list(fk_parent_tables[[IEATools::row_col_types$industry]],
fk_parent_tables[[IEATools::row_col_types$product]],
fk_parent_tables[[IEATools::row_col_types$other]]),
c(IEATools::row_col_types$industry, IEATools::row_col_types$product,
IEATools::row_col_types$other)),
retain_zero_structure = FALSE,
schema = schema_from_conn(conn),
fk_parent_tables = get_all_fk_tables(conn = conn, schema = schema, collect = TRUE),
.db_table_name = PFUPipelineTools::hashed_table_colnames$db_table_name,
.pk_col = PFUPipelineTools::dm_pk_colnames$pk_col,
.algo = "md5"
)
Arguments
- .df
The data frame to be upserted.
- conn
A connection to the CL-PFU database.
- db_table_name
A string identifying the destination for
.df
inconn
, i.e. the name of a remote database table. Default isNULL
, meaning that the value for this argument will be taken from the.db_table_name
column of.df
.- additional_hash_group_cols
A vector or list of additional columns by which
.df
will be grouped before hashing and, therefore, appear in the output. Default isNULL
. Passed topl_hash()
.- usual_hash_group_cols
A vector of columns by which
.df
will be grouped before hashing and, therefore, appear in the output. Default isPFUPipelineTools::additional_hash_group_cols
but can be set toNULL
to disable. Passed topl_hash()
.- keep_single_unique_cols
A boolean that tells whether to keep columns with a single unique value in the output. Default is
TRUE
. Passed topl_hash()
.- in_place
A boolean that tells whether to modify the database at
conn
. Default isFALSE
, which is helpful if you want to chain several requests.- encode_fks
A boolean that tells whether to code foreign keys in
.df
before upserting toconn
. Default isTRUE
.- compress
A boolean that tells whether to compress
db_table_name
in the database after uploading. Default isFALSE
.- round_double_columns
A boolean that tells whether to round double-precision columns in
.df
. Default isFALSE
.- digits
An integer that tells the number of significant digits.
digits
has an effect only whenround_double_columns
isTRUE
. Default is15
, which should eliminate any numerical precision errors forcompress_rows()
.- index_map
A list of 2 or more data frames that represent the mappings from inboard row and column indices in the database to outboard row and column names in the memory of the local computer. See documentation for
encode_matsindf()
andmatsbyname::to_triplet()
. Default is alist
that contains theindustry
,product
, andother
members offk_parent_tables
.- retain_zero_structure
A boolean that tells whether to retain the structure of zero matrices. See details.
- schema
The data model (
dm
object) for the database inconn
. Default isdm_from_con(conn, learn_keys = TRUE)
. See details.- fk_parent_tables
A named list of all parent tables for the foreign keys in
db_table_name
. See details.- .db_table_name
The name of the table name column in
.df
. Default isPFUPipelineTools::hashed_table_colnames$db_table_name
.- .pk_col
The name of the primary key column in a primary key table. See
PFUPipelineTools::dm_pk_colnames
.- .algo
The hashing algorithm. Default is "md5".
Details
This function decodes foreign keys (fks), when possible,
assuming that all fks are integers.
If non-integers (typically, character strings)
are provided in fk columns of .df
,
the non-integers will be recoded to their appropriate integer key values.
This function knows about CL-PFU database tables that contain
matrix information.
In particular, if .df
contains matrices,
they are expanded into row-col-val format
before uploading.
The output of this function is a special data frame that contains the following columns:
All single-valued columns columns in
.df
, columns given inadditional_hash_group_cols
(defaultNULL
), and columns given inusual_hash_group_cols
(defaultPFUPipelineTools::usual_hash_group_cols
).Hash: A column with a hash of all non-foreign-key columns.
schema
is a data model (dm
object) for the database in conn
.
Its default value (schema_from_conn(conn)
)
extracts the data model for the database at conn
automatically.
However, if the caller already has the data model,
supplying it in the schema
argument will save time.
fk_parent_tables
is a named list of tables,
one of which (the one named db_table_name
)
contains the foreign keys for db_table_name
.
fk_parent_tables
is treated as a store from which foreign key tables
are retrieved by name when needed.
The default value (which calls get_all_fk_tables()
with collect = TRUE
because decoding of foreign keys
is done outboard of the database)
retrieves all possible foreign key parent tables from conn
,
potentially a time-consuming process.
For speed, pre-compute all foreign key parent tables once
(via get_all_fk_tables(collect = TRUE)
)
and pass the list to the fk_parent_tables
argument
of this function.
The user in conn
must have write access to the database.
By default, pl_upsert()
will delete all zero entries
in matrices before upserting.
But for some countries and years,
that could result in missing matrices, such as U_EIOU.
Set retain_zero_structure = TRUE
to preserve all entries in a zero matrix.
See also
pl_download()
for the reverse operation.
pl_upload_schema_and_simple_tables()
for a way to establish the database schema.