Filter a table from the database using natural expressions
pl_filter_collect.Rd
Often when collecting data from the database,
filtering is desired.
But filtering based on foreign keys
(fks, as stored in the database)
is effectively impossible, because of foreign key encoding.
This function filters based on
fk values (typically strings),
not fk keys (typically integers),
thereby simplifying the filtering process,
with optional downloading thereafter.
By default (collect = FALSE
),
a tbl
is returned
(and data are not downloaded from the database).
Use dplyr::collect()
to execute the resulting SQL query
and obtain an in-memory data frame.
Or, set collect = TRUE
to execute the SQL and
return an in-memory data frame.
Usage
pl_filter_collect(
db_table_name,
...,
version_string = NULL,
collect = FALSE,
create_matsindf = collect,
conn,
schema = schema_from_conn(conn = conn),
fk_parent_tables = get_all_fk_tables(conn = conn, schema = schema),
index_map_name = "Index",
index_map = fk_parent_tables[[index_map_name]],
rctype_table_name = "matnameRCType",
rctypes = decode_fks(db_table_name = rctype_table_name, collect = TRUE, conn = conn,
schema = schema, fk_parent_tables = fk_parent_tables),
matrix_class = c("Matrix", "matrix"),
matname = PFUPipelineTools::mat_meta_cols$matname,
matval = PFUPipelineTools::mat_meta_cols$matval,
rowtype_colname = PFUPipelineTools::mat_meta_cols$rowtype,
coltype_colname = PFUPipelineTools::mat_meta_cols$coltype,
valid_from_version_colname = PFUPipelineTools::dataset_info$valid_from_version_colname,
valid_to_version_colname = PFUPipelineTools::dataset_info$valid_to_version_colname
)
Arguments
- db_table_name
The string name of the database table to be filtered.
- ...
Filter conditions on the data frame, such as
Country == "USA"
orYear %in% 1960:2020
. These conditions reduce data volume, because they are applied prior to downloading fromconn
. If no rows match these conditions, a data frame with no rows is returned.- version_string
A string of length
1
or more that indicates the desired version(s).NULL
, the default, means to download all versions available indb_table_name
.c()
(an empty string) returns a zero-row table. Ifversion_string
is invalid, an error will be emitted.- collect
A boolean that tells whether to download the result. Default is
FALSE
. See details.- create_matsindf
A boolean that tells whether to create a matsindf data frame from the collected data frame. Default is the value of
collect
, such that settingcollect = TRUE
also impliescreate_matsindf
.- conn
The database connection.
- schema
The data model (
dm
object) for the database inconn
. Default isschema_from_conn(conn = conn)
. See details.- fk_parent_tables
A named list of all parent tables for the foreign keys in
db_table_name
. Default isget_all_fk_tables(conn = conn, schema = schema)
. See details.- index_map_name
The name of the table that serves as the index for row and column names. Default is "Index".
- index_map
The index map for the matrices in the database at
conn
. Default isfk_parent_tables[[index_table_name]]
.- rctype_table_name
The name of the table that contains row and column types. Default is "matnameRCType".
- rctypes
The table of row and column types for the database at
conn
. Default isfk_parent_tables[[rctype_table_name]]
.- matrix_class
One of "Matrix" (the default) for sparse matrices or "matrix" (the base matrix representation in
R
) for non-sparse matrices.- matname
The name of the matrix name column. Default is
PFUPipelineTools::mat_meta_cols$matname
.- matval
The name of the matrix value column. Default is
PFUPipelineTools::mat_meta_cols$matval
.- rowtype_colname, coltype_colname
The names for row and column type columns in data frames. Defaults are
PFUPipelineTools::mat_meta_cols$rowtype
andPFUPipelineTools::mat_meta_cols$coltype
, respectively.- valid_from_version_colname, valid_to_version_colname
Names for columns containing version information. Defaults are
PFUPipelineTools::dataset_info$valid_from_version_colname
andPFUPipelineTools::dataset_info$valid_to_version_colname
, respectively.
Details
schema
is a data model (dm
object) for the CL-PFU database.
It can be obtained from calling schema_from_conn()
.
If minimal interaction with the database is desired,
be sure to override the default value for schema
by supplying a pre-computed dm
object.
fk_parent_tables
is a named list of tables,
some of which are fk parent tables containing
the mapping between fk values (usually strings)
and fk keys (usually integers)
for db_table_name
.
fk_parent_tables
is treated as a store from which foreign key tables
are retrieved by name when needed.
An appropriate value for fk_parent_tables
can be obtained
from get_all_fk_tables()
.
If minimal interaction with the database is desired,
be sure to override the default value for fk_parent_tables
by supplying a pre-computed named list of
foreign key tables.