Title: | Deal with Country Data in an Easy Way |
---|---|
Description: | Wrangle country data more effectively and quickly. This package contains functions to easily identify and convert country names, download country information, merge country data from different sources, and make quick world maps. |
Authors: | Francesco Saverio Bellelli [aut, cre, cph] (https://fbellelli.com/) |
Maintainer: | Francesco Saverio Bellelli <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2025-02-14 04:57:08 UTC |
Source: | https://github.com/fbellelli/countries |
When at least 3 country names or years are found in the column names, the function will automatically transform the table from a wide to a long format by pivoting the country/year columns.
This is equivalent to applying tidyr::pivot_longer()
or data.table::melt()
on the columns with years or countries as names.
The function is able to detect years also when they are preceded by a prefix.
auto_melt( x, names_to = "pivoted_colnames", values_to = "pivoted_data", verbose = TRUE, pivoting_info = FALSE )
auto_melt( x, names_to = "pivoted_colnames", values_to = "pivoted_data", verbose = TRUE, pivoting_info = FALSE )
x |
A data.frame object to check and pivot country or year columns. |
names_to |
String indicating how the column holding the name of the pivoted columns should be called in the output table. Default is |
values_to |
String indicating how the column containing the values of the pivoted columns should be called in the output table. Default is |
verbose |
Logical value. If set to |
pivoting_info |
Logical value indicating whether to return the list of names of the column that have been pivoted. Default is |
A table transformed into a "long" format by pivoting country or year columns. If year columns are found, a numeric column called "year_pivoted_colnames"
is added isolating the years extracted from the table header's.
auto_merge, find_countrycol,find_timecol
# example data example <- data.frame(Date = c("01.01.2019", "01.02.2019", "01.03.2019"), Japan = 1:3, Norway = 2:4, Germany = 3:5, US = 4:6) example2 <- data.frame(Sector = c("Agriculture", "Mining", "Forestry"), X2000 = 1:3, X2001 = 2:4, X2002 = 3:5, X2003 = 4:6) # examples pivotting countries and years from column names auto_melt(example) auto_melt(example2)
# example data example <- data.frame(Date = c("01.01.2019", "01.02.2019", "01.03.2019"), Japan = 1:3, Norway = 2:4, Germany = 3:5, US = 4:6) example2 <- data.frame(Sector = c("Agriculture", "Mining", "Forestry"), X2000 = 1:3, X2001 = 2:4, X2002 = 3:5, X2003 = 4:6) # examples pivotting countries and years from column names auto_melt(example) auto_melt(example2)
The aim of this function is to simplify country data merging for quick analyses. Compared to a normal merge function auto_merge()
:
Is able to perform the merging of multiple data tables at once.
Supports automatic detection of columns to merge.
It is able to handle different country naming conventions and date formats. For example, it will be able to recognise that "Italy" and "ITA" refer to the same country and will merge the two entries across tables.
It detects if data is in a wide format with country names or years in the column names and will automatically pivot the data.
auto_merge( ..., by = NULL, country_to = "ISO3", inner_join = FALSE, merging_info = FALSE, verbose = TRUE, auto_melt = TRUE )
auto_merge( ..., by = NULL, country_to = "ISO3", inner_join = FALSE, merging_info = FALSE, verbose = TRUE, auto_melt = TRUE )
... |
Data to be merged. Inputs need to be data frames or coercible to data frame. Tables can also be provided into a single list e.g. |
by |
A list or a vector indicating the columns to be used for merging the data. If not provided, the function will try to automatically detect columns to be merged. For more information, refer to the details sections. |
country_to |
Nomenclature to which country names should be converted to in the output. Default is |
inner_join |
Logical value indicating whether to perform an inner join. The default is |
merging_info |
Logical value. If |
verbose |
Logical value indicating whether to print status messages on the console. Default is |
auto_melt |
Logical value indicating whether to automatically pivot country names or years present in the column names. Default is |
Automatic detection of columns to merge.
The automatic detection process starts by first identifying the key of each table, i.e. a set of variables identifying the entries in the table. This process is optimised for common formats of country data.
The function will then try to match key columns across tables based on their values.
Columns containing country names and time information are identified and are processed to take into account different nomenclatures and time formats.
This automatic process works for the most common dataset structures, but it is not foolproof. Therefore, we always advise to check the columns that are being merged by setting verbose = TRUE
and reading the printout.
Moreover, users should be aware that this automatic detection process can increase the overall merging time considerably. This can be especially long for tables containing many columns or when a large number of tables is being merged.
Formatting of by
argument
If an argument is provided to by
, it needs to be either 1) a list of column names, or 2) a vector of regular expressions. The format requirements are the following:
In case a list is passed, each element of the list must be a vector of length equal to the number of tables being merged (i.e., if 3 tables are being merged, the list needs to contain all vectors of length 3). The vectors should contain the names of columns to be merged in each table, NA
can be inserted for tables that do not contain the variable, and names should be ordered in the same order of the tables that are being merged (i.e. the first column name should be present in the first table being merged). The name of the merged columns can be modified by assigning a name to the elements of the list. For example, list("countries"=c("Nation",NA,"COUNTRY"), "sector"=c("Industry","industry",NA))
is requesting to merge the columns tab1$Nation
and tab3$COUNTRY
, and the columns tab1$Industry
and tab2$industry
. These two merged columns will be named "countries"
and "sector"
in the output, as requested by the user.
In case a vector is passed, each element is interpreted as a regular expression to be used for matching the columns to be merged. For example, the same order provided in the list example could be written as c("countries"="Nation|COUNTRY", "sector"="[Ii]ndustry")
. This will merge the first column in each table whose name matches the pattern described by the regular expression and will name the two resulting columns as "countries"
and "sector"
respectively.
If merging_info = FALSE
a single merged table is returned. If merging_info = TRUE
, a list object is returned, containing the merged table (merged_table
), a table summarising which columns have been merged (info_merged_columns
), a table summarising the conversion of country names (info_country_names
), a table summarising the conversion of time columns to a common format (info_time_formats
), a list of all the columns that have been pivoted when wide tables with country or years in column names were detected (pivoted_columns
), a list recapitulating the inputs passed to the function (call
).
# sample data tab1 <- data.frame(Industry = c(1, 1, 2, 2), Nation = c("ITA", "FRA", "ITA", "FRA"), tot = runif(4)) tab2 <- data.frame(industry = 1:4, rate = runif(1:4)) tab3 <- data.frame(COUNTRY = c("United States", "France", "India"), national_avg = runif(3)) # examples of merging orders auto_merge(tab1, tab2, tab3) auto_merge(list(tab1, tab2, tab3)) auto_merge(tab1, tab2, tab3, by = c("countries"="Nation|COUNTRY", "sector"="[Ii]ndustry")) auto_merge(tab1, tab2, tab3, country_to = "UN_fr")
# sample data tab1 <- data.frame(Industry = c(1, 1, 2, 2), Nation = c("ITA", "FRA", "ITA", "FRA"), tot = runif(4)) tab2 <- data.frame(industry = 1:4, rate = runif(1:4)) tab3 <- data.frame(COUNTRY = c("United States", "France", "India"), national_avg = runif(3)) # examples of merging orders auto_merge(tab1, tab2, tab3) auto_merge(list(tab1, tab2, tab3)) auto_merge(tab1, tab2, tab3, by = c("countries"="Nation|COUNTRY", "sector"="[Ii]ndustry")) auto_merge(tab1, tab2, tab3, country_to = "UN_fr")
Check if the connection to REST Countries API is working. The function checks if the user has an internet connection and if any answer is returned from the Countries REST API.
check_countries_api(warnings = TRUE, timeout = 4)
check_countries_api(warnings = TRUE, timeout = 4)
warnings |
Logical value indicating whether to output a warning when there is no connection. Default is |
timeout |
Numeric value giving the timeout in seconds for attempting connection to the API. Default is |
Returns a logical value: TRUE
if there is a connection, FALSE
if there is no connection.
check_countries_api()
check_countries_api()
The function looks for country names or year information in the column names. This function is designed for simple panel country data, in which countries' time series are arranged side by side on columns or stacked on rows. The function will only return year/country column names if at least 3 country/year column names are detected.
check_wide_format(x, adjacency = TRUE)
check_wide_format(x, adjacency = TRUE)
x |
A dataframe |
adjacency |
Logical value indicating whether column names containing country or year information need to be adjacent to each other. Default is |
Returns a data.frame identifying the columns names that contain country or year information.
find_keycol, find_countrycol, find_timecol
example <- data.frame(Year=2000:2010, China=0:10, US=10:20, Vietnam=30:40) check_wide_format(x=example)
example <- data.frame(Year=2000:2010, China=0:10, US=10:20, Vietnam=30:40) check_wide_format(x=example)
This function is an interface for REST Countries API. It allows to request and download information about countries, such as: currency, capital city, language spoken, flag, neighbouring countries, and much more. NOTE: Internet access is needed to download information from the API. At times the API may be unstable or slow to respond.
country_info( countries = NULL, fields = NULL, fuzzy_match = TRUE, match_info = FALSE, collapse = TRUE )
country_info( countries = NULL, fields = NULL, fuzzy_match = TRUE, match_info = FALSE, collapse = TRUE )
countries |
A vector of countries for which we wish to download information. The function also supports fuzzy matching capabilities to facilitate querying. Information is only returned for the 249 countries in the ISO standard |
fields |
Character vector indicating the fields to query. A description of the accepted fields can be found here. Alternatively, a list of accepted field names can be obtained with the function |
fuzzy_match |
Logical value indicating whether to allow fuzzy matching of country names. Default is |
match_info |
Logical value indicating whether to return information on country names matched to each input in |
collapse |
Logical value indicating whether to collapse multiple columns relating to a same field together. Default is |
Returns the requested information about the countries in a table. The rows of the table correspond to entries in countries
, columns correspond to requested fields
.
list_fields, check_countries_api
# Run examples only if a connection to the API is available: if (check_countries_api(warnings = FALSE)){ # The example below queries information on the currency used in Brazil, US and France: info <- country_info(countries = "Brazil", fields = "capital") # data for multiple countries can be requested info <- country_info(countries = c("Brazil", "USA", "FR"), fields = "capital") #' # Data can be returned for all countries by leaving - countries - empty info <- country_info(fields = "capital") # All available fields can be requested by leaving fields empty info <- country_info(countries = c("Brazil", "USA", "FR")) # All information for all countries can be downloaded by leaving both arguments empty info <- country_info() }
# Run examples only if a connection to the API is available: if (check_countries_api(warnings = FALSE)){ # The example below queries information on the currency used in Brazil, US and France: info <- country_info(countries = "Brazil", fields = "capital") # data for multiple countries can be requested info <- country_info(countries = c("Brazil", "USA", "FR"), fields = "capital") #' # Data can be returned for all countries by leaving - countries - empty info <- country_info(fields = "capital") # All available fields can be requested by leaving fields empty info <- country_info(countries = c("Brazil", "USA", "FR")) # All information for all countries can be downloaded by leaving both arguments empty info <- country_info() }
This function recognises and converts country names to different nomenclatures and languages using a fuzzy matching algorithm.
country_name()
can identify countries even when they are provided in mixed formats or in different languages. It is robust to small misspellings and recognises many alternative country names and old nomenclatures.
country_name( x, to = "ISO3", fuzzy_match = TRUE, verbose = FALSE, simplify = TRUE, poor_matches = FALSE, na_fill = FALSE, custom_table = NULL )
country_name( x, to = "ISO3", fuzzy_match = TRUE, verbose = FALSE, simplify = TRUE, poor_matches = FALSE, na_fill = FALSE, custom_table = NULL )
x |
A vector of country names |
to |
A string containing the desired naming conventions to which |
fuzzy_match |
Logical value indicating whether fuzzy matching of country names should be allowed ( |
verbose |
Logical value indicating whether the function should print to the console a full report. Default is |
simplify |
Logical value. If set to |
poor_matches |
Logical value. If set to |
na_fill |
Logical value. If set to |
custom_table |
Custom conversion table to be used. This needs to be a |
Returns a vector of converted country names. If multiple nomenclatures are passed to the argument to
, the vectors are arranged in a data frame. If simplify=FALSE
, the function will return a list object.
is_country, match_table, find_countrycol
#Convert country names to a single nomenclatures: (e.g. 3-letters ISO code) country_name(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= "ISO3") #When multiple arguments are provided to the - to - argument, a data frame is returned: country_name(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= c("UN_en","UN_fr","ISO3")) #This function can also be used to translate country names: (e.g. translating all to Chinese) country_name(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= "name_zh")
#Convert country names to a single nomenclatures: (e.g. 3-letters ISO code) country_name(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= "ISO3") #When multiple arguments are provided to the - to - argument, a data frame is returned: country_name(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= c("UN_en","UN_fr","ISO3")) #This function can also be used to translate country names: (e.g. translating all to Chinese) country_name(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= "name_zh")
A table containing country names in different naming conventions
country_reference_list
country_reference_list
A data frame with columns corresponding to different country naming conventions.
Reference name for the geographic unit. The names in this column contain only ASCII characters. This nomenclature is available for all countries.
3-letter country codes as defined in ISO standard 3166-1 alpha-3
. This nomenclature is available for the territories in the standard (currently 249 territories).
2-letter country codes as defined in ISO standard 3166-1 alpha-2
. This nomenclature is available for the territories in the standard (currently 249 territories).
Numeric country codes as defined in ISO standard 3166-1 numeric
. This country code is the same as the UN's country number (M49 standard). This nomenclature is available for the territories in the ISO standard (currently 249 countries).
Official UN name in Arabic. This nomenclature is only available for countries in the M49 standard (currently 249 territories).
Official UN name in Chinese. This nomenclature is only available for countries in the M49 standard (currently 249 territories).
Official UN name in English. This nomenclature is only available for countries in the M49 standard (currently 249 territories).
Official UN name in French. This nomenclature is only available for countries in the M49 standard (currently 249 territories).
Official UN name in Spanish. This nomenclature is only available for countries in the M49 standard (currently 249 territories).
Official UN name in Russian. This nomenclature is only available for countries in the M49 standard (currently 249 territories).
Official WTO name in English. This nomenclature is only available for WTO members and observers (currently 189 entities).
Official WTO name in French. This nomenclature is only available for WTO members and observers (currently 189 entities).
Official WTO name in Spanish. This nomenclature is only available for WTO members and observers (currently 189 entities).
Translation of ISO country names in Arabic. (currently 249 territories)
Translation of ISO country names in Bulgarian. (currently 249 territories)
Translation of ISO country names in Czech. (currently 249 territories)
Translation of ISO country names in Danish. (currently 249 territories)
Translation of ISO country names in German. (currently 249 territories)
Translation of ISO country names in Greek. (currently 249 territories)
Translation of ISO country names in English. (currently 249 territories)
Translation of ISO country names in Spanish. (currently 249 territories)
Translation of ISO country names in Estonian. (currently 249 territories)
Translation of ISO country names in Basque. (currently 249 territories)
Translation of ISO country names in Finnish. (currently 249 territories)
Translation of ISO country names in French. (currently 249 territories)
Translation of ISO country names in Hungarian. (currently 249 territories)
Translation of ISO country names in Italian. (currently 249 territories)
Translation of ISO country names in Japanese. (currently 249 territories)
Translation of ISO country names in Korean. (currently 249 territories)
Translation of ISO country names in Lithuanian. (currently 249 territories)
Translation of ISO country names in Dutch. (currently 249 territories)
Translation of ISO country names in Norwegian. (currently 249 territories)
Translation of ISO country names in Polish. (currently 249 territories)
Translation of ISO country names in Portuguese. (currently 249 territories)
Translation of ISO country names in Romanian. (currently 249 territories)
Translation of ISO country names in Russian. (currently 249 territories)
Translation of ISO country names in Slovak. (currently 249 territories)
Translation of ISO country names in Swedish. (currently 249 territories)
Translation of ISO country names in Thai. (currently 249 territories)
Translation of ISO country names in Ukranian. (currently 249 territories)
Translation of ISO country names in simplified Chinese. (currently 249 territories)
Translation of ISO country names in traditional Chinese. (currently 249 territories)
GTAP country and region codes.
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
Other variants of the country name included to improve the matching process
A table containing country names in different naming conventions
country_reference_list_long
country_reference_list_long
A data frame with three columns providing information on country naming conventions. This table is a long-format version of "country_reference_list".
Numeric value that uniquely identifies entity. This corresponds to the row number in table "country_reference_list".
Country naming convention (e.g. UN english, ISO 3-digit code, etc.).
Country names
This function takes a data frame as argument and returns the column name (or index) of all columns containing country names.
It can be used to automate the search of country columns in data frames.
For the purpose of this function, a country is any of the 249 territories designated in the ISO standard 3166
.
On large datasets a random sample is used for evaluating the columns.
find_countrycol( x, return_index = FALSE, allow_NA = TRUE, min_share = 0.8, sample_size = 1000 )
find_countrycol( x, return_index = FALSE, allow_NA = TRUE, min_share = 0.8, sample_size = 1000 )
x |
A data frame object |
return_index |
A logical value indicating whether the function should return the index of country columns instead of the column names. Default is |
allow_NA |
Logical value indicating whether columns containing |
min_share |
A value between |
sample_size |
Either |
Returns a vector of country names (return_index=FALSE
) or column indices (return_index=TRUE
) of columns containing country names.
is_country, country_name, find_keycol, find_timecol
find_countrycol(x=data.frame(a=c("Brésil","Tonga","FRA"), b=c(1,2,3)))
find_countrycol(x=data.frame(a=c("Brésil","Tonga","FRA"), b=c(1,2,3)))
This function takes a data frame as argument and returns the column names (or indices) of a set of columns that uniquely identify the table entries (i.e. table key). It can be used to automate the search of table keys. Since the function was designed for country data, it will first search for columns containing country names and dates/years. These columns will be given priority in the search for keys. Next, the function prioritises left-most columns in the table. For time efficiency, the function does not test all possible combination of columns, it just tests the most likely combinations. The function will look for the most common country data formats (e.g. cross-sectional, time-series, panel data, dyadic, etc.) and searches for up to 2 additional key columns beyond country and time columns.
find_keycol( x, return_index = FALSE, search_only = NA, sample_size = 1000, allow_NA = FALSE )
find_keycol( x, return_index = FALSE, search_only = NA, sample_size = 1000, allow_NA = FALSE )
x |
A data frame object |
return_index |
A logical value indicating whether the function should return the index of country columns instead of the column names. Default is |
search_only |
This parameter can be used to restrict the search of table keys to a subset of columns. The default is |
sample_size |
Either |
allow_NA |
Logical value indicating whether to allow key columns to have |
Returns a vector of column names (or indices) that uniquely identify the entries in the table. If no key is found, the function will return NULL
. The output is a named vector indicating whether the identified key columns contain country names ("country"
), year and dates ("time"
), or other type of information ("other"
).
find_timecol, find_countrycol, is_keycol
example <-data.frame(nation=rep(c("FRA","ALB","JOR"),3), year=c(rep(2000,3),rep(2005,3),rep(2010,3)), var=runif(9)) find_keycol(x=example)
example <-data.frame(nation=rep(c("FRA","ALB","JOR"),3), year=c(rep(2000,3),rep(2005,3),rep(2010,3)), var=runif(9)) find_keycol(x=example)
This function takes a data frame as argument and returns the column names (or indices) of all columns containing dates and the most likely column containing year information, if any. It can be used to automate the search of date and year columns in data frames.
find_timecol(x, return_index = FALSE, allow_NA = TRUE, sample_size = 1000)
find_timecol(x, return_index = FALSE, allow_NA = TRUE, sample_size = 1000)
x |
A data frame object |
return_index |
A logical value indicating whether the function should return the index of time columns instead of the column names. Default is |
allow_NA |
Logical value indicating whether to allow time columns to contain |
sample_size |
Either |
Returns a vector of names (return_index=FALSE
) or indices (return_index=TRUE
) of columns containing date or year information. Only the most likely year column is returned.
find_timecol(x=data.frame(a=1970:2020, year=1970:2020, b=rep("2020-01-01",51),c=sample(1:1000,51)))
find_timecol(x=data.frame(a=1970:2020, year=1970:2020, b=rep("2020-01-01",51),c=sample(1:1000,51)))
This function checks whether the string is a country name. It supports different languages and naming conventions.
The function returns TRUE
if it relates to one of the 249 countries currently in the ISO standard 3166
.
Alternatively, the argument check_for
allows to narrow down the test to a subset of countries.
Fuzzy matching can be used to allow a small margin of error in the string.
is_country(x, check_for = NULL, fuzzy_match = FALSE)
is_country(x, check_for = NULL, fuzzy_match = FALSE)
x |
A character vector to be tested (also supports UN/ISO country codes) |
check_for |
A vector of country names to narrow down testing. The function will return |
fuzzy_match |
A logical value indicating whether to tolerate small discrepancies in the country name matching. The default and fastest option is |
Returns a logical vector indicating whether the string is a country name
match_table, country_name, find_countrycol
#Detect strings that are country names is_country(x=c("ITA","Estados Unidos","Estado Unidos","bungalow","dog",542), fuzzy_match=FALSE) is_country(x=c("ITA","Estados Unidos","Estado Unidos","bungalow","dog",542), fuzzy_match=TRUE) #Checking for a specific subset of countries is_country(x=c("Ceylon","LKA","Indonesia","Inde"), check_for=c("India","Sri Lanka"))
#Detect strings that are country names is_country(x=c("ITA","Estados Unidos","Estado Unidos","bungalow","dog",542), fuzzy_match=FALSE) is_country(x=c("ITA","Estados Unidos","Estado Unidos","bungalow","dog",542), fuzzy_match=TRUE) #Checking for a specific subset of countries is_country(x=c("Ceylon","LKA","Indonesia","Inde"), check_for=c("India","Sri Lanka"))
This function checks if a value is a date by attempting to convert it to a date format. The user can specify which date formats should be tested with the argument formats
.
is_date( x, formats = c("%Y-%m-%d", "%y-%m-%d", "%m-%d-%Y", "%m-%d-%y", "%d-%m-%Y", "%d-%m-%y", "%Y/%m/%d", "%y/%m/%d", "%m/%d/%Y", "%m/%d/%y", "%d/%m/%Y", "%d/%m/%y", "%Y.%m.%d", "%y.%m.%d", "%m.%d.%Y", "%m.%d.%y", "%d.%m.%Y", "%d.%m.%y", "%d %b %Y", "%d %B %Y", "%b %d %Y", "%B %d %Y", "%b %d, %Y", "%B %d, %Y", "%d%b%Y", "%d%B%Y", "%Y%B%d", "%Y%b%d", "%b %Y", "%B %Y", "%b %y", "%B %y", "%m-%Y", "%Y-%m", "%m/%Y", "%Y/%m") )
is_date( x, formats = c("%Y-%m-%d", "%y-%m-%d", "%m-%d-%Y", "%m-%d-%y", "%d-%m-%Y", "%d-%m-%y", "%Y/%m/%d", "%y/%m/%d", "%m/%d/%Y", "%m/%d/%y", "%d/%m/%Y", "%d/%m/%y", "%Y.%m.%d", "%y.%m.%d", "%m.%d.%Y", "%m.%d.%y", "%d.%m.%Y", "%d.%m.%y", "%d %b %Y", "%d %B %Y", "%b %d %Y", "%B %d %Y", "%b %d, %Y", "%B %d, %Y", "%d%b%Y", "%d%B%Y", "%Y%B%d", "%Y%b%d", "%b %Y", "%B %Y", "%b %y", "%B %y", "%m-%Y", "%Y-%m", "%m/%Y", "%Y/%m") )
x |
A vector of values to be tested |
formats |
Date formats to be checked for (expressed in R date notation). |
Returns a logical vector indicating whether the values can be converted to any of the date formats provided. Notice that unless specified, the default allowed formats do not include simple year numbers (e.g. 2022 or 1993) because number vectors could wrongly be identified as dates. Also, notice that testing NA
values will return FALSE
.
find_timecol, find_keycol, is_country
is_date(c("2020-01-01","test",2020,"March 2030"))
is_date(c("2020-01-01","test",2020,"March 2030"))
This function takes a data frame and a vector of column names as argument and returns a logical value indicating whether the indicated columns uniquely identify entries in the data frame.
If the output is TRUE
, the indicated columns could be the keys of the table.
is_keycol(x, cols, allow_NA = FALSE, verbose = TRUE)
is_keycol(x, cols, allow_NA = FALSE, verbose = TRUE)
x |
A data frame object |
cols |
A vector of column names or indices to be tested. |
allow_NA |
Logical value indicating whether to allow key columns to have |
verbose |
Logical value indicating whether messages should be printed on the console. Default is |
Returns a logical value. If TRUE
, the columns indicated in colnames
uniquely identify the entries in x
.
find_keycol, find_countrycol, find_timecol
is_keycol(data.frame(a=1:10,b=sample(c("a","b","c"),10, replace=TRUE)), cols="a") is_keycol(data.frame(a=1:10,b=sample(c("a","b","c"),10, replace=TRUE)), cols="b") is_keycol( data.frame(a=c(1:5,1:5), b=sample(c("a","b","c"),10, replace=TRUE), c=c(rep("a",5),rep("b",5))), cols=c("a","c"))
is_keycol(data.frame(a=1:10,b=sample(c("a","b","c"),10, replace=TRUE)), cols="a") is_keycol(data.frame(a=1:10,b=sample(c("a","b","c"),10, replace=TRUE)), cols="b") is_keycol( data.frame(a=c(1:5,1:5), b=sample(c("a","b","c"),10, replace=TRUE), c=c(rep("a",5),rep("b",5))), cols=c("a","c"))
This function returns a vector of country names in different nomenclatures.
list_countries(nomenclature = "name_en")
list_countries(nomenclature = "name_en")
nomenclature |
String indicating the nomenclature from which the list of countries should be taken. Not all countries are present in all nomenclatures, for example Taiwan is not recognised by the UN, so it will not be returned with |
A vector of country names in the desired nomenclature.
random_countries, country_name
list_countries("ISO3") list_countries("UN_en") list_countries()
list_countries("ISO3") list_countries("UN_en") list_countries()
This function queries REST Countries API and returns a list of all possible fields that can be used in the function country_info
.
NOTE: Internet access is needed to download information from the API.
list_fields()
list_fields()
A vector of accepted fields for the function country_info()
# Run example only if a connection to the API is available if (check_countries_api(warnings = FALSE)){ list_fields() }
# Run example only if a connection to the API is available if (check_countries_api(warnings = FALSE)){ list_fields() }
This function returns a conversion table for country names to the desired naming conventions and languages. The use of fuzzy matching allows more flexibility in recognising and identifying country names.
match_table( x, to = c("simple", "ISO3"), fuzzy_match = TRUE, verbose = FALSE, matching_info = FALSE, simplify = TRUE, na_fill = FALSE, poor_matches = TRUE, custom_table = NULL )
match_table( x, to = c("simple", "ISO3"), fuzzy_match = TRUE, verbose = FALSE, matching_info = FALSE, simplify = TRUE, na_fill = FALSE, poor_matches = TRUE, custom_table = NULL )
x |
A vector of country names |
to |
A vector containing one or more desired naming conventions to which |
fuzzy_match |
Logical value indicating whether fuzzy matching of country names should be allowed ( |
verbose |
Logical value indicating whether the function should print to the console a report on the matching process. Default is |
matching_info |
Logical value. If set to true the output match table will include additional information on the matching of |
simplify |
Logical value. If set to |
na_fill |
Logical value. If set to |
poor_matches |
Logical value. If set to |
custom_table |
Custom conversion table to be used. This needs to be a data.frame object. Default is |
Returns a conversion table for countries names to the desired naming conventions. If simplify=FALSE
it returns a list object.
match_table(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= c("UN_en","ISO3"))
match_table(x=c("UK","Estados Unidos","Zaire","C#te d^ivoire"), to= c("UN_en","ISO3"))
This function returns the mode of vectors. That is to say, for any given vector of values, it returns the value that appears most frequently.
The function works with strings, numerical and mixed inputs. NA
values are treated as distinct values.
Mode(x, na.rm = FALSE, first_only = FALSE)
Mode(x, na.rm = FALSE, first_only = FALSE)
x |
A vector |
na.rm |
Logical value indicating whether |
first_only |
Logical value indicating whether only the first mode should be returned if |
Returns the mode of the vector x
countries::Mode(c("a","a",2,3)) countries::Mode(c(1,1,2,3,NA,2)) countries::Mode(c(NA,NA,NA,1,1,2))
countries::Mode(c("a","a",2,3)) countries::Mode(c(1,1,2,3,NA,2)) countries::Mode(c(NA,NA,NA,1,1,2))
This function provides access to the discrete colours palettes used in this packages' 11 themes.
palettes_countries(n, theme = 1, reverse = FALSE)
palettes_countries(n, theme = 1, reverse = FALSE)
n |
Number of desired colours |
theme |
A numeric value or name identifying the theme's colours. Can be a number between 1 and 11, or one of the theme's names: |
reverse |
Logical value indicating whether to reverse the order of the palette's colours Default is FALSE. |
Returns n colours from the requested theme
palettes_countries(5, theme = 1)
palettes_countries(5, theme = 1)
quick_map()
allows to plot country chloropleth maps with one line of code.
The only inputs required are a data.frame
object and the name of the column to plot.
The function uses country_name()
's capabilities to automatically match country names to one of the territories in the ISO standard 3166-1. This allows fuzzy matching of country names in multiple languages and nomenclatures.
For some map examples, see this article.
quick_map( data, plot_col, theme = 1, zoom = "Default", verbose = FALSE, save_to = NULL, width_plot = 30, name_legend = NULL, reverse_palette = FALSE, col_breaks = NULL, col_border = "black", col_na = "grey97", width_border = 0.1 )
quick_map( data, plot_col, theme = 1, zoom = "Default", verbose = FALSE, save_to = NULL, width_plot = 30, name_legend = NULL, reverse_palette = FALSE, col_breaks = NULL, col_border = "black", col_na = "grey97", width_border = 0.1 )
data |
Table (data.frame) containing the data to plot. Each row in the table should correspond to a country. One of the columns should contain country names. |
plot_col |
Name of the column to plot. |
theme |
A numeric value or name identifying one of the predefined visual themes for the map. Can be a number between 1 and 11, or one of the predefined theme's names: |
zoom |
This argument defines the zoom applied to the map. It can be either a string identifying one of the predefined zoom boxes ( |
verbose |
Logical value indicating whether to print messages to the console. Default is |
save_to |
Path to the file where the plot is to be saved. This need to be in an existing directory. The default is |
width_plot |
Width (in cm) when plot is saved to a file. The ratio between height and width is fixed. This argument is only relevant if |
name_legend |
String giving the name to be used for the plotted variable in the legend of the map. If nothing is provided, the default is to use the name in |
reverse_palette |
Logical value indicating whether to reverse the order of the colours in the palette. Default is |
col_breaks |
Only relevant for numeric data. This argument allows the user to provide manual breaks for the colour scale. Needs to be a numeric vector ( |
col_border |
Colour of border line separating countries and landmasses. Default is |
col_na |
Colour for countries with missing data (NAs). Default is |
width_border |
Numeric value giving the width of the border lines between countries. Default is '0.1'. |
Good to know
quick_map()
only allows plotting of territories in the ISO standard 3166-1. It does not support plotting of other regions.
The output of the function is a ggplot object. This means means that users can then customise the look of the output by applying any of ggplot's methods.
Disclaimer
Territories' borders and shapes are intended for illustrative purpose. They might be outdated and do not imply the expression of any opinion on the part of the package developers.
ggplot object
# creating some sample data to plot example_data <- data.frame(country = random_countries(100), population = runif(100)) # make a map quick_map(example_data, "population") # The function provides several predefined themes quick_map(example_data, "population", theme = 3) quick_map(example_data, "population", theme = "Reds") # provide breaks for the colour scale quick_map(example_data, "population", col_breaks = c(0, 1e5, 1e6, 1e7, 1e8, 1e9))
# creating some sample data to plot example_data <- data.frame(country = random_countries(100), population = runif(100)) # make a map quick_map(example_data, "population") # The function provides several predefined themes quick_map(example_data, "population", theme = 3) quick_map(example_data, "population", theme = "Reds") # provide breaks for the colour scale quick_map(example_data, "population", col_breaks = c(0, 1e5, 1e6, 1e7, 1e8, 1e9))
This function returns the mode of vectors. That is to say, for any given vector of values, it returns the value that appears most frequently.
The function works with strings, numerical and mixed inputs. NA
values are treated as distinct values.
random_countries(n, replace = FALSE, nomenclature = "name_en", seed = NULL)
random_countries(n, replace = FALSE, nomenclature = "name_en", seed = NULL)
n |
Number of desired (pseudo)random country names. |
replace |
Logical value indicating whether sampling should be with replacement. |
nomenclature |
Nomenclature from which the list of countries should be taken. Not all countries are present in all nomenclature, for example Taiwan is not recognised by the UN, so it will not be returned with |
seed |
Single numerical value to be used as seed. |
A vector of n (pseudo)random country names.
random_countries(10) random_countries(n = 500, replace = TRUE) random_countries(n = 5, nomenclature = "ISO3", seed = 5)
random_countries(10) random_countries(n = 500, replace = TRUE) random_countries(n = 5, nomenclature = "ISO3", seed = 5)
These function return the position (index) of all the minimum, maximum, and mode values of the vector x
. which_min()
and which_max()
only support numeric and logical vectors.
These functions are identical to which.min()
and which.max()
, except that ALL minima/maxima are returned instead of only the first one.
which_min(x, first_only = FALSE) which_max(x, first_only = FALSE) which_mode(x, first_only = FALSE)
which_min(x, first_only = FALSE) which_max(x, first_only = FALSE) which_mode(x, first_only = FALSE)
x |
A numeric or vector |
first_only |
Logical value indicating whether only the first value should be returned (i.e. if |
Returns the position of the minimum, maximum and mode values of a vector x
which_mode(c("a","a",2,3)) which_min(c(1,1,2,3,NA,2)) which_max(c(NA,NA,NA,1,1,2))
which_mode(c("a","a",2,3)) which_min(c(1,1,2,3,NA,2)) which_max(c(NA,NA,NA,1,1,2))
A table containing points to draw a world map. The data comes from the package maps ("world") An additional column is added with ISO 3-digit country codes.
world
world
A data frame with six columns providing information to plot world maps.
Longitude
Latitude
Numeric value used to identify polygons
Order in which lines should be traced
Name of the polygon's geographic region
3-digits ISO country code for the region