amerifluxr is a programmatic interface to the AmeriFlux. This vignette demonstrates examples to import data and metadata downloaded from AmeriFlux, and to parse and clean data for further use. A companion vignette for site selection is available as well.
AmeriFlux data and metadata can be downloaded using amf_download_base() and amf_download_bif(). Users will need to create a personal AmeriFlux account here before download.
The following downloads AmeriFlux flux/met data (aka BASE data product) from a single site: US-CRT.
## When running, replace user_id and user_email with a real AmeriFlux account
floc2 <- amf_download_base(
user_id = "my_user",
user_email = "my_email@mail.com",
site_id = "US-CRT",
data_product = "BASE-BADM",
data_policy = "CCBY4.0",
agree_policy = TRUE,
intended_use = "other",
intended_use_text = "amerifluxr package demonstration",
verbose = TRUE,
out_dir = tempdir()
)
The downloaded file is a zipped file saved in tempdir() (e.g., AMF_{SITE_ID}_BASE-BADM_{VERSION}.zip), which contains a BASE data file (e.g., AMF_{SITE_ID}_BASE_{RESOLUTION}_{VERSION}.csv, RESOLUTION = HH (half-hourly) or HR (hourly)) and a metadata file (aka BADM data product, e.g., AMF_{SITE_ID}_BIF_{VERSION}.xlsx). The amf_download_base() also returns the file path to the downloaded file, which can be used later to read the file into R.
The following downloads a single file containing all AmeriFlux sites’ metadata (i.e., BADM data product) for sites under the CC-BY-4.0 data use policy.
## When running, replace user_id and user_email with a real AmeriFlux account
floc1 <- amf_download_bif(
user_id = "my_user",
user_email = "my_email@mail.com",
data_policy = "CCBY4.0",
agree_policy = TRUE,
intended_use = "other",
intended_use_text = "amerifluxr package demonstration",
out_dir = tempdir(),
verbose = TRUE,
site_w_data = TRUE
)
The downloaded file is a Excel file saved to tempdir() (e.g., AMF_{SITES}_BIF_{POLICY}_{VERSION}.xlsx, SITES = AA-Net (all registered sites) or AA-Flx (all sites with flux/met data available); POLICY = CCBY4 (shared under AmeriFlux CC-BY-4.0 data use policy) or LEGACY (shared under AmeriFlux Legacy data use policy)). Similarly, the amf_download_bif() also returns the file path to the downloaded file, which can be used later to read the file into R.
For this vignette, we will use following example data files [files are truncated to limit package size, for demonstration purposes only].
# An example of BASE zipped files downloaded for US-CRT site
floc2 <- system.file("extdata", "AMF_US-CRT_BASE-BADM_2-5.zip", package = "amerifluxr")
# An example of unzipped BASE files from the above zipped file
floc3 <- system.file("extdata", "AMF_US-CRT_BASE_HH_2-5.csv", package = "amerifluxr")
# An example of all sites' BADM data
floc1 <- system.file("extdata", "AMF_AA-Flx_BIF_CCBY4_20201218.xlsx", package = "amerifluxr")
The amd_read_base() imports a BASE file, either from a zipped file or an unzipped comma-separated file (.csv). The parse_timestamp parameter can be used if additional time-keeping columns (e.g., year, month, day, hour) are desired.
# read the BASE from a zip file, without additional parsed time-keeping columns
base1 <- amf_read_base(
file = floc2,
unzip = TRUE,
parse_timestamp = FALSE
)
pander::pandoc.table(base1[c(1:3),])
TIMESTAMP_START | TIMESTAMP_END | CO2 | H2O | FC | NEE_PI | CH4 | FCH4 | H |
---|---|---|---|---|---|---|---|---|
2.011e+11 | 2.011e+11 | NA | NA | NA | NA | NA | NA | NA |
2.011e+11 | 2.011e+11 | NA | NA | NA | NA | NA | NA | NA |
2.011e+11 | 2.011e+11 | NA | NA | NA | NA | NA | NA | NA |
LE | G_1_1_1 | G_2_1_1 | WD | WS | USTAR | ZL | MO_LENGTH | W_SIGMA | V_SIGMA |
---|---|---|---|---|---|---|---|---|---|
NA | 27.45 | 37.32 | NA | NA | NA | NA | NA | NA | NA |
NA | 26.92 | 34.11 | NA | NA | NA | NA | NA | NA | NA |
NA | 33.71 | 41.3 | NA | NA | NA | NA | NA | NA | NA |
U_SIGMA | T_SONIC | T_SONIC_SIGMA | PA | RH | TA | TS_1_1_1 | TS_2_1_1 |
---|---|---|---|---|---|---|---|
NA | NA | NA | NA | 92.34 | 11.18 | 3.468 | 3.131 |
NA | NA | NA | NA | 87.63 | 11.69 | 3.606 | 3.155 |
NA | NA | NA | NA | 83.31 | 12.37 | 3.727 | 3.277 |
WTD | SWC | NETRAD | PPFD_IN | SW_IN | SW_OUT | LW_IN | LW_OUT | P |
---|---|---|---|---|---|---|---|---|
-0.9133 | 45.13 | 7.064 | 0 | 0 | 0 | 368.5 | 360.6 | 0 |
-0.8939 | 45.05 | 2.375 | 0 | 0 | 0 | 365.8 | 361.4 | 0.254 |
-0.8885 | 45 | 2.923 | 0 | 0 | 2.812 | 369.7 | 365.2 | 0 |
# read the BASE from a csv file, with additional parsed time-keeping columns
base2 <- amf_read_base(
file = floc3,
unzip = FALSE,
parse_timestamp = TRUE
)
pander::pandoc.table(base2[c(1:3), c(1:10)])
YEAR | MONTH | DAY | DOY | HOUR | MINUTE | TIMESTAMP |
---|---|---|---|---|---|---|
2011 | 1 | 1 | 1 | 0 | 15 | 2011-01-01 00:15:00 |
2011 | 1 | 1 | 1 | 0 | 45 | 2011-01-01 00:45:00 |
2011 | 1 | 1 | 1 | 1 | 15 | 2011-01-01 01:15:00 |
TIMESTAMP_START | TIMESTAMP_END | CO2 |
---|---|---|
2.011e+11 | 2.011e+11 | NA |
2.011e+11 | 2.011e+11 | NA |
2.011e+11 | 2.011e+11 | NA |
The details of the BASE data product’s format and variable definitions can be found on AmeriFlux website. In short, the BASE data product contains flux, meteorological, and soil observations that are reported at regular intervals of time, generally half-hourly or hourly, for a certain time period. TIMESTAMP_START and TIMESTAMP_END columns (i.e., YYYYMMDDHHMM 12 digits) denote the starting and ending time of each reporting interval (i.e., row).
All other variables use the format of {base name}_{qualifier}, e.g., FC_1, CO2_1_1_1. Base names indicate fundamental quantities that are either measured or calculated / derived. Qualifiers are suffixes appended to variable base names that provide additional information (e.g., gap-filling, position) about the variable. In some cases, qualifiers are omitted if only one variable is provided for a site.
The amf_variable() retrieves the latest list of base names and default units. For sites that have relatively fewer variables and less complicated qualifiers, the users could easily interpret variables and qualifiers. The amf_variable() also returns the expected maximal and minimal values based on physically plausible ranges or network reported values.
# get a list of latest base names and units.
FP_ls <- amf_variables()
pander::pandoc.table(FP_ls[c(11:20), ])
Name | Description | Units | Min | |
---|---|---|---|---|
8 | DBH | Diameter of tree measured at breast height (1.3m) with continuous dendrometers | cm | 0 |
9 | LEAF_WET | Leaf wetness, range 0-100 | % | 0 |
10 | SAP_DT | Difference of probes temperature for sapflow measurements | deg C | -10 |
11 | SAP_FLOW | Sap flow | mmolH2O m-2 s-1 | NA |
12 | T_BOLE | Bole temperature | deg C | -50 |
13 | T_CANOPY | Temperature of the canopy and/or surface underneath the sensor | deg C | -50 |
14 | FETCH_70 | Distance at which cross-wind integrated footprint cumulative probability is 70% | m | 0 |
15 | FETCH_80 | Distance at which cross-wind integrated footprint cumulative probability is 80% | m | 0 |
16 | FETCH_90 | Distance at which cross-wind integrated footprint cumulative probability is 90% | m | 0 |
17 | FETCH_FILTER | Footprint quality flag (i.e., 0, 1): 0 and 1 indicate data measured when wind coming from direction that should be discarded and kept, respectively | nondimensional | 0 |
Max | |
---|---|
8 | 500 |
9 | 100 |
10 | 10 |
11 | NA |
12 | 70 |
13 | 70 |
14 | 10000 |
15 | 12000 |
16 | 15000 |
17 | 1 |
Alternatively, the amf_parse_basename() can programmatically parse the the variable names into base names and qualifiers. This function can be helpful for sites with many variables and relatively complicated qualifiers, as a prerequisite for handling data from many sites. The function returns a data frame with information about each variable’s base name, qualifier, and whether a variable is gap-filled, layer-aggregated, or replicate aggregated.
# parse the variable name
basename_decode <- amf_parse_basename(var_name = colnames(base1))
pander::pandoc.table(basename_decode[c(1, 2, 3, 4, 6, 11, 12),])
variable_name | basename | qualifier_gf | qualifier_pi | |
---|---|---|---|---|
1 | TIMESTAMP_START | TIMESTAMP_START | NA | NA |
2 | TIMESTAMP_END | TIMESTAMP_END | NA | NA |
3 | CO2 | CO2 | NA | NA |
4 | H2O | H2O | NA | NA |
6 | NEE_PI | NEE | NA | _PI |
11 | G_1_1_1 | G | NA | NA |
12 | G_2_1_1 | G | NA | NA |
qualifier_pos | qualifier_ag | layer_index | H_index | V_index | |
---|---|---|---|---|---|
1 | NA | NA | NA | NA | NA |
2 | NA | NA | NA | NA | NA |
3 | NA | NA | NA | NA | NA |
4 | NA | NA | NA | NA | NA |
6 | NA | NA | NA | NA | NA |
11 | _1_1_1 | NA | NA | 1 | 1 |
12 | _2_1_1 | NA | NA | 2 | 1 |
R_index | is_correct_basename | is_pi_provide | is_gapfill | is_fetch | |
---|---|---|---|---|---|
1 | NA | TRUE | FALSE | FALSE | FALSE |
2 | NA | TRUE | FALSE | FALSE | FALSE |
3 | NA | TRUE | FALSE | FALSE | FALSE |
4 | NA | TRUE | FALSE | FALSE | FALSE |
6 | NA | TRUE | TRUE | FALSE | FALSE |
11 | 1 | TRUE | FALSE | FALSE | FALSE |
12 | 1 | TRUE | FALSE | FALSE | FALSE |
is_layer_aggregated | is_layer_SD | is_layer_number | |
---|---|---|---|
1 | FALSE | FALSE | FALSE |
2 | FALSE | FALSE | FALSE |
3 | FALSE | FALSE | FALSE |
4 | FALSE | FALSE | FALSE |
6 | FALSE | FALSE | FALSE |
11 | FALSE | FALSE | FALSE |
12 | FALSE | FALSE | FALSE |
is_replicate_aggregated | is_replicate_SD | is_replicate_number | |
---|---|---|---|
1 | FALSE | FALSE | FALSE |
2 | FALSE | FALSE | FALSE |
3 | FALSE | FALSE | FALSE |
4 | FALSE | FALSE | FALSE |
6 | FALSE | FALSE | FALSE |
11 | FALSE | FALSE | FALSE |
12 | FALSE | FALSE | FALSE |
is_quadruplet | |
---|---|
1 | FALSE |
2 | FALSE |
3 | FALSE |
4 | FALSE |
6 | FALSE |
11 | TRUE |
12 | TRUE |
While BASE data products are quality-checked before release, the data may not be filtered for all outliers. The amf_filter_base() can be use to filter the data based on the expected physically ranges (i.e., obtained through amf_variables()). By default, a ±5% buffer is applied to account for possible edge values near the lower and upper bounds, which are commonly observed for certain variables like radiation, relative humidity, and snow depth.
# filter data, using default physical range +/- 5% buffer
base_f <- amf_filter_base(data_in = base1)
Measurement height information contains height/depth and instrument model information of the BASE data products. The info can be downloaded directly using the amf_var_info() function. The function returns a data frame for all available sites, and can be subset using the “Site_ID” column. The “Height” column refers to the distance from the ground surface in meters. Positive values are heights, and negative values are depths. See the web page for explanation.
# obtain the latest measurement height information
var_info <- amf_var_info()
# subset the variable by target Site ID
var_info <- var_info[var_info$Site_ID == "US-CRT", ]
pander::pandoc.table(var_info[c(1:10), ])
Site_ID | Variable | Start_Date | Height | Instrument_Model | |
---|---|---|---|---|---|
5150 | US-CRT | CH4 | NA | 2 | GA_OP-LI-COR LI-7700 |
5151 | US-CRT | CO2 | NA | 2 | GA_OP-LI-COR LI-7500 |
5152 | US-CRT | FC | NA | 2 | GA_OP-LI-COR LI-7500 |
5153 | US-CRT | FCH4 | NA | 2 | GA_OP-LI-COR LI-7700 |
5154 | US-CRT | G_1_1_1 | NA | -0.1 | SOIL_H-Plate |
5155 | US-CRT | G_2_1_1 | NA | -0.1 | SOIL_H-Plate |
5156 | US-CRT | H | NA | 2 | SA-Campbell CSAT-3 |
5157 | US-CRT | H2O | NA | 2 | GA_OP-LI-COR LI-7500 |
5158 | US-CRT | LE | NA | 2 | GA_OP-LI-COR LI-7500 |
5159 | US-CRT | LW_IN | NA | 2 | RAD-Pyrrad-SW+LW |
Instrument_Model2 | Comment | BASE_Version | |
---|---|---|---|
5150 | NA | NA | 5-5 |
5151 | NA | NA | 5-5 |
5152 | SA-Campbell CSAT-3 | NA | 5-5 |
5153 | SA-Campbell CSAT-3 | NA | 5-5 |
5154 | NA | NA | 5-5 |
5155 | NA | NA | 5-5 |
5156 | NA | NA | 5-5 |
5157 | NA | NA | 5-5 |
5158 | SA-Campbell CSAT-3 | NA | 5-5 |
5159 | NA | NA | 5-5 |
Biological, Ancillary, Disturbance, and Metadata (BADM) are non-continuous information that describe and complement continuous flux and meteorological data (e.g., BASE data product). BADM include general site description, metadata about the sensors and their setup, maintenance and disturbance events, and biological and ecological data that characterize a site’s ecosystem. See link for details.
The amf_read_bif() can be used to import the BADM data file. The function returns a data frame for all available sites, and can subset using the “SITE_ID” column.
# read the BADM BIF file, using an example data file
bif <- amf_read_bif(file = floc1)
# subset by target Site ID
bif <- bif[bif$SITE_ID == "US-CRT", ]
pander::pandoc.table(bif[c(1:15), ])
SITE_ID | GROUP_ID | VARIABLE_GROUP | VARIABLE |
---|---|---|---|
US-CRT | 12764 | GRP_ACKNOWLEDGEMENT | ACKNOWLEDGEMENT |
US-CRT | 12764 | GRP_ACKNOWLEDGEMENT | ACKNOWLEDGEMENT_COMMENT |
US-CRT | 12765 | GRP_CLIM_AVG | MAT |
US-CRT | 12765 | GRP_CLIM_AVG | MAP |
US-CRT | 12765 | GRP_CLIM_AVG | CLIMATE_KOEPPEN |
US-CRT | 27000537 | GRP_COUNTRY | COUNTRY |
US-CRT | 15683 | GRP_DOI | DOI |
US-CRT | 15683 | GRP_DOI | DOI_CITATION |
US-CRT | 15683 | GRP_DOI | DOI_DATAPRODUCT |
US-CRT | 88075 | GRP_DOI_CONTRIBUTOR | DOI_CONTRIBUTOR_DATAPRODUCT |
US-CRT | 88075 | GRP_DOI_CONTRIBUTOR | DOI_CONTRIBUTOR_NAME |
US-CRT | 88075 | GRP_DOI_CONTRIBUTOR | DOI_CONTRIBUTOR_ROLE |
US-CRT | 88075 | GRP_DOI_CONTRIBUTOR | DOI_CONTRIBUTOR_ORDINAL |
US-CRT | 88075 | GRP_DOI_CONTRIBUTOR | DOI_CONTRIBUTOR_EMAIL |
US-CRT | 88075 | GRP_DOI_CONTRIBUTOR | DOI_CONTRIBUTOR_INSTITUTION |
DATAVALUE |
---|
Supported by NOAA (NA10OAR4170224) & NSF (NSF1034791) |
Acknowledgement Walter Berger for fully support |
10.1 |
849 |
Dfa |
USA |
10.17190/AMF/1246156 |
Jiquan Chen, Housen Chu (2011-2013) AmeriFlux US-CRT Curtice Walter-Berger cropland, Dataset. https://doi.org/10.17190/AMF/1246156 |
AmeriFlux |
AmeriFlux |
Jiquan Chen |
Author |
1 |
jqchen@msu.edu |
University of Toledo / Michigan State University |
# get a list of all BADM variable groups and variables
unique(bif$VARIABLE_GROUP)
[1] “GRP_ACKNOWLEDGEMENT” “GRP_CLIM_AVG” “GRP_COUNTRY”
[4] “GRP_DOI” “GRP_DOI_CONTRIBUTOR” “GRP_DOI_ORGANIZATION” [7]
“GRP_DOM_DIST_MGMT” “GRP_FLUX_MEASUREMENTS” “GRP_HEADER”
[10] “GRP_HEIGHTC” “GRP_IGBP” “GRP_LAND_OWNERSHIP”
[13] “GRP_LOCATION” “GRP_NETWORK” “GRP_REFERENCE_PAPER”
[16] “GRP_SHIPPING_ADDRESS” “GRP_SITE_CHAR” “GRP_SITE_DESC”
[19] “GRP_SITE_FUNDING” “GRP_STATE” “GRP_TEAM_MEMBER”
[22] “GRP_TOWER_POWER” “GRP_TOWER_TYPE” “GRP_URL”
[25] “GRP_URL_AMERIFLUX” “GRP_UTC_OFFSET”
[1] 59
As shown above, BADM data contain information from a variety of variable groups (i.e., GRP_{BADM_GROUPS}). Browse the definitions of all available variable groups here.
To get the BADM data for a certain variable group, use amf_extract_badm() function. The function also renders the data format (i.e., display all variables by columns) for human readability.
# extract the FLUX_MEASUREMENTS group
bif_flux <- amf_extract_badm(bif_data = bif, select_group = "GRP_FLUX_MEASUREMENTS")
pander::pandoc.table(bif_flux)
GROUP_ID | SITE_ID | FLUX_MEASUREMENTS_METHOD | FLUX_MEASUREMENTS_VARIABLE |
---|---|---|---|
12767 | US-CRT | Eddy Covariance | CO2 |
12782 | US-CRT | Eddy Covariance | H2O |
12786 | US-CRT | Eddy Covariance | H |
12787 | US-CRT | Eddy Covariance | CH4 |
FLUX_MEASUREMENTS_DATE_START | FLUX_MEASUREMENTS_DATE_END |
---|---|
20110101 | 20131231 |
20110101 | 20131231 |
20110101 | 20131231 |
20110511 | 20120520 |
FLUX_MEASUREMENTS_OPERATIONS |
---|
Continuous operation |
Continuous operation |
Continuous operation |
Continuous operation |
# extract the HEIGHTC (canopy height) group
bif_hc <- amf_extract_badm(bif_data = bif, select_group = "GRP_HEIGHTC")
pander::pandoc.table(bif_hc)
GROUP_ID | SITE_ID | HEIGHTC | HEIGHTC_STATISTIC | HEIGHTC_DATE |
---|---|---|---|---|
88202 | US-CRT | 0.12 | Mean | 20121121 |
88203 | US-CRT | 1.125 | Mean | 20120928 |
88204 | US-CRT | 0.375 | Mean | 20120627 |
88205 | US-CRT | 0 | Mean | 20121002 |
88206 | US-CRT | 0.52 | Mean | 20130515 |
88207 | US-CRT | 0.15 | Mean | 20110705 |
88208 | US-CRT | 0 | Mean | 20121001 |
88209 | US-CRT | 0 | Mean | 20111024 |
88210 | US-CRT | 1.14 | Mean | 20130703 |
88211 | US-CRT | 0.35 | Mean | 20130716 |
88212 | US-CRT | 0.45 | Mean | 20110726 |
88213 | US-CRT | 0.975 | Mean | 20110819 |
88214 | US-CRT | 0 | Mean | 20120520 |
88215 | US-CRT | 0.975 | Mean | 20120820 |
88216 | US-CRT | 0.18 | Mean | 20121214 |
88217 | US-CRT | 0.18 | Mean | 20130104 |
88218 | US-CRT | 0.675 | Mean | 20110810 |
88219 | US-CRT | 0.72 | Mean | 20130605 |
88220 | US-CRT | 1.125 | Mean | 20120912 |
88221 | US-CRT | 0.3 | Mean | 20130422 |
88222 | US-CRT | 0.075 | Mean | 20110617 |
88223 | US-CRT | 0.18 | Mean | 20130212 |
88224 | US-CRT | 0.18 | Mean | 20130319 |
88225 | US-CRT | 0.225 | Mean | 20120612 |
88226 | US-CRT | 1 | Mean | 20130620 |
88227 | US-CRT | 1.125 | Mean | 20111007 |
88228 | US-CRT | 0.3 | Mean | 20130816 |
88229 | US-CRT | 0.35 | Mean | 20130725 |
88230 | US-CRT | 0.525 | Mean | 20120720 |
88231 | US-CRT | 0.18 | Mean | 20130124 |
88232 | US-CRT | 0 | Mean | 20110611 |
88233 | US-CRT | 0.06 | Mean | 20121022 |
Note: amf_extract_badm() returns all columns in characters. Certain groups of BADM variables contain columns of time stamps (i.e., ISO format) and data values, and need to be converted before further use.
# convert HEIGHTC_DATE to POSIXlt
bif_hc$TIMESTAMP <- strptime(bif_hc$HEIGHTC_DATE, format = "%Y%m%d", tz = "GMT")
# convert HEIGHTC column to numeric
bif_hc$HEIGHTC <- as.numeric(bif_hc$HEIGHTC)
# plot time series of canopy height
plot(bif_hc$TIMESTAMP, bif_hc$HEIGHTC, xlab = "TIMESTAMP", ylab = "canopy height (m)")
Last, the contacts of the site members and data DOI can be obtained from the BADM data. The AmeriFlux data policy requires proper attribution (e.g., data DOI).
In some case, for example, using data shared under Legacy Data Policy for publication, data users are required to contact data contributors directly, so that they have the opportunity to contribute substantively and become a co-author.
# get a list of contacts
bif_contact <- amf_extract_badm(bif_data = bif, select_group = "GRP_TEAM_MEMBER")
pander::pandoc.table(bif_contact)
GROUP_ID | SITE_ID | TEAM_MEMBER_NAME | TEAM_MEMBER_ROLE |
---|---|---|---|
12777 | US-CRT | Housen Chu | FluxContact |
12784 | US-CRT | Jiquan Chen | PI |
TEAM_MEMBER_EMAIL | TEAM_MEMBER_INSTITUTION | TEAM_MEMBER_ADDRESS |
---|---|---|
chu.housen@gmail.com | University of Toledo / University of California, Berkeley | NA |
jqchen@msu.edu | University of Toledo / Michigan State University | 202 Manly Miles Bldg. 1405 South Harrison Road Michigan State University, East Lansing, MI 48823 |
# get data DOI
bif_doi <- amf_extract_badm(bif_data = bif, select_group = "GRP_DOI")
pander::pandoc.table(bif_doi)
GROUP_ID | SITE_ID | DOI |
---|---|---|
15683 | US-CRT | 10.17190/AMF/1246156 |
DOI_CITATION | DOI_DATAPRODUCT |
---|---|
Jiquan Chen, Housen Chu (2011-2013) AmeriFlux US-CRT Curtice Walter-Berger cropland, Dataset. https://doi.org/10.17190/AMF/1246156 | AmeriFlux |