amerifluxr is a programmatic interface to the AmeriFlux. This vignette
demonstrates examples to query a list of target sites based on sites’
general information and availability of metadata and data. A companion
vignette for Data import is available as
well.
Get a site list with general info
AmeriFlux data are organized by individual sites. Typically, data
query begins with site search and selection. A full list of AmeriFlux
sites with general info can be obtained using the amf_site_info()
function.
Convert the site list to a data.table for easier manipulation. Also
see link for
variable definition.
Table continues below
AR-CCa |
Carlos Casares agriculture |
Argentina |
Buenos Aires |
CRO |
AR-CCg |
Carlos Casares grassland |
Argentina |
Buenos Aires |
GRA |
AR-Cel |
CELPA Mar Chiquita BA |
Argentina |
Buenos Aires |
WET |
Table continues below
-35.62 |
-61.32 |
83 |
Cfa |
16.1 |
1060 |
-35.92 |
-61.19 |
84 |
Cfa |
16.1 |
1060 |
-37.7 |
-57.42 |
1 |
Cfb |
14 |
926 |
LEGACY |
2012 |
2020 |
LEGACY |
2018 |
2020 |
LEGACY |
NA |
NA |
The site list provides a quick summary of all registered sites and
sites with available data.
It’s often important to understand the data use policy under which
the data are shared. In 2021, the AmeriFlux community moved to the
AmeriFlux CC-BY-4.0 License. Most site PIs now share their sites’ data
under the CC-BY-4.0 license. Data for some sites are shared under the
historical AmeriFlux data-sharing policy, now called the AmeriFlux
Legacy Data Policy.
Check link for
data use policy and attribution guidelines.
# total number of registered sites
pander::pandoc.table(sites_dt[, .N])
# total number of sites with available data
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N])
# get number of sites with available data, grouped by data use policy
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N, by = .(DATA_POLICY)])
Further group sites based on IGBP.
# get a summary table of sites grouped by IGBP
pander::pandoc.table(sites_dt[, .N, by = "IGBP"])
CRO |
129 |
GRA |
85 |
WET |
114 |
DNF |
1 |
EBF |
12 |
WSA |
12 |
MF |
16 |
ENF |
101 |
DBF |
59 |
OSH |
40 |
WAT |
10 |
CSH |
13 |
URB |
15 |
BSV |
7 |
CVM |
11 |
SAV |
8 |
SNO |
1 |
# get a summary table of sites with available data, & grouped by IGBP
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N, by = "IGBP"])
CRO |
85 |
GRA |
67 |
WET |
61 |
DNF |
1 |
WSA |
8 |
EBF |
8 |
ENF |
94 |
DBF |
55 |
MF |
12 |
OSH |
32 |
CSH |
11 |
BSV |
5 |
WAT |
8 |
CVM |
8 |
URB |
7 |
SAV |
6 |
SNO |
1 |
# get a summary table of sites with available data,
# & grouped by data use policy & IGBP
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N, by = .(IGBP, DATA_POLICY)][order(IGBP)])
BSV |
CCBY4.0 |
3 |
BSV |
LEGACY |
2 |
CRO |
LEGACY |
15 |
CRO |
CCBY4.0 |
70 |
CSH |
LEGACY |
5 |
CSH |
CCBY4.0 |
6 |
CVM |
CCBY4.0 |
8 |
DBF |
CCBY4.0 |
49 |
DBF |
LEGACY |
6 |
DNF |
CCBY4.0 |
1 |
EBF |
CCBY4.0 |
6 |
EBF |
LEGACY |
2 |
ENF |
CCBY4.0 |
78 |
ENF |
LEGACY |
16 |
GRA |
LEGACY |
14 |
GRA |
CCBY4.0 |
53 |
MF |
CCBY4.0 |
8 |
MF |
LEGACY |
4 |
OSH |
CCBY4.0 |
25 |
OSH |
LEGACY |
7 |
SAV |
CCBY4.0 |
6 |
SNO |
CCBY4.0 |
1 |
URB |
CCBY4.0 |
5 |
URB |
LEGACY |
2 |
WAT |
CCBY4.0 |
8 |
WET |
CCBY4.0 |
44 |
WET |
LEGACY |
17 |
WSA |
CCBY4.0 |
6 |
WSA |
LEGACY |
2 |
Once decided, users can query a target site list based on the desired
criteria, e.g., IGBP, data availability, data policy, geolocation.
# get a list of cropland and grassland sites with available data,
# shared under CC-BY-4.0 data policy,
# located within 30-50 degree N in latitude,
# returned a site list with site ID, name, data starting/ending year
crop_ls <- sites_dt[IGBP %in% c("CRO", "GRA") &
!is.na(DATA_START) &
LOCATION_LAT > 30 &
LOCATION_LAT < 50 &
DATA_POLICY == "CCBY4.0",
.(SITE_ID, SITE_NAME, DATA_START, DATA_END)]
pander::pandoc.table(crop_ls[c(1:10),])
CA-ER1 |
Elora Research Station |
2015 |
2021 |
US-A32 |
ARM-SGP Medford hay pasture |
2015 |
2017 |
US-A74 |
ARM SGP milo field |
2015 |
2017 |
US-AR1 |
ARM USDA UNL OSU Woodward Switchgrass 1 |
2009 |
2012 |
US-AR2 |
ARM USDA UNL OSU Woodward Switchgrass 2 |
2009 |
2012 |
US-ARb |
ARM Southern Great Plains burn site- Lamont |
2005 |
2006 |
US-ARc |
ARM Southern Great Plains control site- Lamont |
2005 |
2006 |
US-ARM |
ARM Southern Great Plains site- Lamont |
2003 |
2023 |
US-Bi1 |
Bouldin Island Alfalfa |
2016 |
2023 |
US-Bi2 |
Bouldin Island corn |
2017 |
2023 |
In some cases, users may want to know if certain types of metadata
are available for the selected sites. The amf_list_metadata() function
provides a quick summary of metadata availability before actually
downloading the data and metadata.
By default, amf_list_metadata() returns a full site list with the
available entries (i.e., counts) for all BADM groups. Check AmeriFlux
webpage for definitions of all BADM groups.
Table continues below
AR-CCa |
1 |
1 |
1 |
1 |
AR-CCg |
1 |
1 |
1 |
1 |
AR-Cel |
1 |
1 |
1 |
0 |
Table continues below
1 |
2 |
1 |
1 |
2 |
2 |
0 |
0 |
1 |
The site_set parameter of the amf_list_metadata() can be used to
subset the sites of interest.
metadata_aval_sub <- as.data.table(amf_list_metadata(site_set = crop_ls$SITE_ID))
# down-select cropland & grassland sites by interested BADM group,
# e.g., canopy height (GRP_HEIGHTC)
crop_ls2 <- metadata_aval_sub[GRP_HEIGHTC > 0, .(SITE_ID, GRP_HEIGHTC)][order(-GRP_HEIGHTC)]
pander::pandoc.table(crop_ls2[c(1:10), ])
US-Ne2 |
196 |
US-Tw3 |
162 |
US-Twt |
133 |
US-Ne3 |
128 |
US-Ne1 |
119 |
US-Bi1 |
112 |
US-Var |
105 |
US-Snd |
70 |
US-Bi2 |
54 |
US-ARM |
45 |
Get data availability
Users can use amf_list_data() to query the availability of specific
variables in the data (i.e., flux/met data, so-called BASE data
product). The amf_list_data() provides a quick summary of variable
availability (per site/year) before downloading the data.
By default, amf_list_data() returns a full site list of variable
availability (data percentages per year) for all variables. The site_set
parameter of amf_list_data() can be used to subset the sites of
interest.
Table continues below
US-AR1 |
CO2 |
CO2 |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
FC |
FC |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
G |
G |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
H |
H |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
H2O |
H2O |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
LE |
LE |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
LW_IN |
LW_IN |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
LW_OUT |
LW_OUT |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
NETRAD |
NETRAD |
FALSE |
0 |
0 |
0 |
0 |
US-AR1 |
P |
P |
FALSE |
0 |
0 |
0 |
0 |
Table continues below
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Table continues below
0 |
0 |
0 |
0 |
0 |
0.5905 |
0.9866 |
0.9941 |
0.6654 |
0 |
0 |
0 |
0 |
0 |
0.6082 |
0.976 |
0.9886 |
0.6621 |
0 |
0 |
0 |
0 |
0 |
0.6421 |
0.9965 |
0.9999 |
0.9961 |
0 |
0 |
0 |
0 |
0 |
0.6123 |
0.9867 |
0.9938 |
0.6666 |
0 |
0 |
0 |
0 |
0 |
0.6092 |
0.971 |
0.9792 |
0.6633 |
0 |
0 |
0 |
0 |
0 |
0.6101 |
0.9816 |
0.9936 |
0.6647 |
0 |
0 |
0 |
0 |
0 |
0.6416 |
0.9965 |
0.9999 |
0.9961 |
0 |
0 |
0 |
0 |
0 |
0.6416 |
0.9965 |
0.9999 |
0.9961 |
0 |
0 |
0 |
0 |
0 |
0.5447 |
0.9964 |
0.9996 |
0.996 |
0 |
0 |
0 |
0 |
0 |
0.6422 |
0.9965 |
0.9999 |
0.9961 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
The variable availability can be used to subset sites that have
certain variables in specific years. The BASENAME column indicates the
variable’s base name (i.e., ignoring position qualifier), and can be
used to get a coarse-level variable availability.
See AmeriFlux
website for definitions of base names and qualifiers.
# down-select cropland & grassland sites based on the available wind speed (WS) and
# friction velocity (USTAR) data in 2015-2018, regardless their qualifiers
data_aval_sub <- data_aval[data_aval$BASENAME %in% c("WS","USTAR"),
.(SITE_ID, BASENAME, Y2015, Y2016, Y2017, Y2018)]
# calculate mean availability of WS and USTAR in each site and each year
data_aval_sub <- data_aval_sub[, lapply(.SD, mean),
by = .(SITE_ID),
.SDcols = c("Y2015", "Y2016", "Y2017", "Y2018")]
# sub-select sites that have WS and USTAR data for > 75%
# during 2015-2018
crop_ls3 <- data_aval_sub[(Y2015 + Y2016 + Y2017 + Y2018) / 4 > 0.75]
pander::pandoc.table(crop_ls3)
US-ARM |
0.5772 |
0.9871 |
0.9683 |
0.9826 |
US-Ne1 |
0.77 |
0.7861 |
0.756 |
0.7167 |
US-Ne2 |
0.7636 |
0.7878 |
0.7594 |
0.7442 |
US-SRG |
0.9669 |
0.9851 |
0.9775 |
0.9997 |
US-Tw3 |
0.9689 |
0.9569 |
0.9763 |
0.4005 |
US-Var |
0.9983 |
1 |
0.9455 |
1 |
US-Wkg |
0.9973 |
0.9909 |
0.9965 |
0.9848 |
Last, sometimes users would look for sites with multiple measurements
of similar variables (e.g., multilevel wind speed, soil temperature).
The VARIABLE column in the variable availability can be used to get a
fine-level variable availability.
# down-select cropland & grassland sites by available wind speed (WS) data,
# mean availability of WS during 2015-2018
data_aval_sub2 <- data_aval[data_aval$BASENAME %in% c("WS"),
.(SITE_ID, VARIABLE, Y2015_2018 = (Y2015 + Y2016 + Y2017 + Y2018)/4)]
# calculate number of WS variables per site, for sites that
# have any WS data during 2015-2018
data_aval_sub2 <- data_aval_sub2[Y2015_2018 > 0, .(.N, Y2015_2018 = mean(Y2015_2018)), .(SITE_ID)]
pander::pandoc.table(crop_ls4 <- data_aval_sub2[N > 1, ])
US-ARM |
3 |
0.8766 |
US-Ne1 |
4 |
0.7027 |
US-Ne2 |
4 |
0.709 |
US-Ne3 |
4 |
0.7287 |
US-Wkg |
2 |
0.9942 |
A companion function amf_plot_datayear() can be used for visualizing
the data availability in an interactive figure. However, it is strongly
advised to subset the sites, variables, and/or years for faster
processing and better visualization.
#### not evaluated so to reduce vignette size
# plot data availability for WS & USTAR
# for selected sites in 2015-2018
amf_plot_datayear(
site_set = crop_ls4$SITE_ID,
var_set = c("WS", "USTAR"),
nonfilled_only = TRUE,
year_set = c(2015:2018)
)
Get data summary
In addition, users can use amf_summarize_data() to query the summary
statistics of specific variables in the BASE data. The
amf_summarize_data() provides summary statistics for each variable
(e.g., percentiles) before downloading the data.
By default, amf_summarize_data() returns variable summary (selected
percentiles) for all variables and sites. The site_set and var_set
parameters can be used to subset the sites or variables of interest.
## get data summary for selected sites & variables
data_sum <- amf_summarize_data(site_set = crop_ls3$SITE_ID,
var_set = c("WS", "USTAR"))
pander::pandoc.table(data_sum[c(1:10), ])
Table continues below
4165 |
US-ARM |
WS_1_1_1 |
WS |
FALSE |
353556 |
4168 |
US-ARM |
USTAR_1_1_1 |
USTAR |
FALSE |
353556 |
4221 |
US-ARM |
WS_1_2_1 |
WS |
FALSE |
353556 |
4224 |
US-ARM |
USTAR_1_2_1 |
USTAR |
FALSE |
353556 |
4248 |
US-ARM |
WS_1_3_1 |
WS |
FALSE |
353556 |
4251 |
US-ARM |
USTAR_1_3_1 |
USTAR |
FALSE |
353556 |
10404 |
US-Ne1 |
USTAR_1_1_1 |
USTAR |
FALSE |
175320 |
10476 |
US-Ne1 |
WS_1_1_1 |
WS |
FALSE |
175320 |
10477 |
US-Ne1 |
WS_1_2_1 |
WS |
FALSE |
175320 |
10478 |
US-Ne1 |
WS_1_3_1 |
WS |
FALSE |
175320 |
Table continues below
4165 |
23542 |
0.5092 |
1.06 |
1.444 |
1.749 |
2.021 |
4168 |
23086 |
0.02901 |
0.05339 |
0.07707 |
0.1014 |
0.1271 |
4221 |
32209 |
0.8247 |
1.725 |
2.389 |
2.894 |
3.331 |
4224 |
31081 |
0.03152 |
0.05623 |
0.07908 |
0.1017 |
0.1259 |
4248 |
51797 |
0.988 |
2.126 |
3.002 |
3.689 |
4.278 |
4251 |
45785 |
0.03332 |
0.05838 |
0.08043 |
0.1016 |
0.1243 |
10404 |
15578 |
0.024 |
0.049 |
0.071 |
0.093 |
0.116 |
10476 |
118560 |
0.55 |
0.94 |
1.19 |
1.37 |
1.53 |
10477 |
10493 |
0.8 |
1.2 |
1.49 |
1.72 |
1.95 |
10478 |
11666 |
0.52 |
0.77 |
0.99 |
1.19 |
1.39 |
Table continues below
4165 |
2.274 |
2.531 |
2.796 |
3.075 |
3.361 |
3.669 |
3.999 |
4168 |
0.1533 |
0.1794 |
0.2052 |
0.2305 |
0.2556 |
0.281 |
0.3068 |
4221 |
3.732 |
4.113 |
4.476 |
4.838 |
5.207 |
5.575 |
5.951 |
4224 |
0.1517 |
0.1788 |
0.2056 |
0.2325 |
0.2597 |
0.2868 |
0.3142 |
4248 |
4.799 |
5.289 |
5.763 |
6.229 |
6.685 |
7.143 |
7.611 |
4251 |
0.1495 |
0.1775 |
0.2063 |
0.2361 |
0.2657 |
0.2951 |
0.3253 |
10404 |
0.14 |
0.164 |
0.188 |
0.213 |
0.238 |
0.263 |
0.289 |
10476 |
1.68 |
1.81 |
1.96 |
2.1 |
2.26 |
2.44 |
2.63 |
10477 |
2.17 |
2.4 |
2.64 |
2.89 |
3.16 |
3.44 |
3.75 |
10478 |
1.59 |
1.79 |
2 |
2.22 |
2.45 |
2.71 |
3.01 |
Table continues below
4165 |
4.353 |
4.736 |
5.155 |
5.628 |
6.174 |
6.836 |
7.701 |
4168 |
0.3334 |
0.3613 |
0.3919 |
0.4253 |
0.4635 |
0.509 |
0.5668 |
4221 |
6.34 |
6.757 |
7.216 |
7.738 |
8.362 |
9.123 |
10.16 |
4224 |
0.342 |
0.3711 |
0.4012 |
0.4341 |
0.4714 |
0.5148 |
0.5697 |
4248 |
8.09 |
8.577 |
9.086 |
9.631 |
10.23 |
10.96 |
11.94 |
4251 |
0.3567 |
0.3886 |
0.4227 |
0.4596 |
0.5006 |
0.5491 |
0.6111 |
10404 |
0.315 |
0.343 |
0.373 |
0.406 |
0.4438 |
0.49 |
0.551 |
10476 |
2.85 |
3.1 |
3.38 |
3.69 |
4.04 |
4.48 |
5.06 |
10477 |
4.1 |
4.49 |
4.93 |
5.42 |
6 |
6.7 |
7.62 |
10478 |
3.34 |
3.73 |
4.17 |
4.69 |
5.28 |
5.98 |
6.87 |
4165 |
8.963 |
11.3 |
4168 |
0.6528 |
0.8272 |
4221 |
11.71 |
14.47 |
4224 |
0.6519 |
0.8205 |
4248 |
13.43 |
16.32 |
4251 |
0.706 |
0.9164 |
10404 |
0.645 |
0.852 |
10476 |
5.93 |
7.61 |
10477 |
9.01 |
11.65 |
10478 |
8.19 |
10.61 |
Alternatively, a companion function amf_plot_datasummary() provides
interactive visualization to the data summary.
#### not evaluated so to reduce vignette size
## plot data summary of USTAR for selected sites,
amf_plot_datasummary(
site_set = crop_ls3$SITE_ID,
var_set = c("USTAR")
)
#### not evaluated so to reduce vignette size
## plot data summary of WS for selected sites,
# including clustering information
amf_plot_datasummary(
site_set = crop_ls3$SITE_ID,
var_set = c("WS"),
show_cluster = TRUE
)
Once having a target site list, users can download these sites’ data
and metadata using the site IDs. See Data
import for data download and import examples.