Site selection

Get a site list with general info

AmeriFlux data are organized by individual sites. Typically, data query begins with site search and selection. A full list of AmeriFlux sites with general info can be obtained using the amf_site_info() function.

Convert the site list to a data.table for easier manipulation. Also see link for variable definition.

# get a full list of sites with general info
sites <- amf_site_info()
sites_dt <- data.table::as.data.table(sites)

pander::pandoc.table(sites_dt[c(1:3), ])

Table continues below
SITE_ID	SITE_NAME	COUNTRY	STATE	IGBP
AR-CCa	Carlos Casares agriculture	Argentina	Buenos Aires	CRO
AR-CCg	Carlos Casares grassland	Argentina	Buenos Aires	GRA
AR-Cel	CELPA Mar Chiquita BA	Argentina	Buenos Aires	WET

Table continues below
TOWER_BEGAN	URL_AMERIFLUX	TOWER_END
2012	https://ameriflux.lbl.gov/sites/siteinfo/AR-CCa	NA
2018	https://ameriflux.lbl.gov/sites/siteinfo/AR-CCg	NA
2018	https://ameriflux.lbl.gov/sites/siteinfo/AR-Cel	2018

Table continues below
LOCATION_LAT	LOCATION_LONG	LOCATION_ELEV	CLIMATE_KOEPPEN	MAT	MAP
-35.62	-61.32	83	Cfa	16.1	1060
-35.92	-61.19	84	Cfa	16.1	1060
-37.7	-57.42	1	Cfb	14	926

DATA_POLICY	DATA_START	DATA_END
LEGACY	2012	2020
LEGACY	2018	2020
LEGACY	NA	NA

The site list provides a quick summary of all registered sites and sites with available data.

It’s often important to understand the data use policy under which the data are shared. In 2021, the AmeriFlux community moved to the AmeriFlux CC-BY-4.0 License. Most site PIs now share their sites’ data under the CC-BY-4.0 license. Data for some sites are shared under the historical AmeriFlux data-sharing policy, now called the AmeriFlux Legacy Data Policy.

Check link for data use policy and attribution guidelines.

# total number of registered sites
pander::pandoc.table(sites_dt[, .N])

634


# total number of sites with available data
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N])

469


# get number of sites with available data, grouped by data use policy
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N, by = .(DATA_POLICY)])

DATA_POLICY	N
LEGACY	92
CCBY4.0	377

Further group sites based on IGBP.

# get a summary table of sites grouped by IGBP
pander::pandoc.table(sites_dt[, .N, by = "IGBP"])

IGBP	N
CRO	129
GRA	85
WET	114
DNF	1
EBF	12
WSA	12
MF	16
ENF	101
DBF	59
OSH	40
WAT	10
CSH	13
URB	15
BSV	7
CVM	11
SAV	8
SNO	1


# get a summary table of sites with available data, & grouped by IGBP
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N, by = "IGBP"])

IGBP	N
CRO	85
GRA	67
WET	61
DNF	1
WSA	8
EBF	8
ENF	94
DBF	55
MF	12
OSH	32
CSH	11
BSV	5
WAT	8
CVM	8
URB	7
SAV	6
SNO	1


# get a summary table of sites with available data, 
#  & grouped by data use policy & IGBP
pander::pandoc.table(sites_dt[!is.na(DATA_START), .N, by = .(IGBP, DATA_POLICY)][order(IGBP)])

IGBP	DATA_POLICY	N
BSV	CCBY4.0	3
BSV	LEGACY	2
CRO	LEGACY	15
CRO	CCBY4.0	70
CSH	LEGACY	5
CSH	CCBY4.0	6
CVM	CCBY4.0	8
DBF	CCBY4.0	49
DBF	LEGACY	6
DNF	CCBY4.0	1
EBF	CCBY4.0	6
EBF	LEGACY	2
ENF	CCBY4.0	78
ENF	LEGACY	16
GRA	LEGACY	14
GRA	CCBY4.0	53
MF	CCBY4.0	8
MF	LEGACY	4
OSH	CCBY4.0	25
OSH	LEGACY	7
SAV	CCBY4.0	6
SNO	CCBY4.0	1
URB	CCBY4.0	5
URB	LEGACY	2
WAT	CCBY4.0	8
WET	CCBY4.0	44
WET	LEGACY	17
WSA	CCBY4.0	6
WSA	LEGACY	2

Once decided, users can query a target site list based on the desired criteria, e.g., IGBP, data availability, data policy, geolocation.


# get a list of cropland and grassland sites with available data,
#  shared under CC-BY-4.0 data policy,
#  located within 30-50 degree N in latitude,
# returned a site list with site ID, name, data starting/ending year
crop_ls <- sites_dt[IGBP %in% c("CRO", "GRA") &
                      !is.na(DATA_START) &
                      LOCATION_LAT > 30 &
                      LOCATION_LAT < 50 &
                      DATA_POLICY == "CCBY4.0",
                    .(SITE_ID, SITE_NAME, DATA_START, DATA_END)]
pander::pandoc.table(crop_ls[c(1:10),])

SITE_ID	SITE_NAME	DATA_START	DATA_END
CA-ER1	Elora Research Station	2015	2021
US-A32	ARM-SGP Medford hay pasture	2015	2017
US-A74	ARM SGP milo field	2015	2017
US-AR1	ARM USDA UNL OSU Woodward Switchgrass 1	2009	2012
US-AR2	ARM USDA UNL OSU Woodward Switchgrass 2	2009	2012
US-ARb	ARM Southern Great Plains burn site- Lamont	2005	2006
US-ARc	ARM Southern Great Plains control site- Lamont	2005	2006
US-ARM	ARM Southern Great Plains site- Lamont	2003	2023
US-Bi1	Bouldin Island Alfalfa	2016	2023
US-Bi2	Bouldin Island corn	2017	2023

Get metadata availability

In some cases, users may want to know if certain types of metadata are available for the selected sites. The amf_list_metadata() function provides a quick summary of metadata availability before actually downloading the data and metadata.

By default, amf_list_metadata() returns a full site list with the available entries (i.e., counts) for all BADM groups. Check AmeriFlux webpage for definitions of all BADM groups.

# get data availability for selected sites
metadata_aval <- data.table::as.data.table(amf_list_metadata())
pander::pandoc.table(metadata_aval[c(1:3), c(1:10)])

Table continues below
SITE_ID	GRP_ACKNOWLEDGEMENT	GRP_CLIM_AVG	GRP_COUNTRY	GRP_DOI
AR-CCa	1	1	1	1
AR-CCg	1	1	1	1
AR-Cel	1	1	1	0

Table continues below
GRP_DOI_CONTRIBUTOR	GRP_DOI_ORGANIZATION	GRP_DOM_DIST_MGMT
1	2	1
1	2	2
0	0	1

GRP_FLUX_MEASUREMENTS	GRP_HEADER
2	1
2	1
6	1

The site_set parameter of the amf_list_metadata() can be used to subset the sites of interest.

metadata_aval_sub <- as.data.table(amf_list_metadata(site_set = crop_ls$SITE_ID))

# down-select cropland & grassland sites by interested BADM group,
#  e.g., canopy height (GRP_HEIGHTC)
crop_ls2 <- metadata_aval_sub[GRP_HEIGHTC > 0, .(SITE_ID, GRP_HEIGHTC)][order(-GRP_HEIGHTC)]
pander::pandoc.table(crop_ls2[c(1:10), ])

SITE_ID	GRP_HEIGHTC
US-Ne2	196
US-Tw3	162
US-Twt	133
US-Ne3	128
US-Ne1	119
US-Bi1	112
US-Var	105
US-Snd	70
US-Bi2	54
US-ARM	45

Get data availability

Users can use amf_list_data() to query the availability of specific variables in the data (i.e., flux/met data, so-called BASE data product). The amf_list_data() provides a quick summary of variable availability (per site/year) before downloading the data.

By default, amf_list_data() returns a full site list of variable availability (data percentages per year) for all variables. The site_set parameter of amf_list_data() can be used to subset the sites of interest.

# get data availability for selected sites
data_aval <- data.table::as.data.table(amf_list_data(site_set = crop_ls2$SITE_ID))
pander::pandoc.table(data_aval[c(1:10), ])

Table continues below
SITE_ID	VARIABLE	BASENAME	GAP_FILLED
US-AR1	CO2	CO2	FALSE
US-AR1	FC	FC	FALSE
US-AR1	G	G	FALSE
US-AR1	H	H	FALSE
US-AR1	H2O	H2O	FALSE
US-AR1	LE	LE	FALSE
US-AR1	LW_IN	LW_IN	FALSE
US-AR1	LW_OUT	LW_OUT	FALSE
US-AR1	NETRAD	NETRAD	FALSE
US-AR1	P	P	FALSE

Table continues below
Y1994	Y1995	Y1996	Y1997	Y1998	Y1999	Y2000	Y2001	Y2002	Y2003
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0

Table continues below
Y2009	Y2010	Y2011	Y2012
0.5905	0.9866	0.9941	0.6654
0.6082	0.976	0.9886	0.6621
0.6421	0.9965	0.9999	0.9961
0.6123	0.9867	0.9938	0.6666
0.6092	0.971	0.9792	0.6633
0.6101	0.9816	0.9936	0.6647
0.6416	0.9965	0.9999	0.9961
0.6416	0.9965	0.9999	0.9961
0.5447	0.9964	0.9996	0.996
0.6422	0.9965	0.9999	0.9961

Y2013	Y2014	Y2015	Y2016	Y2017	Y2018	Y2019	Y2020	Y2021	Y2022	Y2023
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0

The variable availability can be used to subset sites that have certain variables in specific years. The BASENAME column indicates the variable’s base name (i.e., ignoring position qualifier), and can be used to get a coarse-level variable availability.

See AmeriFlux website for definitions of base names and qualifiers.

# down-select cropland & grassland sites based on the available wind speed (WS) and 
# friction velocity (USTAR) data in 2015-2018, regardless their qualifiers
data_aval_sub <- data_aval[data_aval$BASENAME %in% c("WS","USTAR"),
                           .(SITE_ID, BASENAME, Y2015, Y2016, Y2017, Y2018)]

# calculate mean availability of WS and USTAR in each site and each year
data_aval_sub <- data_aval_sub[, lapply(.SD, mean), 
                               by = .(SITE_ID),
                               .SDcols = c("Y2015", "Y2016", "Y2017", "Y2018")]

# sub-select sites that have WS and USTAR data for > 75%
#  during 2015-2018
crop_ls3 <- data_aval_sub[(Y2015 + Y2016 + Y2017 + Y2018) / 4 > 0.75]
pander::pandoc.table(crop_ls3)

SITE_ID	Y2015	Y2016	Y2017	Y2018
US-ARM	0.5772	0.9871	0.9683	0.9826
US-Ne1	0.77	0.7861	0.756	0.7167
US-Ne2	0.7636	0.7878	0.7594	0.7442
US-SRG	0.9669	0.9851	0.9775	0.9997
US-Tw3	0.9689	0.9569	0.9763	0.4005
US-Var	0.9983	1	0.9455	1
US-Wkg	0.9973	0.9909	0.9965	0.9848

Last, sometimes users would look for sites with multiple measurements of similar variables (e.g., multilevel wind speed, soil temperature). The VARIABLE column in the variable availability can be used to get a fine-level variable availability.


# down-select cropland & grassland sites by available wind speed (WS) data,
#  mean availability of WS during 2015-2018
data_aval_sub2 <- data_aval[data_aval$BASENAME %in% c("WS"),
                            .(SITE_ID, VARIABLE, Y2015_2018 = (Y2015 + Y2016 + Y2017 + Y2018)/4)]

# calculate number of WS variables per site, for sites that 
#  have any WS data during 2015-2018
data_aval_sub2 <- data_aval_sub2[Y2015_2018 > 0, .(.N, Y2015_2018 = mean(Y2015_2018)), .(SITE_ID)]
pander::pandoc.table(crop_ls4 <- data_aval_sub2[N > 1, ])

SITE_ID	N	Y2015_2018
US-ARM	3	0.8766
US-Ne1	4	0.7027
US-Ne2	4	0.709
US-Ne3	4	0.7287
US-Wkg	2	0.9942

A companion function amf_plot_datayear() can be used for visualizing the data availability in an interactive figure. However, it is strongly advised to subset the sites, variables, and/or years for faster processing and better visualization.

#### not evaluated so to reduce vignette size
# plot data availability for WS & USTAR
#  for selected sites in 2015-2018
amf_plot_datayear(
  site_set = crop_ls4$SITE_ID,
  var_set = c("WS", "USTAR"),
  nonfilled_only = TRUE,
  year_set = c(2015:2018)
)

Get data summary

In addition, users can use amf_summarize_data() to query the summary statistics of specific variables in the BASE data. The amf_summarize_data() provides summary statistics for each variable (e.g., percentiles) before downloading the data.

By default, amf_summarize_data() returns variable summary (selected percentiles) for all variables and sites. The site_set and var_set parameters can be used to subset the sites or variables of interest.

## get data summary for selected sites & variables
data_sum <- amf_summarize_data(site_set = crop_ls3$SITE_ID,
                     var_set = c("WS", "USTAR"))
pander::pandoc.table(data_sum[c(1:10), ])

Table continues below
	SITE_ID	VARIABLE	BASENAME	GAP_FILLED	DATA_RECORD
4165	US-ARM	WS_1_1_1	WS	FALSE	353556
4168	US-ARM	USTAR_1_1_1	USTAR	FALSE	353556
4221	US-ARM	WS_1_2_1	WS	FALSE	353556
4224	US-ARM	USTAR_1_2_1	USTAR	FALSE	353556
4248	US-ARM	WS_1_3_1	WS	FALSE	353556
4251	US-ARM	USTAR_1_3_1	USTAR	FALSE	353556
10404	US-Ne1	USTAR_1_1_1	USTAR	FALSE	175320
10476	US-Ne1	WS_1_1_1	WS	FALSE	175320
10477	US-Ne1	WS_1_2_1	WS	FALSE	175320
10478	US-Ne1	WS_1_3_1	WS	FALSE	175320

Table continues below
	DATA_MISSING	Q01	Q05	Q10	Q15	Q20
4165	23542	0.5092	1.06	1.444	1.749	2.021
4168	23086	0.02901	0.05339	0.07707	0.1014	0.1271
4221	32209	0.8247	1.725	2.389	2.894	3.331
4224	31081	0.03152	0.05623	0.07908	0.1017	0.1259
4248	51797	0.988	2.126	3.002	3.689	4.278
4251	45785	0.03332	0.05838	0.08043	0.1016	0.1243
10404	15578	0.024	0.049	0.071	0.093	0.116
10476	118560	0.55	0.94	1.19	1.37	1.53
10477	10493	0.8	1.2	1.49	1.72	1.95
10478	11666	0.52	0.77	0.99	1.19	1.39

Table continues below
	Q25	Q30	Q35	Q40	Q45	Q50	Q55
4165	2.274	2.531	2.796	3.075	3.361	3.669	3.999
4168	0.1533	0.1794	0.2052	0.2305	0.2556	0.281	0.3068
4221	3.732	4.113	4.476	4.838	5.207	5.575	5.951
4224	0.1517	0.1788	0.2056	0.2325	0.2597	0.2868	0.3142
4248	4.799	5.289	5.763	6.229	6.685	7.143	7.611
4251	0.1495	0.1775	0.2063	0.2361	0.2657	0.2951	0.3253
10404	0.14	0.164	0.188	0.213	0.238	0.263	0.289
10476	1.68	1.81	1.96	2.1	2.26	2.44	2.63
10477	2.17	2.4	2.64	2.89	3.16	3.44	3.75
10478	1.59	1.79	2	2.22	2.45	2.71	3.01

Table continues below
	Q60	Q65	Q70	Q75	Q80	Q85	Q90
4165	4.353	4.736	5.155	5.628	6.174	6.836	7.701
4168	0.3334	0.3613	0.3919	0.4253	0.4635	0.509	0.5668
4221	6.34	6.757	7.216	7.738	8.362	9.123	10.16
4224	0.342	0.3711	0.4012	0.4341	0.4714	0.5148	0.5697
4248	8.09	8.577	9.086	9.631	10.23	10.96	11.94
4251	0.3567	0.3886	0.4227	0.4596	0.5006	0.5491	0.6111
10404	0.315	0.343	0.373	0.406	0.4438	0.49	0.551
10476	2.85	3.1	3.38	3.69	4.04	4.48	5.06
10477	4.1	4.49	4.93	5.42	6	6.7	7.62
10478	3.34	3.73	4.17	4.69	5.28	5.98	6.87

	Q95	Q99
4165	8.963	11.3
4168	0.6528	0.8272
4221	11.71	14.47
4224	0.6519	0.8205
4248	13.43	16.32
4251	0.706	0.9164
10404	0.645	0.852
10476	5.93	7.61
10477	9.01	11.65
10478	8.19	10.61

Alternatively, a companion function amf_plot_datasummary() provides interactive visualization to the data summary.

#### not evaluated so to reduce vignette size
## plot data summary of USTAR for selected sites, 
amf_plot_datasummary(
  site_set = crop_ls3$SITE_ID,
  var_set = c("USTAR")
)

#### not evaluated so to reduce vignette size
## plot data summary of WS for selected sites, 
#  including clustering information
amf_plot_datasummary(
  site_set = crop_ls3$SITE_ID,
  var_set = c("WS"),
  show_cluster = TRUE
)

Once having a target site list, users can download these sites’ data and metadata using the site IDs. See Data import for data download and import examples.

IGBP	N
CRO	129
GRA	85
WET	114
DNF	1
EBF	12
WSA	12
MF	16
ENF	101
DBF	59
OSH	40
WAT	10
CSH	13
URB	15
BSV	7
CVM	11
SAV	8
SNO	1

IGBP	N
CRO	85
GRA	67
WET	61
DNF	1
WSA	8
EBF	8
ENF	94
DBF	55
MF	12
OSH	32
CSH	11
BSV	5
WAT	8
CVM	8
URB	7
SAV	6
SNO	1

Y1994	Y1995	Y1996	Y1997	Y1998	Y1999	Y2000	Y2001	Y2002	Y2003
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0

Y2013	Y2014	Y2015	Y2016	Y2017	Y2018	Y2019	Y2020	Y2021	Y2022	Y2023
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0

IGBP	N
CRO	129
GRA	85
WET	114
DNF	1
EBF	12
WSA	12
MF	16
ENF	101
DBF	59
OSH	40
WAT	10
CSH	13
URB	15
BSV	7
CVM	11
SAV	8
SNO	1

IGBP	N
CRO	85
GRA	67
WET	61
DNF	1
WSA	8
EBF	8
ENF	94
DBF	55
MF	12
OSH	32
CSH	11
BSV	5
WAT	8
CVM	8
URB	7
SAV	6
SNO	1

Y1994	Y1995	Y1996	Y1997	Y1998	Y1999	Y2000	Y2001	Y2002	Y2003
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0

Y2013	Y2014	Y2015	Y2016	Y2017	Y2018	Y2019	Y2020	Y2021	Y2022	Y2023
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0

Housen Chu

Get a site list with general info

Get metadata availability

Get data availability

Get data summary

IGBP	N
CRO	129
GRA	85
WET	114
DNF	1
EBF	12
WSA	12
MF	16
ENF	101
DBF	59
OSH	40
WAT	10
CSH	13
URB	15
BSV	7
CVM	11
SAV	8
SNO	1

IGBP	N
CRO	85
GRA	67
WET	61
DNF	1
WSA	8
EBF	8
ENF	94
DBF	55
MF	12
OSH	32
CSH	11
BSV	5
WAT	8
CVM	8
URB	7
SAV	6
SNO	1

Y1994	Y1995	Y1996	Y1997	Y1998	Y1999	Y2000	Y2001	Y2002	Y2003
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0

Y2013	Y2014	Y2015	Y2016	Y2017	Y2018	Y2019	Y2020	Y2021	Y2022	Y2023
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0