← Back to DatasetsDataset

Data and scripts from: “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification”

Creators: Timothy Johnsen, Xiangyu Bi, Chunwei Chou ORCID, Charuleka Varadharajan ORCID, Yuxin Wu, Jonathan Skone, Lavanya Ramakrishnan

Year: 2025

DOI: 10.15485/2561511

License: CC-BY 4.0

Location: ER-PHS1, ph1, Ecohydrology Sensor Monitoring Station at PumpHouse northeast facing hillslope, East River, Colorado. Shrubland.

Temporal extent: 2019-10-12 to 2023-04-11

Bounding box: 38.920°N to 38.920°N, -106.950°W to -106.950°W

Publisher: ESS_DIVE

Description

This data package includes data and scripts from the manuscript “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification”. The study addressed common challenges faced in environmental sensing and modeling, including uncertain input data, missing sensor observations, and high-dimensional datasets with interrelated but redundant variables. Point-scaled meteorological and soil sensor observations were perturbed with noises and missing values, and denoising autoencoder (DAE) neural networks were developed to reconstruct the perturbed data and further predict evapotranspiration. This study concluded that (1) the reconstruction quality of each variable depends on its cross-correlation and alignment to the underlying data structure, (2) uncertainties from the models were overall stronger than those from the data corruption, and (3) there was a tradeoff between reducing bias and reducing variance when evaluating the uncertainty of the machine learning models. This package includes: (1) Four ipython scripts (.ipynb): “DAE_train.ipynb” trains and evaluates DAE neural networks, “DAE_predict.ipynb” makes predictions from the trained DAE models, “ET_train.ipynb” trains and evaluates ET prediction neural networks, and “ET_predict.ipynb” makes predictions from trained ET models. (2) One python file (.py): “methods.py” includes all user-defined functions and python codes used in the ipython scripts. (3) A “sub_models” folder that includes five trained DAE neural networks (in pytorch format, .pt), which could be used to ingest input data before being fed to the downstream ET models in ‘ET_train.ipynb” or ‘ET_predict.ipynb’. (4) Two data files (.csv). Daily meteorological, vegetation, and soil data is in “df_data.csv”, where “df_meta.csv” contains the location and time information of “df_data.csv”. Each row (index) in “df_meta.csv” corresponds to each row in “df_data.csv”. These data files are formatted to follow the data structure requirements and be directly used in the ipython scripts, and they have been shuffled chronologically to train machine learning models. The meteorological and soil data was collected using point sensors between 2019-2023 at (4.a) Three shrub-dominated field sites in East River, Colorado (named “ph1”, “ph2” and “sg5” in “df_meta.csv”, where “ph1” and “ph2” were located at PumpHouse Hillslopes, and “sg5” was at Snodgrass Mountain meadow) and (4.b) One outdoor, mesoscale, and herbaceous-dominated experiment in Berkeley, California (named “tb” in “df_meta.csv”, short for Smartsoils Testbed at Lawrence Berkeley National Lab). - See "df_data_dd.csv" and "df_meta_dd.csv" for variable descriptions and the Methods section for additional data processing steps. See "flmd.csv" and "README.txt" for brief file descriptions. - All ipython scripts and python files are written in and require PYTHON language software.

View in Catalog View DOI Record

Local Knowledge Graph (14 entities)

Loading graph...

Related Works

Items connected by shared entities, co-authorship, citations, or semantic similarity.

Article

Challenges in Building an End-to-End System for Acquisition, Management, and Integration of Diverse Data From Sensor Networks in Watersheds: Lessons From a Mountainous Community Observatory in East River, Colorado

2019IEEE Access2 shared authors

Article

Variations in bedrock and vegetation cover modulate subsurface water flow dynamics of a mountainous hillslope

2024Water Resources Research2 shared authors

Dataset

Meteorological and Soil Data from Ecohydrology Sensor Towers at Pump House and Snodgrass Mountain in East River Watershed, Colorado, 2019-2025

202674% similar

Dataset

Hybrid predictive modeling approach simulated evapotranspiration and ecosystem respiration data

202073% similar

Dataset

Meteorological Variables and Energy Fluxes at the Pumphouse Site, Crested Butte, CO 2017-2019

202573% similar

Article

Estimation of Evapotranspiration Rates and Root Water Uptake Profiles From Soil Moisture Sensor Array Data

2021Water Resources Research1 shared author

Document

Data and scripts from: “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification”

Description

Local Knowledge Graph (14 entities)

Related Works

Challenges in Building an End-to-End System for Acquisition, Management, and Integration of Diverse Data From Sensor Networks in Watersheds: Lessons From a Mountainous Community Observatory in East River, Colorado

Variations in bedrock and vegetation cover modulate subsurface water flow dynamics of a mountainous hillslope

Meteorological and Soil Data from Ecohydrology Sensor Towers at Pump House and Snodgrass Mountain in East River Watershed, Colorado, 2019-2025

Hybrid predictive modeling approach simulated evapotranspiration and ecosystem respiration data

Meteorological Variables and Energy Fluxes at the Pumphouse Site, Crested Butte, CO 2017-2019

Estimation of Evapotranspiration Rates and Root Water Uptake Profiles From Soil Moisture Sensor Array Data

Revising Desertification of Riparian Zones Along Cold Desert Streams

Planning for Drought

Drought is Draining (Denver post yr 2000)