Imputation of contiguous gaps and extremes of subhourly groundwater time series using random forests
Abstract
Machine learning can provide sustainable solutions to gap-fill groundwater (GW) data needed to adequately constrain watershed models. However, imputing missing extremes is more challenging than other parts of a hydrograph. To impute missing subhourly data, including extremes, within GW time-series data collected at multiple wells in the East River watershed, located in southwestern Colorado, we consider a single-well imputation (SWI) and a multiple-well imputation (MWI) approach. SWI gap-fills missing GW entries in a well using the same well's time-series data; MWI gap-fills a specific well's missing GW entry using the time series of neighboring wells. SWI takes advantage of linear interpolation and random forest (RF) approaches, whereas MWI exploits only the RF approach. We also use an information entropy framework to develop insights into how missing data patterns impact imputation. We discovered that if gaps were at random intervals, SWI could accurately impute up to 90% of missing data over an approximately two-year period. Contiguous gaps constituted more complex scenarios for imputation and required the use of MWI. Information entropy suggested that if gaps were contiguous, up to 50% of missing GW data could be estimated accurately over an approximately two-year period. The RF-feature importance suggested that a time feature (months) and a space feature (neighboring wells) were the most important predictors in the SWI and MWI. We also noted that neither SWI nor MWI methods could capture the missing extremes of a hydrograph. To counter this, we developed a new sequential approach and demonstrated the imputation of missing extremes in a GW time series with high accuracy.
Local Knowledge Graph (23 entities)
Related Works
Items connected by shared entities, co-authorship, citations, or semantic similarity.
Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data
The East River, Colorado, Watershed: A mountainous community testbed for improving predictive understanding of multiscale hydrological-biogeochemical dynamics
Differential C-Q Analysis: A New Approach to Inferring Lateral Transport and Hydrologic Transients Within Multiple Reaches of a Mountainous Headwater Catchment
Machine Learning Assisted Gap-Filled Discharge Data for the East River Community Watershed, Colorado, for Water Years 2014-2021
QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells, East River, Colorado
Gap-filled meteorological data (2011-2020) and modeled potential evapotranspiration data from the KCOMTCRE2 WeatherUnderground weather station, from the East River Watershed, Colorado.
Riparian Restoration Using Hydrologic Manipulation and Physical Disturbance
The River Basin Model: An Overview
1974 Comparison Average Annual River Basin Water Supply
Cited By (31 times, 4 in Knowledge Hub)
Aerobic respiration controls on shale weathering
From legacy contamination to watershed systems science: a review of scientific insights and technologies developed through DOE-supported research in water and energy security
Downscaled hyper-resolution (400 m) gridded datasets of daily precipitation and temperature (2008-2019) for East Taylor subbasin (western United States)
Modeling Spatial Distribution of Snow Water Equivalent by Combining Meteorological and Satellite Data with Lidar Maps
References (47)
4 in Knowledge Hub, 43 external
