Date of Award


Degree Name

Master of Science


Geography and Environmental Resources

First Advisor

Li, Ruopu


Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentration in water samples collected from lakes in Illinois as an indirect measurement of HABs. The major objectives were to assess the spatiotemporal pattern of HABs over Illinois regions in recent decades, and to examine different machine learning models for predicting the Chl-a concentration based on publicly available water quality datasets. The Chl-a dataset was compiled from two different sources, the regular monitoring program by Illinois Environmental Protection Agency (IEPA) and the Voluntary Lake Monitoring Program (VLMP), the latter of which was primarily collected by citizen participants. Seven environmental and water quality zones were selected for spatial analyses. Additionally, the temporal patterns were assessed using time-series decomposition of monthly Chl-a concentration datasets. The machine learning pipeline includes two tasks: a regression modeling task for predicting Chl-a concentration, and a classification task for estimating lake trophic status. Different meteorological, land use and land cover, and lake morphometry variables were used as independent variables. Four regression models, i.e., Partial Least Squares Regression (PLSR), Support Vector Machine Regression (SVR), Artificial Neural Network Regression (ANNR), and Random Forest Regression (RFR) were used for the first task of the modeling pipeline, and four classification models, i.e., Logistic Regression Classification (LRC), Support Vector Machine Classification (SVC), Artificial Neural Network Classification (ANNC), and Random Forest Classification (RFC), were used for the second task. Results indicate that: a) the Collinsville region in southwestern part of Illinois exhibited higher mean concentration of Chl-a in its lakes than any other regions from 1998 to 2018; b) the lakes that showed increasing trends in their monthly mean Chl-a concentrations were also clustered in the southwestern region; c) Random Forest outperformed all other models in both classification (Accuracy=60.06%) and regression (R2=38.88%); and d) the land use and land cover variables were found as the most important set of variables in Random Forest models.

Available for download on Wednesday, September 21, 2022




This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.