Date of Award


Degree Name

Master of Science


Plant and Soil Science

First Advisor

Choudhary, Ruplal


There is an increasing need of automation for routine tasks like sorting agricultural produce in large scale post-harvest processing. Among different kinds of sensors used for such automation tasks, near-infrared (NIR) technology provides a rapid and effective solution for quantitative analysis of quality indices in food products. As industries and farms are adopting modern data-driven technologies, there is a need for evaluation of the modelling tools to find the optimal solutions for problem solving. This study aims to understand the process of evaluation of the modelling tools, in view of near-infrared data obtained from green leafy vegetables. The first part of this study deals with prediction of the type of leafy green vegetable from the near-infrared reflectance spectra non-destructively taken from the leaf surface. Supervised classification methods used for the classification task were k-nearest neighbors (KNN), support vector machines (SVM), linear discriminant analysis (LDA) classifier, regularized discriminant analysis (RDA) classifier, naïve Bayes classifier, bagged trees, random forests, and ensemble discriminant subspace classifier. The second part of this study deals with prediction of total glucosinolate and total polyphenol contents in leaves using Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR). Optimal combination of predictors were chosen by using recursive feature elimination. NIR spectra taken from 283 different samples were used for classification task. Accuracy rates of tuned classifiers were compared for a standard test set. The ensemble discriminant subspace classifier was found to yield the highest accuracy rates (89.41%) for the standard test set. Classifiers were also compared in terms of accuracy rates and F1 scores. Learning rates of classifiers were compared with cross-validation accuracy rates for different proportions of dataset. Ensemble subspace discriminants, SVM, LDA and KNN were found to be similar in their cross-validation accuracy rates for different proportions of data. NIR spectra as well as reference values for total polyphenol content and total glucosinolate contents were taken from 40 samples for each analyses. PLSR model for total glucosinolate prediction built with spectra treated with Savitzky-Golay second derivative yielded a RMSECV of 0.67 μmol/g of fresh weight and cross-validation R2 value of 0.63. Similarly, PLSR model built with spectra treated with Savitzky-Golay first derivative yielded a RMSECV of 6.56 Gallic Acid Equivalent (GAE) mg/100g of fresh weight and cross-validation R-squared value of 0.74. Feature selection for total polyphenol prediction suggested that the region of NIR between 1300 - 1600 nm might contain important information about total polyphenol content in the green leaves.




This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.