Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Potter-McIntyre, Sally


Machine learning algorithms can be used to analyze large datasets and to identify relationships and patterns that otherwise might be missed by more traditional scientific and statistical approaches. The aim of this study is to evaluate the ability of machine learning algorithms to classify mineral systems and provide insights into the geological processes operating on Earth. This study examines the potential of machine learning algorithms as interpretive tools for the identification of geological processes and additional approaches are implemented to predict how geological processes may have evolved at tourmaline-bearing localities in the United States. Tourmaline mineral occurrence data for localities in the United States were retrieved from mineral databases and exploratory machine learning algorithms, such as market basket analysis and hierarchical clustering, were used to identify geological and geochemical processes. Common geological processes operating in sedimentary, igneous, metamorphic, and hydrothermal systems were all identified based on the presence of diagnostic mineral assemblages such as actinolite-wollastonite-dravite in metamorphic rocks or microcline-schorl-beryl in igneous deposits. Several different iterations of supervised machine learning algorithms were used with models incorporating different combinations of mineral occurrence data, environmental data, and geological process labels in order to learn how to predict the geologic evolution of tourmaline-bearing localities. A test dataset was generated by selecting different locations within the United States randomly and mineralogy was assigned to each site by using interpolation methods. Decision tree and random forest algorithms were both then used to classify the randomly generated test dataset. Cross-validation approaches show that the decision trees likely performed better when classifying the test dataset. The results discussed throughout this study highlight how machine learning algorithms can be very effective and accurate supplementary tools when characterizing tourmaline-bearing deposits. The models discussed in this paper were able to classify different geological processes with over ~90% accuracy and they were able to predict how geological processes evolved at different tourmaline-bearing localities with an estimated ~70% accuracy. The most accurate classification of tourmaline-bearing localities occurred when analyzing deposits that were subjected to higher temperatures and pressures which in turn generates more distinct mineralogies that allow machine learning algorithms to identify patterns with greater confidence. The analysis of tourmaline localities associated with low-temperature hydrothermal and sedimentary environments results in much more error-prone classifiers which can be attributed to a lack of tourmaline-bearing sedimentary deposits in mineral databases and because sedimentary deposits can have a record of processes from multiple geologic environments that may or may not be related. The strengths and limitations of the models trained are detailed throughout this paper.




This dissertation is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.