Date of Award

9-1-2021

Degree Name

Master of Science

Department

Computer Science

First Advisor

Huang, Chun-Hsi

Abstract

Genotype data, consisting large numbers of markers, is used as demographic and association studies to determine genes related to specific traits or diseases. Handling of these datasets usually takes a significant amount of time in its application of population structure inference. Therefore, we suggested applying PCA on genotyped data and then clustering algorithms to specify the individuals to their particular subpopulations. We collected both real and simulated datasets in this study. We studied PCA and selected significant features, then applied five different clustering techniques to obtain better results. Furthermore, we studied three different methods for predicting the optimal number of subpopulations in a collected dataset. The results of four different simulated datasets and two real human genotype datasets show that our approach performs well in the inference of population structure. NbClust is more effective to infer subpopulations in the population. In this study, we showed that centroid-based clustering: such as k-means and PAM, performs better than model-based, spectral, and hierarchical clustering algorithms. This approach also has the benefit of being fast and flexible in the inference of population structure.

Download

COinS

Access

This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.

OpenSIUC

Theses

POPULATION STRUCTURE INFERENCE USING PCA AND CLUSTERING ALGORITHMS

Date of Award

Degree Name

Department

First Advisor

Abstract

Access

Links

Browse

Author Corner

OpenSIUC

Theses

POPULATION STRUCTURE INFERENCE USING PCA AND CLUSTERING ALGORITHMS

Author

Date of Award

Degree Name

Department

First Advisor

Abstract

Share

Access

Links

Browse

Author Corner