Date of Award


Degree Name

Doctor of Philosophy


Computer Science

First Advisor

Che, Dunren


In massive open online courses (MOOCs), a pressing need for an efficient automated approach of identifying keyphrases from MOOC video lectures has emerged. Because of the linear structure of MOOCs and the linear way in navigating the content of MOOCs, learners have difficulty to know the main knowledge addressed in MOOC video lectures and spend too much time navigating among to find the right content matching their learning goals. A feasible solution is automatic provision of keyphrases associated with MOOC video lectures that can help learners quickly identify a suitable knowledge and efficiently navigate to desired parts of MOOC video lectures without spending too much time to expedite their learning process. Keyphrases in MOOCs demonstrate three unique features: (1) low-frequency occurrence, (2) advanced scientific or technical concepts, and (3) late occurrence. Existing approaches to automatic keyphrases extraction (either supervised or unsupervised) do not consider these unique features, causing them to produce unsatisfactory performance when utilized to extract keyphrases from MOOC video lectures. In this dissertation, we propose $SemKeyphrase$, an unsupervised cluster-based approach for keyphrase extraction from MOOC video lectures. $SemKeyphrase$ incorporates a new semantic relatedness method and ranking algorithm, called $PhraseRank$. The proposed semantic relatedness method incorporates a novel metric that combines two scores ($WSem$ and $CSem$) to efficiently compute the semantic relatedness between candidate keyphrases in MOOCs. The $PhraseRank$ algorithm involves two phases when ranking candidate keyphrases: ranking clusters and reranking top candidate keyphrases. The first phase of $PhraseRank$ leverages the semantic relatedness of candidate keyphrases with regard to the subtopics of a MOOC video lecture to measure the importance of candidate keyphrases, which are further used to rank clusters of candidate keyphrases. Top candidate keyphrases from top-ranked clusters are then determined by a proposed selection strategy. The second phase of $PhraseRank$ reranks the top candidate keyphrases using a new ranking criterion and generates ranked top-K keyphrases as the final output. Experiment results on a real-world dataset of MOOC video lectures show that $SemKeyphrase$ outperforms other state-of-the-art methods.




This dissertation is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.