Date of Award


Degree Name

Master of Science


Computer Science

First Advisor

Carver, Dr. Norman


ABSTRACT MAJID MEMARI, for the Masters of Science degree in Computer Science, presented on November 3rd, 2017 at Southern Illinois University, Carbondale, IL. Title: PREDICTING THE STOCK MARKET USING NEWS SENTIMENT ANALYSIS Major Professor: Dr. Norman Carver Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. GDELT is the largest, most comprehensive, and highest resolution open database ever created. It is a platform that monitors the world's news media from nearly every corner of every country in print, broadcast, and web formats, in over 100 languages, every moment of every day that stretches all the way back to January 1st, 1979, and updates daily [1]. Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yield significant profit. The efficient-market hypothesis suggests that stock prices reflect all currently available information and any price changes that are not based on newly revealed information thus are inherently unpredictable [2]. On the other hand, other studies show that it is predictable. The stock market prediction has been a long-time attractive topic and is extensively studied by researchers in different fields with numerous studies of the correlation between stock market fluctuations and different data sources derived from the historical data of world major stock indices or external information from social media and news [6]. The main objective of this research is to investigate the accuracy of predicting the unseen prices of the Dow Jones Industrial Average using information derived from GDELT database. Dow Jones Industrial Average (DJIA) is a stock market index, and one of several indices created by Wall Street Journal editor and Dow Jones & Company co-founder Charles Dow. This research is based on data sets of events from GDELT database and daily prices of the DJI from Yahoo Finance, all from March 2015 to October 2017. First, multiple different classification machine learning models are applied to the generated datasets and then also applied to multiple different Ensemble methods. In statistics and machine learning, Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Afterwards, performances are evaluated for each model using the optimized parameters. Finally, experimental results show that using Ensemble methods has a significant (positive) impact on improving the prediction accuracy. Keywords: Big Data, GDELT, Stock Market, Prediction, Dow Jones Index, Machine Learning, Ensemble Methods




This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.