Date of Award


Degree Name

Master of Science


Computer Science

First Advisor

Rahimi, Shahram


In recent decades, the rapid development of information technology in the big data field has introduced new opportunities to explore a large amount of data available online. The Global Database of Events, Location (Language), and Tone (GDELT) is the largest, most comprehensive, and highest resolution open source database of human society that includes more than 440 million entries capturing information about events that have been covered by local, national, and international news sources since 1979 in over 100 languages. GDELT constructs a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what is happening around the world, what its context is and who is involved, and how the world is feeling about it, every single day. On the other hand, the stock market prediction has also been a long-time attractive topic and is extensively studied by researchers in different fields with numerous studies of the correlation between stock market fluctuations and different data sources derived from the historical data of world major stock indices or external information from social media and news. Support Vector Machine (SVM) and Logistic Regression are two of the most widely used machine learning techniques in recent studies. The main objective of this research project is to investigate the worthiness of information derived from GDELT project in improving the accuracy of stock market trend prediction specifically for the next days' price changes. This research is based on data sets of events from GDELT database and daily prices of Bitcoin and some other stock market companies and indices from Yahoo Finance, all from March 2015 to May 2017. Then multiple different machine learning and specifically classification algorithms are applied to data sets generated, first using only features derived from historical market prices and then including more features derived from external sources, in this case, GDELT. Then the performance is evaluated for each model over a range of parameters. Finally, experimental results show that using information gained from GDELT has a direct positive impact on improving the prediction accuracy. Keywords: Machine Learning, Stock Market, GDELT, Big Data, Data Mining




This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.