Abstract
Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble.
1 Introduction
Historically, high market prices often make the investors despondent from investing, while low market prices represent an opportunity. Predicting stock market price, therefore, becomes imperative for investors to yield a significant profit. Though predicting the financial markets and the stock movements is onerous [1], many researchers from different fields have scrutinized and used many algorithms and different combination of attributes to predict the market movements. But these algorithms are all on the basis of stock price itself which has random property.
5 Conclusion and Future Scope
The outcome of this research concludes that the machine learning algorithms can be used to predict the increase or decrease in the stock market performance. It verifies the dependency of BSE on the factors taken in the study. Our findings confirm that the dependency of BSE is highest on the gold rate, since the correlation factor is highest. Also, the correlation factor is lowest for silver rate, showing least dependency of BSE on it. Of all the machine learning algorithms used, AdaBoost shows the highest accuracy of 76.79% for 70% training data and 75% for untrained data. There is still a scope of improvement in this project. The project can be further extended to include additional variables such as interest policy, political, and economic reforms to get more accurate results.