7. Conclusion and future work
For small startup investment companies, due to limited funds, it is impossible to trade in the stock market frequently. Instead, they are interested in moderate investment periods that last a week to three months. To address the prediction of the stock price trend in such periods, this paper proposes a novel data-driven system Xuanwu. The system gets through all machine learning processes from generating training samples from the original transaction data to building the prediction models without any human intervene. It first uses a sliding window method to cut the historical transaction data of each stock into multiple Clips whose length equals to a predefined prediction duration. Then, according the shapes that the close prices of these Clips appear, it utilizes an unsupervised heuristic algorithm to classify them into four main classes: Up, Down, Flat, and Unknown. For the Clips belonging to classes Up and Down, they are further classified into different levels which can reflect the extents of their growth and decline rates with respect to both absolute close price and relative return rate. The training sets are derived from these Clips by sampling different classes of samples for imbalanced class distribution. Finally, learning models are trained from these training sets with or without feature selection.