9. Conclusion
It is common to confront with practical input without demographical background, normally generating simple calculation and resulting in data waste. For such anonymous data, we concluded that the RFM model on Spark is a viable methodology for exploring relationships among users with the analysis of time series, while MCA is ready to prune results from the RFM model and build interaction relationships among multiple characteristics.
We propose a statistic-based approach to value latent users via time series segmenting time interval of RFM in a large–scale data set. Using time series analysis, We explored user relationships on coherent time, and we utilized the Spark platform to target users and discover quantitative relationships in the RFM model. Adjusted with k-means method, the clustering results on the three dimensions of the RFM model performed better than clustering on each interval. We leveraged MCA to correspond with multiple qualitative characteristics with quantitate results after the RFM model.
Currently, we are working on a prototype to demonstrate the changing tendency of user behavior. In the future, we will extend the time range of telecom service data to one week, to obtain a more comprehensive analysis results. Furthermore, we can conduct a prediction model and recommender model to improve our CRM analysis.