7 Conclusions
With the ubiquitous of the social web, there has been an explosive growth of usercontributed comments. Meanwhile, there has also been a growing concern about the wide spread of social spam embedded in user-contributed comments. Given the big volume of user-contributed comments on SMSs, there is a pressing need to develop novel methodologies and techniques to tackle social spam.
Previous studies use various features (e.g., user-, text, graph-, and social networkrelated attributes) and classification algorithms (e.g., Naı¨ve Bayesian and Bayesian Network) to design frameworks for detecting social spam on SMSs (e.g., Facebook, Twitter, Sina Weibo, Myspace, YouTube, and Flickr). However, to the best of our knowledge, previous studies have not exploited both probabilistic topic modeling and incremental learning for detecting social spam on SMSs. Thus, the main contributions of our research are the design and evaluation of a novel social spam methodology which is underpinned by the L-LDA model and incremental learning. More specifically, we exploit word-, topic-, and user-based features to better represent social spam and leverage incremental classifiers, such as SVM, logistic regression, perceptron, ROMMA, to enhance spam detection performance. Based on several millions of user comments posted to YouTube, our experimental results show that the proposed methodology can achieve an average accuracy of 91.17 % and an average F1-measure of 78.43 %, respectively. According to our paired t-tests, topic-based features improve the overall accuracy and precision. However, they may hurt the recall of spam detection. In contrast, user-based features enhance the recall of spam detection, but it may hurt precision.