6 Conclusion and future work
Sentiment classification of text data is the starting point of transforming unstructured qualitative data into quantitative data that can be used for decision making in e-commerce. Therefore sentiment classification has attracted large amount of attention from various research areas including computational intelligence, machine learning, and computational linguistics. Existing studies, however, do not paid much attention to the role of linguistic properties of datasets used in sentiment classification but instead concentrate on proposing more sophisticated and complex algorithms to improve the performance. The findings of this study suggest that researchers and practitioners need to consider the properties of datasets they have when they choose a sentiment classification algorithm.
The findings also support the contention that appropriate control of training datasets and algorithms that match to the datasets is as important as finding a sophisticated algorithm. In this regard, the study proposes practitioners and scholars with guidance on applying different sentiment classification algorithms. The study shows that the performance of classification can be improved by controlling data properties of documents in training datasets.