7. Conclusion
In this paper, a framework of speech emotion recognition from feature extraction to emotion classification was proposed according to the fact that speech emotion recognition is difficult to be applied in human-computer interaction, in which the feature selection method based on correlation analysis and Fisher criterion, and the ELM decision tree recognition method based on confusion degree among basic emotions were proposed. And the validity of the proposal was verified through a series of contrast experiments respectively, which contained four groups of experiment under different experimental condition. The ELM is more suitable for the decision tree algorithm through the experimental comparison of recognition rate or recognition time under different experimental condition. And the utility of the feature selection based on correlation analysis and the Fisher criterion was fully verified through the experimental comparison of recognition rate under different emotional feature set.
In future research, some aspects of the experiment would be improved. Firstly, some novel speech emotional features that contain sufficient emotional information such as Teager energy operator feature [44] would be extracted. In addition, other feature extraction methods, e.g., deep learning [45], would be adopted in our future research. Secondly, we will be studying on feature selection based on evolutionary computation, by which the information of emotional labels in feature set can be fully utilized. Thirdly, different speech databases could be taken into account to verify the practicability of the proposed method.