- مبلغ: ۸۶,۰۰۰ تومان
- مبلغ: ۹۱,۰۰۰ تومان
This study uses text and data mining to investigate the relationship between the text patterns of annual reports published by US listed companies and sales performance. Taking previous research a step further, although annual reports show only past and present financial information, analyzing text content can identify sentences or patterns that indicate the future business performance of a company. First, we examine the relation pattern between business risk factors and current business performance. For this purpose, we select companies belonging to two categories of US SIC (Standard Industry Classification) in the IT sector, 7370 and 7373, which include Twitter, Facebook, Google, Yahoo, etc. We manually collect sales and business risk information for a total of 54 companies that submitted an annual report (Form 10-K) for the last three years in these two categories. To establish a correlation between patterns of text and sales performance, four hypotheses were set and tested. To verify the hypotheses, statistical analysis of sales, statistical analysis of text sentences, sentiment analysis of sentences, clustering, dendrogram visualization, keyword extraction, and word-cloud visualization techniques are used. The results show that text length has some correlation with sales performance, and that patterns of frequently appearing words are correlated with the sales performance. However, a sentiment analysis indicates that the positive or negative tone of a report is not related to sales performance.
In this study, we apply text mining to the annual reports of US companies. The aim was to investigate whether word patterns found in selected texts were related to the business performance of the company. We test four hypotheses: hypothesis 1 postulates that category 7370 companies, which include a large number of companies engaged in SNS activities, will have a better business performance than category 7373 companies, such as Yahoo. Hypothesis 1 is verified. Hypotheses 2 through 4 are analyzed by applying text and data-mining techniques to the risk factors of annual reports. Hypothesis 2 postulates that sales performance affects text statistics such as number of sentences. There is some evidence of correlation between sales performance and text statistics, however, further research is required. Hypothesis 3 postulates that the tone of the text correlates with sales performance. Applying sentiment analysis, no correlation was found and, thus, hypothesis 4 is rejected. Hypothesis 4 postulates that word usage in the text is correlated with the sales performance, and the hypothesis is temporarily adopted. In summary, we identify a number of correlations between sales performance and the text pattern of company reports by applying text-mining technology. We expect to have more themes to be studied in the future. For hypothesis 2, better results can be expected by changing the data-processing method. Hypothesis 3 is rejected in this instance, but it is expected that better results can be obtained if the classification method of sentiment analysis is optimized for the text of annual reports. The analysis framework for hypothesis 4 needs to be designed to cover all the data, not only the highest and lowest three ranking companies in terms of sales performance. Also, if key phrases are extracted rather than the number of words, results may be more meaningful. This study provides the following conclusions. Companies with good financial performance and bad companies often use different words.