دانلود رایگان مقاله انگلیسی وب کاوی چند معیاری با DRSA - الزویر 2016

عنوان فارسی
وب کاوی چند معیاری با DRSA
عنوان انگلیسی
Multi-criteria web mining with DRSA
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
10
سال انتشار
2016
نشریه
الزویر - Elsevier
فرمت مقاله انگلیسی
PDF
کد محصول
E7073
رشته های مرتبط با این مقاله
مهندسی کامپیوتر، فناوری اطلاعات
گرایش های مرتبط با این مقاله
مدیریت فناوری اطلاعات، نرم افزار
مجله
پروسه علوم کامپیوتر - Procedia Computer Science
دانشگاه
System Analyst - Brazilian Development Bank (BNDES) - Av. República do Chile - Brazil
کلمات کلیدی
اصل حاکمیت، نظریه مجموعه Rough، تحلیل چند معیاره، وب کاوی
چکیده

Abstract


This study demonstrates the application of the Dominance principle to a particular case of web (World Wide Web) content search under Multi-criteria approach: searching for "Rio de Janeiro" (City and/or State, in Brazil) followed by other attributes (or criteria). It is known that depending on the content of research that is carried out through a "seeker" ("search engine") on the Internet, the result may fall short of the desirable, in terms of quantity and quality of the sites returned. The Dominance principle, subsequent to treatment of the collected information (unstructured data) on the Internet, aimed at revealing patterns (or logical rules) on a set of information and showed how a web content search can become more effective at a significant universe of information. Other techniques and tools have been applied to mining content on the Web, and as shown in this study. The choice of the Dominance principle associated to Rough Set Theory as Multi-criteria decision technique is due to the possibility of inaccurate data (inconsistent) and the need for treatment of these inaccuracies when processing an information system (data table) under a mathematical perspective, and do not need a history of these data. The use of Rough Set Theory and the Dominance principle associated with the probabilistic relationship between conditions and decisions in decision algorithms, is showed by the possibility of there being uncertain data to yield an essential set of effectively consistent information.

نتیجه گیری

5. Conclusions and recommendations for future work


In the context of this study, the search for "Rio de Janeiro" followed by eight other words, considered "condition criteria”, exemplified a case of web content search. Adding condition criteria made it possible to obtain a more effective result and restricted. But still, the amount of URLs returned is significant (approximately 468,000). How to make the search results more effective? From the unstructured data that were returned by the search engine, it has become feasible to draw up a table with structured data, through the lifting of the citation frequency of condition criteria for each referenced URL summary. At this table, it was associated with a decision class ("information class"), where it was possible to expand it to a "decision table". Subsequently, the decision table associated to Dominance principle, which allow extracting "patterns" (or rules) and hence add information to "ranking" of URLs. In this case, a "core" of suggested condition criteria emphasized the importance in highlighting that subset of criteria that are essential to the information system (decision table) in the study, which could not be eliminated without impact (negative) to the system [8]. Of the 96 relevant URLs suggested by the search engine (“Google”), it is observed that the best positioned URLs do not always return the desired information – ex, the site referring to the URL “21” (www.riodejaneironow.com/cultura.htm) suggested by Rule “1”, shows as much as or more information about “Rio de Janeiro” than the site referring to the URL “1” (vejario.abril.com.br/materia/eventos/programacao-450-anos-rio). About the significant URLs in the form of "ranking", the search engine according to its own criteria, exemplified in these cases, as it may become costly to attempt to analyze manually, a considerable mass of unstructured text. Thus, the logical rules generated based on a "decision table", allowed reveal patterns on the set of URLs returned by the search engine, however the existence of other tools and decision support techniques on "web mining" and in particular under uncertainties – ex, "document clustering” and "web mining soft" [15], [16]; “rough association rules” [17]; “rough-fuzzy” and “rough-wavelet” [18].


بدون دیدگاه