دانلود رایگان مقاله انگلیسی کاوش مدارک وب هندی دو زبانه - الزویر 2016

قیمت خرید این محصول

رایگان

دانلود مقاله انگلیسی سفارش ترجمه این مقاله

عنوان فارسی

کاوش مدارک وب هندی دو زبانه

عنوان انگلیسی

Mining of Bilingual Indian Web Documents

صفحات مقاله فارسی

صفحات مقاله انگلیسی

سال انتشار

2016

نشریه

الزویر - Elsevier

فرمت مقاله انگلیسی

PDF

کد محصول

E7072

رشته های مرتبط با این مقاله

مهندسی کامپیوتر

گرایش های مرتبط با این مقاله

مدیریت فناوری اطلاعات، نرم افزار

مجله

پروسه علوم کامپیوتر - Procedia Computer Science

دانشگاه

Chirala Engineering College - Chirala - India

کلمات کلیدی

صفت؛ دو زبانه؛ طبقه بندی؛ استخراج محتوا؛ معدن؛ رویکرد مبتنی بر پیکسل؛ واکسل

برای سفارش ترجمه این مقاله با کیفیت عالی و در کوتاه ترین زمان ممکن توسط مترجمین مجرب سایت ایران عرضه؛ روی دکمه سبز رنگ کلیک نمایید.

۰.۰ (هنوز امتیازی ثبت نشده است)

چکیده

Abstract

Web and mobile communication are growing in popularity globally and regionally catering to different ways of information dissemination, rendering complex web documents having script, language and media content embedded into them. Thus information extraction from different web documents in the modern day scenario is becoming a real challenge, as one has to cater to format and script variations in documented form and media variations in soft-web form. This has become very relevant in Indian education scenario, where bilingual and multi-lingual communication and web documents through on-line courses, are considered. When regional native dialect comes into picture, another dimension of complexity is added. The present paper focuses on content extraction of such documents through a generic approach using pixel-based approach and mining through classification. Indian bilingual web documents are considered and attribute generation is done through reducing the pixel matrix. Five different attributes were identified and studied. A clear state of art comparison between trained dataset and test dataset is given. The results give reasonable content extraction with good accuracy of the datasets studied.

نتیجه گیری

5. Conclusions

In observing carefully modern online and offline web pages and files, it gave rise to an urgent requirement of generic and na¨ıve strategy to handle documents like structured, semi-structured, unstructured, hybrid, heterogeneous and having multi-tasking and multi-lingual features. So, a method using pixel-map manipulation to extract content from Indian regional web documents is developed. This method is tested with other Indian and foreign native language words to form a more elaborate base set. To assess the similarity between trained and tested datasets, number of new datasets with new words was identified and tested using our present algorithm. More analysis on new strategies and algorithms is under progress. A detailed state-of-art analysis can be done with neural network15 and cluster analysis. A comparison of statistical, neural, pattern matching algorithms will give better analysis of this generic approach.