دانلود رایگان مقاله انگلیسی SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی - اشپرینگر 2018

قیمت خرید این محصول

رایگان

دانلود مقاله انگلیسی سفارش ترجمه این مقاله

عنوان فارسی

SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی

عنوان انگلیسی

SemLinker: automating big data integration for casual users

صفحات مقاله فارسی

صفحات مقاله انگلیسی

سال انتشار

2018

نشریه

اشپرینگر - Springer

فرمت مقاله انگلیسی

PDF

کد محصول

E6445

رشته های مرتبط با این مقاله

کامپیوتر، فناوری اطلاعات

گرایش های مرتبط با این مقاله

داده کاوی

مجله

مجله کلان داده - Journal of Big Data

دانشگاه

School of Computer Science and Informatics - Cardif University - UK

کلمات کلیدی

ادغام داده ها، داده های بزرگ، دریاچه داده، مدل سازی، تکامل طرح، نقشه برداری نقشه ها، مدیریت فراداده

برای سفارش ترجمه این مقاله با کیفیت عالی و در کوتاه ترین زمان ممکن توسط مترجمین مجرب سایت ایران عرضه؛ روی دکمه سبز رنگ کلیک نمایید.

۰.۰ (هنوز امتیازی ثبت نشده است)

چکیده

Abstract

A data integration approach combines data from diferent sources and builds a unifed view for the users. Big data integration inherently is a complex task, and the existing approaches are either potentially limited or invariably rely on manual inputs and interposition from experts or skilled users. SemLinker, an ontology-based data integration system, is part of a metadata management framework for personal data lake (PDL), a personal store-everything architecture. PDL is for casual and unskilled users, therefore SemLinker adopts an automated data integration workfow to minimize manual input requirements. To support the fat architecture of a lake, SemLinker builds and maintains a schema metadata level without involving any physical transformation of data during integration, preserving the data in their native formats while, at the same time, allowing them to be queried and analyzed. Scalability, heterogeneity, and schema evolution are big data integration challenges that are addressed by SemLinker. Large and real-world datasets of substantial heterogeneities are used in evaluating SemLinker. The results demonstrate and confrm the integration efciency and robustness of SemLinker, especially regarding its capability in the automatic handling of data heterogeneities and schema evolutions.

نتیجه گیری

Conclusion and future work

We have presented SemLinker, an ontology-based data integration system for PDL and other similar data lake implementations. SemLinker allows casual users with limited technical background and with minimal efort, to integrate, process, and analyze heterogeneous raw data through a unifed conceptual representation of the data schemas regarding a widely used global ontology. To the best of our knowledge, SemLinker is the frst domain-agnostic integration system that ofers self-adapting capabilities to automatically integrate big data with frequently evolving schemas based on solid theoretical foundations. SemLinker has been evaluated on large datasets in multiple domains, and the results not only validate its integration efectiveness and functional efciency, but also indicate that SemLinker’s performance is robust and promising, albeit there is still room for improvement in multiple aspects of the system.

Although SemLinker is a generic integration solution, it targets only structured and semi-structured data, and it is, by no means, a holistic integration solution when unstructured data such as free-text documents and multimedia fles are also considered. For such data we have proposed, in an earlier paper [48], SemCluster, an automatic key phrase extraction tool that specializes in extracting keyphrases from free text documents and annotating each keyphrase with ontology-based metadata. One of our planned immediate undertakings is to combine SemLinker and SemCluster into a broader integration solution towards an efective and efcient metadata management framework for the personal data lake.