دانلود رایگان مقاله انگلیسی SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی - اشپرینگر 2018

عنوان فارسی
SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی
عنوان انگلیسی
SemLinker: automating big data integration for casual users
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
26
سال انتشار
2018
نشریه
اشپرینگر - Springer
فرمت مقاله انگلیسی
PDF
کد محصول
E6445
رشته های مرتبط با این مقاله
کامپیوتر، فناوری اطلاعات
گرایش های مرتبط با این مقاله
داده کاوی
مجله
مجله کلان داده - Journal of Big Data
دانشگاه
School of Computer Science and Informatics - Cardif University - UK
کلمات کلیدی
ادغام داده ها، داده های بزرگ، دریاچه داده، مدل سازی، تکامل طرح، نقشه برداری نقشه ها، مدیریت فراداده
چکیده

Abstract


A data integration approach combines data from diferent sources and builds a unifed view for the users. Big data integration inherently is a complex task, and the existing approaches are either potentially limited or invariably rely on manual inputs and interposition from experts or skilled users. SemLinker, an ontology-based data integration system, is part of a metadata management framework for personal data lake (PDL), a personal store-everything architecture. PDL is for casual and unskilled users, therefore SemLinker adopts an automated data integration workfow to minimize manual input requirements. To support the fat architecture of a lake, SemLinker builds and maintains a schema metadata level without involving any physical transformation of data during integration, preserving the data in their native formats while, at the same time, allowing them to be queried and analyzed. Scalability, heterogeneity, and schema evolution are big data integration challenges that are addressed by SemLinker. Large and real-world datasets of substantial heterogeneities are used in evaluating SemLinker. The results demonstrate and confrm the integration efciency and robustness of SemLinker, especially regarding its capability in the automatic handling of data heterogeneities and schema evolutions.

نتیجه گیری

Conclusion and future work


We have presented SemLinker, an ontology-based data integration system for PDL and other similar data lake implementations. SemLinker allows casual users with limited technical background and with minimal efort, to integrate, process, and analyze heterogeneous raw data through a unifed conceptual representation of the data schemas regarding a widely used global ontology. To the best of our knowledge, SemLinker is the frst domain-agnostic integration system that ofers self-adapting capabilities to automatically integrate big data with frequently evolving schemas based on solid theoretical foundations. SemLinker has been evaluated on large datasets in multiple domains, and the results not only validate its integration efectiveness and functional efciency, but also indicate that SemLinker’s performance is robust and promising, albeit there is still room for improvement in multiple aspects of the system.


Although SemLinker is a generic integration solution, it targets only structured and semi-structured data, and it is, by no means, a holistic integration solution when unstructured data such as free-text documents and multimedia fles are also considered. For such data we have proposed, in an earlier paper [48], SemCluster, an automatic key phrase extraction tool that specializes in extracting keyphrases from free text documents and annotating each keyphrase with ontology-based metadata. One of our planned immediate undertakings is to combine SemLinker and SemCluster into a broader integration solution towards an efective and efcient metadata management framework for the personal data lake.


بدون دیدگاه