- مبلغ: ۸۶,۰۰۰ تومان
- مبلغ: ۹۱,۰۰۰ تومان
A data integration approach combines data from diferent sources and builds a unifed view for the users. Big data integration inherently is a complex task, and the existing approaches are either potentially limited or invariably rely on manual inputs and interposition from experts or skilled users. SemLinker, an ontology-based data integration system, is part of a metadata management framework for personal data lake (PDL), a personal store-everything architecture. PDL is for casual and unskilled users, therefore SemLinker adopts an automated data integration workfow to minimize manual input requirements. To support the fat architecture of a lake, SemLinker builds and maintains a schema metadata level without involving any physical transformation of data during integration, preserving the data in their native formats while, at the same time, allowing them to be queried and analyzed. Scalability, heterogeneity, and schema evolution are big data integration challenges that are addressed by SemLinker. Large and real-world datasets of substantial heterogeneities are used in evaluating SemLinker. The results demonstrate and confrm the integration efciency and robustness of SemLinker, especially regarding its capability in the automatic handling of data heterogeneities and schema evolutions.
Conclusion and future work
We have presented SemLinker, an ontology-based data integration system for PDL and other similar data lake implementations. SemLinker allows casual users with limited technical background and with minimal efort, to integrate, process, and analyze heterogeneous raw data through a unifed conceptual representation of the data schemas regarding a widely used global ontology. To the best of our knowledge, SemLinker is the frst domain-agnostic integration system that ofers self-adapting capabilities to automatically integrate big data with frequently evolving schemas based on solid theoretical foundations. SemLinker has been evaluated on large datasets in multiple domains, and the results not only validate its integration efectiveness and functional efciency, but also indicate that SemLinker’s performance is robust and promising, albeit there is still room for improvement in multiple aspects of the system.
Although SemLinker is a generic integration solution, it targets only structured and semi-structured data, and it is, by no means, a holistic integration solution when unstructured data such as free-text documents and multimedia fles are also considered. For such data we have proposed, in an earlier paper , SemCluster, an automatic key phrase extraction tool that specializes in extracting keyphrases from free text documents and annotating each keyphrase with ontology-based metadata. One of our planned immediate undertakings is to combine SemLinker and SemCluster into a broader integration solution towards an efective and efcient metadata management framework for the personal data lake.