دانلود رایگان مقاله انگلیسی lexiDB: مجذور مقیاس پذیر سیستم مدیریت پایگاه داده - IEEE 2017

قیمت خرید این محصول

رایگان

دانلود مقاله انگلیسی سفارش ترجمه این مقاله

عنوان فارسی

lexiDB: مجذور مقیاس پذیر سیستم مدیریت پایگاه داده

عنوان انگلیسی

lexiDB: A Scalable Corpus Database Management System

صفحات مقاله فارسی

صفحات مقاله انگلیسی

سال انتشار

2017

نشریه

آی تریپل ای - IEEE

فرمت مقاله انگلیسی

PDF

کد محصول

E7296

رشته های مرتبط با این مقاله

مهندسی کامپیوتر

گرایش های مرتبط با این مقاله

نرم افزار

مجله

کنفرانس بین المللی کلان داده - International Conference on Big Data

برای سفارش ترجمه این مقاله با کیفیت عالی و در کوتاه ترین زمان ممکن توسط مترجمین مجرب سایت ایران عرضه؛ روی دکمه سبز رنگ کلیک نمایید.

۰.۰ (هنوز امتیازی ثبت نشده است)

چکیده

Abstract

lexiDB is a scalable corpus database management system designed to fulfill corpus linguistics retrieval queries on multi-billion-word multiply-annotated corpora. It is based on a distributed architecture that allows the system to scale out to support ever larger text collections. This paper presents an overview of the architecture behind lexiDB as well as a demonstration of its functionality. We present lexiDB’s performance metrics based on the AWS (Amazon Web Services) infrastructure with two part-ofspeech and semantically tagged billion word corpora: Historical Hansard and EEBO (Early English Books Online).

نتیجه گیری

V. CONCLUSION AND FURTHER WORK

In this paper, we have presented lexiDB, a new scalable corpus database management system designed specifically to support the indexing of text corpora and retrieval using the main methods employed in corpus linguistics. While other software achieves conceptually similar scalability, e.g. SketchEngine via virtualisation [4] and KorAP with Lucene/Solr integration, we believe that lexiDB is the first corpus database management system with in-built scalability via a distributed architecture. A key point to note about lexiDB is its fast data ingest time for extremely large scale annotated corpora. Normally this has to be traded off against fast retrieval time, but we believe that corpus linguists need both capabilities to deal with corpus updates in, for example, social network analysis where new data needs to be added regularly. Fast indexing also helps to reduce the time overhead between experiments, in other words if we improve the accuracy of automatic annotation and retag our corpus then we do not need to wait for 24 hours to complete the re-indexing before we can start obtaining updated results. lexiDB therefore addresses issues of ‘velocity’ in corpus databases and in turn, enables support for greater ‘volume’ and ‘variety’ in corpora indexed in the system. We have demonstrated the capabilities of lexiDB through evaluation on two multiply-annotated corpora of the scale of one billion words and shown the extremely fast retrieval times for the most frequent words. In addition, due to the distributed design by adding more nodes, lexiDB is able to scale to even larger corpora and we will report on these results in further papers.