دانلود رایگان مقاله بررسی کلان داده قابل فهم

عنوان فارسی
کلان داده قابل فهم: بررسی
عنوان انگلیسی
Understandable Big Data: A survey
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
12
سال انتشار
2016
نشریه
الزویر - Elsevier
فرمت مقاله انگلیسی
PDF
کد محصول
E3213
رشته های مرتبط با این مقاله
مهندسی کامپیوتر
گرایش های مرتبط با این مقاله
نرم افزار
مجله
بررسی علوم کامپیوتر - Computer Science Review
دانشگاه
فرانسه
کلمات کلیدی
کلان داده، هادوپ، استدلال، نهاد ارتباط، استخراج اطلاعات، تراز هستی شناسی
چکیده

Abstract


This survey presents the concept of Big Data. Firstly, a definition and the features of Big Data are given. Secondly, the different steps for Big Data data processing and the main problems encountered in big data management are described. Next, a general overview of an architecture for handling it is depicted. Then, the problem of merging Big Data architecture in an already existing information system is discussed. Finally this survey tackles semantics (reasoning, coreference resolution, entity linking, information extraction, consolidation, paraphrase resolution, ontology alignment) in the Big Data context.

نتیجه گیری

6. Conclusion


We are living in the era of data deluge. The term Big Data had been coined to describe this age. This paper defines and characterizes the concept of Big Data. It gives a definition of this new concept and its characteristics. In addition, a supply chain and technologies for Big Data management are presented. During that management, many problems can be encountered, especially during semantic gathering. Thus it tackles semantics (reasoning, coreference resolution, entity linking, information extraction, consolidation, paraphrase resolution, ontology alignment) with a zoom on “V’s”. It concludes that volume is the most tackled aspect and many works leverage Hadoop MapReduce to deal with volume [21,40,41,22]. More and more, unlike velocity, web and social media informality and uncertainty are addressed by scientists. We see that uncertainty can be handled manually (Ripple Down Rules [44]) or automatically (identification and/or isolation of inconsistencies [88]). About velocity, gazetteers and knowledge bases must be continually updated [88,45] and data processed periodically [43,42]. Similarly if we want to tackle variety, we must deal with various data formats (tweets in [45,46,88] and natural language texts [47,80,62,76]) and distributed data [38,39]. As [13] said, Big Data must be addressed jointly and on each axis to make significant improvement in its management.


بدون دیدگاه