منوی کاربری
  • پشتیبانی: ۴۲۲۷۳۷۸۱ - ۰۴۱
  • سبد خرید

دانلود رایگان مقاله انگلیسی استفاده از داده های توزیع شده بر روی HBase در پلتفرم تحلیلی کلان داده ها - هینداوی 2018

عنوان فارسی
استفاده از داده های توزیع شده بر روی HBase در پلتفرم تحلیلی کلان داده ها برای خدمات بالینی
عنوان انگلیسی
Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
17
سال انتشار
2018
نشریه
هینداوی - Hindawi
فرمت مقاله انگلیسی
PDF
کد محصول
E8497
رشته های مرتبط با این مقاله
پزشکی، مهندسی کامپیوتر
گرایش های مرتبط با این مقاله
انفورماتیک پزشکی، امنیت اطلاعات و رایانش ابری
مجله
روشهای محاسباتی و ریاضی در پزشکی - Computational and Mathematical Methods in Medicine
دانشگاه
Database Integration and Management - IMIT Quality Systems - Vancouver Island Health Authority - Canada
۰.۰ (بدون امتیاز)
امتیاز دهید
چکیده

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.

بحث

4. Discussion


The ultimate goal of the study was to test the performance of the Big Data computing framework and its technical specifications cross platform against all challenges specific to its application in healthcare. This goal was accomplished by combining ADT and DAD data through ingestions over the Hadoop HDFS and the MapReduce programming framework. High performance over the BDA platform was verified with query times of less than four seconds for 3 billion patient records (regardless of complexity), showing that challenges of aggregation, maintenance, integration, data analysis, and interpretative value can be overcome by BDA platforms.


4.1. Modeling Patient Data of Hospital System. There are analytical challenges in many Canadian healthcare systems because of separated silos of aggregations. There are complex and unique variables that include “(1) information used; (2) preference of data entry; (3) services on different objects; (4) change of health regulations; (5) different supporting plans or sources; and (6) varying definition of database field names in different database systems” [45]. Big Data in healthcare can cover tens of millions or billions of patients and present unprecedented opportunities. Although data from such sources as hospital EHR systems are generally of much lower quality than data carefully collected by researchers investigating specific questions, the sheer volume of data may compensate for its qualitative deficiencies, provided that a significant pattern can be found amid the noise [14, 46]. Ultimately, it was designed not only to replicate data but to simulate the entire volume of production and archived data at VIHA, and possibly the Province of British Columbia, such that real patient data from hospitals will be approved to utilize the platform. Therefore, the messiness of the data and its influence on the simulation were not tested, although this could potentially affect accuracy and performance when querying real data.


بدون دیدگاه