دانلود رایگان مقاله انگلیسی دوره تحلیل کلان داده - الزویر 2018

عنوان فارسی
دوره تحلیل کلان داده
عنوان انگلیسی
A course on big data analytics
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
30
سال انتشار
2018
نشریه
الزویر - Elsevier
فرمت مقاله انگلیسی
PDF
کد محصول
E8250
رشته های مرتبط با این مقاله
مهندسی کامپبوتر
گرایش های مرتبط با این مقاله
رایانش ابری
مجله
مجله محاسبات موازی و توزیع شده - Journal of Parallel and Distributed Computing
دانشگاه
Stetson University - DeLand - Florida
کلمات کلیدی
برنامه درسی، تحصیلات تکمیلی، داده های بزرگ، محاسبات ابری
چکیده

Abstract


This report details a course on big data analytics designed for undergraduate junior and senior computer science students. The course is heavily focused on projects and writing code for big data processing. It is designed to help students learn parallel and distributed computing frameworks and techniques commonly used in industry. The curriculum includes a progression of projects requiring increasingly sophisticated big data processing ranging from data preprocessing with Linux tools, distributed processing with Hadoop MapReduce and Spark, and database queries with Hive and Google’s BigQuery. We discuss hardware infrastructure and experimentally evaluate the cost/benefit of an on-premise server versus Amazon’s Elastic MapReduce. Finally, we showcase outcomes of our course in terms of student engagement and anonymous student feedback.

نتیجه گیری

7. Conclusion


This report detailed a big data analytics course that serves as an elective course for upper-level computer scientists at Stetson University. The course is project-oriented and engages students with realistic, hands-on practice using modern big data tools and techniques. Different options for supporting hardware infrastructure were explored and experimentally evaluated. Student feedback and academic and professional outcomes conclusively show that the course is a success and the high-level learning objectives were met.


A course like the one described here is necessarily continuously evolving. New technologies are introduced while others go out of favor. For example, in the first offering of this course, we did not cover Spark. Now, such an omission is unjustifiable. Likewise, we believe it is important for the projects to stay relevant and timely. As new big datasets are made available, projects should be updated to make use of those datasets. For example, at the time of writing, the NYC Yellow Taxi dataset is well known and several blog posts have been authored detailing different ways to analyze the data. The novelty of a NYC taxi data analysis project is rapidly waning, indicating that a different project might be more appropriate in the future.


بدون دیدگاه