ترجمه مقاله نقش ضروری ارتباطات 6G با چشم انداز صنعت 4.0
- مبلغ: ۸۶,۰۰۰ تومان
ترجمه مقاله پایداری توسعه شهری، تعدیل ساختار صنعتی و کارایی کاربری زمین
- مبلغ: ۹۱,۰۰۰ تومان
Abstract
This report details a course on big data analytics designed for undergraduate junior and senior computer science students. The course is heavily focused on projects and writing code for big data processing. It is designed to help students learn parallel and distributed computing frameworks and techniques commonly used in industry. The curriculum includes a progression of projects requiring increasingly sophisticated big data processing ranging from data preprocessing with Linux tools, distributed processing with Hadoop MapReduce and Spark, and database queries with Hive and Google’s BigQuery. We discuss hardware infrastructure and experimentally evaluate the cost/benefit of an on-premise server versus Amazon’s Elastic MapReduce. Finally, we showcase outcomes of our course in terms of student engagement and anonymous student feedback.
7. Conclusion
This report detailed a big data analytics course that serves as an elective course for upper-level computer scientists at Stetson University. The course is project-oriented and engages students with realistic, hands-on practice using modern big data tools and techniques. Different options for supporting hardware infrastructure were explored and experimentally evaluated. Student feedback and academic and professional outcomes conclusively show that the course is a success and the high-level learning objectives were met.
A course like the one described here is necessarily continuously evolving. New technologies are introduced while others go out of favor. For example, in the first offering of this course, we did not cover Spark. Now, such an omission is unjustifiable. Likewise, we believe it is important for the projects to stay relevant and timely. As new big datasets are made available, projects should be updated to make use of those datasets. For example, at the time of writing, the NYC Yellow Taxi dataset is well known and several blog posts have been authored detailing different ways to analyze the data. The novelty of a NYC taxi data analysis project is rapidly waning, indicating that a different project might be more appropriate in the future.