ترجمه مقاله نقش ضروری ارتباطات 6G با چشم انداز صنعت 4.0
- مبلغ: ۸۶,۰۰۰ تومان
ترجمه مقاله پایداری توسعه شهری، تعدیل ساختار صنعتی و کارایی کاربری زمین
- مبلغ: ۹۱,۰۰۰ تومان
Abstract
Due to advances in parallel file systems for big data (i.e. HDFS) and larger capacity hardware (multicore CPUs, large RAM) it is now feasible to manage and query network data in a parallel DBMS supporting SQL, but performing statistical analysis remains a challenge. On the statistics side, the R language is popular, but it presents important limitations: R is limited by main memory, R works in a different address space from query processing, R cannot analyze large diskresident data sets efficiently, and R has no data management capabilities. Moreover, some R libraries allow R to work in parallel, but without data management capabilities. Considering the challenges and limitations described above, we present a system that allows combining SQL queries and R functions in a seamless manner. We justify a parallel DBMS and the R runtime are two different systems that benefit from a low-level integration. Our parallel DBMS is built on top of HDFS, programmed in Java and C++, with a flexible scale out architecture, whereas R is programmed purely in C. The user or developer can make calls in both directions: (1) R calling SQL, to evaluate analytic queries or retrieve data from materialized views (transferring result tables in RAM in a streaming fashion and analyzing them in R), and vice-versa (2) SQL calling R, allowing SQL to convert relational tables to matrices or vectors and making complex computations on them. We give a summary of network monitoring tasks at ATT and present specific programming examples, showing language calls in both directions (i.e. R calls SQL, SQL calls R).
CONCLUSIONS
We presented a system that enables fast bi-directional data transfer between a parallel DBMS and the R runtime. In one direction our system converts SQL relational tables into R data frames or matrices. On the opposite direction an R data frame or matrix is converted into a relational table, with a transformed data frame being the most common case. Our system is built on top of a careful mapping between atomic data types. The system efficiently constructs data structures (i.e. non-atomic data types) in RAM in one pass over a data set. The net gain is that an R script can call an SQL query or materialized view to analyze the result set. On the other hand, an SQL query (not a script or longer embedded SQL program) can call an R function to perform some mathematical computation in an intermediate step. Our initial prototype opens several research directions. We want to define functional constructs in the R programming language to transform relational tables into data frames. In a similar manner, we want to study alternatives to transform a matrix into an SQL object (flat table, subscript/value triples, or binary object). Propagating insertions to materialized views and then to a mathematical model computed by R is a challenging problem. Finally, we need to conduct a detailed performance study on the ATT network data warehouse.