Abstract
Migration from RDBMS to NoSQL has become an important topic in a big data era. This paper provides a comprehensive study on important issues in the migration from RDBMS to NoSQL. We discuss the challenges faced in translating SQL queries; the effect of denormalization, secondary indexes, and join algorithms; and open problems. We focus on a column-oriented NoSQL, HBase, because it is widely used by many Internet enterprises such as Facebook, Twitter, and LinkedIn. Because HBase does not support SQL, we use Apache Phoenix as an SQL layer on top of HBase. Experimental results using TPC-H show that column-level denormalization with atomicity significantly improves query performance, the use of secondary indexes on foreign keys is not as effective as in RDBMSs, and the query optimizer of Phoenix is not very sophisticated. Important open problems are supporting complex SQL queries, automatic index selection, and optimizing SQL queries for NoSQL.