13. Conclusions
Mining high dimensional correlated subspaces is a very challenging but important task for knowledge discovery in multidimensional data. We have introduced 4S, a new scalable subspace search scheme that addresses the issue. 4S works in three steps: scalable computation of L2, scalable mining of Lk (k > 2), and subspace merge to reconstruct fragmented subspaces and to reduce redundancy. Our experiments show that 4S scales to data sets of more than 1.5 million records and 5000 dimensions (i.e., more than 1 trillion subspaces). Not only being more efficient than existing methods, 4S also better detects high quality correlated subspaces that are useful for outlier mining, clustering, and classification. The superior performance of 4S compared to existing methods comes from (a) our new notion of correlated subspaces that has proved to be more general than existing notions and hence, allows to discover subspaces missed by such methods, (b) our scalable subspace search scheme that can discover high dimensional correlated subspaces, and (c) our subspace merge that can recover fragmented subspaces and remove redundancy. Directions for future work include a systematic study our search scheme with different correlation measures, and the integration of the subspace merge into the correlation graph to perform an inprocess removal of redundancy