6. Conclusion
We presented frame- and clip-based methods for compressing mocap data with low latency. Taking advantage of the unique spatial characteristics, we proposed learned spatial decorrelation transform to effectively reduce the spatial redundancy in mocap data. Due to its data adaptive nature, LSDT outperforms the commonly used data-independent transforms, such as discrete cosine transform and discrete wavelet transform, in terms of the decorrelation performance. Experimental results show that the proposed methods can produce higher compression ratios at a lower computational cost and latency than the state-of-the-art methods. In our current implementation, we compress 3D position-based mocap data defined on a skeleton graph. However, it is straightforward to apply our methods to other types of mocap data, such as facial expressions, hand gestures and motion of human bodies. In the future, we will extend our methods to compress mocap data represented by Euler angles. Due to the nonlinear nature of angles, the hierarchical structure may produce significant accumulation errors in the compressed data (Arikan, 2006; Chew et al., 2011). We will seek effective data-driven techniques to tackle this challenge.