6. Discussion
In summary, according to our analysis, we find three interesting facts related to the implementation of THP, namely, 1, the lattice reduction aided algorithm is of better numerical stability but it is more sensitive to imperfect CSI; 2, the computational complexity of some non-linear precoding algorithms (SQRD and RSiegel) is not always greater than their linear pre-equalization counterparts at the preprocessing stage, which is an advantage under fast fading scenarios; 3, the parallelism potential of these THP algorithms diverse significantly, which implies their implementation efforts diverse significantly as well. Overall, SQRD and MMSE are more implementation efficient than V-BLAST and RSiegel. Moreover, the trade-off space of different THP algorithms is summarized in Table 4. Some general guidelines can be drawn from this table. For example, due to its better BER performance and lower implementation effort, SQRD is more suitable to be employed under fast fading scenarios where the preprocessing has to be performed frequently. Under slow fading scenarios, however, the computation load of preprocessing phase is negligible compared to the IC phase and the transmitter is more likely to know close to perfect CSI. Therefore, it is advisable to employ RSiegel and V-BLAST to achieve high performance in low and high SNR region respectively.