6 Conclusion
Regression estimation using samples constructed via the NNM from two sources is not uncommon in applied economics. This paper has demonstrated that such OLS 30 estimators are generally inconsistent, and thus an appropriate bias correction is required. It has also been shown that the convergence rate to the probability limit of the OLS depends on the number of matching variables and the divergence pattern of two sample sizes.
Two versions of bias-corrected estimators have been proposed, and each can be interpreted as a variant of indirect inference estimators. The MSII estimator attains the parametric convergence rate for the cases with at most two matching variables, whereas the MSII-FM estimator achieves the parametric convergence rate when the number of matching variables does not exceed four. Monte Carlo results suggest that a small number of matches work well in practice, and in particular, we should consider the single match when the number of matching variables is two or three.
The paper aims at providing corrections for an established practice, which is to run (parametric) OLS ignoring imputation. In particular, our proposal for MSII-FM is based on a nonparametric series estimation of g2(Z) = E(X2|Z). The nonparametric estimator is employed only when the curse of dimensionality in matching variables prevents MSII from attaining parametric convergence. Alternatively, it is possible to use a nonparametric estimate of g2(Z) as a (generated) regressor in place of X2 in regression (1) from the beginning. As illustrated in Section 2.4, this would result in a partially linear semiparametric model with a measurement error problem. There could be several different (nonparametric) estimators available for the model of this class. However, such estimators are not as widely used in practice as the OLS, and hence we leave the development of the estimators for future work.