# دانلود رایگان مقاله انگلیسی مدل RKHS برای انتخاب متغیر در رگرسیون خطی کارکردی - الزویر 2018

عنوان فارسی
مدل RKHS برای انتخاب متغیر در رگرسیون خطی کارکردی
عنوان انگلیسی
An RKHS model for variable selection in functional linear regression
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
25
سال انتشار
2018
نشریه
الزویر - Elsevier
فرمت مقاله انگلیسی
PDF
کد محصول
E8086
رشته های مرتبط با این مقاله
آمار
گرایش های مرتبط با این مقاله
آمار ریاضی
مجله
مجله تحليل چندمتغيره - Journal of Multivariate Analysis
دانشگاه
کلمات کلیدی
انتخاب ویژگی، رگرسیون خطی تابعی، نقاط ضربه، انتخاب متغیر
چکیده

Abstract

A mathematical model for variable selection in functional linear regression models with scalar response is proposed. By “variable selection” we mean a procedure to replace the whole trajectories of the functional explanatory variables with their values at a finite number of carefully selected instants (or “impact points”). The basic idea of our approach is to use the Reproducing Kernel Hilbert Space (RKHS) associated with the underlying process, instead of the more usual L 2 [0, 1] space, in the definition of the linear model. This turns out to be especially suitable for variable selection purposes, since the finite-dimensional linear model based on the selected “impact points” can be seen as a particular case of the RKHS-based linear functional model. In this framework, we address the consistent estimation of the optimal design of impact points and we check, via simulations and real data examples, the performance of the proposed method.

نتیجه گیری

7. Conclusions

The RKHS approach we have introduced in this paper provides a natural framework for a formal unified theory of variable selection for functional data. The “sparse” models (those where the variable selection techniques are fully justified) appear as particular cases in this setup. As a consequence, it is possible to derive asymptotic consistency results as those obtained in the paper. Likewise, it is also possible to consider the problem of estimating the “true” number of relevant variables in a consistent way, as we do in Section 4. This is in contrast with other standard proposals for which the number of variables is previously fixed as an input, or it is determined using cross validation and other computationally expensive methods. Then, our proposal is more firmly founded in theory and, at the same time, provides a much faster method in practice, which is important when dealing with large data sets.

The empirical results we have obtained are encouraging. In short, according to our experiments, the RKHS-based method works better than other variable selection methods is those sparse models that fulfill the ideal theoretical conditions we need. In the non sparse model considered in the simulations, the RKHS method is slightly outperformed by other proposals (but still behaves reasonably). Finally, in the “neutral” field of real data examples the performance looks also satisfactory and competitive.

Last but not least, from a general, methodological point of view, this paper represents an additional example of the surprising usefulness of reproducing kernels in statistics. Additional examples can be found in [3, 5, 17, 21, 32].

بدون دیدگاه