Feature selection based index tracking: A two-stage approach for optimized sampling

Authors

Thomas R. Holy, Ernst-Abbe-Hochschule, 07745 Jena, Germany
Mario Brandtner, Ernst-Abbe-Hochschule, 07745 Jena, Germany
Christian Pigorsch, Friedrich-Schiller-Universität, 07743 Jena, Germany

Abstract

Equity indices form the basis for index-oriented financial products such as exchange-traded funds (ETFs). The providers of these financial products face the challenge of efficient replication. Optimized sampling is the most attractive approach for replicating an equity index due to its cost-effectiveness, flexibility, and transparency. In this paper, we propose a novel two-stage procedure for optimized sampling. It combines recursive feature elimination (RFE), which can wrap various supervised learning methods, to select the index constituents for replication in the first stage, followed by tracking-error optimization to determine the weights of the tracking portfolio components in the second stage. Our approach allows explicit control over the number of tracking portfolio components, representing a significant advantage over existing machine learning-based approaches. Additionally, the optimization procedure considers important financial constraints and practical and statistical challenges that have been neglected in the recent literature. Based on a dataset including two major indices, S&P 500 and CSI 300, and 16 years of test data, we show that our approach outperforms a state-of-the-art mixed-integer programming (MIP) solver in terms of computing time for all feature selection based approaches used in the first stage. Moreover, we show that recently proposed deep learning approaches are associated with significantly higher tracking errors, higher portfolio turnover, and lower portfolio diversity. Our findings have significant implications for researchers and ETF sponsors seeking to develop efficient and practical solutions for tracking portfolio construction.

Preprint available on SSRN