ETF PORTFOLIO CONSTRUCTION BY FEATURE SELECTION BASED INDEX TRACKING

📈 The Challenge

Passive investment vehicles like ETFs require a precise index replication method to minimize tracking error while adhering to regulatory constraints (e.g., UCITS 5-10-40 rule). Traditional approaches, such as mixed-integer programming (MIP), face computational hurdles, while emerging machine learning approaches using autoencoders often neglect practical financial constraints.

🔧 Our Solution

We propose a two-stage approach that combines the strengths of machine learning and portfolio optimization:

1️⃣ Feature Selection: Select the most important stocks from the index to obtain a smaller, efficient sample using Recursive Feature Elimination (RFE) with linear regression methods (Ridge, LASSO, SVR, Elastic Net, OLS) and tree-based methods (Random Forest, XGBoost), explicitly controlling portfolio size.
2️⃣ Portfolio Optimization: Weights are determined under financial constraints (no short-selling, diversification rules) to minimize tracking error.

🎯 Key Contributions

A Novel Approach: We propose a feature selection approach for constructing tracking portfolios.
Computational Efficiency: Our approach achieves comparable tracking error, diversification, and portfolio turnover to state-of-the-art MIP solvers with significantly faster computation.
Superior Performance compared to Autoencoders: Significantly lower tracking error and turnover with much greater diversification.
Regulatory Compliance: Integrates diversification constraints, avoiding overfitting and sector concentration.

📊 Empirical Validation

Datasets: S&P 500 and CSI 300 (16 years of data, multiple market regimes).
Results: RFE with linear regression methods matches MIP in tracking error, turnover, and diversity.
Autoencoders underperform and suffer from high complexity and a lack of interpretability.
Tree-based methods perform worse in the evaluation criteria than linear regression methods and do not seem suitable for portfolio construction.

💡 Implications

For Researchers: Provides a novel, feature selection-based approach for financial index tracking and highlights the limitations of complex "black-box" machine learning models (i.e., autoencoders) in constrained financial optimization.
For Practitioners: Provides a scalable, transparent framework for UCITS-compliant ETFs, reducing costs and computational overhead.

📈 Conclusion

Our work bridges the gap between modern machine learning and practical portfolio construction, emphasizing simplicity, interpretability, and regulatory compliance. The proposed framework is particularly advantageous for large indices and dynamic rebalancing scenarios.

📚 Read the Preprint