- Harbourfront Quantitative Newsletter
- Posts
- The Limits of Out-of-Sample Testing
The Limits of Out-of-Sample Testing
Revisiting Out-of-Sample Accuracy in Trading Models
In trading system design, out-of-sample (OOS) testing is a critical step to assess robustness. It is a necessary step, but not sufficient. In this edition, I'll explore some issues with OOS testing.
In this issue:
Latest Posts
Sentiment as Signal: Forecasting with Alternative Data and Generative AI (12 min)
Behavioral Biases and Retail Options Trading (10 min)
The Rise of 0DTE Options: Cause for Concern or Business as Usual? (11 min)
How Machine Learning Enhances Market Volatility Forecasting Accuracy (12 min)
Predicting Corrections and Economic Slowdowns (11 min)
Finally, a powerful CRM—made simple.
Attio is the AI-native CRM built to scale your company from seed stage to category leader. Powerful, flexible, and intuitive to use, Attio is the CRM for the next-generation of teams.
Sync your email and calendar, and Attio instantly builds your CRM—enriching every company, contact, and interaction with actionable insights in seconds.
With Attio, AI isn’t just a feature—it’s the foundation.
Instantly find and route leads with research agents
Get real-time AI insights during customer conversations
Build AI automations for your most complex workflows
Join fast growing teams like Flatfile, Replicate, Modal, and more.
How Well Overfitted Trading Systems Perform Out-of-Sample?
In-sample overfitting is a serious problem when designing trading strategies. This is because a strategy that worked well in the past may not work in the future. In other words, the strategy may be too specific to the conditions that existed in the past and may not be able to adapt to changing market conditions.
One way to avoid in-sample overfitting is to use out-of-sample testing. This is where you test your strategy on data that was not used to develop the strategy. Reference [1] examined how well the in-sample optimized trading strategies perform out of sample.
Findings
In-sample overfitting occurs when trading strategies are tailored too closely to historical data, making them unreliable in adapting to future, changing market conditions and behaviors.
The study applied support vector machines with 10 technical indicators to forecast stock price directions and explored how different hyperparameter settings impacted performance and profitability.
Results showed that while models often performed well on training data, their out-of-sample accuracy significantly dropped—hovering around 50%—highlighting the risk of misleading in-sample success.
Despite low out-of-sample accuracy, about 14% of tested hyperparameter combinations outperformed the traditional buy-and-hold strategy in profitability, revealing some potential value.
The highest-performing strategies exhibited chaotic behavior; their profitability fluctuated sharply with minor changes in hyperparameters, suggesting a lack of consistency and stability.
There was no identifiable pattern in hyperparameter configurations that led to consistently superior results, further complicating strategy selection and tuning.
These findings align with classic financial theories like the Efficient Market Hypothesis and reflect common challenges in machine learning, such as overfitting with complex, high-dimensional data.
The paper stresses caution in deploying overfitted strategies, as their sensitivity to settings can lead to unpredictable results and unreliable long-term performance in real markets.
The results indicated that most models had a high in-sample accuracy but only around 50% when applied to out-of-sample data. Nonetheless, a significant proportion of the models managed to outperform the buy-and-hold strategy in terms of profitability.
However, it’s noteworthy that the most profitable strategies are sensitive to system parameters. This is a cause for concern.
Reference
[1] Yaohao Penga, Joao Gabriel de Moraes Souza, Chaos, overfitting, and equilibrium: To what extent can machine learning beat the financial market? International Review of Financial Analysis Volume 95, Part B, October 2024, 103474
How Reliable Is Out-of-Sample Testing?
Out-of-sample testing is a crucial step in designing and evaluating trading systems, allowing traders to make more informed and effective decisions in dynamic and ever-changing financial markets. But is it free of well-known biases such as overfitting, data-snooping, and look-ahead? Reference [2] investigated these issues.
Findings
Out-of-sample testing plays a vital role in evaluating trading systems by assessing their ability to generalize beyond historical data and perform well under future market conditions.
Although useful, out-of-sample testing is not immune to biases such as overfitting, data-snooping, and especially look-ahead bias, which can distort the validity of results.
A common issue arises when models are developed or tuned using insights gained from prior research, creating an indirect dependency between development and test data.
Researchers found that excessively high Sharpe ratios in popular multifactor models can be largely explained by a subtle form of look-ahead bias in factor selection.
Many out-of-sample research designs still overlap with datasets used in earlier studies, leading to results that reflect known patterns rather than genuine model performance.
The ongoing and iterative nature of financial research makes it difficult to construct fully unbiased validation frameworks that truly represent out-of-sample conditions.
When alternative evaluation methods were applied, Sharpe ratio estimates dropped significantly, indicating the extent to which traditional approaches may inflate performance expectations.
This reduction in Sharpe ratios is actually encouraging, as it better reflects the realistic outcomes investors can expect when implementing these models in real time.
Despite these findings, the paper emphasizes that multifactor models still improve on CAPM, though the improvements are smaller than widely claimed.
In short, out-of-sample testing also suffers, albeit subtly, from biases such as overfitting, data-snooping, and look-ahead.
We agree with the authors. We also believe that out-of-sample tests, such as walk-forward analysis, also suffer from selection bias.
Then how do we minimize these biases?
Reference
[2] Easterwood, Sara, and Paye, Bradley S., High on High Sharpe Ratios: Optimistically Biased Factor Model Assessments (2023). SSRN 4360788
Closing Thoughts
The results indicated that most models achieved high in-sample accuracy, but only around 50% when applied to out-of-sample data. While out-of-sample testing is an essential tool for evaluating trading strategies, it is not entirely free from biases such as overfitting and look-ahead. Research shows that these biases can inflate performance metrics like Sharpe ratios, leading to overly optimistic expectations.
Educational Video
Building Trading Strategies that Work with Walk Forward Analysis
In this video, Bob Pardo explains how walk-forward analysis offers a practical solution to the common problem of overfitting in trading strategy development. By distinguishing signal from market noise and adapting to shifting market regimes, this method enables more accurate performance evaluation, robust parameter selection, and continuous strategy refinement. When applied correctly, walk-forward analysis helps create trading systems that are both adaptive and resilient in real-world market conditions.
Missed the Market’s Big Moves?
The market moves fast - we make sure you don’t miss a thing.
Elite Trade Club delivers clear, no-fluff market intel straight to your inbox every morning.
From stocks to big-picture trends, we cover what actually matters.
Join 100,000+ readers who start their day with the edge.
Volatility Weekly Recap
The figure below shows the term structures for the VIX futures (in colour) and the spot VIX (in grey).

Stocks fell to start the week due to uncertainty regarding tariffs and the upcoming FOMC meeting. Friday brought a broader sell-off in the markets, triggered by a weaker-than-expected jobs report and renewed concerns over tariff tensions.
For the week, the S&P 500 fell 2.36% and the Nasdaq lost 2.17%. Oil prices rose during the first half of the week but declined on Friday following news of a potential supply increase. Gold dropped early in the week but bounced back on Friday. Bitcoin traded back in the $113,000 range.

On the volatility front, both spot and VIX futures increased, but they remain in contango. The roll yield, however, decreased sharply and is now near zero. Looking at the overall volatility chart, we can observe an upward trend since the start of the year.

Around the Quantosphere
Equity pullback by macro and quant funds may signal trouble ahead, strategist says. (Investing)
Ultimate Guide to Quant Trading Interviews. (Trademath)
Jane Street revealed the secrets of how its high-tech trading floor works. (Efinancialcareers)
The rise of the quant-engineer-infra hybrid. (Efinancialcareers)
The Quant Winter of 2025: Market Structure Shifts and AI Limitations Expose Hidden Vulnerabilities. (Ainvest)
Stock-picking hedge funds regain investor favour amid market volatility. (Hedgeweek)
Hedge Funds Just Got Burned: Retail Traders Trigger $2.5B Short Squeeze in July. (Yahoo)
Traders Hedging Record Rally Dabble in Exotic Options (Bloomberg)
Digital asset funds see record monthly inflows of $1.12bn (hedgeweek)
Disclaimer
This newsletter is not investment advice. It is provided solely for entertainment and educational purposes. Always consult a financial professional before making any investment decisions.
We are not responsible for any outcomes arising from the use of the content and codes provided in the outbound links. By continuing to read this newsletter, you acknowledge and agree to this disclaimer.