Every trading strategy looks brilliant in hindsight. The equity curve slopes upward, the Sharpe ratio is impressive, and the win rate inspires confidence. But experienced practitioners know that the gap between a compelling backtest and a profitable live strategy is where fortunes are lost. This guide teaches you how to read backtests critically.
A backtest is not a prediction. It's a statement about what would have happened under specific assumptions about the past. The quality of those assumptions determines whether the results generalize to the future.
The Key Metrics
Before diving into pitfalls, let's establish what the key backtest metrics actually measure:
Total Return — The cumulative profit or loss. Context matters: 50% over 10 years is poor; 50% over 6 months is exceptional. Always annualize for comparison.
Sharpe Ratio — Risk-adjusted return (excess return per unit of volatility). Generally: below 0.5 is poor, 0.5-1.0 is acceptable, 1.0-2.0 is strong, above 2.0 is exceptional (or suspicious).
Win Rate — Percentage of profitable trades. A 40% win rate can be highly profitable if winners are much larger than losers. Don't evaluate in isolation.
Max Drawdown — The largest peak-to-trough decline. This is the number that determines whether you can psychologically and financially survive the strategy. A 40% drawdown means you need 67% to recover.
Sharpe ratio decay: backtest vs live
Illustrative, 5-year strategiesStrategy Comparison
Let's look at three common strategies and their backtested performance. These are live results from the Equity.Finance backtest engine running on real market data:
These backtests run on the same price data (AAPL) with the same capital and time period. The differences in results illustrate how strategy choice, not just stock selection, drives outcomes.
The Five Pitfalls
Overfitting
The most common and most dangerous pitfall. Overfitting occurs when a strategy is optimized to perform well on historical data but captures noise rather than signal. A strategy with 15 parameters that was tuned on 10 years of data will almost certainly underperform out of sample.
Red flag: Unusually high Sharpe ratios (>3.0) on standard equity strategies. If someone claims a simple moving average crossover generates a Sharpe of 4, the backtest is likely overfit.
Survivorship Bias
If you only backtest on companies that exist today, you're missing all the companies that went bankrupt, were delisted, or were acquired at distressed valuations. This bias inflates returns because your universe is pre-selected for success.
Red flag: Any backtest on "current S&P 500 constituents" without adjusting for index changes. The S&P 500's composition changes significantly over a decade.
Look-Ahead Bias
Using information that would not have been available at the time of the trading decision. Common examples: using adjusted close prices that incorporate future splits, or using financial data before its actual publication date.
Red flag: Strategies that trade on earnings data the day it was reported in databases rather than the day it was publicly available (which can differ by days or weeks).
Transaction Cost Assumptions
Many backtests assume zero or minimal transaction costs. For strategies that trade frequently, realistic costs (commissions, bid-ask spread, market impact) can easily erode 50-80% of gross returns.
Transaction cost drag by strategy type
% annual return lostSample Period Selection
A strategy backtested from March 2009 (the GFC bottom) to any subsequent date will show great results for almost anything that's long equities. The choice of start and end dates dramatically affects results.
Red flag: Backtests that start at obvious market troughs or end at peaks. Always ask: what happens if I shift the window by 6 months?
How to Evaluate a Strategy
When someone presents a backtest, ask these questions in order:
- What's the hypothesis? A strategy should be based on an economic or behavioral rationale, not just pattern matching.
- How many parameters were optimized? More parameters = higher overfit risk.
- What's the out-of-sample performance? Split the data: train on one period, test on another.
- What's the worst drawdown? Can you survive it financially and psychologically?
- What are the transaction cost assumptions? Double them and see if it's still attractive.
If a strategy can't survive a 2x increase in assumed transaction costs, it probably isn't robust enough for live trading. The market is constantly adapting, and edges erode over time.
The Equity.Finance Backtest Tool
The backtest engine on our Lab page implements three strategies — RSI Mean Reversion, SMA Crossover, and MACD Cross — with configurable parameters. It uses real OHLC data from Alpha Vantage and includes the key metrics discussed above.
Use it to build intuition about how different strategies behave across different market conditions. The goal isn't to find the "best" strategy — it's to understand how strategy parameters affect risk and return, and to practice the critical evaluation skills described in this guide.