The process by which one tests a trading strategy using historical market data, called back-testing, doesn’t sound very sexy or exciting, but it has always been very intriguing to me. As a tool that is used by quantitative analysts, traders, and investors, back-testing is not very well understood, causes skepticism and debate, and is thought by many in the financial industry to be an unsound and inexact science. The ambiguity surrounding the back-test practice lead many to use less than optimal results to wager bets on trades with real money. The results obviously, may not be as expected. Subsequently, we have the CFTC warning that warns us about the use of back-tests and those that would use those results to sell products based on some form of back-test:
This standard CFTC warning is required whenever using back-tests - (Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. There is a possibility that you may sustain a loss equal to or greater than your entire investment regardless of which asset class you trade (equities, options futures or Forex) therefore, you should not invest or risk money that you cannot afford to lose.) The wording here is important.
It is safe to say, that a more balanced approach, and a better educated 'back-tester', and those that consume back-tested strategies is needed. This in turn may help to reduce over-weighted reliance upon the descriptive statistics presented by back-test performance summaries and its advocates. Taking steps to alleviate an overweighting on the back-test process when selecting a trading strategy for use in real-time makes solid sense. Let’s review some of the issues surrounding back-testing.
Author Back-testing Bias
Hindsight Bias - Hindsight bias occurs when the author embraces past events as something that he or she believes could have been predicted with more accuracy than probability suggests.
Confirmation Bias - Beliefs impact confirmation bias by leading the searcher to search only for those variables that impact testing in a positive way, rather than searching for all possibilities.
Unrealistic Competition Bias - This form of bias occurs when results are reviewed against a weak benchmark that unfairly generates a comparison in favor of our back-test outcome, overstating future ability to profit.
Incorrect Data – It should be clear to all reading this, that incorrect data negatively impacts the accurate outcome of the back-test, and every step should be taken to ensure that the historical data used in the back-testing process is “clean”. Methods for checking data need to be incorporated into any good back-test regime.
Sensitivity Testing Forgotten - When testing, one should always perform a sensitivity test as well, sometimes called a robustness test, where the results are perturbed so that a full-spectrum of results may be observed. Tunnel vision, not a real term used when reviewing the back-test process but worth a mention occurs when parameters are used and testing is never performed in and around those parameters. Tunnel Vision. Results can easily deteriorate if the model under investigation is not solid, does not contain valid signal and is worthless. Moving to the left or right of a parameter could cause profits to fall of a cliff quickly.
Selection or Population Bias / Survivorship Bias - These two forms of biases are often intertwined. The tester errs when he or she does not select a large enough population in testing, or uses a population in testing that is not or cannot be used during the normal next phase of testing, the out of sample data set phase.
Survivorship Bias - This bias is interesting because it can creep into back-testing in ways that are not easily discerned. One must ensure that previous results from initial back-test are not somehow selectively iterated in another part or phase of the test. [This may occur when performing a statistical review of the results,a trimmed mean of the returns for example, and selecting only profitable results as subset when creating aggregated bins to further analyze the trajectory across time.] Clearly this is unintentional, but it does happen. Much thought must be given to each step in the back-test process so the tester does not fall into the back-test abyss.
Another type of survivorship bias takes the form of testing on companies that no longer exist. That is why it is important to make sure that your data is reviewed for mergers and acquisitions, splits, and more.
Finally, when incorporating fundamental data into trading models, and back-testing based on this type of information, one must take care to handle the stated earnings of companies involved in an M&A. Restated financial reporting is a requirement and may go on into the future for years. Does your back-testing data contain this type of information?
Thoughts on Processing
Curve Fitting - Curve fitting is related to overuse of optimal parameterization. Variables are found that impact the model and only those variables that are perfect are considered. This simply means that optimal variables are tuned to the data, but outside the data-set used in testing, those optimal variables will cause the model to break down.
Data Snooping - Data snooping occurs when relationships and statistics within the chosen data-set are highlighted as significant (and also may be related to over-curve fitting or over optimization), but may not be significant outside of that data-set. Data snooping police often point to or use the word persistence. Too much snooping and profits earned in the back-test will not persist into the future during real-world trading.
A certain amount of data mining and or data snooping is allowed in back-testing. Sound trading signals that are NOT noise, coincidental or random effects may be discerned through additional tests, and these test must be performed to compare output against buy and hold, noise and other quantitative metrics.
Look-ahead Bias - LAB occurs when the author uses information or data in a back-test simulation that would not have been known or available during the period being analyzed. This may lead to results that are sub-par.
Other Important Points
- Mutli-asset class testing also must take into consideration the differences between symbols, groups of symbols or a large universe of symbols from different asset classes.
- Forex has a cost inherit in holding an overnight or longer duration position in any pair. Here we have time-horizon trade effects.
- Futures have the ever problematic futures expiration and rollover problem.
- Equities have non-liquidity issues, inactive companies, mutual funds, etc., where bid-ask impacts largely on back-test, forward walk, and real-time trading execution and costs.
One must carefully consider whether or not they have the expertise, resources, and skill-sets required to attack back-testing in the first place. Hiring talented professionals in this area, or outsourcing may save the investor the pains and losses that may accompany trading in real-time, on the dark side of the back-test moon.
Final note - Quite often the most simple stratgies 'work' because they embrace the most reality. From hyperbolic rallies, to flash crashes, and other shocks to the system, remember that when the modeler adds a parameter to ensure the caputre of one of these market regimes, we must degrade his or her strategy, add an additional parameter, degrade it again, ad-infinitum.
For more information, visit www.strategydb.com
StrategyDB.com is an innovator in developing trading strategy tools and applications. It
has developed the Strategy Ticker / Strategy Matrix concept, one of the fastest growing
applications in the field of analytics.