This post originally appeared on BumeBox.com
No campaign can remain successful indefinitely, thus a key company objective should be to continuously create successful campaigns. One method to get better results is through A/B testing.
The goal of A/B testing is to improve your conversion rates over time, whatever your goal metric may be (purchase, registration, click, vote, Like, etc). A/B testing only works when there is equal historical bias for the campaigns you are launching, i.e. they are launched under the same conditions. When the launch conditions of an A/B test are not identical, this is something I call Trying Something New (TSN) and any resulting data must be treated carefully. A/B testing is very different from TSN, let me show why. Everything follows from this rule.
Axiom: Conversion rates change over time
The direction, amplitude, and order of the change are not important, the conclusions are the same. For simplicity consider the base case: negative linear decay. Its’ behavior would look something like this
No campaign can remain successful indefinitely, thus a key company objective should be to continuously create successful campaigns. One method to get better results is through A/B testing.
The goal of A/B testing is to improve your conversion rates over time, whatever your goal metric may be (purchase, registration, click, vote, Like, etc). A/B testing only works when there is equal historical bias for the campaigns you are launching, i.e. they are launched under the same conditions. When the launch conditions of an A/B test are not identical, this is something I call Trying Something New (TSN) and any resulting data must be treated carefully. A/B testing is very different from TSN, let me show why. Everything follows from this rule.
Axiom: Conversion rates change over time
The direction, amplitude, and order of the change are not important, the conclusions are the same. For simplicity consider the base case: negative linear decay. Its’ behavior would look something like this
So when you launch a proper A/B test with several versions, your results would look like this after some time t
The amount of data necessary to make your test valid requires its own statistical discussion, but let’s say you meet the requirements to make the data signficant. That means that after time t you can pick a clear winner. In this case the winner is A and you decide to run this version full time. Some time after the test you get crafty and run a new version against A. Another perfectly valid A/B test right? Nope. A/B tests are only valid when all versions of the test have equal historical bias against them. This is TSN and is not valid as an A/B test. The conversion rates looks like this when you introduce new version E.
And herein lies difference between A/B testing and TSN. The results of an A/B test have equal weight because they were launched in the same environment. The validity of TSN, however, completely depend on which point in history you observe the results. At time t* campaign E looks like a clear winner against campaign A. Not so when you let it mature, at time t’ it’s a clear loser.
A proper A/B test is different from a simple TSN. It’s important you can distinguish between the two cases so you can make reliable conclusions.
There are big offline implications too. Ideally a political campaign is a long A/B test in which the candidate tries a bunch of stuff they think will work, observes what works, and iterates quickly. In 2008 President Obama campaigned under the slogans HOPE and CHANGE. It seems that they worked. If he campaigned in 2012 with the same slogans what do you think would happen? It certainly would not have the same impact because the voters have a historical bias for these campaigns. My bet is his conversion rate (votes) would go down in a similar fashion to <graphic 1>. (Author’s note: this is not a political endorsement, merely a case study).
So how would a savvy strategist plan the PO 2012 presidential campaign? Comparing survey and sentiment data from similar historical environments is the only way to do it. Asking a voter now if she prefers HOPE/CHANGE vs. some other new slogan they come up with is not a valid assessment and would lose him the election. Instead they should survey voters on several new slogans and compare their success rates vs. the rates that HOPE/CHANGE had in 2007. Indeed the current political environment is not identical to that of 2007, but under Axiom 1 it’s certainly a better barometer.
A proper A/B test is different from a simple TSN. It’s important you can distinguish between the two cases so you can make reliable conclusions.
There are big offline implications too. Ideally a political campaign is a long A/B test in which the candidate tries a bunch of stuff they think will work, observes what works, and iterates quickly. In 2008 President Obama campaigned under the slogans HOPE and CHANGE. It seems that they worked. If he campaigned in 2012 with the same slogans what do you think would happen? It certainly would not have the same impact because the voters have a historical bias for these campaigns. My bet is his conversion rate (votes) would go down in a similar fashion to <graphic 1>. (Author’s note: this is not a political endorsement, merely a case study).
So how would a savvy strategist plan the PO 2012 presidential campaign? Comparing survey and sentiment data from similar historical environments is the only way to do it. Asking a voter now if she prefers HOPE/CHANGE vs. some other new slogan they come up with is not a valid assessment and would lose him the election. Instead they should survey voters on several new slogans and compare their success rates vs. the rates that HOPE/CHANGE had in 2007. Indeed the current political environment is not identical to that of 2007, but under Axiom 1 it’s certainly a better barometer.