Analysis: The LGF “Front PageView Effect”
Last week, we exposed an “error” in the way the custom-built LGF page view counter reacted to visitors’ clicks, and touched on a few other causes of page view inflation. Since then, CJ appears to have corrected the IE problem that I demonstrated in our video (although we’re not convinced that this was the only “bug”), but what remains as the largest culprit to inflated thread page view numbers is the one in plain sight: the “front page effect” (fpe).
First, I’d like to say that we say “plain sight”, because CJ did explain exactly how it works and admitted that it would significantly increase the page view number that is displayed at the top of each thread. So, while this explanation was buried in the comment section of an unrelated thread, we can’t claim that this trick was snuck in without telling anyone about it. For the sake of thoroughness, here is CJ’s comment one more time:
Instead, The Boiler Room was naturally curious if there was a way to quantify this effect, and therefore get an idea on the level of bias it adds when comparing page view numbers to all the other websites which don’t employ this technique (and/or when it is used to trash tweet). Additionally, this kind of data might come in handy if another blogger was thinking about doing something similar. What we found is that this isn’t that hard to do with some sampling and a little statistical analysis.
For our analysis, here’s what we have to work with:
- CJ has set 12 threads to display on LGF’s front page (at the time the fpe was announced, it was set at 10).
- Each front page thread gets a “view” count when the front page is hit.
- The view counters are observable.
- Each thread is timestamped to the minute.
We’ve also got some smart and resourceful people here in The Boiler Room, and we can set things up so that the data can be gleaned from automated samples and fed into a database to be charted and graphed. In short, we can track the reported view increases for any LGF thread from publication until it drops off the front page (and beyond).
What CJ may or may not have realized is that, with just those few things, we can actually get a pretty good idea of levels and patterns in LGF’s front page traffic by simply tracking what happens to these page view counters over time. Apply a little math and logic, and we can separate the approximate fpe number from the “real” views by applying 2 rules (and these are key, so they deserve bolding):
1. The fpe # can never be greater than the lowest view increase amongst the 12 front page threads over the sample period (except in cases where a new thread is published in between samples and yields the lowest number). In other words, the increase from the “deadest” thread on the front page contains the highest % of fpe views.
2. The greater the sampling frequency, the more accurate our estimate of the fpe becomes, and the % of fpe views in the increase approaches 100.
For 1, we can’t assume that the lowest view increase # is 100% front page views, rather that it still may include a few other views that come from click-throughs, referrers, searches, etc., but we know that it will be the closest to the true fpe #. But based on observation, and knowing generally what happens to views as a thread ages and moves down the front page of a blog, along with the fact that we have 12 threads to sample for the ”deadest” and do so frequently, we can say that it’s going to be a very close estimate.
For 2, we realize that we must balance the effect that our own samples have on the data, as every time we do it we register a front page view ourselves, so we wanted to limit our influence to only 1-2% if possible. This balance was found taking samples a few times an hour.
So, there you have the methodology. Take snapshots of the view increase of a thread, and each time subtract the increase of the “deadest” thread on the page, and what you’re left with is the increase that couldn’t have come from fpe (therefore, “real” views). Make sense?
But, before we reveal the graph and the data, we should ask ourselves: Knowing about this fpe effect, what would we expect the page view counter increases for any given thread on a relatively popular community-style blog like LGF to look like, from the time it’s first published to where it later moves down (and eventually off) the front page?
A: We’d expect it to increase very rapidly when first published, because in addition to the fpe, you have the lizards and lurkers who will click through to the comments, and the outside referrers (from twitter, other blogs etc.), and refreshing while the thread is “active”. Then, as the thread ages and moves down the front page, we’d expect the increases to level off slightly, as the extra views from this thread activity dies off and you’re left with mostly fpe views increasing the counter steadily (with “waves”, as time of day will effect front page view rate) until it reaches the bottom of that front page. Finally, we’d expect the increases to virtually flat-line the minute it is bumped off the front page and becomes thread #13, as it will no longer get fpe increases.
And what would we expect a non-fpe counter to look like for the same thread?
A: We should also see a steep increase at first (although not as steep, and not in the same quantities, obviously), and see that taper off as it becomes older and moves down the front page. After the thread got to be a day old or about 4 spots down on the front page, the thread would essentially be dead for most commenting activity, but we should still see some increases from delayed lurker click throughs, lizards coming back to read comments they missed, searches, etc., and perhaps even a “bump” if/when it sees late hits from other sites. It’s obviously going to vary a bit by the nature of the thread (for example, we wouldn’t expect an “open thread” to get late traffic from outside referrers, where others may get a lot more; so again, 12 to sample from helps), but for the most part, “real” page view increases should reduce themselves to a creeping pace with periodic bumps by the time the thread is a day old.
Well, we tracked and charted one, so what did we find?
Using a random thread that shall remain anonymous*, from the moment it was hatched to beyond the front page (the #s indicate the changes in its position on the page):
The red line represents page views recorded from the counter. Now, remember that with rule #1, the blue and green lines are estimates; it is much more difficult to pin down exactly. Again, this blue line represents the lowest ”real” views could possibly be, and the true line is undoubtedly a little higher for this particular thread (if another thread were sampled, we may see a blue line that is significantly closer to the green). But, since we believe that our methodology is sound, we can say that we’re darn close (to the point where you wouldn’t see much difference in the graph).
Alot of this is fairly intuitive, since the effect stipulates that these dead threads will keep accumulating ”views” as long as they’re on the front page. No one should surprised to see the view counters on these threads to show higher and higher numbers as you scroll down to #12, simply because those threads have been there longer. So the effect is fairly clear to anyone who stops by LGF and takes a quick glance at all the view counters.
In conclusion, the point of this exercise was not to prove beyond a doubt that thread 37xxx really got only x number “real” views (as most blogs count them), but to demonstrate the magnitude of the fpe inflation, and show that the technique renders the individual view counters meaningless. Specifically, the “Front PageView Effect” puts so much weight on the counters that you can’t discern if one thread has a higher count than another because it was particularly insightful/important, or because of thread scheduling it just happened to sit on the front page longer. That’s why normal blogs have a separate counter for “front page” views, and probably the biggest reason why a claim like this
is rather ridiculous, and deserves to be smacked down.
*the thread # is anonymous for IP security reasons
(Hat tip: The Boiler Room)