Big Data Problems – Lies, Damned Lies & Statistics

Big Data Problems: Does Your Data Add Up?

More than any other time in history you have access to thousands of metrics that tell you how well your marketing is performing. Likes, follows, shares, comments, views, pageviews, unique visitors, completed sales and retweets are just a few metrics that help you along the way.

We use this data every day to make business decisions. Post this, don’t post that, use this title because it performs better. The last time we did this, X happened. Let’s try that again.

But can you trust what your own data is telling you? My experiences lately with data say maybe not.

LinkedIn Likes Lost

I have published a handful of times on LinkedIn so I “know the drill.” I promote my posts as soon as they are live and I share them among my social streams. In the case of my last post, I even promoted it on a few paid sources. Yet the numbers seem to be … off.

Look closely – my post with 138 views has as many likes and as many comments as my next two posts – which have, in theory, 800 views. Was the headline that much better? Was the article just that good? Unfortunately not – with just 138 views it also means nobody shared the article and it essentially went nowhere despite paid and social intervention.

So I dug a bit deeper. I noticed that someone liked my post but LinkedIn never reported this like.

linkedin post liked

This “like” appeared in my feed – I waited a few minutes to see if there was a lag effect.

liked3

Then I waited an hour. This ‘like’ never materialized. Did Andrew immediately unlike it? Nope – still there in my activity stream over a day later. When I noticed this, I had another LinkedIn user like my post. It also never came through. I waited over a day for this data to come through but it never materialized. Why are LinkedIn stats wrong? A glitch?

Facebook Link Clicks?  Facebook vs Google Analytics & Statcounter

After my experience on LinkedIn with dubious analytics I thought I should check the rest of my data to ensure I’m acting on reality not what a social network chooses to show me. One of the hardest things about data is you usually only have one source for each metric. I have to trust Twitter that I have X followers and my post got Y retweets. If they didn’t bother showing me some, I’d never know.

So I tested Facebook with verifiable data: traffic sent to my website. Surely the number of clicks I received from a post is straightforward to measure.

facebook traffic fraud

For this post Facebook says they sent 42 link clicks through to my website. A fairly low but acceptable result for my effort. I was more concerned with the 2000 post likes to increase (whatever they call the new) Edgerank but that’s for another post. 42 link clicks – ok, that’s verifiable.

I opened my Statcounter and Google Analytics pages for social referrals. I found my Statcounter hits from Facebook to this page … 3. THREE.

statcounter vs facebook

How is that even possible? Surely Statcounter isn’t recording accurately here. Let’s check Google Analytics.

google analytics vs facebook insights

At this point I wanted to punch something because NOTHING MAKES SENSE. Google shows the same post for the same time period bringing in 58 hits from Facebook.

Let’s extrapolate this out – according to Analytics I got almost 60 hits. Facebook says 40. Statcounter says 3. If we were looking at bigger numbers, the difference between 40,000 hits and 60,000 could mean the difference between someone staying in business and going out of business. 60,000 vs 3,000 is a massive difference.

In fact, I tried to match up the other posts and on some, Analytics was higher. On others it was lower than the Facebook number. Statcounter rarely matched within 10% of either one.

Other Stats: Pageviews, Traffic, Hits …

Someone suggested to me that perhaps Google and Facebook are measuring different things. Clicks vs Sessions, Pageviews, Visits, Hits … we don’t know that the terminology is the same. Maybe someone visited my post twice and Analytics counts it twice while Facebook doesn’t? Not sure.

After these experiences I compared my Google Analytics traffic to my Statcounter numbers. Once you filter out the spam bots from Analytics, they were under-counting by at least 50%. Analytics is installed in the header of my site so should load *before* Statcounter but most of the time under-counted. Sometimes it over-counted.

Conclusions

I tried fixing my data and accounting for the mistakes. Were the timezones set the same? Yes. Should page views (GA) = pageviews (Statcounter)? Yes. Should “link clicks” = “page views”? In my opinion they should … but they never actually match up. I am in the early stages of reviving this blog but already having big data problems.

My biggest conclusion from all my data testing was that none of the data is likely to be 100% accurate and mean exactly what you think it means. Always be working to clean your data and double-checking important metrics as much as possible. You never know when it’s going to be off by 50% or more.

In the future, I’ll be doing more comprehensive data tests … this was the result of something I randomly saw but I will continue to study the metrics we get from various sources and try to match them up with reality.

Thoughts?

Enjoy the post? Like & Share!

Is your data telling you the right story?As I restart my site I'm noticing massive data issues.

Posted by Matt Antonino on Tuesday, August 4, 2015

 

Leave a Reply

Your email address will not be published. Required fields are marked *