Big data? Little data is tricky too.

Big data is a hardy perennial of scary topics. We gaze in wonder at sophisticated organisations who have mastered the beast and stride off into the future as we peer at the mountains of our own data – or what we think are mountains of data.

Over many years and many data sets I have joined the struggle of analysts trying to master the art of insight. This blog post from last week reminded me of the problem. It has been my most popular so far on this site and I don’t really know why. I can tell you exactly how popular, where in the world it is most and least popular, which search terms were most effective etc. etc. Can’t tell you why though. I have some theories – it was personal and timely – but I would have to ask you why I suspect. And that is the navigation challenge of the user data ocean.

At Ask Jeeves, even all those years ago, we had multiple millions of queries to understand. We could count them, see where they came from, measure their $ value, see how common they were, categorise them and then and then… All of these measures could then be compared against a forecast (that was an interesting struggle too) and against the previous day, week, quarter and year. Data gave the business its rhythm. There were four measures of the health of the period: search volumes (how much), search share (how much more/less than the market), revenue (worth how much), profit (how much was the business worth). These measures made the happy days happy and the sad days sad. Of course, there were a multitude of other measures to colour in the triangle beneath this point (yield, marketing spend, seasonal data, distribution fees etc.), but they were the ones that counted.

What we struggled to do was answer the why question.  Why more last week? Why less revenue than last year? Many hours were spent in much debate. Not always conclusive debate either.

Similarly, at the BBC, traffic data is a common currency (or almost common, some of the elders cling to ratings, of course). Reach, users, visits, referrals, frequency and depth of visits paint the picture of a healthy or an ailing product. In the case of Bitesize, the numbers describing this health are really quite large.

Again, the hard part was discerning the cause. Why was that game less popular than that one or one podcast downloaded three times as much as average? We all had good theories and experience counted in weighing our guesses. We never really knew though. Or, when we did know it was because we asked. We survey or focused grouped to find out. We created another data set to explain the first one.

What those masters of big data probably do better is use the data to inform smaller, closer decisions in iterative development. One insight into user need is tested live as a new product, feature or piece of content is created. Then the data from usage is used to tweak and adjust the next step of development. And so on. The size of the data is made smaller by using it as evidence in the test of a clear hypothesis. There is less need to worry about all the data all the time in this approach. You still need to ask users why and what they think but the steering along the way is probably more accurate. Social tools make those questions easier to answer now as well and have become vital to any investigation (more data too, however).

Maybe one response to big data anxiety is to ask smaller questions of smaller data sets and leave the ocean alone. Maybe. But I would need to check.

(If you ever feel like telling me why a post is better or worse, please do. It will save me at least that one struggle).

Leave a Reply