Why should we be data-genic? In this series, I’ve attempted several answers to this question. I started off with the curious case of Little Data—how we, at Mindworks, extracted significant value by working smaller data sets. And that’s what got us started on being more data-genic.
I also shared the varied business benefits that have accrued to us since—sharper customer centricity, new revenue streams and multiplicity of insights.
In the previous two articles, I have tried to share operational discoveries we have made, including some of the hacks we learnt in making Little Data work for us (start with a prototype, rely less on automation, access easier data sources first). And then, why we needed to embed aspects of storytelling in our practice—because only through stories do people really connect with data.
Continuing to drill down further, let’s explore data design in this piece. By this I mean how could we go about building the data processes in the organisation—what parts of the operational process should you data-enable, what data sets will be relevant, what dashboards to build, etc? Again, my perspective will remain for smaller companies, who have only limited resources to bring to this new operational practice.
Also, keep in mind these are rules-of-thumb that we discovered through our iterations. You may quite possibly find other ways to make it work in your context.
Business users should lead the process design
In many ways, that’s the obvious thing to do. After all, they will be the ones to make it work, and derive value from it. But we all know the problem is how to get these users, who have little affinity or experience in data handling, interested in the first place.
We figured this out after a few iterations. We got the business user in, by starting with them on the process, manually. This required going to about 50 content websites and extracting information from specific sections. We told them we don’t know how to design the data-driven process, or even how data could help them.
So these users started on the manual process—it was quite time-intensive. But by starting in this fashion, these process users also started identifying key process variables, and hacks to manage those variables the best way.
We also embedded our data analyst in the business team, who watched the evolving manual process closely. Now and then, he would also suggest some hacks to speed up their process, so the users could see some early gains the data team could provide.
In some time, the process matured enough that the business team knew how to make this manual process work well. More importantly, they had a pretty good idea about their pain points—uselessly repetitive steps, challenges in information tagging, how to derive insights from outcomes and so on.
This way the key data design parameters got sharply identified for the data team. More so, there was already a buy-in from the business users. Effective data design began from there on.
So we now always try to get the business user to take the lead in defining the process.
Model in Excel or Drive first
Another significant challenge often crops up when we start working with real data.
In the example above, when the data team attempted to bring this data through RSS feeds and scraping scripts, this data was often different from what was obtained during the manual process.
Thus began the process of modifying the manual process.
It makes sense to design the data-driven process in Excel first. We use Excel when we are prototyping with one or two test users, and Google Drive if there are more users involved. Our test business user would work with crude dashboards in Excel, and help fine-tune it.
Once all major aspects of the process were modelled, and found to be adequate—and by adequate we mean, they have to be delivering at least 20-30% better than the manual process—we started coding this into a custom application (almost always using PHP and mySQL).
The initial prototype should be quick to set up, and easy to work with. That allows quick iterations and fine-tuning.
Don’t pre-configure around one key parameter
Or even two, for that matter. For a long time, in our auto site, CarToq.com, we focused our data analysis around traffic metrics. That was our key business objective—we wanted to break into the Top 3 sites in auto content. But it led to some skewed decisions also.
And that’s another dynamic that can be managed better with better data design. As you start using data in your decision-making, decisions start getting influenced more and more by the data you are seeing. Which is a healthy place to be in. Except, if your data view is skewed.
In Cartoq, our selective bias for traffic data meant we failed to notice the alarming slide in our engagement metrics. When we breached the 1 million monthly visits mark, it came at the cost of very poor user engagement numbers.
Since then, we have adopted another rule-of-thumb for data analytics. Build dashboards around three key metrics ideally for three different data buckets, with key and supporting metrics. In Cartoq, these are traffic, engagement and ad performance.
This multiplicity of variables, often pulling in different directions, does make decision-making more difficult, and less linear. But this data-driven model is a closer approximation to the real world, and thus helps optimise outcomes better.
In fact, this could be the common strand that runs across our data design practices. The data framework we build should capture ground-level realities. It is an aid to existing processes, not a new process by itself.
That’s why we rely on business users to lead the process, and then we iterate the data model with prototypes. Data design must be embedded in real-world dynamics.