Data: how to manage your company’s most valuable commodity
Christy Haragan, Principal Sales Engineer and Global GDPR Lead at MarkLogic, explains how she thinks companies need to structure and manage their data, and the issues they might face. Data Infrastructure is a topic in this month’s Business Chief and Haragan offers comment on the subject from 15 years’ worth of experience in the IT world.
How valuable a commodity is data?
Business is a science in today’s world, and data is what drives effective decisions – so data is arguably the most important asset any company has. We all know that the best product in the world is useless unless you can sell it. Selling it means knowing which market to address, what messaging to use and how to reach out to do so. Data is the most important component to answering these questions and achieving these goals.
Are some companies getting it wrong?
Where organisations fail is when they take decisions that aren’t backed by the right data. Knowing what data you need to have is thus equally critical. The challenge we face in the modern world is data overload - we have a paradox in that data is what enables us to sell effectively and operate efficiently, but that we have so much of it that it can seem impossible to find the right data.
How is ‘big data’ being used in the market these days?
The worlds of ‘Big Data’ and ‘Data Science’ are relatively new fields. However, the more traditional field of ‘Data Management’ has become more important in this new era of data-driven decision making. People have learnt that simply ‘dumping’ data into a ‘data lake’ will result in them ending up with a ‘swamp’. In fact, statistics have shown that data scientists, those highly skilled, highly expensive PhD hires, spend 80% of their time simply trying to manage data.
So why do people go down this route of dumping data together? Because data sits in silos. A medium-sized organisation will typically have 100 or so systems, while large organisations can have thousands. And this is to say nothing of the external data feeds organisations are looking to leverage. Systems were designed to solve a particular problem with a particular set of data, not to integrate different types of data together. Organisations spend billions each year in integration software, but traditional approaches are failing to keep pace with business and the rate of change of data.
What are the key steps organisations need to take to ensure they structure and manage data correctly?
Agile data management. What does this mean? Data management typically follows the same process of building a bridge:
- Gather all requirements (the possible questions we would want to ask of the data)
- Analyse these to build a data model (something that captures all the data points necessary to answer the questions)
- Go to each data source or system that will be required to fill in the data model and look at their data models (which will have been built to serve the original purpose that data is used for)
- Merge them all together to make them look like your target model
Only once all this is done can you actually start asking questions about your data. However, by then the business will have moved on, new data sources appeared, and people have gotten so bored that they’ll have likely opted for the ‘dump the data in one place’ approach previously mentioned.
Instead, an agile data management approach involves trying to answer one question at a time – working through just the data that supports that question, and then delivering those to our data scientists and letting them drive what further questions they wish to ask to improve their analysis. With data management improved, data scientists will be able to focus more of their time on gaining appropriate insight from data, which will give businesses quicker and better insights that will help them drive decisions that will help them grow.
Data management and compliance
As a final note to make, agile data management does not equate with process-free data management. Governance, security and control are all aspects of traditional data management that help enable the business to trust that the data is fit for purpose. As the data is brought together and treated in an agile data management fashion, part of the process in doing this should be adding metadata (information about the data). This meta-data would include details like: Where did the data come from? Is this data personal data (necessary for GDPR compliance)? What quality requirements does this data have? Adding this metadata means that the data can be classified and brought under specific controls and processes to ensure it is held to the necessary standards and governance required by the business and by law.