Unravel Data: Bust the big data blocks
Any organisation looking to truly optimize its big data stack knows it faces enemies, blocking the way of DataOps and DevOps teams. It can feel like climbing a mountain; the fabled golden city of uptime and peak efficiency always beyond another crag, around another boulder.
It’s a tough area, particularly when the IT team and wider business have other conflicting ideas about how to spend that time. Yet it’s a fantastic challenge to grapple with for those who like nothing better than overcoming whatever block that business, technology, or chance throws into their path. For the DevOps and the DataOps teams operationalising big data and ensuring models and services remain in production there are certain blocks that loom larger than the rest. Here’s how teams can look at these blockages with determination rather than fear.
Let’s get it out of the way first, because when the business talks analytics and big data, the number one, front-of-mind issue is volume. It’s one of the V’s of big data, and it’s part of why we call it… ‘big’ data.
The volume of data can be a challenge. Expanding giga-, tera-, and petabytes need some ‘tin’ to store it on. Everyone knows that the digital society is producing great quantities of data, and the volume is rising day-on-day from digital transactions and records, the internet of things (IoT), and ever-greater digitalisation and the ‘chips-with-everything’ mentality that’s connecting so many new categories of device each year. Even that sentence was high in word volume.
And so, with rising data volumes available, enterprises continue to leverage this resource to create new business value. The trend fuels a plethora of new applications spanning an alphabet soup, from ETL, AI, IoT, and ML, aimed at many business drivers. These applications need an application performance management solution to meet robust enterprise production requirements as they are deployed on new data platforms.
Then there are the systems that begin to creak or fall over when data volumes start pushing at their technical boundaries. These are often solutions like relational database management systems, or statistics or visualisation software. Many examples of these struggle to manage really big data.
Data applications (i.e., data consumers) don’t exist in isolation of the underlying big data stack. Endlessly looking for more storage server space, reconfiguring clusters, and ensuring that databases are optimised is no mean feat for the DevOps and DataOps teams.
Applications are threaded together with many different systems (e.g., ETL, Spark, MapReduce, Kafka, Hive, etc.). How the stack performs has a direct impact on downstream consumers. Managing applications is highly complex and requires an end-to-end solution, especially to meet SLA agreements.
Stich those siloes together
The barriers and blocks that get in the way of a strong big data stack and a good analytics process go deeper, almost inevitably encompassing the siloes most organisations have built up as data pools in departments and teams. Each guards their information and narrowly optimised processes so that the business grows a complex data estate of co-mingled assets. Each silo is a barrier and a block to a single version of the truth, and a strong analytic process that encompasses the whole organisation. As is said in the context of customer service, ‘it’s easier when there’s one throat to choke’.
In fact, merging data sources and cataloguing data (best-practice aspects in busting those siloes) really helps combat another block…
The data pipeline is only as good as its weakest link. Unexplained run-time problems in your applications often occur because one part of the analytics pipeline has changed, moved, been reconfigured, is starved of compute resources etc. Troubleshooting is time-intensive and complex and configuration variables are a pain to parse through. And when aiming for perfection the wise DataOps person knows that “perfect is the enemy of good”.
The automata, your friend
Automation is one big data block busting ally. The DataOps team have some big fish to fry, and only so many pairs of hands with which to work. In fact, there’s a real skills shortage for good big data architects and cloud specialists.
Morgan McKinley, the recruiters, published an IT salary guide showing soaring demand and a massive shortage of available talent in emerging technologies and specialist fields. The dearth of talent has really driven up industry salaries. A cloud architect could expect £100,000 to £130,000 with the right industry experience on top of their technology acumen.
Given the lack of freely available talent, it makes sense to safeguard the teams you have and allow them to use their skills to best effect by automating the parts of their roles that allow them to focus on their higher skill-set. In case you were wondering, the hottest data technologies are still SQL, R, Python, Hadoop within Data Science, and Kafka, Scala, Spark underpinned with Java within Data Engineering – according to Morgan McKinley.
Giving the existing DataOps and analytical trouble-shooting talent a hand with some strong application performance management solutions will help them bust through blockages with ease, more easily troubleshooting ands optimising every aspect of the big data stack. Not only does it make sense to optimise the process for better, faster business results – but given the strength of the DataOps team’s negotiating power in the current skills shortage, it’s wise to keep them happy and allow them to enjoy their jobs, rather than debugging, chasing their own tails all day.
Kunal Agarwal, CEO at Unravel Data
Kunal Agarwal co-founded Unravel Data in 2013 and serves as CEO. Kunal has led sales and implementation of Oracle products at several Fortune 100 companies. He co-founded Yuuze.com, a pioneer in personalised shopping and what-to-wear recommendations. Before Yuuze.com, he helped Sun Microsystems run Big Data infrastructure such as Sun's Grid Computing Engine. Kunal holds a bachelors in Computer Engineering from Valparaiso University and an M.B.A from The Fuqua School of Business, Duke University.
Intelliwave SiteSense boosts APTIM material tracking
“We’ve been engaged with the APTIM team since early 2019 providing SiteSense, our mobile construction SaaS solution, for their maintenance and construction projects, allowing them to track materials and equipment, and manage inventory.
We have been working with the APTIM team to standardize material tracking processes and procedures, ultimately with the goal of reducing the amount of time spent looking for materials. Industry studies show that better management of materials can lead to a 16% increase in craft labour productivity.
Everyone knows construction is one of the oldest industries but it’s one of the least tech driven comparatively. About 95% of Engineering and Construction data captured goes unused, 13% of working hours are spent looking for data and around 30% of companies have applications that don’t integrate.
With APTIM, we’re looking at early risk detection, through predictive analysis and forecasting of material constraints, integrating with the ecosystem of software platforms and reporting on real-time data with a ‘field-first’ focus – through initiatives like the Digital Foreman. The APTIM team has seen great wins in the field, utilising bar-code technology, to check in thousands of material items quickly compared to manual methods.
There are three key areas when it comes to successful Materials Management in the software sector – culture, technology, and vendor engagement.
Given the state of world affairs, access to data needs to be off site via the cloud to support remote working conditions, providing a ‘single source of truth’ accessed by many parties; the tech sector is always growing, so companies need faster and more reliable access to this cloud data; digital supply chain initiatives engage vendors a lot earlier in the process to drive collaboration and to engage with their clients, which gives more assurance as there is more emphasis on automating data capture.
It’s been a challenging period with the pandemic, particularly for the supply chain. Look what happened in the Suez Canal – things can suddenly impact material costs and availability, and you really have to be more efficient to survive and succeed. Virtual system access can solve some issues and you need to look at data access in a wider net.
Solving problems comes down to better visibility, and proactively solving issues with vendors and enabling construction teams to execute their work. The biggest cause of delays is not being able to provide teams with what they need.
On average 2% of materials are lost or re-ordered, which only factors in the material cost, what is not captured is the duplicated effort of procurement, vendor and shipping costs, all of which have an environmental impact.
As things start to stabilise, APTIM continues to utilize SiteSense to boost efficiencies and solve productivity issues proactively. Integrating with 3D/4D modelling is just the precipice of what we can do. Access to data can help you firm up bids to win work, to make better cost estimates, and AI and ML are the next phase, providing an eco-system of tools.
A key focus for Intelliwave and APTIM is to increase the availability of data, whether it’s creating a data warehouse for visualisations or increasing integrations to provide additional value. We want to move to a more of an enterprise usage phase – up to now it’s been project based – so more people can access data in real time.