Unifying the data estate for the GenAI edge
By Adam Beavis (pictured), Vice President & Country Manager, Databricks Australia
Generative Artificial Intelligence (GenAI) has the potential to increase global gross domestic product (GDP) by up to US$10 trillion, according to J.P. Morgan Research, but businesses must get the data foundation right if they’re to take advantage of this enormous opportunity.
Organisations have been experimenting with Artificial Intelligence (AI) and preparing to deploy their AI models for years now. However, many businesses face challenges from poor data quality and complex workflows to navigating disparate AI data and platforms, along with rising costs and unclear business value.
Yet, despite these challenges, businesses are moving AI models into production at a rapid rate, with 11 times more AI models shifting into production in comparison to last year, according to the Databricks State of Data and AI report which also found that, on average, organisations became over three times more efficient at putting AI models into production.
However, one of the biggest barriers to successful GenAI implementation that remains is the state of the enterprise data estate. For many organisations, the data estate is fragmented, with data silos and discrete systems making it hard to bring everything together to train AI models properly.
Break down data silos
Businesses often adopt multiple data platforms and software solutions over time, resulting in incompatible and proprietary formats for their data across the organisation. Not only can this lead to vendor lock-in, it also makes for data silos and a fragmented approach to security and governance, leading to slow, complex and expensive data platforms.
Such fragmentation presents particular privacy and security issues when data is used to train AI models — especially GenAI models, which often need to use very large datasets. Moreover, due to the ‘black box’ nature of most proprietary GenAI models, it’s difficult for organisations to work out exactly what’s going on with their data once it’s been fed into the algorithm.
Even when using open source GenAI models, if there isn’t an overarching data governance infrastructure to ensure that data is where it needs to be and being managed in the way it needs to be managed, the chances of it ending up where it shouldn’t be, or being used in a way it shouldn’t be used, increases.
Address fragmentation to create a unified data estate
Bringing the data estate together into a unified and cohesive whole is perhaps the most effective way to avoid falling afoul of data regulations, especially when it comes to privacy and security. But how can businesses defragment their data assets and unify them effectively if they’re spread across multiple systems and solutions?
As a first step, such fragmentation can be solved by storing data in open formats and in a unified platform that doesn’t require enterprises to hand data to vendors. With open formats, it becomes easier for businesses to own their data, since it doesn’t need to be tied to a particular solution or system for readability.
The implementation of a data intelligence platform is key. Such a platform, built on a ‘lakehouse’ foundation, works to democratise access to analytics and intelligent applications by marrying an organisation’s data with AI models that are personalised to suit the needs of business that is deploying them.
Drawing on the innate capabilities of a lakehouse foundation, which combines the best features of a data lake and a data warehouse, a data intelligence platform can deliver reliability and performance without sacrificing open standards, enabling an organisation to unify its data warehousing and AI use cases on a single platform.
With open data formats and a data intelligence platform in place, it is much easier to establish a stringent data governance regime — the key capability underpinning regulatory compliance. This is because an effective data governance solution helps companies protect their data from unauthorised access and ensures that rules are in place to comply with regulatory requirements.
Pathways to growth and innovation
As organisations continue to accumulate vast amounts of data and GenAI models become ever more powerful, the ability to unify data through open formats and data intelligence platforms will become even more valuable.
With tools like UniForm that can make multiple open storage formats work together seamlessly and a powerful data intelligence platform, businesses have a much greater chance of overcoming data estate fragmentation and getting their GenAI projects beyond the proof of concept phase, into production — driving a new era of innovation and growth in the process.