Generative AI: 7 steps to make sure your data is ready

Picture: Eugene Mymrin/Getty Images.

Everyone wants to harness the power of generative AI and big language models, but there’s a catch. For artificial intelligence to meet its high expectations, viable and quality data are needed – and that’s where the problem hurts.

Constant pressure to “do something with generative AI”

McKinsey points out that there is a constant pressure to “do something with generative AI”. However, this pressure is accompanied by other problems: “If your data is not ready for generative AI, your company is not ready for generative AI,” warns the company in a recent report on the subject, carried out by Joe Caserta and Kayvaun Rowshankish.

The latter suggest that IT managers and data managers “will need to develop a clear vision of the implications of generative AI in terms of data”. The data could be consumed through pre-existing services, via application programming interfaces (APIs) or company-specific models, which will require “a sophisticated data labeling and marking strategy, as well as larger investments”.

Perhaps the biggest challenge is “the ability of generative AI to work with unstructured data – chats, videos, code,” according to the authors of the report. “Data organizations are used to working with structured data, especially data organized in tables. »

Organize data to take advantage of technological developments

This evolution of data-related concerns means that companies need to rethink the global data architecture to support generative AI initiatives. “It may not seem new, but while it was possible for a company to get by before, generative AI will pose big problems in this regard. It will simply not be possible to benefit from many of the benefits of generative AI without a solid database,” warn the authors of the report.

Across the industry, a growing number of executives are worried about their company’s ability to handle the huge influx of data needed to manage emerging challenges – especially generative AI. “The digital transformation, driven by constant innovation and technological advances, implies a change in the way organizations operate,” says Jeff Heller, vice president of technology and operations at Faction, Inc.

“In this rapidly changing environment, virtually all departments, from R&D (Research and Development) teams to operational positions, are experiencing remarkable expansion, with the proliferation of devices and advanced technologies,” he says.

Data, an asset or a handicap

Moreover, AI is not the only factor driving the need for more efficient and responsive data architectures. “Customers will continue to demand tailor-made services and communications, which of course rely largely on accurate data,” emphasizes Bob Brauer, founder and CEO of Interzoid.

“The increasing dependence on analysis and visualization tools, vital for making strategic decisions, leads to a strong dependence on data. And as artificial intelligence becomes more important, data becomes essential to train its models,” he adds.

According to Jeff Heller, the message is clear: it’s time for companies to strategize and adopt cutting-edge technologies to “ensure that data remains an invaluable asset rather than an overwhelming handicap”.

7 points of attention for organizations

In order to prepare your data for the emerging era of artificial intelligence, experts recommend that organizations take into account the following elements :

The implementation of a data governance strategy. “By defining the right priorities, the right teams, the right governance, the right tools and a management mandate, companies can take their data quality challenges from a weak point to a significant competitive advantage,” says Bob Brauer. The creation of a “working group – or the appropriate equivalent depending on the size of the organization – to study how the emerging innovation of generative AI, large language models and other new AI-based technologies can be applied to obtain a competitive advantage” could be a step towards obtaining support from the organization for its data, which underpins AI and other initiatives.

The implementation of a data storage strategy. Finding a place to store all this data, while making it accessible and findable, is essential. Recent industry studies reveal that “more than half of the stored data (60%) is inactive, which means that it is rarely – or never – accessed,” reports Brian Pawlowski, head of development at Quantum. “Despite this, companies do not want to part with it, because they understand that data can offer valuable solutions and business value in the coming years, especially given the advent of widespread use of generative AI,” he says. According to him, this situation calls for a reassessment of existing capabilities to “establish modern and automated storage architectures that allow people to easily access active and inactive data and work with them throughout their life cycle”.

The quality of the data. Preparing the data architecture to handle new AI-powered demands must “start by making data quality a strategic priority,” advises Bob Brauer. “A good starting point would be the appointment of a data manager – or an equivalent function – with a specific budget and resources for initiatives in this direction. »

The analysis of the progress made. “Among the priorities of the management of an organization, there must be the evaluation of company-wide data, as well as the establishment of provisions and objectives to measure its success,” emphasizes Bob Brauer.

The management of the capacities of unstructured data. The problems related to data quality are more pronounced with generative AI models than with classical machine learning models. And this, “because there is much more data, and most of it is unstructured, which makes it difficult to use existing monitoring tools,” explain the authors of the McKinsey report. “Unstructured data accounts for about 90% of the data that will be created in the future, and global capacity is increasing by 25% with the average annual growth rate for the next five years,” reports Brian Pawlowski. “These unstructured data are those that are stored in files and objects: high–definition videos and images, complex medical data, genome sequencing, input data for machine learning models, captured scientific data on the natural world – mapping of oil and gas fields, for example – and simulation of reality – in particular special effects, animation and augmented reality. It is essential that organizations deploy solutions that manage the data lifecycle in an automated way and that use advanced technologies, such as AI, to help extract greater business value. »

The integration of capabilities into the data architecture to support extended use cases. According to the authors of the McKinsey report, we must not forget to “integrate relevant capabilities (such as vector databases and pre- and post-data processing pipelines) into the existing data architecture, in particular to support unstructured data”.

The use of AI to help build AI. “Use generative AI to help you manage your own data,” suggests the McKinsey team. “Generative AI can accelerate existing tasks and improve the way they are performed throughout the data value chain, from data engineering to data governance and analysis. »