Over the years I have heard phrases like “data is the new oil” or “data is the new gold.” However, the more we look at and discuss data management and operate, a more true comparison emerges: data is like radioactive materials.
Like radioactive substances, data has enormous potential to create positive change and innovation. However, it also carries inherent risks that must be carefully managed. Just as improper handling of radioactive materials can lead to catastrophic consequences, reckless handling of data can result in solemn damage.
How AI creators and userswhen it comes to data, we need to adopt a way of thinking about how we handle radioactive materials, recognizing their potential for both good and harm, and taking proactive steps to ensure they are used responsibly and beneficially.
Data evolution and AI
In the 2010s The era of Big Data has arrived, marked by an unprecedented influx of information. This escalate in data was indispensable to the operation of large-scale models, resulting in the need for enormous amounts of information. However, as we enter the 2020s, there has been a noticeable shift towards collecting data for specific operate cases. This change emphasized quality over quantity and the importance of targeted data collection.
More recently, the development of generative artificial intelligence (GenAI) has changed the type of content we consider data. Data is no longer constrained to spreadsheets and structured datasets. Data now includes articles, videos and more.
This extension expands the scope of possibilities for artificial intelligence initiatives, but also introduces new complexities and threats. With content as data, not only will the complexity of AI projects escalate, but the risk of data becoming a burden for businesses will also escalate.
When is data an asset or a liability?
While data can be a valuable asset, offering physical business results, it has some solemn limitations and can become a huge burden if not managed well.
This is especially true in the face of GenAI and maturing privacy regulations. Quote Dominique Shelton-Leipzig’s book Trust“recalibration is needed to avoid collisions between data innovation and data privacy. If a data breach were a country and the $6 trillion loss was GDP, the country where the data breach occurred would be the third-largest GDP in the world, behind the United States and China.” Gone are the days when data is stored by default, especially if the data does not generate value.
Even organizations that are good at data management are generally ill-prepared to apply the same level of data management to the masses of novel content data sources now available in the form of reports, PDFs, meeting recordings, presentations, and other multimedia resources.
Here are some scenarios where data has become a burden for businesses:
- Collecting data without a purpose or using data for multiple purposes. For example, original data may be collected for transactional purposes (i.e., we need to include physician notes in patient records to document diagnoses and treatment plans), but attempts to operate the same data for another, unspecified purpose do not always work.
- Storing massive amounts of data. Data consumes enormous amounts of energy to store, secure and process, resulting in an increased carbon footprint.
- Data poses security risks. Cybercriminals are attracted to organizations with huge amounts of data. As the amount of data stored increases, are you ready to reduce the additional risks associated with it?
- Penniless data quality leads to poorly trained models. Artificial intelligence and machine learning rely on neat data to function properly. Without this, companies could make costly mistakes.
Fortunately, there are several strategies to avoid such data pitfalls.
Strategies for turning data into assets
Data subject to the strictest protection guidelines often comes from a human source, whether you’re observing users, capturing transaction information, creating conversational agents, or performing other human-centric ML activities. People are complicated and sometimes stupid and unreliable, which means the data reflects some of these errors.
How Dun and Bradstreet say, “When data is dirty, there is usually an underlying problem with the business process that needs to be solved.” In other words, inexact or incomplete data is often the result of penniless data collection practices, lack of data management, and mismatches between IT and business goals. Don’t assume that what you capture is an true representation of the world.
In my experience working with hospitals, it is not uncommon for patient cases to be re-reviewed and updated with novel data because an incorrect diagnosis was made or lab work performed outside the healthcare system needed to be added to their records.
When working with raw data, everything is fine. However, there is a cascading effect of models built on original, incomplete or uncorrected data. While data may never be perfect, it’s worth ensuring that your data hygiene processes not only target the data, but also the models that subscribe to it.
Whenever you decide to collect novel data, consider the risks of (1) data collection and (2) data storage. Will it only escalate your company’s liability, or is it related to fair operate and therefore worth storing (read: protecting)?
Don’t be a company that strives for perfect data. Often, building a model through rapid prototyping will allow you to determine the nature of the missing data and make it easier for you to capture data for the right purpose.
Overall, we need to stop defaulting to treating data as valuable. Cassie Kozyrkov wrote it best is LinkedIn: “I wish we would all stop pronouncing data with a capital “D.” Data isn’t magic – just because you have a spreadsheet full of numbers doesn’t guarantee you’ll be able to get anything useful out of it.”
Good data arises as a function of process. As the amount of data needed to harness the power of GenAI increases, investing in data quality has never been more significant. Data only gains value through process and astute investments. It may not be gold waiting to be found, but a diamond in the making.
about the author