top of page

AI Starts With Data: Why Most Organizations Are Not Ready Yet

  • thaborwalbeek
  • Jan 12
  • 5 min read

Garbage In, Garbage Out — Data Preparation for AI


Series 1: Preparing Your Data for AI (Part 1 of 4)

AI is evolving quickly. Where organizations first were struggling to understand how AI could help them, more and more projects are started to experiment with AI. And for some organizations they have even moved on from experiments to full scale implementations of AI.

Within organizations budgets are allocated to support these AI projects, as they are not debating anymore about if AI matters, or will be used in the future, they now want to know how fast they can implement these projects and when it can deliver value for them.

Even within the organizations more and more people and teams are equipped with AI knowledge and are starting to understand the benefits of implementing them.

Yet, despite AI developing and evolving fast, many organizations do not seem to be able to get the AI projects out of a Proof of Concept (PoC) phase, or are facing issues while running the projects in production.

For these organizations the main problem seems to be that they blame the model, as it is not accurate enough, not yielding into the outcome they expected, or the predictions could not be explained to the rest of the organization.

From here teams try to explore more and other models, other advanced techniques, or just tuning the parameters of the model.

The real bottleneck in most projects however is the fact that the reason for failure is mostly in the data itself. Data is not ingested, organized, cleaned and transformed properly, before putting it into any AI model. Therefore, we see a lot of Garbage In, Garbage Out.


The AI Expectation Gap

For years and years data has been used, for analytical purposes, business intelligence, and the management and users have gained trust in the dashboard and reports they see and work with on a daily basis.

They also come from trustworthy source systems, that have evolved over the years. And those systems have been customized to meet all the organization's requirement in their business model.

So, this would create a sense of: we have the data, therefore we can do AI.

And that is one pitfall, if you look at AI systems. As humans we consume data in a different way. Which makes those dashboards and reports perfectly fine for running the business. But, those AI systems consume the data in a different way.

The fact that those dashboards and reports work well, is that as humans we can deal with any ambiguity, and we know exactly what the data is showing us, including any data that might appear useful and insightful. From our experience we know exactly what exceptions might be there, how to make interpretations from the numbers we see. We can compensate that for missing values, definitions that are not (or partly) declared, and we understand the business rules, as we are working with that on a daily basis.

Models, and AI models in particular, do not have this ability that we humans have. They learn their patterns from the data itself. So whatever is presented to the model, is what it can work with. So, if there is any missing data, inconsistency in the data, or assumptions that we normally do in the analysis phase, the AI model will not be able to capture all of that, leading to minor or major data issues while learning the model.

And that is exactly where most AI projects will fail.



Why Data Is the Real Bottleneck for AI

AI models do not create any intelligence on its own. It just reads the data and works with what is given. If there is any inconsistency (missing values, missing logic, etc.) in datasets, and especially between different sources, the AI model will not be able to detect any patterns. It just uses those 'weaknesses' and prepares the model on that.

This explains the common situation in many organizations:

  • A promising AI use case is identified

  • Data is extracted (from different sources)

  • Models are built and evaluated

  • Results are not as expected and hard to trust


At this point most teams will start focusing on the last two bullets by adjusting the model, parameters to tweak the model to get a better result. But in reality it means that the ingestion and preparation of the data was not properly prepared for AI.

Therefore, it is very important to have a good data quality system in place, to assure that the data ingested into AI systems are fully prepared in such a way the AI system understands and can work with it.


What “Preparing for AI” Actually Involves

Preparing for AI is essential and cannot be performed after the model(s) have been created. It is important that these steps are executed while ingesting the data and while deciding on design choices for the AI model.

At a high level this preparation would entail:

  • Data ingestion

    Bringing data into the platform in a way that preserves source meaning, structure, and lineage.

  • Data organization

    Structuring data around clear entities and relationships instead of ad-hoc tables and reports.

  • Data cleaning as a system

    Replacing one-off fixes with explicit, reusable, and testable rules.

  • Data transformation for learning

    Shaping data so models can learn meaningful patterns, without leakage or distortion.

  • Handling multiple sources with shared meaning

    Reconciling different systems that describe the same business concepts in different ways.


The preparation phase is not optional and skipping it, or partly doing it, will result in results that are not expected or can be trust.


Why This Series Exists

While data cleaning and transformation is normally not seen as the primary concern and done iteratively by the teams, it will not work for the AI system. The reason is that while creating this along the way, dashboards and reports are forgiving. If something does not work, you take a step back and make changes. For AI systems this will not work.

This means that any human interpretation that normally is done while building analysis, reports and dashboards, now has to be managed beforehand. An AI system actually shows you how your data is organized and how the data discipline is in the organization.


This series focuses on the layer between raw business data and AI models. It will go through all the steps to get the data ready before the AI model will be able to succeed.


The intent is to combine:

  • Conceptual clarity about why certain data problems matter for AI

  • Practical direction on what needs to change

  • Implementation-oriented thinking on how to approach data preparation without tying the discussion to specific tools or vendors


How This Series Is Structured

Posts are published in monthly series of four, with each series focusing on one stage of preparing data for AI.

This first series focuses on foundations:

  • Why existing data often fails AI

  • Why ingestion and organization matter more than expected

  • Why multi-source data requires shared meaning


Later series will go deeper into data cleaning systems, transformations for learning, governance, and operating AI-ready data at scale.

Each post stands on its own, but together they form a coherent journey from AI ambition to data reality.


What Comes Next

In the next post, we will examine a common assumption that quietly undermines many AI initiatives: the belief that historical business data is already suitable for learning.

Understanding why that assumption fails is a critical step toward preparing data that AI can actually use.



This article is part of the series Garbage In, Garbage Out — Data Preparation for AI, exploring how organizations can build data foundations that actually work for AI.

Recent Posts

See All

Comments


Dathabor Data Solutions

  • alt.text.label.LinkedIn

©2024 by Dathabor Data Solutions. Proudly created with Wix.com

bottom of page