EN The AI Paradox in Companies: Everyone Wants Fast Change, No One Wants the Hard Work
Everyone talks about AI. Hardly anyone talks about data. This research-based article explains why most AI projects in SMEs fail not because of the technology, but because of silos, poor data quality, and unclear ownership—and what companies must do now to become truly AI-ready.

Artificial intelligence was the defining topic of 2025 and will remain so in 2026—especially for SMEs and mid-sized businesses hoping for greater efficiency, automation, and better decision-making. The vision is compelling, whether it involves large language models, traditional machine learning, or specialized AI systems such as computer vision or predictive analytics. Many LLMs can already solve useful tasks today without requiring extensive customer-specific datasets, for example in knowledge retrieval, draft writing, or general analysis.
The catch becomes clear in the details: Truly business-critical, data-driven AI projects in companies rarely fail because of the technology—they fail because of the data foundation. This bottleneck is often underestimated, especially in small and medium-sized companies where resources are limited and IT landscapes have evolved over time. For all AI initiatives based on company data—from forecasting and automation to decision support—data quality becomes the limiting factor. Poor, incomplete, or biased data makes such solutions useless, no matter how powerful the model is. That is the well-kept secret many companies only learn after their first failed AI initiative.
Why data matters: Garbage in, Garbage out
AI needs one thing: high-quality training data. Period.
Poor data leads to unreliable models, incorrect recommendations, and failed projects [1] [2].
Data scientists spend more than 80% of their time cleaning data, not doing machine learning [3] [4] [5]. The effort is hidden, but it comes back like a boomerang. Even brilliant algorithms cannot deliver high-quality results if the input data is flawed—known as “Garbage in, Garbage out.”
The quality of training data is fundamental for high-performing machine learning models with strong generalization capability [1].
Current research shows: data quality is critical for the safety, fairness, and robustness of AI systems [8] [9] [10]. This applies equally to large language models, traditional ML systems, and all other AI approaches.
The uncomfortable truth: German companies are not AI-ready
This is the core problem: Around 70% of German companies do not have a unified data management system [22]. They struggle with fragmented data structures, silos between departments, and IT landscapes that have grown historically over decades. The ERP system here, the CRM there, sensor data somewhere else. No one manages to bring it all together.
Globally, companies consistently report: data quality is the main obstacle to AI readiness [7]. It ranks among the top three central barriers, right alongside lack of resources and unrealistic expectations. A recent meta-analysis of data-readiness metrics shows that poor data quality leads to inaccurate and ineffective AI models, which can result in incorrect or unsafe applications [7].
Where things go wrong: The usual suspects
- Data silos: Every department manages its data in isolation. Sales, production, HR—everyone has their own system, their own standards. Consolidation? Not planned. Excel spreadsheets as shadow processes add to the chaos
- Legacy systems without interfaces: Old ERP and CRM systems often lack modern APIs. Extracting data without jeopardizing the system? Technically complex, expensive, and risky
- Missing governance: Who “owns” the data? Who is responsible for its quality? Too many companies have no answer to these questions. The result: problems are ignored, standards do not exist, and chaos spreads [11]
- Incomplete and inconsistent data: Missing values, duplicates, errors that no one noticed. Because before AI projects, the data was simply never analyzed this intensively [7]
In working with dozens of projects, a consistent pattern emerges: almost every project struggles with data quality issues. They are not always the same—one lacks historical data, another has unstructured or faulty data. Some companies have redundant data, others work with messy datasets. Rarely is there only one issue; usually there are several at the same time. This is not the exception—this is the rule.
The disaster: Starting AI without a data foundation
The classic scenario: A mid-sized company launches its first AI project with great optimism. After just a few weeks, it becomes clear: the models are useless. Why? Because the training data is not meaningful [2]. What happens then?
- AI gets discredited as “hype without value”
- Manual effort for data cleaning explodes—time and budget are gone
- Projects become 2–5x more expensive and take much longer than planned
- They are canceled or end up as a “Proof of Concept without impact"
From consulting practice: The most typical scenario looks like this: A manufacturing company plans a predictive maintenance project using sensor data. Expectations are high. Then the data analysis begins: the sensors deliver incomplete data, the structure is inconsistent, historical faults are not documented. What was supposed to be a 6-month project turns into an open-ended data clean-up mission. The project is stopped or drastically reduced. The potential remains unused.
The right path: First the data foundation, then AI
Step-by-step approach
- Understand the status quo. What data sources exist? How do the flows work? This creates clarity about the real size of the problem.
- Identify silos and interface issues. Where can data not communicate? Where is the greatest integration potential?
- Clean and harmonize data. Fix errors, define standards, remove duplicates [6]. Studies show: the right handling of missing values can lead to up to 20% improvement in classification tasks [7]. A structured data quality assessment is essential before any ML project [12].
- Build a central data model. A data lake, data warehouse, or modern cloud data layer—as a collection point for cleaned, standardized data.
- Introduce data governance. Clear responsibilities, standards, quality metrics. Governance is the key to sustainability [7] [11]. Automated governance frameworks reduce errors and accelerate implementation [13].
- Integrate systems. Connect ERP, CRM, IoT, and MES via standardized interfaces.
- Build AI infrastructure. Only now: cloud platforms, ML tools, analytics, and LLM platforms.
It sounds like a lot of work—and it is. But it is the only real foundation for AI success, regardless of which AI technology is used.
The pragmatic starting point for SMEs
Not everything at once:
- Data value workshop: Clarify internally: where are the real data potentials? Which use cases solve real business problems?
- Discovery workshop: Outline concrete AI applications: predictive maintenance? Fraud detection? Demand forecasting? Automation with LLMs?
- AI readiness workshop: Evaluate honestly: how AI-ready is our data really? What is needed? This should be based on standardized metrics [7]. A - * - structured maturity framework for ML quality helps set realistic expectations.
- Pilot project: Start small, e.g. with an analytics project that also improves data quality. Quick wins create momentum.
- Scaling: Use measurable success (e.g. “40% time savings in reporting”) to justify larger investments in the data foundation.
The advantage: The entry point is low-threshold, pragmatic, and measurable. The data foundation improves in parallel.
What does this look like in reality?
Many mid-sized companies know exactly that their data foundation is a problem—but they do not know how to proceed concretely. The spectrum ranges from the initial assessment to technical consolidation and the establishment of governance structures. This requires not only technical know-how, but also experience with mid-market contexts: limited resources, ongoing business operations, and organizational resistance.
The key is a structured, proven approach combined with pragmatic solutions that fit the company. Companies that follow this path systematically report:
- 20–40% efficiency gains in data integration and management
- Significantly faster time-to-value for subsequent AI projects
- Higher acceptance of AI solutions because the data quality is right
- Long-term advantages through stable, maintainable data structures
The regulatory reality: EU AI Act
There is also added pressure from above: the EU AI Act sets higher requirements for data quality and documentation. Companies using high-risk AI (recruitment, lending, safety) must prove that their data is representative, complete, and free from bias [23] [24]. That is impossible without a stable data foundation.
The EU AI Act, which entered into force in June 2024, obliges providers of high-risk AI systems to establish a Quality Management System (QMS) [21]. Specifically, Article 10 of the EU AI Act regulates strict requirements for training, validation, and test datasets: they must be relevant, representative, free of errors and discrimination, and as complete as possible for their intended purpose [23]. In addition, companies must carry out and document a comprehensive investigation and mitigation of bias in their datasets—especially with regard to possible discrimination and effects on fundamental rights [23].
For companies in high-risk areas, the rule is clear: without demonstrably high-quality training data, AI systems are not legally compliant—and therefore not deployable. Mere compliance becomes an economic factor. Companies that professionalize their data foundation today benefit twice: they meet the regulatory requirements of the EU AI Act and are technically better prepared at the same time [24].
Conclusion: The window is now
Competitive pressure is increasing. Companies that modernize their data foundation now will have a massive advantage in 2–3 years. They will be able to scale AI quickly and successfully, while competitors are still struggling with silos.
Data is becoming a strategic asset. Like technology, talent, and capital. Those who have good data benefit—not only in AI, but across all digital initiatives.
The problem is avoidable. Many AI investments fail not because of technology or money, but because of poor data management. Companies that address this dramatically increase their chances of success [2].
Without robust infrastructure, clear governance, and consistently managed data quality, every AI strategy remains theoretical:
- Infrastructure: Without scalable, integrated data and AI infrastructure—from data platforms and interfaces to deployment processes—AI remains fragmented and cannot be transferred into operational use [21].
- Governance: Clear roles, responsibilities, policies, and control mechanisms for handling data and AI are necessary to ensure quality, compliance, and traceability, especially in the context of the EU AI Act [23].
- Data Quality: Data quality is the operational translation of all ambitions: only if data is correct, complete, consistent, current, and largely free from systematic bias can ML models and LLM-based solutions reliably create value in a business context [24].
The core insight is simple: AI is not primarily a tech problem. It is a data problem. All business-relevant AI systems, whether LLMs, ML models, or specialized AI, are only as good as the data they use. Those who stabilize their data foundation now instead of chasing AI hype will benefit. Those who wait will fall behind.
The time to act is not tomorrow. It is today!
The first step is not a tool, but clarity. The free SME blueprint as an AI readiness self-check helps assess data quality, infrastructure, and governance in just a few minutes and prioritize the next steps in a structured way.
👉 Follow the link and send us “AI Self-Check” via DM—we will send it to you directly.
Author: Denis Appelganz, IT Consultant at WaveAccess