AI Software Development

Reality check GenAI in software development: between hype and real value

by Michael Heß, Area Manager Software Development

Beyond the hype cycle

The integration of artificial intelligence into software development has reached a critical turning point. After the initial hype cycle, a more complex and nuanced reality is emerging for technology leaders. The widespread adoption of AI is undeniable: the DORA 2024 report shows that 89% of companies prioritise integrating AI into their applications. 76% of technologists say they rely on AI for parts of their daily work.¹ The promise is accelerated workflows, increased creativity and unprecedented productivity.

However, there is a significant gap between perception and performance. While over 75% of developers now use AI coding assistants and report feeling more productive, many companies are not seeing a corresponding improvement in overall delivery speed or tangible business results.² This phenomenon, known as the AI productivity paradox, is not a sign of AI’s failure, but rather a symptom of a profound, systemic disconnect between a powerful new technology and established delivery systems.

This report deconstructs this paradox through a series of four critical, data-driven observations. It examines the impact of AI from the perspective of the individual developer, the health of the, the limitations of the deployment system, and the dynamics of human organisation. The codebase analysis shows that the challenges associated with today’s AI assistants are harbingers of the much larger architectural and strategic changes required to navigate the coming era of autonomous, agent-based AI.

1. The paradox of the developer experience

Perceived acceleration vs. objective reality

Developers who use generative AI report significant improvements in their daily work experience. Research shows that intensive AI users spend more time in a state of flow, are more satisfied with their work overall, and are less likely to suffer from burnout.³ This is confirmed by studies showing that developers who use AI are more focused on satisfying work and more engaged. This strong psychological advantage creates a powerful perception of accelerated workflows and increased personal productivity. ⁴ However, this perception is challenged by objective performance data. A randomised controlled trial (RCT) conducted by METR in 2025 found that experienced open-source developers took an average of 19% longer to complete their tasks when using AI tools. The discrepancy is significant: the same developers who were objectively slower estimated that the AI tools made them 20% faster.

Why this discrepancy?

The cause of this discrepancy seems to lie in a redefinition of what constitutes “work”. In the past, developers equated effort with the time they actively spent writing code. However, AI assistants drastically reduce this specific activity, creating a strong sense of acceleration. The cognitive load has not been eliminated, but has shifted from code generation to new, less tangible tasks: prompt engineering, output verification, debugging AI-generated logic, and integrating the code into a larger system. Since these new activities are not yet categorised as “work” in the same way as typing, developers underestimate the total time required to complete them. They may be faster at the task at hand, but the overall task has expanded to include a new, often hidden layer of AI management effort.

The dilemma of “valuable work”

A popular argument for using AI is that it automates routine tasks, giving developers more time for more challenging tasks such as system architecture, solving complex problems or user research. However, the data suggests that the opposite is true. The same DORA study³ that highlighted higher job satisfaction also found that developers who use AI spend less time on what they consider to be “valuable work”. At the same time, the time spent on tedious or routine tasks has not decreased accordingly.

This counterintuitive finding can be explained by how the time freed up by AI is actually reallocated. According to Atlassian’s 2025 State of DevEx Survey⁷, developers save over 10 hours per week by using AI, but lose a comparable amount of time due to organisational inefficiencies and friction in collaboration. The time saved in the IDE is immediately consumed by systemic delays elsewhere in the software development cycle.

Furthermore, the nature of current AI tools contributes to this trend. Due to their limited context windows, AI assistants tend to generate localised standard code rather than reasoning across system-wide abstractions. This encourages a work pattern in which the time saved on a low-value task is immediately reinvested in generating another task. Without a clear strategic mandate to focus on high-value work, the gap created by AI is not filled with architecture design sessions, but with the generation of more lines of code. This leads to a state of high activity and positive morale, but the proportion of time spent on work that truly creates long-term value declines.

2. The hidden tax of code inflation

While the developer experience is a complex mix of perceived benefits and hidden costs, the impact of AI on the codebase itself is far less clear. There is growing evidence that the sheer generative power of AI leads to a deterioration in code quality and maintainability, imposing a kind of “tax” in the form of technical debt that must be paid off by development teams over many years.

The erosion of engineering discipline

The “GitClear 2025 AI Code Quality Report” is a comprehensive analysis of 211 million lines of code that were changed between 2020 and 2024. It provides a clear, quantitatively sound picture of this trend. The data shows a dramatic erosion of established best practices in engineering that is directly related to the rise of AI assistants.

The most important finding is the sharp decline in refactoring. The percentage of “moved” code – a strong indicator of refactoring activity, where a developer identifies a reusable piece of logic and moves it to a common location – fell from 24.1% of all code changes in 2020 to just 9.5% in 2024.

Instead, developers are generating huge amounts of new and duplicate code. The proportion of “added” lines of code in commits rose from 39% to 46% over the same period. Even more alarming is that the proportion of copied and pasted lines has risen sharply. 2024 was the first year in which the volume of duplicated code in commits exceeded that of refactored code.

This phenomenon is a direct consequence of how AI tools work. They are designed to deliver a localised, immediate solution based on the context provided in an input prompt. This makes it much easier to generate a new, slightly modified code block than to search the codebase for an existing abstraction that can be reused.

Direct follow-up costs: code churn and wasted engineering effort

The negative effects of this suboptimal, duplicated code are not a distant problem. The GitClear report shows a significant increase in “churn.” This term refers to the percentage of new code that is changed or deleted within two weeks of its creation. This metric is a direct measure of wasted work and premature commits, and it rose from 3.1% in 2020 to 5.7% in 2024. This suggests that developers are spending more time correcting AI-assisted code, negating the initial speed advantages. The time “saved” by generating code in seconds must be repaid through immediate rewrites, debugging, and future maintenance cycles.

3. Systemic limitations

The acceleration of code generation at the individual developer level collides with the limited capacity of the broader software delivery system. This collision leads to bottlenecks that absorb and negate productivity gains. It is therefore essential for managers to shift their focus from measuring individual activities to measuring end-to-end system performance. The data clearly shows that without a holistic view, companies run the risk of optimising a single part of the process at the expense of the whole.

The following table provides a consolidated overview of the conflicting signals that define the AI productivity paradox, contrasting positive indicators at the individual level with worrying trends at the code and system levels.

Metric Category	Metric	Observed Impact of AI Adoption
Individual Perception & Activity	Time in Flow State Perceived Speed Job Satisfaction	▲ Increases Significantly ▲ Increases significantly ▲ Increases
Code Volume & Churn	Lines of “Added” Code Duplicated Code Early Rework (Churn)	▲ Increases Significantly ▲ Increases Sharply (4x-8x) ▲ Increases significantly
Code quality & maintainability	“Moved” Code (Refactoring)	▼ Decreases Sharply
System Throughput & Stability	PR Review Time Delivery Stability (CFR) Overall Org. Throughput Task Completion Time	▲ Increases Sharply (+91%) ▼ Decreases (-7.2%) ▬ No Measurable Improvement ▼ Decreases (19% slower)

Bottleneck: Code review

The main point of conflict is the code review process. Research by Faros AI² based on telemetry data from over 10,000 developers provides an important data point: in teams with high AI acceptance, developers merge 98% more pull requests (PRs), but the time these PRs spend waiting to be reviewed increases by 91%. This reveals a classic system bottleneck. The newly gained speed in code generation is not offset by an increase in the capacity of the system to review, test and integrate this code. The gains made in one part of the system are completely consumed by the queue in the next, slowest step.

Small batch sizes

This bottleneck is exacerbated by the tendency of AI-powered workflows to produce larger work packages. AI makes it easy to generate hundreds of lines of code at once, resulting in larger and more complex pull requests. However, this contradicts one of the basic principles of high-performing DevOps organisations and lean product development: the principle of small batch sizes. Small, frequent changes are easier to review, less risky to deploy, and enable faster feedback loops. The DORA Report suggests that the temptation to create large batches with AI is a major reason for the observed decline in delivery stability.

The Primacy of System-Level Metrics

This highlights the danger of focusing on misleading or incomplete metrics. Tool-specific telemetry data such as daily AI usage, suggestion acceptance rate, or number of chat interactions are merely early indicators of adoption, not business impact. They measure activity, not success. For example, a high acceptance rate for AI suggestions is meaningless if the code contains errors or sits in a review queue for days.

The only reliable method for measuring the actual impact of a technological or process-related change is through holistic, results-oriented system metrics. The four key DORA metrics – lead time for changes, deployment frequency, change failure rate (CFR) and mean time to restore service (MTTR) – remain the gold standard for this purpose. They measure the performance of the entire delivery system from commit to production.¹³

The evidence suggests that the impact of AI is not yet positive from this perspective. For example, DORA’s own research found a correlation between increased AI adoption and a 7.2% reduction in delivery stability, as measured by an increase in change failure rate. Similarly, the Faros AI report ²found no measurable improvement in overall organisational throughput or DORA metrics despite widespread AI adoption. The message: the key factor for success is not how fast an individual developer feels, but whether the entire system delivers value to users faster, more frequently and more reliably.

4. The organisational dynamics of AI adoption

The most effective levers for unlocking the potential of AI do not always lie in choosing a particular model or tool, but in establishing clear governance, proactive risk management, and fostering trust through transparent leadership. Companies that neglect these human and systemic levels run the risk of having their technical efforts repeatedly undermined.

Governance, compliance, and risk management

Studies show that in some companies, up to 60 per cent of new code already comes from AI tools. At the same time, only 18 per cent of these companies have binding guidelines for their use.¹⁷ The result is a governance vacuum. Teams are stuck in a “compliance treadmill”: every new use case or tool leads to uncertainty and improvised checks.

This is risky because AI significantly amplifies existing risks. A 2025 Veracode study found that 45 per cent of the coding tasks examined using AI tools resulted in security vulnerabilities. Java was particularly critical: over 70 per cent of AI-generated results had vulnerabilities.¹⁸ The problems range from insecure programming patterns and the use of outdated libraries to hard-coded secrets or licence violations due to the unclear origin of code snippets. With 81% of companies admitting to delivering vulnerable code due to time pressure, untested AI results quickly become the ideal basis for security incidents.

However, the answer to this does not lie in banning AI tools, but in a stable, proactive governance framework. Companies need to move away from ad hoc checks and towards clear, reusable rules for data protection, security standards, intellectual property and responsible use. The goal is to create a structure that provides a safe environment for experimentation while reducing the effort required to introduce new tools. This allows teams to focus on creating value instead of navigating a maze of compliance uncertainties.

The critical role of clear frameworks

DORA’s research has identified four concrete, data-driven measures that executives can use to promote trust, create psychological safety and drive effective adoption. The impact of these non-technical measures is enormous.

Establishing a simple and understandable acceptable use policy is the most effective lever for driving adoption. Companies with clear policies see an estimated 451% increase in AI adoption among their teams compared to companies without such policies. Without clear rules, every developer is forced to bear the cognitive burden of a risk assessor, constantly wondering whether the use of a tool is safe or permissible. A clear policy relieves individuals of this burden and gives them back their mental energy. It also creates the psychological safety to use the tools confidently and effectively.

When time for learning is officially approved and scheduled, rather than being relegated to evenings and weekends, acceptance within the team increases. This signals that the company values the development of new skills and accepts that a learning curve is necessary to master these new tools.

Address concerns about job security. The fear of being replaced by AI is a significant and understandable source of concern. Leaders who address these concerns directly and transparently can increase acceptance within the team. Open communication about how AI is intended to complement rather than replace developers builds trust, allowing teams to engage with the technology as a partner rather than a threat.

Share a transparent roadmap. Simply publishing a clear plan for how the company intends to use AI can increase acceptance among the team. Transparency reduces uncertainty and helps align individual efforts with the overall corporate strategy.

These measures directly counteract the “empathy gap” identified in a study by Atlassian [source reference]. According to the study, 63% of developers feel that their managers do not understand their problems, a significant increase from 44% in the previous year.⁷ This gap often arises when managers view AI as a simple “time-saving” tool and try to “hoard” those savings without addressing the underlying systemic frictions and new anxieties. By focusing on these four human-centred strategies, managers can lay the foundation for trust and clarity that are essential to any successful technological transformation.

Outlook

The observations described in this report—the paradox of developer experience, the hidden costs of code inflation, the reality of systemic bottlenecks, and the urgent need for organisational leadership—are the first signs of a much larger shift in software development. The industry is rapidly moving from an era of AI augmentation, where tools assist human developers, to an era of AI automation, where autonomous agents can independently perform complex, multi-step development tasks.

Agentic coding is rapidly emerging. AI systems are already generating large amounts of code, and autonomous agents will only amplify this trend. What an AI assistant suggests today as too large a pull request, an AI agent may implement at ten times the scale in a fraction of a second. The result is massive code inflation that can no longer be managed manually. This exponentially exacerbates problems such as oversized batch sizes, bottlenecks in system integration and growing security risks.

This makes overcoming the challenges of current AI assistants a crucial prerequisite for the future. The technical disciplines, automated security measures and system-level metrics required for effective management of AI-supported workflows form the indispensable basis for the safe use of agent-based AI.

Contact

Are you looking for an experienced and reliable IT partner?

We offer customised solutions to meet your needs – from consulting, development and integration to operation.

References

Adopt generative AI – DORA, accessed on September 2, 2025,
https://dora.dev/research/ai/adopt-gen-ai/

The AI Productivity Paradox Research Report | Faros AI, accessed on September 2, 2025,
https://www.faros.ai/blog/ai-software-engineering

How gen AI affects the value of development work – DORA, accessed on September 2, 2025,
https://dora.dev/research/ai/value-of-development-work/

How does generative AI impact Developer Experience?, accessed on September 2, 2025,
https://devblogs.microsoft.com/premier-developer/how-does-generative-ai-impact-developer-experience/

Measuring the Impact of Early-2025 AI on Experienced Open …, accessed on September 2, 2025,
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Study finds that AI tools make experienced programmers 19% slower. But that is not the most interesting find… : r/programming – Reddit, accessed on September 2, 2025,
https://www.reddit.com/r/programming/comments/1lxh8ip/study_finds_that_ai_tools_make_experienced/

Atlassian research: AI adoption is rising, but friction persists – Work …, accessed on September 2, 2025,
https://www.atlassian.com/blog/developer/developer-experience-report-2025

Leveraging AI for Software Engineering Productivity: Best Practices for Cost Reduction and Revenue Growth – DB Services, accessed on September 2, 2025,
https://dbservices.pt/leveraging-ai-for-software-engineering-productivity-best-practices-for-cost-reduction-and-revenue-growth/

Report Summary: GitClear AI Code Quality Research 2025 – jonas.rs, accessed on September 2, 2025,
https://www.jonas.rs/2025/02/09/report-summary-gitclear-ai-code-quality-research-2025.html

AI Copilot Code Quality 2025 | PDF | Software Engineering – Scribd, accessed on September 2, 2025,
https://www.scribd.com/document/834297356/AI-Copilot-Code-Quality-2025

AI-Generated Code Statistics 2025: Can AI Replace Your …, Netcorp, accessed on September 2, 2025,
https://www.netcorpsoftwaredevelopment.com/blog/ai-generated-code-statistics

February | 2025 – Rob Bowley, accessed on September 2, 2025,
https://blog.robbowley.net/2025/02/

DORA Metrics: Complete guide to DevOps performance measurement (2025) – DX, accessed on September 2, 2025,
https://getdx.com/blog/dora-metrics/

Measuring the productivity impact of AI coding tools: A practical guide for engineering leaders | Swarmia, accessed on September 2, 2025,
https://www.swarmia.com/blog/productivity-impact-of-ai-coding-tools/

Measuring AI code assistants and agents – GetDX, accessed on September 2, 2025,
https://getdx.com/research/measuring-ai-code-assistants-and-agents/

DORA Metrics: How to measure Open DevOps Success – Atlassian, accessed on September 2, 2025,
https://www.atlassian.com/devops/frameworks/dora-metrics

AI-generated code surges as governance lags – AI, Data & Analytics Network, accessed on September 2, 2025,
https://www.aidataanalytics.network/data-science-ai/news-trends/ai-generated-code-surges-as-governance-lags

AI-Generated Code Poses Major Security Risks in Nearly Half of All …, accessed on September 2, 2025,
https://securitytoday.com/articles/2025/08/05/ai-generated-code-poses-major-security-risks-in-nearly-half-of-all-development-tasks.aspx

AI-generated Code: How to Protect Your Software From AI-generated Vulnerabilities, accessed on September 2, 2025,
https://www.ox.security/blog/ai-generated-code-how-to-protect-your-software-from-ai-generated-vulnerabilities/

Secure AI Framework (SAIF): A Conceptual Framework for Secure AI Systems | Machine Learning | Google for Developers, accessed on September 2, 2025,
https://developers.google.com/machine-learning/resources/saif

Google’s Secure AI Framework – Google Safety Center, accessed on September 2, 2025,
https://safety.google/cybersecurity-advancements/saif/

Emerging agentic AI trends reshaping software development – GitLab, accessed on September 2, 2025,
https://about.gitlab.com/the-source/ai/emerging-agentic-ai-trends-reshaping-software-development/

How agentic AI is transforming software development – AWS, accessed on September 2, 2025,
https://aws.amazon.com/isv/resources/how-agentic-ai-is-transforming-software-development/

Agentic code generation: The future of software development – AI Accelerator Institute, accessed on September 2, 2025,
https://www.aiacceleratorinstitute.com/agentic-code-generation-the-future-of-software-development/

PromptPilot: Exploring User Experience of Prompting with AI-Enhanced Initiative in LLMs | Request PDF – ResearchGate, accessed on September 2, 2025,
https://www.researchgate.net/publication/391984354_PromptPilot_Exploring_User_Experience_of_Prompting_with_AI-Enhanced_Initiative_in_LLMs

Vibe Coding: Why Microservices Are Cool Again | by Zak Mandhro | PullFlow -Medium, accessed on September 2, 2025,
https://medium.com/pullflow/vibe-coding-why-microservices-are-cool-again-bbee690cdf50

Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces – arXiv, accessed on September 2, 2025,
https://arxiv.org/html/2504.14320v1

Designing systems for AI agents: What makes a good AX? | Standard Beagle Studio, accessed on September 2, 2025,
https://standardbeagle.com/designing-systems-for-ai-agents/

The Agentic AI Revolution: Transforming Software as a Service | by Rahul Krishnan, accessed on September 2, 2025,
https://solutionsarchitecture.medium.com/the-agentic-ai-revolution-transforming-software-as-a-service-a7c915172b33