Tuesday, 17 June 2025

How AI Is Reshaping Network Operations at Deutsche Telekom

Michal Sewera, an experienced technology leader at Deutsche Telekom Group (generally written as TDG which stands for 'Telekom Deutschland GmbH'), recently offered a rare behind-the-scenes view of how AI is being used to manage and optimise telco cloud operations. As the head of TDG’s cloud-native 5G core DevOps team, he has led the shift to a new operating model built on cloud-native principles, automation and AI.

Presenting at the FutureNet World conference in London on 7–8 May 2025, Michal shared how TDG’s journey to cloud-native began with the realisation that cloud is not simply about virtualisation or containers. The real transformation lies in a fundamental change in architecture and operations. Moving to a GitOps operational model with declarative deployments and a concept of desired network state has allowed TDG to move from infrequent bulk updates to continuous, incremental changes. In this new approach, change is no longer an exception but an asset.

However, this shift comes with its own challenges. Cloud-native telco systems are composed of highly distributed microservices, open-source components and loosely coupled layers. This creates what Michal refers to as the butterfly effect, where even a small change can lead to unexpected consequences elsewhere in the system. Traditional approaches to validation, configuration and assurance are simply no longer sufficient.

To address this, TDG has integrated AI tools across all stages of the network lifecycle: development, rollout and operations. In the development phase, TDG uses an AI-based validation framework that collects data from across the application, platform and infrastructure layers. It analyses complex interdependencies using pattern recognition across 3GPP signalling, KPIs, logs, Kubernetes, CNIs and service mesh. This approach replaces traditional regression testing with intelligent analysis that highlights functional issues and pinpoints root causes early in the pipeline.

During rollout, the AI-powered Network Configuration Co-Pilot supports configuration changes across distributed clusters. The tool goes well beyond Git automation bots, using a mix of reusable configuration patterns, chat-based interaction with embedded vendor knowledge and natural language integration with systems like Kubernetes. This allows engineers to handle the massive complexity of telco configurations more efficiently and with greater confidence.

In live operations, TDG employs a combination of active and passive monitoring across its Platform as a Service layer. Probes and telemetry continuously monitor performance while AI-driven root cause analysis tools detect anomalies and correlate them with platform and network data. This enables early detection of degradation and supports predictive fault analysis. TDG also applies AI to canary testing and deployment. New releases are gradually introduced in production environments under close AI-assisted monitoring, allowing issues to be caught before full rollout. This model is a marked departure from the old reliance on staging environments and lab testing.

TDG’s new operational model, grounded in GitOps and driven by AI, offers a compelling example of how operators can adapt to the complexity and speed of change in cloud-native environments. The shift transforms telecom networks from silent, black-box systems into transparent, data-rich platforms where actionable insight can be extracted and acted upon in near real time.

Michal’s insights make clear that AI is not an optional add-on in this new environment. It is a fundamental enabler that allows the telco cloud to scale, evolve and remain resilient. For operators looking to modernise their networks, TDG’s experience offers valuable lessons in how to harness automation and intelligence to meet the demands of the future.

You can watch the full video of his talk below:

Related Posts