How a Swiss client reduced TCO and future proofed their tech

A large Swiss based insurance company was already going through a digital transformation phase. I joined one of their teams to transition the DevOps from legacy toolset supporting monoliths, to a modern MS Azure based Kubernetes environment.

This transition rooted from the management’s willingness, as an on-prem data center posed multiple maintenance challenges. Based on Microsoft’s commitment to EU Data protection regulations, especially the GDPR, ISO/IEC standards, and their leadership in implementing EU Data Boundary, it made sense to adopt an industry wide proven cloud solution for their operations.

By the end of my time in this project, the team had automated 80% of their application deployment tasks, while future proofing them towards infrastructure automation as well. As a result, this move resulted in drastic reduction of forecasted TCO, improved team efficiency, which eventually enabled them to consider more enhancements.

In my opinion, this engagement was the ideal template for the terms like “digital transformation” and “cloud-native adoption”.

Previously

The team dealt with transactional data management for IT assets and configurations, enabling their ITSM systems to pinpoint the root causes of failure and take corrective actions. Accuracy of the information and the asset relationships maintained was crucial. Owing to the adoption of cloud, it posed more challenges in keeping track of the transient resources. This resulted in the management of huge volumes of important data.

The team had previously built an application to fetch this information from multiple data sources and consolidate it into a common configuration management database. This application was written in Go, and built as a monolith, hosted in a private data center. On one hand, it was easier to manage the network security based on local contracts, on the other hand it posed technical challenges with scaling and maintenance.

This slowed down the team and they found themselves resolving issues full time, with little to no time left for enhancements. Owing to this, the budget was allocated to transform this from a legacy system to a modern tech stack with modern DevOps practices.

Opportunities

As the time passed by, the monolithic deployment grew, and soon scaling and releasing new features and bug fixes were an issue.
A few components fell short of instance resources, due to which additional instances were allocated with a load balancer – adding to more costs.
Owing to Inconsistent responses due to the API changes in the data sources, the application often failed to gather accurate information. Current setup also made it difficult to roll out corresponding response processing changes sooner, resulting in loss of data which was manually fixed in the target CMDB.
The application depended heavily on the memory as it used to load all the transactional data for its expected functioning. This resulted in an absurd amount of RAM being provisioned on instances for undeserved reasons. This further inflated the infrastructure cost.
In spite of leveraging DevOps tools, the production roll out was highly manual. For each deployment, there were some metadata management scripts being run to make sure the newly deployed application has appropriate support to process incoming response data.
The self hosted DevOps toolchain also presented frequent downtimes, which resulted in disruptions during deployments wasting team bandwidth.
Lack of containerization resulted in inconsistent provisioning of prod and sub-prod environments, adding to unexpected issues and maintenance activities.

Owing to these, it was practically impossible to maintain high data accuracy which is least required for this application component to make sense in the larger landscape. The impact of inaccurate data was further seen in other teams as the trust was low and they often doubted the correctness of “enriched” CMDB.

Impact

I was introduced to the team to mainly focus on transitioning the DevOps tool chain from self-hosted setup to Azure DevOps, while contributing to the overall architectural decomposition and transition from monolith to Kubernetes based micro services. Additionally, I was also responsible for introducing SecOps by upgrading the pipelines to perform SAST and DAST validations. The business service improved leaps and bounds to a level where I laid the foundation for Infrastructure automation using Terraform. The results were delivered in the phases described below.

Phase 1: In the initial phase, the team held multiple meetings to understand all the functions monolith performed, and planning and identification of potential microservices to continue serving those functions. I owned and led the DevOps aspect of planning while being in sync. By the end of this phase, the full picture of target architecture was documented and finalized.
Phase 2: While the monolith was being refactored by reusing the existing code, I spent time developing the template for new CI/CD pipelines to perform build, push, and deployment required by each microservices. I made it a point to standardize the pipeline, so that it can easily be reused for any number of microservices. 80% of the microservices were onboarded on the new DevOps setup within a couple of hours, while rest of them required few special tweakings which didn’t take long.
Phase 3: At an organizational level, a new homegrown Kubernetes management wrapper was released and they pushed all the teams to use the same, as it addressed reporting on consumption, audit, and centralized maintenance of underlying infrastructure. This phase involved incorporating the new wrapper into the pipelines with respective integrations to reporting tools.
Phase 4: This is where I incorporated Aquascan, Nexus IQ, Sonarqube, and Netsparker to address the SecOps aspects. During this phase, many vulnerabilities and code smells were identified and the team spent some more time fixing the same, resulting in robust application components.
Phase 5: After 6 months of work above, we released the newly designed application to the production Kubernetes clusters using DevSecOps pipelines. The go live was successful with hardly any issues. The go live was followed by a couple of observability weeks.
Phase 6: In the following months, few enhancements to the applications and DevSecOps were followed. During this time, I further automated the pipeline features like automatic versioning and tagging to keep track of source code running on each container in each environment. The team regrouped to explore more enhancement opportunities, where I proposed using Terraform to further automate infrastructure provisioning, using better secret management services, etc.

Wins

The new production environment ran parallel to the old setup for a quarter, before the old infrastructure was decommissioned. The team conducted user training, created API documentation and made sure all the stakeholders transitioned to the APIs in the new environment.

The application now runs on the Azure Kubernetes Service while DevSecOps pipelines make the daily releases possible. Constraints put by the self hosted DevOps tools (Bamboo, Jfrog) were removed, as Azure environments offer highly available services. This effort was appreciated by the management, as it resulted in:

Highly robust application and flexible deployment routines.
Scaling the application component was as simple as making a change in the Yaml file.
Improved trust on the configuration data being enriched by the new application.
Relieved the team from bug fix and maintenance tasks by 90%.
No more manual activities, thus reduced human errors.
Time to onboard a new micro service from development to production was just a matter of a couple of hours.
Reduced total cost of ownership.
Better security.