I was reminiscing with a friend and former colleague of mine recently about how we both have experienced product and service releases that were under tremendous pressure to deliver a new feature or capability and were already behind schedule. Along with the highly charged environment and the pressures on the team, people were already tired and quality issues started creeping into the builds. Look, our teams had the best of intentions but being behind on a critical release meant that ultimately there was an impact on quality. And, when we finally managed to get these challenging releases “out the door,” too many defects escaped with them and made just about everything worse. Now, not only are we delaying revenue and missing opportunities, the high-priority need to fix critical defects flattens our capacity for new projects, compounding frustration and opportunity costs.
Building and maintaining great CI/CD pipelines is hard, but doable
These formative (and painful) experiences instilled in me a passion to commit my professional career to resolve these two critical issues:
- How we make the engineering team more productive, so we can accelerate the releases.
- How efficiently we identify and resolve the offending commits early in the release cycle and gating them from reaching production.
Fortunately, early in my career I had the opportunity to learn from folks that were very good at building efficient release processes that delivered consistent high quality releases. In particular, I learnt a lot from BEA Weblogic’s Engineering team which built one of the most efficient and automated release processes with a lot of emphasis on quality. I was also part of a talented team in VMware which was responsible to set up the release processes and CI/CD pipelines for the on-prem and cloud native applications. It was painful, but we successfully integrated the CI/CD pipelines with a few monitoring tools to help the developers to debug the pipeline issues.
Why doesn’t everyone build great CI/CD pipelines, to accelerate software delivery?
We weren’t the only ones with both the scars of painful releases and the experience of well-executed release processes. If we learned the lessons, lots of other folks had as well. Why wasn’t everyone running release processes with best practices?
For one thing, it can be pretty challenging to keep pre-production environments stable and fully functional. One very simple and insidious reason is that when we run automated tests in pre-production environments, we invariably have the dilemma of what to do when the tests fail. Is it a “real” failure, or is it an artifact of the test environment? If testing in pre-production environments is perceived to be flaky, it becomes easy for people to start ignoring the test failures and keep promoting the commits to the next stage in the pipeline. This behavior, in turn, results in having to perform regressions in production environments, which are easily 10x more expensive to fix.
The sad truth is that it is all too common to wind up detecting major regressions in production that were already caught in pre-production, but nobody examined and triaged that failure before the code was promoted to production. Not only does this consume time and money with regressions in production environments, think about all the time that we wasted in automating and running tests, the results of which are being ignored.
So, why does this happen over and over again?
- Not because we don’t understand what’s going on
- Not because we don’t know how to minimize these failures
- Not because industry analysts haven’t been telling us how important effective automation is
It turns out what keeps most companies from investing on setting up the fully automated CI/CD pipeline are:
- Historically, it has been hard and resource intensive
- The CI/CD eco-system is constantly evolving and the existing tools have simply not kept up with rapidly evolving developer requirements
- Outage in production is very visible due to the business impact it causes, but the pre-production issues are ignored because the tangible benefit of stable pre-production environments is not explicitly visible to decision makers
Existing CI/CD tools are failing developers (and businesses that depend on them)
Developers, and the businesses that depend on them, need a platform that helps them to diagnose the build, deploy and test issues quickly AND that encourages consistent “high standards of governance and efficiency” across an organization.
Succinctly, developers need an aggregated view of all the sources of issues to help them to find the root cause of the problem quickly. That will reduce the MTTR for fixing the pre-production issues and keep the pre-production environments functional and stable.
The nature of a typical CI/CD environment where build, tests and deployments run full throttle and the demands of quick turn around to troubleshoot, fix and redeploy is key to optimize the CI/CD process. DevOps environment generates an overload of logs. Information overload can make Root Cause Analysis (RCA) a nightmare. Hours if not days are spent manually in troubleshooting issues especially when it manages to creep into the production environment. Often, it is not the lack of logs or metrics data or traces or information in general but it is indeed a problem of excess. Sifting through the noise, maze of alerts, warnings and irrelevant logs to focus on the ‘errors that matter’ in quick time is crucial to convergence of root cause.
Complexity is costly: draining productivity & quality
Side note: In many cases, pre-production environments only require monitoring when there are code promotion events. Personally, I don’t see a need to keep collecting logs 24×7 if those environments are used only once or twice per day to validate code commits. To do this, we need an observability solution to collect the relevant logs and metrics ONLY when we really need it to debug the CI/CD pipeline issues including the test issues.
Introducing ReleaseIQ: A better way to build and manage complex, heterogeneous release pipelines to Accelerate Software Delivery
This is why we started ReleaseIQ, to bridge the gap between these areas and solve real world problems for product teams.
ReleaseIQ’s intelligent people-centric software delivery platform is designed to do exactly that. Our platform unifies the CI/CD with Observability to provide the ability to developers and testers to fix the build, deploy and test failures quickly which results in higher productivity and release efficiency.
ReleaseIQ’s key capabilities include:
- Build Pipelines Quickly with No Code, Drag and Drop Pipeline Orchestrator – Provides end-to-end visibility into the release status of each commit and build that goes into production by foreseeing potential roadblocks, and helping teams take timely action to avoid delays in software releases.
- Maintain Pipeline Integrity with Intelligent Root Cause Analysis – Allows both developers and DevOps engineers to rapidly orchestrate CI/CD pipelines with customized workflows and processes by seamlessly integrating with their SDLC tools.
- Ensure Pipeline Operation with End-to-end Process Visibility – Role-based productivity dashboards enable tracking and measuring performance across teams. Improve organizational governance with enterprise-class BI visualizations.
- Get the Right Information to the Right Team Members with Role-based Productivity Dashboards – Integrated AI-driven troubleshooting capability brings insights into every step of the release process for build, deploy, and test failures. Leverage root cause traceability to improve engineering efficiency and productivity.
This results in faster deployment, shorter lead time for new features, faster service restoration, and lower change failure rate.
If increasing release velocity and with no compromise on quality sounds interesting, I invite you to sign up for a demo so that the team can show you the future of software delivery and release management. You can also learn more about ReleaseIQ here.