On my current project we have started working on a continuous delivery process. We started moving that direction as a way to improve and extend our development feedback loop, but really ended up with a way to deliver better software and to deliver it faster and more often with less risk.
If you haven’t had the opportunity to read through the Continuous Delivery book by Jez Humble and David Farley, I recommend making the time to do so. We have been proponents of continuous integration (CI) for a long time, and when we started reading the Continuous Delivery book, a lot of the suggestions just seemed like the next logical steps. Many of the pieces of the process were things we had always seen as future improvements, but had never gotten around to them.
Our Release Process
We use an agile development process, but that only extends so far. Within each of our two-week iterations we develop code to fulfill user stories, test those stories, and package them for release. Once we release a package, a separate team installs it on their systems and then runs smoke tests, functional tests, performance tests, security scans, and schedules a push to production.
This post-release, pre-installation cycle often takes a few weeks, so we tended to push production once per month or less. That meant at least a two-iteration lag between releasing the software from development and installing the software to production. Since the goal of agile development is to have releasable software at the end of each iteration we felt it was important to extend the agile process by bringing many of the post-release tasks to within the iteration.
Functional tests were easy to bring into the development cycle. The team developed a functional test suite that covered our application as much as practical. We test on multiple browsers using roles with difference privilege levels, which gives us a high confidence that the system is performing as expected.
Our current test suite takes about five hours to execute so we run it overnight via the CI engine. Each afternoon we simply need to make sure that the latest and greatest version of our software is installed and configured on the test system.
The only way we can do that reliably was to make sure we have an automated installation. We actually thought our installation process was pretty well automated, right up until we tried to make it 100% automated. After all, it was just one step to install an RPM. And of course the checks to make sure all the prerequisite packages were on the system. And a little bit of configuration after the installation. And we also had to remember to restart the servers after the configuration was done. Simple really.
By using Puppet, we boiled that down to a true automated install starting from a base operating system. Now, the CI system can deploy our software automatically so that it is ready for the test run later that evening.
With the tests running every night, we began to see our bug count go down. No more regression errors are creeping into our code, and unintended side effects of changes to functionality and configuration are seen right away. The quality on our code is up with very little direct effort for the developers. The tighter feedback loop just makes it easier to stay on top of the issues.
Also, our tests were written in parallel with the development. That gave us independent validation of the functionality, since the tester and developer both needed to read and interpret the user stories. Frequently, that led to discussions between them early in the process as they debated behaviors and interfaces.
Since we are running the functional test suite each night our confidence in each change has increased. Because the installation is done automatically, we knew it is 100% repeatable. And since the automated install is handling all the packages that are being brought in to the system, our configuration management is being controlled much tighter, not to mention documented via the Puppet scripts.
All of that has led to a much higher confidence in the code and therefore less risk. There are the obvious benefits of knowing that whatever the development team delivered will work in the downstream systems, including production. That cuts down on a lot of troubleshooting and leads a lot less finger-pointing when things don’t work. No more guessing as to how the software will be installed or what the configuration of the target system is.
With less risk came more courage. Bigger changes can be made and the effects seen right away. Not only easier and safer refactoring for developers, but also bigger changes to the underlying system, whether system upgrades or core functionality changes.
Since the risk is lower and the development cycle is shorter, we can push production more often. That feeds on itself quickly. The more often you push to production, the smaller the set of changes and the easier it is to fix any problems. In turn, the risk continues to go down because you’re pushing small changes to production frequently. Each production push is no longer a big event, rather it’s a small, regular event, so you are able to do it even more frequently. Less stress, less risk.
Also, the decision to push to production can be put in the hands of the business owners. The choice is not this month or next, but rather this week or next. Business owners can decide to hold back until a significant new functionality is available, or they can push out quickly when a bug is found. If they need a release to coincide with or precede an event, it is easy to accommodate them. Their decisions can be based on business need, and not an operations-driven monthly release window.
We’ve still got a long way to go in our process. Performance testing and security testing are next on the roadmap. That will again reduce risk and give us confidence to push more often to production. We plan to drive each of those via the functional tests that are already in place, thereby reducing the ongoing maintenance. As we pull more of the post-release cycle to within the integration, our feedback loop will become even tighter, again driving down risks.