A Little History
A number of years ago, I was part of a team supporting a US Federal Government program office responsible for bringing DevOps best practices to that agency’s programs. We acted as curators of tools, advocates of best practices, and maintainers of the deployment pipelines and procedures. Through this, we ensured the successful DevOps transformation of each project. As part of that responsibility, we supported and maintained the delivery pipeline build tools: Jenkins, Nexus, SonarQube, etc.
Our team mostly adhered to agile best practices. Operating on a two week deployment schedule, we pushed changes and updates to the build tools as needed. We limited the changes to ones agreed upon and successfully tested during the iteration.
In Creeps the Fear
One time a couple of days prior to the next planned deploy, the program office informed us that they wanted to add an additional item to the list. My team was hesitant after considering the circumstances of this request. We were told:
Don’t be afraid.
Apparently, the person attempting to comfort us had already manually validated the change in a test environment. According to them, they had not faced any issues performing this manual tool upgrade, so there would be no issues when my team performed the upgrade — for the first time — during the upcoming production deployment. There were no notes to accompany this change, only the assurance that nothing could go wrong. What once a different person successfully accomplished in a different environment, another could replicate without any insurmountable hurdles faced. At least, that is what the customer insisted.
Being a manual change was bad. Having a test environment that did not completely match production was worse. Expecting those performing the production upgrade to not once have practiced it in an earlier environment was worse still.
The customer’s expectation was that during the deployment window any potential complications could be resolved. They couldn’t understand why we didn’t think that was a perfectly fine plan to follow. Though it seemed obvious to us, we were unable to sway their opinion. Thankfully, we performed the change without any problems. But that is besides the point.
How Not to Be Afraid
Surprises are bad. Surprises in production, unacceptable. It is essential to test deployments in production-like environments before ever being attempting them in production. Sometimes production requires a manual deployment. In those instances, the individuals responsible for the the deployment must first follow the same process in a testing environment.
Properly configured automation clearly eliminates many of these problems. If two similarly configured environments have the same deployment done onto them, the outcome should be the same. The expectation is that issues found and resolved during the testing process will prevent those same issues occurring during the production deployment. Further, issues not present in the testing environment should not appear during the production deployment. You should not attempt a production deployment until the process has achieved a clean result in a production-like environment.
I recognize that it can be a challenge — whether by design or policy — to perfectly replicate an environment. In some cases it isn’t possible to move the data into a less secure environment. In other cases, scrubbing the data of personal information requires impractically long lead times. Techniques such as Blue-Green deployments can help address this by allowing you to validate the deployment on an inactive production-exact environment.
A production deployment should be unexciting, predictable, and a non-event. When it is, you can stop being afraid.