On my last two posts I went through setting up CI for your PHP project. As promised, this post will walk through the CD pipeline that I setup for the project. The Continuous Integration jobs were a good start, but I wanted to throw in additional testing, on architecture that better mimics production. I created two additional test environments, QA and Staging, and finally, the code was able to proceed to production.
Architecture (1)

QA

I created a QA cycle that runs every 6 hours. This cycle spins up a new machine to be tested on, executes the tests, then terminates upon completion. Similar tools and structure are used as we saw in the DevInt environment but tweaked to expand the coverage. The below sections outline each of the Jenkins jobs setup for this testing cycle.

Deploy To QA

The first thing that we do upon kicking off a QA cycle is to spin up a new machine. When we were testing in DevInt, I designed the machine to run everything, being both the app server and database. This involved not only spinning up the machine on EC2 using the same knife ec2 server script we used in DevInt, but also spinning up a database on RDS. The command is similarly simple as seen below:

aws rds create-db-instance \
--db-instance-identifier $machine_name \
--vpc-security-group-ids sg-7979be00 \
--no-auto-minor-version-upgrade \
--db-instance-class $size \
--engine MySQL \
--engine-version 5.5.42 \
--storage-encrypted \
--license-model general-public-license \
--no-multi-az \
--db-name $db_name \
--port 3306 \
--master-username $rds_master_user \
--master-user-password $rds_master_password \
--allocated-storage $drive_size \
--db-subnet-group-name default-vpc-118d1a75 \
--no-publicly-accessible

I actually create the RDS instance before I spin up the EC2 machine, as the database is slower to spin up. That way, once the app server is up, we can check to see if the database is finished.

#see if our db has connection information yet
endpoint=`aws rds describe-db-instances --db-instance-identifier $machine_name | grep ENDPOINT | cut -f2`
while [ -z "$endpoint" ]; do
sleep 10
endpoint=`aws rds describe-db-instances --db-instance-identifier $machine_name | grep ENDPOINT | cut -f2`
done

Unlike the DevInt machine installing the application from an image in Jenkins, this job actually installs the latest package from the snapshots repository in Nexus. We don’t care exactly what version we are installing, we just want the latest one in that repository, as we have confidence in that build, because it passed all of the tests in DevInt. I determined the latest version of the app with the below script.

app_version=`find /var/lib/nexus/storage/snapshots/[app-name]/ -type f \
-name \*.tgz -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d' ' | cut -f8 -d'/'`

In order to tell the application that the database is on a separate machine, our environment is configured via chef, and users are also setup in the database to ensure that only connections from the app server are allowed.
Finally, I used route53 to setup an internal DNS mapping so that this QA machine could be easily accessed by a static name if needed.

#tie our application to an IP using R53
echo ""
echo ""
echo "Waiting for DNS update to complete."
changeid=`aws route53 change-resource-record-sets \
--hosted-zone-id [ZONE ID] \
--change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "qa.[DOMAIN]",
"Type": "A",
"TTL": 60,
"ResourceRecords": [{
"Value": "'${private_ip}'"
}]
}
}]
}' | cut -f2`
status=`aws route53 get-change --id $changeid | cut -f3`
while [ "$status" != "INSYNC" ]; do
sleep 10
status=`aws route53 get-change --id $changeid | cut -f3`
done

Smoke Test On QA

The smoke tests are run the same way on the QA machine as they were on the DevInt machine. This again ensures that the application was installed successfully, and since they complete in under 30 seconds, there is little time wasted. Similarly to DevInt, if any or all tests fail – we jump immediately to the Destroy QA job. If all tests pass – we move onto acceptance testing.

Acceptance Test On QA

The acceptance tests are run the same way on the QA machine as they were on the DevInt machine. All of our Selenium tests that have the ‘Smoke’ tag are executed. To do this, I wrote an xml file to be read into the testing framework that looks like the below.

<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">

<suite name="Selenium Smoke Test Suite" parallel="methods" thread-count="20" verbose="3">
<test name="Selenium Smoke">
<groups>
<run>
<include name="Smoke.*"/>
<exclude name="Regression"/>
</run>
</groups>
<packages>
<package name="tests.selenium" />
</packages>
</test>
</suite>

Similarly to DevInt, if any or all tests fail – we jump immediately to the Destroy QA job. If all tests pass – we move onto regression testing.

Regression Test On QA

This is our first new job thrown into the pipeline, really testing out new functionality of the application. Instead of only running through our Selenium tests that are labeled ‘Smoke’ we run all of the Selenium tests. Additionally, I configured Selenium to run the tests through a proxy. By doing this, we could scan every request and response made through our tests, to even more fully security test our application. OWASP ZAP was again used, but this time, was setup and run as a proxy. By simply launching ZAP, it stays open as a proxy. A similar script was used to what we ran for the quick scan in DevInt.

#!/bin/bash

#cleanup old results
rm -rf zapScan.log zapReport.jsp

#launch the proxy
echo ""
echo ""
/opt/zap/zap.sh -cmd -newsession -quickout $WORKSPACE/zapScan.xml -port 9090

Finally, we needed to configure our Selenium tests to run through this proxy. To do this, I simply needed to add another two parameters to the execution of the SecureCI Testing Framework tests, providing the proxy information.

-DproxyHost=localhost -DproxyPort=9090

So that the whole command looks like

ant clean run replace junit-report -k -DappURL=https://${Private_IP} -Dtest-suite=regression.xml -DproxyHost=localhost -DproxyPort=9090

While the SecureCI Testing framework is setup to handle running tests through a proxy, generic Selenium configuration needs to be modified if you are not using this framework. These modifications are below for Java code.

String PROXY = "[proxyHost]:[proxyPort]";

Proxy proxy = new Proxy();
proxy.setHttpProxy(PROXY);
DesiredCapabilities cap = new DesiredCapabilities();
cap.setCapability(CapabilityType.PROXY, proxy);

WebDriver driver = new [browser]Driver(cap);

Similarly to all of our previous jobs in QA, if any or all tests fail – we jump to Destroy QA, and if all tests pass – we move forward to our full security scan.

Full Security Scan On QA

For our last set of tests on QA, we wanted to run a more in depth security scan. In DevInt, we did a cursory scan of the system without being authenticated. For the scan on QA, we wanted to run through the entire system, fully authenticated. To accomplish this, I manually created a ZAP ‘context’ which had credentials stored in it, and saved that off. Instead of running from the cmd like we did with the quick scan, I ended up needing to use the Jenkins ZAP plugin to in order to pass in the context, as ZAP doesn’t currently support authentication from the cmd at this time. Setting this up was very strait forward, and fit in nicely to our pipeline. Because a variable couldn’t be passed into the URL, I used the R53 domain that I had setup previously.
If this scan passed, then we ran the next job of promoting our application as a release candidate, and either way, we destroyed the QA machine.

Destroy QA

This job runs anytime any tests fail, not waiting to go through the rest of the pipeline. Because we have both an app server, and a database machine, I needed to use two commands to ensure everything got shut down.

#!/bin/bash

knife ec2 server delete ${Instance_ID} \
--config ~/.chef/knife.rb \
--purge \
--node-name ${Tag_Name} \
--yes

aws rds delete-db-instance \
--db-instance-identifier ${Tag_Name} \
--skip-final-snapshot

Promote As Candidate Release

Finally, if all of our tests pass, then we will consider this package viable as a release candidate. This is the same package that passed through all of our DevInt tests, only this round of testing took closer to about 2 hours. I created a custom Nexus repository called ‘releasecandidates’ which this package is then pushed into. In order to save space on our SecureCITM box, instead of copying the same files over, I actually just created a soft link from the ‘snapshots’ repo into the ‘releasecandidates’ repo.

#!/bin/bash

if [ -z ${Snapshot} ]; then
echo "No snapshot provided, build failing"
exit 1
fi

if [ ! -L /var/lib/nexus/storage/releasecandidate/[app-name]/${Snapshot} ]; then
ln -s /var/lib/nexus/storage/snapshots/[app-name]/${Snapshot} /var/lib/nexus/storage/releasecandidate/[app-name]/
fi

And that wraps up our QA environment.

Staging

Staging runs very similarly to QA, just a few more differences. We attempt to move closer to our production like setup in our deployment, and our tests take a bit longer. Because of this, our Staging cycle only kicks off every night. If we want to execute these tests sooner, it’s a simple task to manually trigger the job. The below sections outlines each Jenkins jobs setup for this testing cycle.

Deploy To Staging

The first thing that we do upon kicking off a Staging cycle is to spin up a new environment to test on. To better replicate production, we wanted two app servers and two databases. Again, I decided to spin up the database first, but instead of running MySQL for our database, I decided to run Aurora. From AWS: “Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora provides up to five times better performance than MySQL with the security, availability, and reliability of a commercial database at one tenth the cost.” So, I ran the RDS CLI create-db-cluster command, instead of create-db-instance.

aws rds create-db-cluster \
--db-cluster-identifier $machine_name \
--vpc-security-group-ids sg-7979be00 \
--engine Aurora \
--engine-version 5.5.42 \
--storage-encrypted \
--database-name $db_name \
--port 3306 \
--master-username $rds_master_user \
--master-user-password $rds_master_password \
--db-subnet-group-name default-vpc-118d1a75

Additionally, instead of just spinning up one app server, I spun up 2 machines to better replicate production. I was able to do this using the exact same commands as above (see Deploy to QA), and the application was installed on each machine. The application installed was the latest application in the ‘releasecandidates’ Nexus repository. I then wanted to spin up a loadbalancer to sit in front of these app servers. I did using the below command

aws elb create-load-balancer
--load-balancer-name $machine_name \
--listeners "Protocol=HTTP,LoadBalancerPort=80,InstanceProtocol=HTTP,InstancePort=80"\
"Protocol=HTTPS,LoadBalancerPort=443,InstanceProtocol=HTTPS,InstancePort=443,SSLCertificateId=arn:aws:iam::[aws-id]:[cert location]" \
--scheme internal \
--subnets subnet-25b098e \
--security-groups sg-7979be00

Next, I used similar code from QA Deploy job to link the loadbalancer to an internal DNS ‘staging.[DOMAIN]’. I then needed to register the ec2 instances with the loadbalancer, and so I ran this simple command for each ec2 machine:

aws elb register-instances-with-load-balancer \
--load-balancer-name $machine_name \
--instances $instance_id

Finally, I needed to configure my ec2 machines to point to my database cluster, which I could do the same way as I did on QA.

Smoke Test On Staging

This runs the same as it did on QA

Acceptance Test On Staging

This runs the same as it did on QA

Load Test On Staging

This job runs some pre-recorded JMeter tests, recorded using the GUI. Multiple scenarios were recorded using JMeter, many of which mimic the Selenium Smoke tests. These tests are also stored in the source code repository (under it’s own folder in test), to ensure the tests stay in sync with the code. These jmx files start off testing with 300 users, ramping up and adding a user every half of a second. Subsequent tests examine with up to 1000 simultaneous users. These jmx files are relatively easy to change to increase the number of users, but for our initial release, these numbers satisfied how many users were needed the system to support. To execute these tests, the below command was run

jmeter -n -t tests/jmeter/testLogin.jmx -l testLogin.jtl

The results are automatically scanned for 500 errors and other timeouts, and response times are examined to be under 1 second. If nothing is found, the tests are determined to have passed. Similar to our other testing jobs, if any or all tests fail – we jump to our destroy staging job, and if all tests pass – we move forward to our stress testing.

Stress Test On Staging

A subsequent performance test is run to try to push the application to it’s limit, while the load testing was just to ensure the application could handle the desired amount of users. LoadUI is used in conjunction with SoapUI to push our application to the limit. As this is a much more complicated task, stay tuned to a future post for how exactly to set this up, and how the results are parsed and checked for passes and failures.
Big surprise, if any or all tests fail – we jump to our destroy staging job, and if all tests pass – we move onto our last testing phase, penetration testing.

Penetration Test On Staging

The last test we run on staging is a penetration test, further testing the security of the overall system. I decided to use w3af for our tooling. Similar to the stress test, this is a much more complicated task, so stay tuned to a future post for how exactly to set this up, and how the results are parsed and checked for passes and failures.
Finally, once this test run is complete, we destroy the staging environment. If all these penetration tests pass, then the promote as release Jenkins job is also run.

Destroy Staging

This job runs anytime any tests fail, not waiting to go through the rest of the pipeline. We again need to tweak our scripts to kill the machines, due to the database cluster that was launched, and the loadbalancer sitting in front of the ec2 instances.

#!/bin/bash

while read Instance_ID; do
knife ec2 server delete ${Instance_ID} \
--config ~/.chef/knife.rb \
--purge \
--node-name ${Tag_Name} \
--yes
done < instances.txt

aws rds delete-db-cluster \
--db-cluster-identifier ${Tag_Name} \
--skip-final-snapshot

aws elb delete-load-balancer \
--load-balancer-name ${Tag_Name}

Promote As Release

This runs the same as it did on QA, except instead of ‘pushing’ to ‘releasecandidates’ repository, it pushes to the ‘releases’ repository.

And that wraps up our Staging environment

Production

Our production environment again changes a bit, expanding upon itself to become more dynamic for a larger user base. This environment again has a sharded database system, with multiple app servers sitting behind a loadbalancer. Because this is production, there is only one job, the deployment of the application, and while this runs fairly similar to staging, again, there are a few differences.

Deploy To Production

Our production environment is similar to staging in that we launch a loadbalancer, and EC2 instances. The latest package in the ‘releases’ Nexus repository is used to install, using the same commands as above. This time however, we don’t launch a new RDS cluster. Because we want to keep using the same data (we are going to production), the old RDS cluster will be used. For this reason, we have Jenkins archive any database connection information. The section of the script to do that is as follows, with it checking to see if an RDS cluster already exists or not.

#check to see if some of these have already been set, if not, we need to create a new db
if [ ! -f ./db.vars ] || [ "$New_Database" = true ]; then
old_db_name='a-bad-db-name'
#load our old db name
if [ -f ./db.vars ]; then
old_db_name=$db_name
rm ./db.vars
fi

#setup our new db vars
db_name="Production-$BUILD_NUMBER"
rds_master_password=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
db_password=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
app_key=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)

#save them
echo "db_name=\"$db_name\"" >> ./db.vars
echo "rds_master_password=\"$rds_master_password\"" >> ./db.vars
echo "db_password=\"$db_password\"" >> ./db.vars
echo "app_key=\"$app_key\"" >> ./db.vars

echo ""
echo ""
echo "New DB Vars"
cat ./db.vars
echo ""
echo ""

New_Database=true
fi

We then have Jenkins archive the artifact as a post build action.
ArchiveDBConnection
As before, once everything is up and running, and connected properly, we run R53 to update the DNS. Once the R53 route-change status is completed, we then want to kill our old production instance. This effectively performs a blue green deployment. This process runs once a week, but can also be manually kicked off to speed up promotion process.

And that is it for our entire DevOps Pipeline deployment. While you may follow some different architectural models, use some different tools, and different processes, hopefully this outline of the pipeline will server to ensure you can successfully get your application automatically tested and deployed repeatably in a quick timely fashion.

3 thoughts to “Completing your CD Pipeline for your PHP Project

  • vamshi r

    hi I need some script to test my php application and for archival other than distribution, we are using the cucumber bdd testing, if you provide, i would be helpfull, i have seen your blog, it was very good

    Reply
    • Max Saperstone

      Hi Vamshi,
      What sort of help are you looking for? Coveros offers capabilities to help out with that, or if you have specific questions, I might be able to just answer them

      Reply
  • Pingback: New Testing Framework Release - Coveros

Leave a comment

Your email address will not be published.

X