In the previous blog post, I shared with you the various environments we have and how we use them. I also shared how we deploy applications using CD (Continuous Delivery) and went over the various technologies used for CD deployments.
Today, I’ll take you through the step-by-step process that we use for creating our Jenkins deploy job. You will get a glimpse at the Python methods we wrote to execute many of the deployment orchestration steps. I hope this information will help you generate ideas about how you can automate your deploys.
Jenkins jobs
Each deploy job is written to include a method for each deploy orchestration step. This allows us to create jobs that are self-documenting and easy to follow. We also have the flexibility to have a different deploy orchestration for different apps and/or environments.
Step 1: Setup Hip Chat plugin
We use the Hip Chat notifications plugin to display the status of the build in various rooms. Enter the Project Room you want the job status in and select “Start Notification.”
Step 2: Restrict where the project runs
Since we use Func to run commands remotely, we must restrict the job to run on the Func master.
Step 3: Create build environment variables
With a few variables, we can use the same scripts for various different application deploys in any environment.
# Colon separated list of applications to deploy app=app1:app2:app3
# Environment we are deploying to
env=QE1
# A page in the app to verify the app is successfully serving requests
h_page=/page1
# Used to verify the deploy automation is working properly without executing the steps
debug=”false”
# Run by user for QE automation testing
runid=CD_AUTO_DEPLOY
# QE verification tests to run
test_name=RUN-ACCEPTANCE-TEST
# The func master for this environment
node=<func-server>
Step 4: Create deployment build step
For this, we add an “Execute Python Script” step. In the deploy orchestration section, we include any number of steps in the order we choose, giving us flexibility to have different orchestration steps for individual applications, as needed. Here’s the content of our deploy job:
import os, sys
# Import CTCT libraries
import healthchecks
import notifications
import common as ctct
import common_func as fc
import deploy
import jenkins
from logger import Logger
logger = Logger(‘jenkins_job’, ‘INFO’).logger
deploy.debug = os.getenv(“debug”)
# Set variables
env_vars = [ ‘app’, ‘env’, ‘h_page’, ‘debug’, ‘runid’, ‘test_name’, ‘ignore’ ]
for var in env_vars:
exec_str = “%s = ctct.override_vars_env(\”%s\”, None)” % (var, var)
exec exec_str
# Use 0 for a downtime deploy. Any number greater than 0 will
# deploy to servers in that group
cell=0
deploy.debug=debug
# Retrieve server list from database
svrs = ctct.get_servers(env,app,cell,ignore)
if not svrs:
logger.critical(“No servers found for app %(app)s in environment %(env)s and cell %(cell)s.” % locals())
# Func ping. Verifies Func service is responding.
fc.ping(svrs)
# Setup globals
puppet_server = ctct.p_svr(env)
splayTime = ctct.splay_time(env, svrs)
# Setup a client to func
client = fc.client_setup(env, svrs)
# Deploy orchestration
notifications.disable_nagios_notifications(env,app,cell,svrs,ignore)
deploy.stop_jboss_apps(client, svrs)
deploy.stop_httpd_service(client, svrs)
deploy.stop_puppetd_service(client, svrs)
deploy.puppetrun(client, svrs, splayTime, puppet_server)
deploy.start_puppetd_service(client, svrs)
deploy.start_jboss_apps(client, svrs)
deploy.start_httpd_service(client, svrs)
healthchecks.json_healthcheck(env,app,cell,ignore,h_page)
jenkins.run_job(test_name, params={ ‘runId’ : runid })
deploy.online_jboss_apps(client, svrs)
notifications.enable_nagios_notifications(env,app,cell,svrs,ignore)
logger.info(“Deployment Successful for cell ” + str(cell))
Step 5: Post deploy steps
Here we added the “Log Parsing” plugin step. This color codes log levels and makes it easier to find failures in the deploy job.
We also added a “Groovy Postbuild” step that calls a Groovy script to capture job stats and write them to a log. We analyze these stats and look for trends. The script gathers data such as:
- Job Name
- Success/Failure/Aborted
- Number of Applications Deployed
- Duration
- Executed By
- Date/Time
The last post deploy step is the “Hip Chat Notifications” plugin that enables us to post the job status in a specific development team chat.
Deployment automation
Here’s a look at some of the code behind the deploy job. All of our Python scripts give us the flexibility to change deployment orchestration as needed.
Enabling/disabling nagios notifications
We have a script on our Nagios servers leveraging the Nagios API to enable/disable Nagios notification or set downtime duration. Our deploy automation scripts use func to call the Nagios API on our Nagios servers.
def disable_nagios_notifications(env,app,cell,svrs,ignore):
action=”disable”
cmd=”/nagios_disable_notifications.sh “
nagios(env,app,cell,ignore,action,cmd,svrs)
def downtime_nagios_notifications(env,app,cell,svrs,ignore,duration,user,msg):
action=”downtime”
cmd=”/nagios_schedule_downtime.sh svr ” + duration + ” ” + user + ” ” + msg
nagios(env,app,cell,ignore,action,cmd,svrs)
def enable_nagios_notifications(env,app,cell,svrs,ignore):
action=”enable”
cmd=”/nagios_enable_notifications.sh “
nagios(env,app,cell,ignore,action,cmd,svrs)
def nagios(env,app,cell,ignore,action,cmd,svrs):
n_svr=ctct.n_svr(env)
client = ctct_func.client_setup(env,n_svr)
# Call nagios api for each individual server
t_dir = “/some/directory”
for svr in svrs.split(“;”):
logger.info(action + ” nagios notifications on ” + svr)
svr=svr.replace(“some.domain”,””)
if (action==”downtime”):
t_cmd = cmd.replace(“svr”,svr)
else:
t_cmd = cmd + svr
ctct_func.execute_func(client,t_dir + t_cmd)
Stopping our applications
Before running Puppet to update our application code and config, we stop all instances of Apache and JBoss.
def _do_action_on_all_ctct_services(action, service_type=’jboss’):
return ‘for a in `ls /etc/init.d/’ + service_type + ‘-*`; do /sbin/service $(basename “$a”) ‘ + action + ‘; done’
# Do action on all ctct apps
def ctct_apps(client, svrs, action, service_type=’jboss’):
cmd = _do_action_on_all_ctct_services(action, service_type)
log_cmd(cmd)
if debug == ‘true’: return
fc.execute_func(client,cmd)
# Stop httpd service
def stop_httpd_service(client, svrs):
logger.info(log_noop(“Stopping Apache for %(svrs)s” % locals()))
cmd = “/sbin/service httpd stop”
log_cmd(cmd)
if debug == ‘true’: return
fc.execute_func(client,cmd)
# Stop all jboss apps
def stop_jboss_apps(client, svrs):
logger.info(log_noop(“Stopping all JBoss apps for %(svrs)s” % locals()))
cmd = _do_action_on_all_ctct_services(‘stop’)
log_cmd(cmd)
if debug == ‘true’: return
fc.execute_func(client,cmd)
Running Puppet
We use Puppet to install our application code and configuration. We store our individual war and knob files in Nexus with each environment tied to a different Nexus branch. For example, environment QE1 pulls application code from the integration branch and environment QE3 pulls application code from the integration-tested branch. Here’s the code we use for running Puppet:
# Deploy jboss apps
def puppetrun(client, svrs, splayTime, puppetServer, tags=None):
logger.info(log_noop(“Updating App(s) for %(svrs)s” % locals()))
global debug
pdebug = debug
if puppetServer == ‘localhost’:
if debug: debug = ‘true’
if tags:
tags_str = ‘–tags ‘ + str(tags)
else:
tags_str = ”
cmd = “/usr/sbin/puppetd -v –server=” + puppetServer + ” –color=false –onetime–no-daemonize –no-usecacheonfailure –ignorecache –summarize –splay –splaylimit ” + splayTime + ” –no-noop ” + tags_str
log_cmd(cmd)
debug = pdebug
if debug == ‘true’: return 0
return fc.execute_func(client,cmd)
Checking app health post deployment
We test the app health prior to kicking off QE verification tests. We can fail quickly by pinging an application page instead of waiting 30 or 60 minutes for QE verification tests to complete.
def json_healthcheck(env,app,cell,ignore,h_page):
errors = []
hosts = ctct_url.get_url_list(env,app,cell,ignore).split(‘;’)
# Verify healthcheck against each individual server
for host in hosts:
if ‘#app#’ in h_page:
url = replace_appname_in_healthcheck_url(host, h_page)
else:
url = urljoin(‘http://’ + host, h_page)
try:
logger.info(“Checking ” + url)
check_url(url,hc_good)
except urllib2.HTTPError, e:
errors.append(host)
pass
except urllib2.URLError, e:
logger.critical(e.reason)
if errors:
failed = ‘, ‘.join([a for a in errors])
logger.critical(“Healthcheck failed on %s” % (failed))
@retry(6, 2, ExceptionToCheck=(urllib2.HTTPError, urllib2.URLError)) # 6 tries, 2 sec. delay, backoff of 2
def check_url(url, src):
con = ctct_url.get_http_body(url)
rv = False
for t_src in src:
hc = json.loads(con)
if hc[t_src[0]] == t_src[1]:
logger.info(url + ” passed.”)
rv = True
break
return rv
Kicking off QE verification tests
We use the Jenkins Python libraries to start Jenkins jobs on another server. This library gives us the job status every 30 seconds and waits for the jobs to finish before moving on. That way we can promote the build to another environment when the QE verification tests succeed.
def run_job(job, params={}, block=True, skip_if_running=True, jenkinsurl=JENKINSURL, username=None, password=None):
jenkins_obj = Jenkins(jenkinsurl, username=username, password=password)
job = jenkins_obj.get_job(job)
job.invoke(None, block, skip_if_running, 3, 15, params)
build_obj = job.get_last_build_or_none()
if not build_obj:
logger.warn(“Could not get build object”)
return None
is_good = build_obj.is_good()
log_str = “%s result is %s” % (str(build_obj), build_obj.get_status())
if is_good: logger.info(log_str)
else: logger.critical(log_str)
return is_good
Conclusion
By improving our deployment automation with better tools and creating reusable code, we can support the various deployment needs of our development teams. Here at Constant Contact, we use a few different technologies that help with our deployment automation for CD. The improvement in our deployment automation has allowed our development teams to deploy to any environment including production. They can now get bug fixes and enhancements out to our customers much quicker than ever before. As we move all of our applications to CD, we can better support daily deployments thanks to the improvements to our tool set.
What changes has the need to support CD caused with how your organization uses of tools and systems? Share them in the comments section below.
Interesting post. Thanks for the detail. I’m wondering how you manage the relationship between your puppet modules that, in addition to deploying the application artifacts maintained in nexus, are also deploying server configurations? Puppet master servers typically have the “latest” state of all modules, which typically changes outside of any single application’s lifecycle. For example, an update to the puppet jboss module will get deployed to the puppet master as an event unrelated to an application deployment. How are you managing the puppet changes along with the application changes/deployments to ensure everything (the application and server configuration) are in sync? What if a rollback is required.
Thanks again for the article.
Lance
Lance, we try to never rollback because it is hard to know the state of the configuration at any given build. We tried to make the process easy, so if issues come up, we can rapidly deploy a new set of changes.
We are looking into versioning our configuration and having this tie to a version of our application. Once we find something that works for us, we’ll be happy to share this with you.