Deployment Automation for Continuous Delivery, Pt. 2

In the previous blog post, I shared with you the various environments we have and how we use them. I also shared how we deploy applications using CD (Continuous Delivery) and went over the various technologies used for CD deployments.

Today, I’ll take you through the step-by-step process that we use for creating our Jenkins deploy job. You will get a glimpse at the Python methods we wrote to execute many of the deployment orchestration steps. I hope this information will help you generate ideas about how you can automate your deploys.

Jenkins jobs

Each deploy job is written to include a method for each deploy orchestration step. This allows us to create jobs that are self-documenting and easy to follow. We also have the flexibility to have a different deploy orchestration for different apps and/or environments.

Step 1: Setup Hip Chat plugin

We use the Hip Chat notifications plugin to display the status of the build in various rooms. Enter the Project Room you want the job status in and select “Start Notification.”

Step 2: Restrict where the project runs

Since we use Func to run commands remotely, we must restrict the job to run on the Func master.

Step 3: Create build environment variables

With a few variables, we can use the same scripts for various different application deploys in any environment.

# Colon separated list of applications to deploy app=app1:app2:app3

 

# Environment we are deploying to

env=QE1

# A page in the app to verify the app is successfully serving requests

h_page=/page1

# Used to verify the deploy automation is working properly without executing the steps

debug=”false”

# Run by user for QE automation testing

runid=CD_AUTO_DEPLOY

# QE verification tests to run

test_name=RUN-ACCEPTANCE-TEST

# The func master for this environment

node=<func-server>

Step 4: Create deployment build step

For this, we add an “Execute Python Script” step. In the deploy orchestration section, we include any number of steps in the order we choose, giving us flexibility to have different orchestration steps for individual applications, as needed. Here’s the content of our deploy job:

import os, sys

 

# Import CTCT libraries

import healthchecks

import notifications

import common as ctct

import common_func as fc

import deploy

import jenkins

 

from logger import Logger

logger = Logger(‘jenkins_job’, ‘INFO’).logger

 

deploy.debug = os.getenv(“debug”)

 

# Set variables

env_vars = [ ‘app’, ‘env’, ‘h_page’, ‘debug’, ‘runid’, ‘test_name’, ‘ignore’ ]

for var in env_vars:

exec_str = “%s = ctct.override_vars_env(\”%s\”, None)” % (var, var)

exec exec_str

 

# Use 0 for a downtime deploy. Any number greater than 0 will

# deploy to servers in that group

cell=0

deploy.debug=debug

 

# Retrieve server list from database

svrs = ctct.get_servers(env,app,cell,ignore)

if not svrs:

logger.critical(“No servers found for app %(app)s in environment %(env)s and cell %(cell)s.” % locals())

 

# Func ping. Verifies Func service is responding.

fc.ping(svrs)

 

# Setup globals

puppet_server = ctct.p_svr(env)

splayTime = ctct.splay_time(env, svrs)

 

# Setup a client to func

client = fc.client_setup(env, svrs)

 

# Deploy orchestration

notifications.disable_nagios_notifications(env,app,cell,svrs,ignore)

deploy.stop_jboss_apps(client, svrs)

deploy.stop_httpd_service(client, svrs)

deploy.stop_puppetd_service(client, svrs)

deploy.puppetrun(client, svrs, splayTime, puppet_server)

deploy.start_puppetd_service(client, svrs)

deploy.start_jboss_apps(client, svrs)

deploy.start_httpd_service(client, svrs)

healthchecks.json_healthcheck(env,app,cell,ignore,h_page)

jenkins.run_job(test_name, params={ ‘runId’ : runid })

deploy.online_jboss_apps(client, svrs)

notifications.enable_nagios_notifications(env,app,cell,svrs,ignore)

 

logger.info(“Deployment Successful for cell ” + str(cell))

Step 5: Post deploy steps

Here we added the “Log Parsing” plugin step. This color codes log levels and makes it easier to find failures in the deploy job.

We also added a “Groovy Postbuild” step that calls a Groovy script to capture job stats and write them to a log. We analyze these stats and look for trends. The script gathers data such as:

  • Job Name
  • Success/Failure/Aborted
  • Number of Applications Deployed
  • Duration
  • Executed By
  • Date/Time

The last post deploy step is the “Hip Chat Notifications” plugin that enables us to post the job status in a specific development team chat.

Deployment automation

Here’s a look at some of the code behind the deploy job. All of our Python scripts give us the flexibility to change deployment orchestration as needed.

Enabling/disabling nagios notifications

We have a script on our Nagios servers leveraging the Nagios API to enable/disable Nagios notification or set downtime duration. Our deploy automation scripts use func to call the Nagios API on our Nagios servers.

def disable_nagios_notifications(env,app,cell,svrs,ignore):

action=”disable”

cmd=”/nagios_disable_notifications.sh “

nagios(env,app,cell,ignore,action,cmd,svrs)

 

def downtime_nagios_notifications(env,app,cell,svrs,ignore,duration,user,msg):

action=”downtime”

cmd=”/nagios_schedule_downtime.sh svr ” + duration + ” ” + user + ” ” + msg

nagios(env,app,cell,ignore,action,cmd,svrs)

 

def enable_nagios_notifications(env,app,cell,svrs,ignore):

action=”enable”

cmd=”/nagios_enable_notifications.sh “

nagios(env,app,cell,ignore,action,cmd,svrs)

 

def nagios(env,app,cell,ignore,action,cmd,svrs):

n_svr=ctct.n_svr(env)

client = ctct_func.client_setup(env,n_svr)

 

# Call nagios api for each individual server

t_dir = “/some/directory”

for svr in svrs.split(“;”):

logger.info(action + ” nagios notifications on ” + svr)

svr=svr.replace(“some.domain”,””)

if (action==”downtime”):

t_cmd = cmd.replace(“svr”,svr)

else:

t_cmd = cmd + svr

 

ctct_func.execute_func(client,t_dir + t_cmd)

Stopping our applications

Before running Puppet to update our application code and config, we stop all instances of Apache and JBoss.

def _do_action_on_all_ctct_services(action, service_type=’jboss’):

    return ‘for a in `ls /etc/init.d/’ + service_type + ‘-*`; do /sbin/service $(basename “$a”) ‘ + action + ‘; done’

 

# Do action on all ctct apps

def ctct_apps(client, svrs, action, service_type=’jboss’):

    cmd = _do_action_on_all_ctct_services(action, service_type)

    log_cmd(cmd)

    if debug == ‘true’: return

    fc.execute_func(client,cmd)

 

# Stop httpd service

def stop_httpd_service(client, svrs):

    logger.info(log_noop(“Stopping Apache for %(svrs)s” % locals()))

    cmd = “/sbin/service httpd stop”

    log_cmd(cmd)

    if debug == ‘true’: return

    fc.execute_func(client,cmd)

 

# Stop all jboss apps

def stop_jboss_apps(client, svrs):

    logger.info(log_noop(“Stopping all JBoss apps for %(svrs)s” % locals()))

    cmd = _do_action_on_all_ctct_services(‘stop’)

    log_cmd(cmd)

    if debug == ‘true’: return

    fc.execute_func(client,cmd)

Running Puppet

We use Puppet to install our application code and configuration. We store our individual war and knob files in Nexus with each environment tied to a different Nexus branch. For example, environment QE1 pulls application code from the integration branch and environment QE3 pulls application code from the integration-tested branch. Here’s the code we use for running Puppet:

# Deploy jboss apps

def puppetrun(client, svrs, splayTime, puppetServer, tags=None):

    logger.info(log_noop(“Updating App(s) for %(svrs)s” % locals()))

    global debug

    pdebug = debug

    if puppetServer == ‘localhost’:

        if debug: debug = ‘true’

    if tags:

        tags_str = ‘–tags ‘ + str(tags)

    else:

        tags_str = ”

 

    cmd = “/usr/sbin/puppetd -v –server=” + puppetServer + ” –color=false –onetime–no-daemonize –no-usecacheonfailure –ignorecache –summarize –splay –splaylimit ” + splayTime + ” –no-noop ” + tags_str

    log_cmd(cmd)

    debug = pdebug

    if debug == ‘true’: return 0

    return fc.execute_func(client,cmd)

Checking app health post deployment

We test the app health prior to kicking off QE verification tests. We can fail quickly by pinging an application page instead of waiting 30 or 60 minutes for QE verification tests to complete.

def json_healthcheck(env,app,cell,ignore,h_page):

    errors = []

    hosts = ctct_url.get_url_list(env,app,cell,ignore).split(‘;’)

 

    # Verify healthcheck against each individual server

    for host in hosts:

        if ‘#app#’ in h_page:

                url = replace_appname_in_healthcheck_url(host, h_page)

            else:

            url  = urljoin(‘http://’ + host, h_page)

 

            try:

            logger.info(“Checking ” + url)

            check_url(url,hc_good)

        except urllib2.HTTPError, e:

            errors.append(host)

            pass

            except urllib2.URLError, e:

            logger.critical(e.reason)

 

    if errors:

        failed = ‘, ‘.join([a for a in errors])

            logger.critical(“Healthcheck failed on %s” % (failed))

 

@retry(6, 2, ExceptionToCheck=(urllib2.HTTPError, urllib2.URLError)) # 6 tries, 2 sec. delay, backoff of 2

def check_url(url, src):

 

    con = ctct_url.get_http_body(url)

    rv = False

    for t_src in src:

        hc = json.loads(con)

        if hc[t_src[0]] == t_src[1]:

            logger.info(url + ” passed.”)

            rv = True

            break

    return rv

Kicking off QE verification tests

We use the Jenkins Python libraries to start Jenkins jobs on another server. This library gives us the job status every 30 seconds and waits for the jobs to finish before moving on. That way we can promote the build to another environment when the QE verification tests succeed.

def run_job(job, params={}, block=True, skip_if_running=True, jenkinsurl=JENKINSURL, username=None, password=None):

    jenkins_obj = Jenkins(jenkinsurl, username=username, password=password)

    job = jenkins_obj.get_job(job)

    job.invoke(None, block, skip_if_running, 3, 15, params)

    build_obj = job.get_last_build_or_none()

    if not build_obj:

        logger.warn(“Could not get build object”)

        return None

    is_good = build_obj.is_good()

    log_str = “%s result is %s” % (str(build_obj), build_obj.get_status())

    if is_good: logger.info(log_str)

    else: logger.critical(log_str)

    return is_good

Conclusion

By improving our deployment automation with better tools and creating reusable code, we can support the various deployment needs of our development teams. Here at Constant Contact, we use a few different technologies that help with our deployment automation for CD. The improvement in our deployment automation has allowed our development teams to deploy to any environment including production. They can now get bug fixes and enhancements out to our customers much quicker than ever before. As we move all of our applications to CD, we can better support daily deployments thanks to the improvements to our tool set.

What changes has the need to support CD caused with how your organization uses of  tools and systems? Share them in the comments section below.

Comments

  1. Lance says:

    Interesting post. Thanks for the detail. I’m wondering how you manage the relationship between your puppet modules that, in addition to deploying the application artifacts maintained in nexus, are also deploying server configurations? Puppet master servers typically have the “latest” state of all modules, which typically changes outside of any single application’s lifecycle. For example, an update to the puppet jboss module will get deployed to the puppet master as an event unrelated to an application deployment. How are you managing the puppet changes along with the application changes/deployments to ensure everything (the application and server configuration) are in sync? What if a rollback is required.

    Thanks again for the article.

    Lance

    • Ron Duphily says:

      Lance, we try to never rollback because it is hard to know the state of the configuration at any given build. We tried to make the process easy, so if issues come up, we can rapidly deploy a new set of changes.

      We are looking into versioning our configuration and having this tie to a version of our application. Once we find something that works for us, we’ll be happy to share this with you.

Leave a Comment