In my previous blog post about server provisioning automation, I described our pre-provisioning phase and how we automated the out-of-band interface configuration and make machines remotely accessible and ready to kickstart. In this installment, I go through the provisioning phase and show how we use Puppet to automatically kickstart physical machines.
The Challenge
In the “old way” of doing things, when a server needed to be built, the Datacenter Managers used to fill out a Web form. They’d enter the server’s basic information (its name, IP addresses, net mask, default gateway, type of server, and the OU in which it belonged) and pick a disk layout. Upon submitting the form, the PXE configuration for the server would be built. They’d go to the server console (physical or remote), PXE boot the host, and choose the host’s name from the kickstart menu. The host would then be kickstarted.
This works well for single hosts and one-offs, but it obviously doesn’t scale. This became apparent two summers ago, when we built a new remote datacenter and were faced with building several hundred hosts from scratch. At the time, we wrote a bunch of ad-hoc scripts to help with the task: one extracted relevant information from the inventory system, another used the serial numbers obtained to connect to each iDRAC and get the host interface’s MAC address, and another script cross-referenced it all and built PXE boot and kickstart configurations for each server. Assembling the data in a usable format was the most time-consuming challenge; after that, with all of that in place, it was a simple matter of using IPMI to set all servers to PXE boot once and trigger a reset.
With this in mind, we started work on operationalizing the above process and packaging it all up so that our Datacenter Managers could use it. What we came up with makes use of Puppet as a source of record for host information and triggering kickstarts. Infrastructure provisioning became “code” in the form of Puppet resources, checked into source control — infrastructure provisioning as code.
How Puppet Kickstarts
In comparison to the old Web-form-driven process, the new Puppet-driven kickstarts really come in handy when a request to provision a large number of machines comes in. For example, our Datacenter Managers no longer cringe (as much) when they need to build a new 72-node Cassandra cluster. They’ll allocate suitable hardware from the inventory system, getting serial numbers along with which switch ports they’re connected to. Next they’ll allocate IP numbers from our IP provisioning system, and finally they’ll define the appropriate server type and VLAN the machines belong to. They’ll feed this data through a couple of scripts. One will generate DNS records; the other will output Puppet resources that they can check in to Git and thus trigger kickstarts.
Here’s an example of what each host’s definition looks like:
Kickstart::Image { pxeosversion => "centos6.3-64", puppetmaster => "p5-puppet1.DOMAIN", ns => ["10.60.92.152","10.60.92.151"], ou => "Linux/P5/DBA", link => "link", type => "cassandra-dse", } kickstart::image { 'p5-qacass725': mac => "01-08-3b-cb-6d-1e-ed", builddate => "09262012", clientip => "10.60.92.29", clientmask => "255.255.255.192", clientgateway => "10.60.92.62", dracurl => "idrac-SERIAL.DRACDOMAIN", netlocal => ["p5-r7-rsw16 ge-1/0/18"], }
The above uses the kickstart::image definition to populate the kickstart configuration file and several helper files on the deploy servers. Here’s a simplified version of kickstart::image:
# Definition that populates the kickstart file and # helper files that allow for unattended kickstarting. define kickstart::image ( $clientip, $clientmask, $clientgateway, $resolvers, # DNS resolvers $osversion, # OS version $mac, # dash-separated MAC prefixed by 01- $dracurl, # iDRAC URL $netlocal, # switchport info for future use $puppetmaster, # Puppetmaster fqdn $ensure='file', # or 'absent' $type='app', # app, mysql, cassandra, mail, etc. $link='link', # ethX, emX, or link $ou='Linux/QA/App', # OU to put host into $builddate='000000000' ) { # Where to put files $pxecfgdir='/tftpboot/path/to/pxelinux.cfg' $ipmibin='/path/to/kickstart/ipmibin' $ksdir='/path/to/kickstart/cfg-files' # File resource defaults File { ensure => $ensure, owner => 'root', group => 'root', mode => '0644', before => Exec["execute_${name}-${dracurl}"], } # The kickstart file file { "${ksdir}/${name}_ks.cfg": content => template("${module_name}/${type}_ks.cfg.erb"), } # Pointer file linking the host's MAC address to its # intended kickstart file file { "${pxecfgdir}/${mac}": content => template("${module_name}/pxeks.erb"), require => File["${ksdir}/${name}_ks.cfg"], } # Helper script: sets the next boot mode to PXE # and reboots the host file { "${ipmibin}/${name}-${dracurl}.sh": mode => '0700', content => template("${module_name}/drackick.sh.erb"), } # Execute above shell script exec { "execute_${name}-${dracurl}": command => "${ipmibin}/${name}-${dracurl}.sh", subscribe => File["${ipmibin}/${name}-${dracurl}.sh"], refreshonly => true, } }
The key helper script above is the script that Puppet executes at the end to start the unattended kickstart process. Here’s its template:
#!/bin/bash TODAY=`date +%m%d%Y` DRACP=`/path/to/dracp/retrieval` if [ ${TODAY} == <%= builddate %> ] then ipmitool -I lanplus -U root -P ${DRACP} -H <%= dracurl %> chassis bootdev pxe ipmitool -I lanplus -U root -P ${DRACP} -H <%= dracurl %> power reset fi
The other key file is the pointer file that associates the host’s MAC address with its intended kickstart file:
default bootmeup prompt 0 label bootmeup kernel <%= pxeosversion %>/vmlinuz append initrd=<%= pxeosversion %>/initrd.img ramdisk_size=9689 ks=http://<%= ipaddress %>/path/to/kickstart/cfg-files/<%= name %>_ks.cfg ksdevice=<%= link %>
The build server runs Puppet in daemon mode, so within 30 minutes (or sooner if the Datacenter Managers force it) Puppet applies the configuration to the deploy server. It builds the suitable PXE boot and kickstart configuration files, and then executes IPMI-based commands that set servers to PXE boot and reset.
Safeguards and Quality Control
To prevent accidental (re)kickstarts of servers, there’s a safeguard using Puppet’s subscribe meta parameter and the builddate parameter in kickstart::image. A kickstart is only triggered if the timestamp matches today’s date AND the generated kickstart file has changed since the puppet run. In addition, machines are moved to other VLANs once they’re built and PXE booting is not available in them, so at the very worst, a machine would only get rebooted if the safeguards were not in place.
Lessons Learned and Future
There are still several manual steps in this process: Data Center Managers need to run separate scripts for IP provisioning, DNS creation and one to generate Puppet code. Ideally, they’d put the minimum necessary into Puppet and then the back-end would fill in any missing pieces. It would reach into the inventory database, provision an IP, create DNS, obtain the MAC address from the iDRAC, and then do the kickstart. We’re fairly far along with IP provisioning and we’re already doing scripted DNS creation; a missing link here is an API to the inventory system.
However, the biggest obstacle to automating it all end-to-end remains switch port configuration. From the example above you can see that we collect that information and hope someday that can be fed to automation that will make the necessary switch port configuration changes for us.
Continue the conversation by sharing your comments here on the blog.
Leave a Comment