Virtualization for developers, Part 3
Welcome back to the third and final part in our series on creating dynamic, version-controlled virtual development environments using Vagrant, VirtualBox, and Puppet. In this post we’ll focus on the final piece of our puzzle, which is using puppet itself to provision your virtual environment exactly to your specifications.
Introducing Puppet Manifests and Modules
In Part 2 of this series we introduced you to how Vagrant and puppet are tied together through the use of the config.vm.provision
key in the Vagrant’s Vagrantfile
. To refresh your memory that resembled something like this:
config.vm.provision :puppet do |puppet|
puppet.options = "--verbose --debug"
puppet.manifests_path = "puppet/manifests"
puppet.module_path = "puppet/modules"
puppet.manifest_file = "site.pp"
puppet.facter = {
"vagrant" => true,
}
end
Moving beyond and into Puppet itself, when Vagrant provisions your newly booted VM, it executes puppet to provide the configuration values in the block above. In this case, our initial entry-point to our puppet manifests is the puppet.manifest_file
key, in our case site.pp
residing in the puppet.manifests_path
key, in this case puppet/manifests
. Thus, to get started we will need to create a puppet/
directory in our project along with a puppet/manifests/site.pp
file and a puppet/modules
directory.
For the purposes of explanation, I am going to reference a PHP skeleton application I created using these concepts. It is freely available on GitHub.
Here’s what our bare-bones LAMP stack skeleton app looks like within puppet:
├── manifests
│ ├── nodes
│ │ └── default.pp
│ └── site.pp
└── modules
├── app
│ ├── files
│ │ ├── config
│ │ │ ├── development
│ │ │ │ └── public
│ │ │ └── ec2
│ │ │ └── public
│ │ └── ec2
│ │ └── aws_assign_ip.sh
│ ├── manifests
│ │ ├── codebase.pp
│ │ ├── database.pp
│ │ └── webserver.pp
│ └── templates
│ └── apache
│ └── virtualhost
│ └── vhost.conf.erb
├── ec2
│ ├── files
│ │ └── ec2-api-tools.zip
│ ├── manifests
│ │ └── init.pp
│ └── templates
│ └── aws_assign_ip.erb
└── zendserver
└── manifests
├── apt
│ └── repo
│ └── zend.pp
├── init.pp
├── params.pp
└── prerequisites.pp
The entry point to our puppet manifests is as previously mentioned the puppet/manifests/site.pp
file, but before we get too deep, let’s discuss some of the concepts of how puppet actually works first.
The puppet concept
In some ways, Puppet looks like a programming language, and as such it has a tendency to confuse developers who think of puppet manifests in terms of a language. In reality, puppet manifests describe more what has to be done than how it is done. The ultimate result is a tree of dependencies puppet then resolves on the target machine. Confused yet? It takes a little getting used to, but let’s try to clear things up.
Puppet manifests contain something puppet calls Types. There are a ton of different types to define various requirements for your machine. An example, the exec
type, is used to define a shell command that needs to be executed on the machine. The documentation for this type defines the following as possible input parameters for a definition:
exec { 'resource title':
command => # The actual command to execute.
creates => # A file to look for before running
cwd => # The directory from which to run
environment => # Any additional environment
group => # The group to run the command as.
logoutput => # Whether to log command output
onlyif => # If this parameter is set, then
path => # The search path used for command
provider => # The specific backend to use for
refresh => # How to refresh this command.
refreshonly => # The command should only be run...
returns => # The expected return code(s).
timeout => # The maximum time the command
tries => # The number of times execution
try_sleep => # The time to sleep in seconds
umask => # Sets the umask to be used
unless => # If this parameter is set, then
user => # The user to run the command as.
}
All puppet types being used, regardless of where, should first be given a unique resource name to identify them in the dependency tree, and then a series of parameters applicable to the type. For example, the following is taken from puppet/modules/ec2/manifests/init.pp
in my skeleton application:
exec { "ec2-api-tools-unzip" :
command => "/usr/bin/unzip /tmp/ec2-api-tools.zip -d /usr/local",
creates => "/usr/local/ec2-api-tools-1.6.12.0",
require => [ File['/tmp/ec2-api-tools.zip'], Package['unzip'], Package['default-jre'] ]
}
This command is used to install Amazon AWS EC2 command line tooling in the virtual machine using the standard unzip
shell command. It can be translated into the following description:
“Execute the shell command and identify it as
ec2-api-tools-unzip
, this command creates a/usr/local/ec2-api-tools-1.6.12.0
directory and thus if this directory exists do not run this command again. Before running this shell command, make sure the/tmp/ec2/api-tools.zip
File
type has been completed, as well as theunzip
anddefault-jre
Package types”
Thus, when puppet processes this particular type it will first make sure it hasn’t previously been executed and also ensure it does not get executed until its dependencies listed in the require
section have run. It is important to note that this is a key difference between a puppet manifest and a programming language. In puppet, logical location of a given type entry does not have any bearing on the order the types are processed. In our example this exec
definition could execute anytime after its requirements have been satisfied regardless of when or where it was actually defined.
Fundamentally, puppet is simply a collection of these type definitions with their dependencies documented. In fact one could, in theory, place all of the necessary types in a single manifest. However, as a matter of a best practice, these types are typically broken down in a more reasonable fashion through the use of classes and modules.
The puppet manifests
Now that we have at least a basic understanding of puppet types and dependencies, let’s take a look at the entry point for our puppet manifests, the puppet/manifests/site.pp
file:
info("Configuring '${::fqdn}' (${::site_domain}) using environment '${::environment}'")
# Fix for Puppet working with Vagrants
group { 'puppet': ensure => 'present', }
# Setup global PATH variable
Exec { logoutput => true, path => [
'/usr/local/bin',
'/opt/local/bin',
'/usr/bin',
'/usr/sbin',
'/bin',
'/sbin',
'/usr/local/zend/bin',
], }
import 'nodes/*.pp'
In this manifest we do a few things. We output a little debugging/info, we use the group
type to make sure the puppet
group exists (to sidestep an issue with some VMs in Vagrant), and we have an Exec
type definition before importing additional scripts from puppet/manifests/nodes/*.pp
.
This demonstrates another important to understand notion regarding puppet manifests. If you recall, in our original exec
type example we used the lower-case version of exec
to define the action, where in the site.pp
example above we use the Exec
upper-cased version. This is not a typo, these two declarations mean different things to puppet.
In the first example, the lower case signifies an actual definition and action to perform. Using the capitalized version of the type, however, is akin to defining default values. In this case, we use Exec
to define some default paths for shell executions and set the logoutput
configuration value to true by default.
We’re not done yet though, let’s take a look at the single node manifest puppet/manifests/nodes/default.pp
which is included in the site.pp
import
statement:
node default {
include apt
include stdlib
include git
case $::environment {
development: {
include app::database
include app::webserver
include app::codebase
sysctl::value { 'vm.overcommit_memory': value => '1' }
}
ec2 : {
include app::codebase
include app::webserver
include app::database
include ec2
}
}
package { 'unzip' :
ensure => present
}
package { 'vim' :
ensure => present
}
package { 'autoconf' :
ensure => present
}
package { 'make' :
ensure => present
}
sysctl::value { 'fs.file-max': value => '100000' }
exec { "apt-get clean" :
command => "/usr/bin/apt-get clean"
}
exec { "apt-update":
command => "/usr/bin/apt-get update",
require => [ Exec['apt-get clean'] ]
}
}
While it is outside of the scope of this particular series, I will make note of a new construct in this example, the node
construct. Puppet is designed to allow you to create a single collection of manifests which represent different types of machines and build configurations. One of the ways the type of build configuration is chosen is by its DNS name or IP address when the script is run, which is defined by the node
construct (i.e. node www.example.com
instead of node default
). For our purposes using the default
special node type is sufficient.
Next, you can see an example of one of the few control structures available within manifests which functions very similarly to a switch
statement in a language such as PHP. In this case, we are looking at the $::environment variable in the manifest and based on that, deciding what modules and classes we will include in our manifests. Since these manifests are designed to either deploy to a Vagrant VM or to an AWS instance, we do slightly different things for each case, and other things regardless such as ensuring certain basic packages like unzip
are installed.
Using puppet modules
Now that we are through the basics of puppet, let’s now move on to the construction of classes and modules. You’ll notice in our default.pp
example above we include a number of classes such as app::webserver
. These references tie into the modules defined in the puppet/modules
directory, specifically the puppet/modules/app
module.
Puppet modules are organized in a standard fashion, and are broken down into the following (basic) structure:
├── app
│ ├── files
│ │ ├── config
│ │ │ ├── development
│ │ │ │ └── public
│ │ │ └── ec2
│ │ │ └── public
│ │ └── ec2
│ │ └── aws_assign_ip.sh
│ ├── manifests
│ │ ├── codebase.pp
│ │ ├── database.pp
│ │ └── webserver.pp
│ └── templates
│ └── apache
│ └── virtualhost
│ └── vhost.conf.erb
At a minimum, a module consists of a manifests/
directory which contains files that represent classes in the module. The init.pp
file if it exists maps to a class name the same as the module name, otherwise <modulename>::<classname>
maps to <modulename>/manifests/<classname>.pp
.
So in our case, if we are loading for example the app::webserver
class, we will be looking at the puppet/modules/app/manifests/webserver.pp
file which is as follows:
class app::webserver {
class { 'composer':
target_dir => '/usr/local/bin',
composer_file => 'composer',
}
class { 'apache': }
class { 'zendserver':
php_version => $::php_version,
use_ce => false
}
file { "/usr/local/bin/pear" :
target => '/usr/local/zend/bin/pear',
ensure => 'link',
require => [ Class['zendserver'] ]
}
apache::vhost { $::site_domain :
docroot => "/vagrant/public",
ssl => true,
priority => '000',
env_variables => [
"APPLICATION_ENV $::environment"
],
require => [ Package['apache'] ]
}
exec { "bootstrap-zs-server" :
command => "/usr/local/zend/bin/zs-manage bootstrap-single-server --acceptEula TRUE -p 'password'; touch /var/local/zs-bootstrapped",
cwd => "/usr/local/zend/bin/",
require => [ Class['zendserver'] ],
creates => "/var/local/zs-bootstrapped"
}
file { "/etc/profile.d/server_env.sh" :
content => "export APPLICATION_ENV=$::environment",
owner => root,
group => root,
mode => 755
}
# Disable the default (catch-all) vhost
exec { "disable default virtual host from ${name}":
command => "a2dissite default",
onlyif => "test -L ${apache::params::config_dir}/sites-enabled/000-default",
notify => Service['apache'],
require => Package['apache'],
}
}
While we won’t go through every single line of code in this manifest, basically the job of this class is to initialize the web server itself for the server. It makes use of a number of third-party modules (discussed in Part 2 of this series) to install and configure Apache, Zend Server, and composer.
Let’s take a look at one more small example, specifically the app::codebase
class which is used to initialize the code base for the application and can take care of things such as running composer update
, etc.
class app::codebase {
info("Deploying Codebase for environment $environment")
file { "/vagrant/public/.htaccess" :
group => "www-data",
owner => "root",
mode => 775,
source => "puppet:///modules/app/config/$::environment/public/.htaccess"
}
composer::exec { 'update-codebase' :
cmd => "update",
cwd => "/vagrant",
logoutput => true
}
}
The reason I wanted to show this specific piece of code was to explain something in the file
type used in this class, the source
attribute. The source
attribute determines where the contents of the file being referenced will be loaded from, which can exist within the puppet module itself in the <module>/files
directory. So, in this example, the source of:
puppet:///modules/app/config/$::environment/public/.htaccess
Will reference the puppet/modules/app/files/config/<env>/public/.htaccess
file. Any changes made to this file will automatically update the virtual machine’s copy every time the machine is provisioned, and its an excellent way to keep configuration files in a version controlled and managed way.
Conclusion
I’ll be the first to admit, this is most certainly a crash course on puppet in the context of virtualization for developers. We did not discuss everything in puppet by far, but we did cover the basics. Coupled with the skeleton PHP application you should have a pretty good foundation to start hacking around on your own. Keeping the puppet docs handy is helpful, and I’m happy to answer any questions you have. Throw us a comment below.
Share your thoughts with @engineyard on Twitter