An Overview of Engine Yard’s Chef ‘node’ Object

An Overview of Engine Yard's Chef 'node' Object

Many of our customers writing their own custom Chef recipes often need to know how to reference certain information about their environment from the virtual machine running Chef recipes.</p>

When Engine Yard’s infrastructure runs Chef, it runs a version of Chef called chef-solo. This version doesn’t make use of chef-server or any other related components from Chef (formerly Opscode); instead, the platform downloads the latest tar’d and gzipped version of your recipes as we know them (which were uploaded via ey recipes upload), removes everything on the instance under /etc/chef-custom, then unpacks the archive there and runs it with chef-solo.

This can happen as a single run on its own using ey recipes apply, or at the end of a full Chef run by clicking “apply” in the Engine Yard Cloud dashboard.

Basic Anatomy of a Chef Run on Engine Yard Cloud

When Chef is run on Engine Yard Cloud, there are two types of recipes:

  • Main (also called “base”) recipes
  • Custom (your custom chef recipes)

These are run in the order of “main, custom” if you click the “apply” button on the dashboard. This tells the Engine Yard architecture to form a connection to the instance, obliterate everyhing under /etc/chef, download our base recipes that constitute our “stack”, unpack them there, and run them. This happens every time a base run is fired off.

The second type is the “custom” chef run. This is your custom code that you upload to Engine Yard Cloud using the ey command from the engineyard gem. These recipes are uploaded from your computer to our infrastructure. Then, when the time comes to run those recipes, the same architecture sends a command to the server(s) in question to obliterate everything under /etc/chef-custom, download the latest version of your custom chef recipes as we know them, unpack them in that location, then run that code with Chef.

Knowing this, you should be able to see an obvious caveat off the bat: you can’t (and shouldn’t) modify Chef recipes on the instance and expect them to run. Since the directory where your custom chef is located will be removed prior to the next run and replaced with a “fresh” copy of your recipes, you’ll lose any changes you make on the server.

Finally, there’s a bit of a “loophole” to the order in which these recipes are run as mentioned earlier. If you use the “apply” button, recipes do indeed run in the “main, custom” order, waiting for a full run of the base recipes that we provide, regardless of environment state, before running your custom chef recipes. This can be undesirable in various situations, so you can use the engineyard gem to run only your custom recipes with the following command:

ey recipes apply -c YourAccountName -e YourEnvironmentName

This will not upload recipes on your computer to the cloud; instead it will simply instruct our platform to execute a custom chef run on the environment specified. Base recipes are not run in this particular case.

An Example dna.json

Every time a custom Chef run happens, a file located at /etc/chef-custom/dna.json is read. That file is put into place by the Engine Yard Cloud automation system and contains a great deal of information about the instances in your environment. The below is an example of what that dna.json file might look like. This serves as a reference for anyone who needs to look up what the structure might be.

Note that the values for these keys have been removed and replaced with descriptive values. In some cases, another key called “_comment” was added to explain an entire “block” of data; these don’t exist in the actual dna.json file.

For another reference of this same file, feel free to use this GitHub gist.

{
  "alert_email": "email address where you want automated warnings to go",
  "backup_interval": "int value for db backups, configurable on dashboard",
  "backup_window": "int value for db backups, configurable on dashboard",
  "ruby_version": "Ruby 1.9.3 - could be 2.0.0 or something else",
  "db_host": "internal hostname of your database master",
  "db_slaves": [
    "db replica hosts appear here. only works with [Postgre|My]SQL"
  ],
  "user_ssh_key": [
    "ssh keys for each user will appear in this array",
    "and they will be written out to /home/deploy/.ssh/authorized_keys"
  ],
  "utility_instances": [
    {
      "hostname": "an internal amazon ip",
      "name": "resque_1 (or whatever you named it in the dashboard)"
    },
    {
      "hostname": "a different internal amazon ip",
      "name": "elasticsearch_1 (or whatever you named it in the dashboard)"
    }
  ],
  "environment": {
    "name": "the name of the env - could be 'prod', 'staging', 'wibble' etc.",
    "framework_env": "actual framework env - staging, production, etc.",
    "stack": "nginx_unicorn - internal variable specifying app server stack"
  },
  "haproxy": {
    "username": "deploy (default for logging in at http://appmasterip/haproxy?stats)",
    "password": "password for the deploy user for above url"
  },
  "users": [
    {
      "username": "deploy ('deploy' is always our default user)",
      "password": "randomly generated password",
      "gid": "1000",
      "uid": "1000",
      "comment": ""
    }
  ],
  "packages_to_install": [
    "if you have it enabled in the dashboard (which is NOT recommended)",
    "a list of packages to install via portage will be listed here.",
    "we don't recommend this because that installation should be managed by,",
    "and frankly is better suited to, chef. We may retire support for this",
    "feature at any point in the future.",
    "Normally this would look like any valid package atom, like this:",
    "=www-servers/nginx-1.4.7",
    "which is installed with emerge via a command like:",
    "emerge -v =www-servers/nginx-1.4.7",
    "however, one must be sure the package is UNMASKED in",
    "/etc/portage/package.keywords/local by explicit, specific atom (the =...",
    "...thing) before this will install. That's why chef is a better option."
  ],
  "gems_to_install": [
    {
      "name": "jazor",
      "version": "0.1.8"
    },
    {
      "name": "other gems you list in the dashboard will also be installed",
      "version": "here, however we STRONGLY recommend using bundler instead!"
    },
    {
      "name": "because bundler will properly manage dependencies and help you",
      "version": "avoid dependency hell!"
    },
    {
      "name": "Finally, remember that when Chef runs on our platform, it's running with an isolated",
      "version": "version of Ruby separate from the main version your app runs, and therefore has a separate set of gems. You won't be able to call bundler and its gems from inside Chef, so plan accordingly."
   }
  ],
  "applications": {
    "my_application_name_here": {
      "deploy_key": "the deploy key we generated for you and you've hopefully
                     added to your git repo host (e.g. GitHub)",
      "repository_name": "[email protected]:youraccount/yourapp.git",
      "type": "could be rails3, rails4, sinatra, rack, etc. (node.js, java)",
      "branch": "which git branch you want to deploy - usually master",
      "deploy_action": "deploy",
      "migration_command": "command you want to use to migrate.
                            Usually like rake db:migrate but could be whatever
                            you want it to be, esp. for PHP, Node.js etc. apps",
      "revision": "SHA for branch",
      "run_deploy": false,
      "run_migrations": false,
      "newrelic": false,
      "vhosts": [
        {
          "role": "staging - like framework_env",
          "name": "valid DNS hostname entered in dashboard, e.x. example.com"
        },
        {
          "role": "staging",
          "name": "example2.com - remember, you can have more than one app
                   on an environment, but be sure to give them *different*
                   hostnames so as not to confuse nginx configuration."
        }
      ],
      "recipes": [
        "memcached",
        "monit",
        "nginx",
        "unicorn",
        "These are the basic recipes that *our* platform will run from our own
         cookbooks that we maintain on our own. You don't need to worry about
         this one iota."
      ],
      "http_bind_port": 80,
      "https_bind_port": 443,
      "auth": {
        "active": false
      },
      "services": [
        {
          "resource": "mongrel",
          "mongrel_base_port": 5000,
          "mongrel_mem_limit": 150,
          "mongrel_instance_count": 3,
          "_comment": "This is left over from earlier versions of the product.
                       Just ignore it since nobody uses mongrel anymore."
        },
        {
          "resource": "memcached",
          "base_port": 11211,
          "mem_limit": 128,
          "_comment": "memcached comes installed out of the box by default, but
                       you can make changes to that configuration with custom
                       chef recipes."
        }
      ]
    }
  },
  "crons": [
    "If you set up cron jobs in the dashboard, they'd go here. Instead of this",
    "however, we recommend setting up cron in your own custom chef recipes.",
    "Not a whole lot of customers make use of this because they'd like to have",
    "revision control over their cron changes, which you get with custom chef,",
    "so we might remove this feature in the future. Just use custom chef for",
    "this instead. It's better all around."
  ],
  "master_app_server": {
    "public_ip": "public EC2 hostname, managed by their dns",
    "private_dns_name": "EC2's internal hostname for the VM",
    "_comment": "The application master is where the EIP (Elastic IP) for the
                 environment is anchored, meaning all traffic goes right there
                 on port 80 or 443, which is where haproxy runs and then
                 forwards requests to one of N nginx backends, where N is equal
                 to the number of application instances in your environment.
                 This is a round-robin load balancing scheme."
  },
  "members": [
    "this array contains a list of known application machines in the",
    "environment based on their internal IP from AWS. This is how haproxy",
    "knows which backends (app servers) to send requests to later when that",
    "recipe runs."
  ],
  "engineyard": {
    "environment": {
      "alert_email": "where you want automated warning emails sent. This could
                      take the form of '<some server ip> using X% memory' to
                      warn you about memory/swap use (common with overnight
                      admin/backup tasks/cron jobs), or '<ip> CPU usage at
                      100%' or something similar. Recommend this is an alias
                      to your entire development team including your ops person
                      if you have one.",
      "backup_interval": 24,
      "backup_window": 10,
      "stonith_endpoint": "https://cloud.engineyard.com/stonith - this is where
                           application machines in the environment will 'check
                           in' every so often to be sure they're up. If not,
                           Engine Yard Cloud will *automatically* 'shoot the
                           other node in the head' - STONITH. This is how
                           automated application server failover happens. If the
                           app master fails to check in with stonith, EY Cloud
                           kills it of its own volition then promotes one of
                           your app slaves to be the master. This is why we
                           always recommend running at minimum 2 app machines
                           in any given production environment. Instances can
                           become unstable or unresponsive for multiple reasons,
                           most commonly infrastructure failure (network, host)
                           or customer code not behaving as expected.",
      "newrelic_key": null,
      "ruby_version": null,
      "stack_name": "nginx_unicorn",
      "name": "staging - environment name, again (could just as easily be
               'wibble' or 'foobar')",
      "framework_env": "staging (or production, or something else)",
      "stats_password": "password for haproxy stats as noted above",
      "ssh_username": "deploy - always our default user",
      "ssh_password": "password for the account, though you'll never use it
                       since we require using ssh keys instead",
      "ssh_keys": [
        "again, a list of ssh keys will appear in this array, one per user.",
        "these are added in the dashboard under tools -> ssh keys. Once you",
        "add an SSH key, you need to edit the environment you want it on, then",
        "check the box for that key in that environment. Click update, then go",
        "back to the environment overview page and click 'apply'. When done,",
        "your SSH key should be installed in /home/deploy/.ssh/authorized_keys",
        "and you should be able to SSH up as the deploy user to the EC2 host."
      ],
      "db_stack_name": "postgres9_3 - could be mysql5_1 or something",
      "monitoring": "monit (default, not changable w/o your own chef)",
      "region": "us-east-1 (could be us-west-# or eu-west-# or whatever)",
      "backup_bucket": "internal location where db backups are stored - note
                        that we don't mean snapshots here, we mean an
                        honest-to-god SQL dump from mysqldump or similar",
      "components": [
        {
          "key": "ruby_193 (or ruby_200 or whatever)"
        },
        {
          "key": "rubygems",
          "version": "1.8.25"
        },
        {
          "key": "lock_db_version
                  - This is a dashboard option that lets you prevent the
                    environment from upgrading the minor version of the db.
                    For example, if you're on PostgreSQL 9.2.4, and a platform
                    update comes from us to bump you to 9.2.7, if this is true
                    that update will not happen for you. Modifyable in env
                    edit page.",
          "value": false
        },
        {
          "key": "ext4 (whether or not you have the ext4 fs feature)",
          "value": null
        },
        {
          "key": "environment_metadata (internal stuff, ignore)",
          "descriptive_hostname": "true"
        },
        {
          "key": "metadata",
          "clusters": [

          ]
        }
      ],
      "instances": [
        {
          "id": "i-xxxxxxxx (EC2 instance ID)",
          "name": null,
          "reporting_url": "internal, ignore",
          "role": "db_master (could be app_master, app, util, db_master,
                   or db_slave)",
          "enabled": true,
          "public_hostname": "EC2 public hostname",
          "private_hostname": "EC2 internal only hostname",
          "awsm_token": "internal, ignore",
          "stonith_config": {
            "endpoint_uri": "https://cloud.engineyard.com/stonith",
            "endpoint_token": "token for EY stonith to identify this machine",
            "endpoint_id": "i-xxxxxxxx (internal, ignore)",
            "monitor_host": "EC2 internal hostname (internal, ignore)"
          },
          "instance_api_config": {
            "base_url": "https://cloud.engineyard.com/instance_api
                         This is where our API for instance configuration is",
            "instance_id": "internal db id for the instance in question",
            "token": "specific token for this instance",
            "core_url": "https://api.engineyard.com/ - internal api url"
          },
          "components": [
            {
              "key": "ssmtp - used as a sendmail replacement"
            },
            {
              "tags": [
                "product=cloud",
                "ey.domain=api.engineyard.com",
                "ey.server.id=some number",
                "ey.environment.id=some number",
                "sso_id=internal uuid, ignore it"
              ],
              "key": "appfirst"
            },
            {
              "instance_id": 123456789101112,
              "domain": "api.engineyard.com",
              "key": "appfirst_tags - appfirst related stuff, ignore it"
            }
          ]
        },
        {
          "id": "i-xxxxxxxx",
          "name": "resque_1 - specific name given to this instance in the dash",
          "reporting_url": "https://cloud.engineyard.com/reporting/some id",
          "role": "util (this one's a util machine named resque_1)",
          "enabled": true,
          "public_hostname": "EC2 public hostname",
          "private_hostname": "internal private hostname",
          "awsm_token": "internal uuid - ignore",
          "stonith_config": {
            "endpoint_uri": "https://cloud.engineyard.com/stonith",
            "endpoint_token": "internal token to ID this machine",
            "endpoint_id": "i-xxxxxxxx",
            "monitor_host": "internal ec2 hostname, ignore"
          },
          "instance_api_config": {
            "base_url": "https://cloud.engineyard.com/instance_api",
            "instance_id": 1234567891011213,
            "token": "some token",
            "core_url": "https://api.engineyard.com/"
          },
          "components": [
            {
              "key": "ssmtp - sendmail replacement"
            },
            {
              "tags": [
                "product=cloud",
                "ey.domain=api.engineyard.com",
                "ey.server.id=some id",
                "ey.environment.id=some id",
                "sso_id=some id"
              ],
              "key": "appfirst"
            },
            {
              "instance_id": 1234567891011213,
              "domain": "api.engineyard.com",
              "key": "appfirst_tags"
            }
          ]
        },
        {
          "id": "i-xxxxxxxx",
          "name": null,
          "reporting_url": "https://cloud.engineyard.com/reporting/some id",
          "role": "app_master - this is where requests first get to the
                   cluster. From here haproxy round robin balances them to
                   one of the other application machines in this cluster. The
                   app master is also responsible for running a 'git pull'
                   of your repository, then instead of running it from all
                   application machines, it does an rsync of the files over to
                   each app and utility machine instead. Cuts down on down on
                   deploy times and lowers load on git hosts",
          "enabled": true,
          "public_hostname": "public ec2 hostname",
          "private_hostname": "private ec2 hostname",
          "awsm_token": "some token, ignore it",
          "stonith_config": {
            "endpoint_uri": "https://cloud.engineyard.com/stonith",
            "endpoint_token": "ignore",
            "endpoint_id": "i-xxxxxxxx",
            "monitor_host": "internal hostname"
          },
          "instance_api_config": {
            "base_url": "https://cloud.engineyard.com/instance_api",
            "instance_id": 3145798675309123,
            "token": "ignore",
            "core_url": "https://api.engineyard.com/"
          },
          "components": [
            {
              "key": "ssmtp - replacement for sendmail"
            },
            {
              "tags": [
                "product=cloud",
                "ey.domain=api.engineyard.com",
                "ey.server.id=some id",
                "ey.environment.id=some id",
                "sso_id=some id"
              ],
              "key": "appfirst"
            },
            {
              "instance_id": 123456789000001,
              "domain": "api.engineyard.com",
              "key": "appfirst_tags"
            }
          ]
        },
        {
          "id": "i-xxxxxxxx",
          "name": "elasticsearch_1 - here's our elasticsearch util (example)",
          "reporting_url": "https://cloud.engineyard.com/reporting/some id",
          "role": "util",
          "enabled": true,
          "public_hostname": "public ec2 hostname",
          "private_hostname": "internal hostname",
          "awsm_token": "internal token, ignore",
          "stonith_config": {
            "endpoint_uri": "https://cloud.engineyard.com/stonith",
            "endpoint_token": "internal token, ignore",
            "endpoint_id": "i-xxxxxxxx",
            "monitor_host": "internal ec2 hostname"
          },
          "instance_api_config": {
            "base_url": "https://cloud.engineyard.com/instance_api",
            "instance_id": 12345678900000000001,
            "token": "internal token, ignore",
            "core_url": "https://api.engineyard.com/"
          },
          "components": [
            {
              "key": "ssmtp sendmail replacement"
            },
            {
              "tags": [
                "product=cloud",
                "ey.domain=api.engineyard.com",
                "ey.server.id=some id",
                "ey.environment.id=some id",
                "sso_id=some token"
              ],
              "key": "appfirst"
            },
            {
              "instance_id": 123456789000000000000123,
              "domain": "api.engineyard.com",
              "key": "appfirst_tags"
            }
          ]
        }
      ],
      "apps": [
        {
          "deploy_key": "This is the deploy key EYC creates for you",
          "repository_name": "[email protected]:yourorg/yourapp.git",
          "type": "rails3",
          "branch": "master",
          "deploy_action": "deploy",
          "migration_command": null,
          "revision": "revision",
          "run_deploy": false,
          "run_migrations": false,
          "name": "name of the application in the dashboard. For example,
                   'bob_loblaws_law_blog' - note, this is NOT the runtime env,
                   e.g. 'staging' or 'production'",
          "bundled": null,
          "newrelic": false,
          "database_name": "name of the database itself ([Postgre|My]SQL only)",
          "components": [
            {
              "collection": [
                {
                  "name": "New Relic",
                  "config": {
                    "label": "New Relic",
                    "vars": {
                      "new_relic_account_id": 11111111111111151111111111111111,
                      "license_key": "new relic license key - this is obtained
                                      automatically through new relic's
                                      partnership with engine yard; if you want
                                      to bring your own, no problem, you'll
                                      just have to use custom chef to install
                                      it!",
                      "api_key": "new relic api key, also obtained
                                  via partnership"
                    }
                  }
                },
                {
                  "name": "appfirst",
                  "config": {
                    "label": "appfirst",
                    "vars": {
                      "daily_supplement_path": "/etc/appfirst",
                      "api_key": 11111111111111151111111111111111
                    }
                  }
                }
              ],
              "key": "addons"
            },
            {
              "key": "app_metadata"
            }
          ],
          "gems": [
            {
              "name": "jazor",
              "version": "0.1.8",
              "source": null,
              "_comment": "This is the same as above - gems installed on the
                           machine by default - but applies at the *app* level"
            }
          ],
          "ebuilds": [
            "ignore for now - part of internal portage/gentoo stuff",
            "if you need a custom ebuild for something, submit a support",
            "ticket at https://support.cloud.engineyard.com and we'll hook",
            "you up as soon as possible. Then you'd use custom chef to",
            "install it once the ebuild is in our portage tree."
          ],
          "vhosts": [
            {
              "domain_name": "example.com - domain name for this app",
              "ssl_cert": null,
              "_comment": "If you had an SSL cert installed here, it would
                           show up as a string above instead of null. These
                           certs are installed in the dashboard under tools ->
                           ssl certificates. It would be applied by our nginx
                           recipe automatically."
            }
          ]
        }
      ],
      "crons": [
        "If you had cron jobs set up, they'd be listed here. However, note",
        "that this is a deprecated practice and only here to support legacy",
        "customers. Do this with custom chef instead, it's a lot better."
      ]
    },
    "this": "i-xxxxxxxx"
  },
  "instance_role": "util",
  "reporting_url": "https://cloud.engineyard.com/reporting/some uuid",
  "name": "resque_1",
  "run_list": "recipe[ey-base]",
  "_comment": "This run_list tells chef-solo on the server what recipes to run,
               which in this case is the ey-base recipe which will bring in all
               necessary and appropriate elements of our stack based on your
               environment configuration selections. In a full chef run, our
               'main' recipes are run first to configure a sane system. Then,
               once that's done and successful, your custom recipes (if any)
               are downloaded into /etc/chef-custom, obliterating anything
               that was previously there (so don't try to run from on the
               instance directly as we'll just nuke everything in
               /etc/chef-custom and re-download a fresh copy) and run to
               configure your environment to your liking, according to your
               code."
}

Examining the node object

So having this JSON file is great, but how do we actually use it? How do you get at the information contained herein?

When Engine Yard’s Chef run starts, it reads this file and places the keys into a hash called node. This hash can then be referenced using Ruby symbols to find the value you’re looking for. Here are a few example key/value pairs that are frequently used.

key value
node[:name] Instance name. For example, if you have a util named “elasticsearch”, this value would evaluate to “elasticsearch” on that specific instance.
node[:instance_role] Instance role. Could be one of util, app, app_master, db_master or db_slave.
node[:instances] Contains an array of hash objects representing the instances in the environment. For example, if you have five instances in your environment total (say an app master, standard application, a database master, database replica and a util), this is going to be an array of 5 hashes, where each object represents one of the five aforementioned hypothetical instances.
node[:applications] Contains an array of hashes that represent all applications deployed on the target environment. Remember, you can deploy more than one application to a single environment in Engine Yard Cloud (although it’s not always recommended due to memory contention issues, it’s still possible). For most dedicated production environments, this should only have one object in it, but you could have more than one if you deployed more than one app to this environment.
node[:environment][:name] Name of the environment. For example, “myapp_production”. This is what you called it when you first created it in the dashboard.
node[:environment][:framework_env] The framework_env variable. Usually something like “staging” or “production”. Could be arbitrary depending on what you put in the dashboard (we’ve seen some people do “development”, “test”, “qa”, “acceptance”, etc.).

A few examples

Personally, I don’t learn much from reading documentation - I need to see simple code samples. What follows is my best attempt to give you some simple examples that you can start to use right away.

Bear in mind, Engine Yard’s truly epic support team maintains a ready-to-use Chef cookbook repository on GitHub at https://github.com/engineyard/ey-cloud-recipes. THESE ARE NOT OFFICIALLY SUPPORTED so don’t ask our support team for help with those recipes. If you find a bug, submit a pull request. These are an easy way to get off the ground, but won’t be a perfect fit for every project or team. Use them as training wheels and a scaffold to get you started, then spruce things up your own way for your own purposes later on.

Conditional logic on instance name and role (or type)

Let’s say you want a Chef recipe to run on any instance named “elasticsearch” something or other.

if node[:name].match(/elasticsearch/i)
  # ... do things here ...
end

This isn’t necessarily the best way to go about this, however. We’re missing at least one if conditional that we should have to avoid accidents. What if somebody on your team accidentally names one of your database replicas “elasticsearch”, editing the wrong object for some reason? You want to be more explicit here - match the instance_role, too:

if node[:name].match(/elasticsearch/i) && node[:instance_role] == 'util'
  # ... do things to your elasticsearch machine here ...
end

However, let’s say you want to add more than one elasticsearch machine for some reason. You should probably number them - e.g. elasticsearch_1, elasticsearch_2, and so on. Or, suffix them with an explicit purpose: elasticsearch_customers, elasticsearch_invoices, and so on.

In this case, our String#match statement isn’t going to work that well. We need to expand it to do the right thing on the right virtual machines.

if node[:instance_role] == 'util' && node[:name].match(/elasticsearch_customers/i)
  # ... do something with the customers elasticsearch machine directly ...
end

if node[:instance_role] == 'util' && node[:name].match(/elasticsearch_invoices/i)
  # ... do things specifically for the invoices elasticsearch machine ...
end

if node[:instance_role] == 'util' && node[:name].match(/elasticsearch/i)
  # ... do things on ALL elasticsearch instances since
  # String#match is going to match any elasticsearch string
  # in case-insensitive manner (the /i at the end).
  # This would be useful for installing elasticsearch in one block,
  # then configuring specific items pertaining to, in this example,
  # customers, then invoices, in different blocks for different
  # machines.
end

Make your bash prompt more useful

When you SSH to an instance, normally you’ll just get a basic prompt that doesn’t necessarily contain the information you want to know about. This is fine for small environments, but larger more complex ones could use a little more… panache.

To do this, we’re going to utilize two files in a single chef cookbook: a template and the recipe itself.

Create the following directory structure:

cookbooks/ # your primary cookbooks directory
    main/ # the default 'main' recipes to run
        recipes/ # the recipes subdirectory
            default.rb # a blank, empty ruby file (for now)
    bash_customization/ # the cookbook for your bash customization
        recipes/ # recipes subdirectory for bash_customization
            default.rb # a blank, empty ruby file (for now)
        templates/ # a directory to hold our templates
            default/ # for the default recipe only
                bashrc.erb # the template which will be parsed with erb

The first task is to build out our ~/.bashrc. Let’s do that with a template now, and then we can populate the variables later.

cookbooks/bash_customization/templates/default/bashrc.erb

# Set the prompt variable. This will eventually be a
# pure shell script after being parsed with erb.
prompt="<%= @name %> (<%= @role %>, <%= @env_name %>) : <%= @public_hostname %> \w \n\$"
export PS1=$prompt

Now we have to tell Chef what those variables are supposed to be. Enter the recipe itself.

cookbooks/bash_customization/recipes/default.rb

# We're going to need net/http to initiate an HTTP request to AWS.
require 'net/http'

# Start by running on all machines. We do this by simply
# omitting the if/unless logic. We want this on all machines in our
# environment, don't we?

# Put something in the Chef log
Chef::Log.info "Parsing and writing out custom .bashrc for deploy..."

# Grab the public hostname for this instance. This recipe
# will be run *from* the instance, which means that the following
# IP address will be resolved internally from Amazon, which
# is good because it's an Amazon-specific, internal IP
# that they use for instance metadata.
public_hostname = Net::HTTP.get(URI('http://169.254.169.254/latest/meta-data/public-hostname'))

template "/home/deploy/.bashrc" do
  action :create # overwrites if existing
  owner  "deploy"
  group  "deploy"
  mode   0640 # deploy can read/write, deploy's group can read, no one else can do anything
  source "bashrc.erb"
  variables({
    :name => node[:name],
    :role => node[:instance_role],
    :env_name => node[:environment][:name],
    :public_hostname => public_hostname
  })
end

Now that you have these files written and in the proper directories, use the engineyard gem to upload them to your environment and apply them:

ey recipes upload -c YourAccountName -e YourEnvironmentName --apply

If you look in the dashboard, you should see the instances “spinning”, which indicates Chef is running. In a few minutes it should be done, and you can SSH up to an instance to see your new bash prompt.

Explore Further

This is a cursory overview with some examples of the information you can get from dna.json. If you want to dig into it, I would suggest you scp the file from your instance(s) to your local machine and take a look around at it. Hint: an editor like Atom makes collapsing/browsing the huge JSON hash that is dna.json much easier.

ssh deploy@myhost_as_it_appears_on_the_dashboard
# <you'll get SSH'd up. feel free to look over dna.json with jazor as is recommended in the motd.>
sudo cp /etc/chef/dna.json ~ # sudo is password-free for the deploy user
sudo chown deploy.deploy ~/dna.json # make sure deploy can read it
exit

# <now, back on your local machine...>
scp deploy@myhost_as_it_appears_on_the_dashboard:/home/deploy/dna.json ~/Desktop/dna.json

You should now have dna.json on your desktop (OS X, desktop Linux distributions). Open it in any editor and poke around.

Some Common Questions and Answers

“Do I need to ‘sudo’ my commands in Chef?” No, you don’t. Chef, when it runs on Engine Yard Cloud, will run as the root user so you won’t need to “sudo” anything.

execute "something" do
  command "sudo monit reload" # not this
  command "monit reload" # this - it'll already be run as root
  action  :run
end

“I get an error when running chef that monit can’t do something because it’s trying to restart <some process/thing here>” This commonly happens because monit - our system monitoring daemon - is busy trying to restart something else. I recommend taking a look at the syslog:

tail -f /var/log/syslog

to see if you can find monit repeating a loop of restarting the same thing and failing each time. This usually happens if the init.d system in Gentoo starts to act up. There are some cases where /etc/init.d/ start will claim a process is already started, but it doesn't show up when running `ps auxfww`. In that case, you want to run `sudo monit unmonitor all -g ` where groupname is the name of the group given in the service's monitrc file, located in `/etc/monit.d/.monitrc`. Once you've unmonitored that, run `sudo /etc/init.d/ zap` to force init.d to reset that service to a "stopped" state. Now, you can re-run `ps auxfww | grep -i <service, partial string match>` to see if you can make sure it's not running. If it isn't, run `/etc/init.d/ start` to see if it starts, then verify with `ps` again. Once that's done, `monit monitor all -g ` to tell monit to start watching it again. Run a `tail -f /var/log/syslog` to ensure that monit can start it correctly and it doesn't flip out anymore.

Something’s wrong and I’m not sure what. Where are the logs?

There are two ways to get at your Chef logs. The easiest is to go to the dashboard (https://cloud.engineyard.com/), click on the environment in question, then on any server where there was a Chef failure there should be a red exclamation mark. Click the “Custom Logs” link to see custom logs.

If for some unknown reason that doesn’t seem to have the correct information, SSH up to the instance and look at /var/log/chef.custom.log. This is a symbolic link to the latest custom Chef run log which should be date stamped. You can also see the main run log there in the same path - chef.main.log.

Additional Resources

Now that you have a better feel for the node object that Engine Yard Cloud’s Chef run will make available to you, you can start looking at other documentation we have on our version of Chef.

One thing to make note of is that Engine Yard totally side-steps the need for chef-server by having our own infrastructure setup. In short, it works like this:

  1. You write your own custom Chef code, then use the engineyard gem to upload to us: ey recipes upload.
  2. The gem runs a tar command on the recipes, then gzips them and sends them to Engine Yard, where we securely store the recipes for later use.
  3. When the time comes for running those recipes, our automation goes to your instance(s), deletes everything under /etc/chef-custom (where your custom recipes go), downloads the most recently uploaded copy of your recipe’s compressed tarball, decompresses it into /etc/chef-custom, then runs chef-solo against those recipes.

This means that there’s no need for knife, chef-server or the like. It also means that there may be some differences between our Chef and the canonical Chef from Opsco-err, “Chef” as they are now known.

Now you are armed with some knowledge about Chef on Engine Yard! Here are some additional resources:

About J. Austin Hughey

J. Austin Hughey is a member of our Professional Services team. He focuses on Ruby, Rails and development operations, and organizes OpenHack Austin.