Measuring Application Performance & Business Metrics using StatsD
Note__:__This post was written by Michael Forhan, one of our friends at AppFirst.
StatsD is a wonderful tool created by the developers at Etsy for keeping track of application metrics with minimal overhead. To stay small, StatsD was written to deliver metrics over UDP to a Graphite server for processing and visualization. At AppFirst we realized that the power of StatsD was enormous: what could be better than seeing application performance, resource bottlenecks and real-time business metrics. The AppFirst collector already picks up system, process, log and nagios-plugin data: extending application awareness with StatsD was only natural.
The developers at AppFirst took StatsD one step further. While we work with Etsy’s deployment of StatsD, we streamlined application metrics by:
- Doing away with maintaining your own graphite server
- Reducing network overhead
- Ensuring delivery of content using TCP
- Providing assistance for using StatsD in your application with a support site, blogs and videos.
If you are just learning about StatsD, this post should help you understand the basics of StatsD, its ease of use and the power you can add to your application so you can create high performance applications.
StatsD implements only a few types of metrics: counters, gauges and timers. These metric types should cover the basis of anything you want to measure in your application.
Try AppFirst free on Engine Yard and get full-stack visibility into your systems, apps and metrics.
Namespaces
Namespaces, also known as buckets, are where your metrics are collected, aggregated and displayed. Namespaces are dot delineated, allowing you to build a natural grouping for searches, correlations and dashboard display. Namespaces are completely user defined but I recommend the format <Application>.<Server>.<Function>
as a starter. The reason I utilize this format is with Cloud instances your <Server>
field can change quickly and your application name is an easily recognizable way to group metrics. For example, EcommerceApp.AWS-12542383.Cart.Add
. The other nice part about this format is if you decide to build your own tools using our REST API, you can group your Cart metrics together so you can have total transactions, ratios of add to buy or add to delete.
Counters
Designed to measure the number of events in a minute, a counter can be incremented or decremented in code. Counters are great for measuring items like number of incoming connections, number of resource requests, or even the frequency of a method or function call. The call to push a StatsD metric is easy. AppFirst has developed collectors for quite a few languages, but in this example I’ll use Ruby:
require ‘afstatsd'
# instantiate a metric connector
statsd = Statsd.new
# define our namespace
statsd.namespace = “myApplication.#{Socket.gethostname}”
# Your Code
statsd.increment "ProductRequest.#{product.sub(/\s+/,"")}"
This example generates a metric named MyApplication.<SERVER>.ProductRequest.<ProductName>
and increments the count by one each time it is executed. In a shopping cart method, you could see product velocity (your product managers and marketing team will love you for this) . Using StatsD you can increment or decrement counters - giving you flexibility for displaying information on a per minute basis.
Gauges
Counters are great for minute by minute instrumentation, but if you are looking to monitor values that shouldn’t change often, a gauge is more appropriate. Gauges are designed for metrics of volume: memory pools, process/thread counts or how many database connections are open. The idea is that on a minute by minute basis, the values either don’t change much or your StatsD metric isn’t called frequently. If you are using a polled data script, or even a cron job to monitor the status of a service, then a gauge is a great choice.
For our example, lets talk about an iptables script. This script runs every so often to check your log files for failed login attempts. If the script locates a failed login with multiple attempts it creates a temporary iptables rule and writes the information to a file. The temporary rules have an exponential blocking period: 5 minutes for the first event, 25 minutes for the second event, 2 hours and 5 minutes for the third event, and so on upto 1 month. The information you want to keep track of is the number of iptables rules in effect. With normal operations this number should be consistent. With StatsD you could add the following line to your bash script:
# Code from Etsy statsd-client.sh example @ github
# Setup UDP Socket
exec 3<> /dev/udp/${STATSD-HOST:-127.0.0.1}/${STATSD_PORT:-8125}
# Send data over Socket
printf “iptables.totalrules:$numberOfRules|g” >&3
# Close Socket
exec 3< &- exec 3>&-
With this script addition, if you have a variable numberOfRules
, you can push that value to StatsD and have it collected on the AppFirst big data store for aggregation and correlation.
Timers
The last item I’ll discuss is timers. Timers are easy enough to understand: they report a time value so you can adequately measure where performance bottlenecks may be in your application. Utilizing timers to find bottlenecks, you can determine where to spend your time to refactor code.
The easiest way to measure this metric is to call your function directly to get the timing. A ruby example:
require ‘afstatsd'
# instantiate a metric connector
statsd = Statsd.new
# define our namespace
statsd.namespace = “myApplication.#{Socket.gethostname}”
# Your Code
statsd.time ‘Cart.ProcessPayment' { cart_processing }
This script will execute your cart_processing method and store the execution time in a StatsD metric named MyApplication.<SERVER>.Cart.ProcessPayment
.
These are simple examples that go to show that information collected by StatsD is nearly limitless. With StatsD libraries available for a large number of languages, you can easily utilize StatsD metrics in your application and then visualize that information on the AppFirst dashboard. Try AppFirst free on Engine Yard and get full-stack visibility into your systems, apps and metrics.
Share your thoughts with @engineyard on Twitter