September 2nd 2016

By Chris White

TAGS:
Open Source

Software Evaluation, Part One: Basic Suitability

When working on a software project, making use of third-party products, tools, or libraries can often save a lot of time and effort. But we have to be careful. If we choose poorly, we might be causing work further down the line.

As a member of Engine Yard’s distribution team, I am constantly reviewing open source projects for their inclusion in our stack. Having been through this many times in the past, I thought I’d share with you some of the things I take into consideration.

In part one of this miniseries, we’ll take a look at indicators of basic suitability. And in part two, we’ll look at more in depth evaluation areas such as ease of modification, dependencies, size, and community.

How much weight you put on any of these factors is up to you, and will often vary from one project to the next based on how critical it is, how you expect to use it, and so on.

Sometimes you won’t be able to satisfy everything, but that doesn’t have to be a showstopper. More often than not, compromises have to be made. The goal in assessing any project is to understand the risks involved.

Breaking Changes

One of the most important things to consider when evaluating a project is how they handle breaking changes. A breaking change is any change that is not backwards compatible with the previous Application Binary Interface (ABI), Application Programming Interface (API), or any other interface between the software and the systems that use it.

If you upgrade a dependency to a new version with no breaking changes then all your existing code should continue to run just fine. Bugs can be fixed, features can be added, but existing features shouldn’t go away or behave differently than expected.

One way of handling breaking changes that is gaining popularity in the open source world is called semantic versioning.

Semantic versioning, in a nutshell, says that all version numbers look like X.Y.Z. Releases that only add bugfixes bump the Z number. Releases that add features bump the Y number. And releases that introduce breaking changes (i.e. modify existing functionality in a way that is not backwards compatible) bump the X number.

This completely separates release numbers from big marketing pushes that has in the past focused on major version changes. Instead it turns version numbers into a sort of contract between the project maintainers and the downstream users.

For instance, if you’re on version 1.4.2 and version 1.5.0 is released, you know from the version number that you should be able to upgrade to get new features without having to worry about any of your existing code breaking.

But if you see a 2.0.0 release, you know there are breaking changes. And hopefully the project has included documentation in the release notes that give you more info (such as what to expect, how to change your code, and so on).

To get an better idea of how an open source project handles breaking changes, it’s a good idea to look at their release policies. Projects that take a lot of care over this have likely documented exactly how they handle things. Else, simply looking at past release and their release notes should give you some sort of idea what to expect.

If the project doesn’t document their releases, as a last resort, the project’s source repository commit log can give an idea of what has changed since the last release. Though, given how tedious this will be, whether it is worth doing each time you want to use a new release is going to be an important question.

Testing

Or course, saying that a new release does not contain any breaking changes and actually being sure of that are two different things. And one of the best way to validate that promise with a comprehensive test suite.

Taking a look the state of a project’s tests is one way to gauge how serious they are about not introducing breaking changes or regressions.

But more than that, you can actually double check new releases by running the previous test suite against them. This should give you an idea of what to expect before you start trying to integrate the new release into your project.

Better yet, you can write your own tests that test the areas of your project that interface with external dependencies.

In the event that a test breaks, it’s important to figure out why that is. Sometimes the test itself is broken, and other times it may be something wrong with the code being tested. This is going to make a big difference when deciding whether to upgrade or not.

Release Branches

A project that uses a system similar to semantic versioning is likely going to end up with multiple active release branches, also called release lines.

Release branches typically correspond with major version numbers. So, there will be a 1.x release branch from which all 1.x releases (1.0, 1.1, 1.2, and so on) are cut. And there will be a 2.x release branch, from which all 2.x releases (2.0, 2.1, 2.2, and so on) are cut.

If a project jumps from version 1.5.0 to version 2.0.0, there is a good chance the maintainers will keep the 1.x branch around for people who are not ready for the breaking changes in the 2.x release line.

How many of these release branches that are actively maintained, and to what extent they are maintained will differ from project to project.

Some projects will keep many release branches around, with a few of them dating back several years. They may even have dedicated teams of people who continue to backport features and apply bugfixes and security patches.

Other projects may only have a few, or even no maintained release branches. Sometimes all you have is a master branch with changes made to it continuously.

What you need from a project is going to depend a lot on how you plan to use it, what sort of software it is, and so on.

For example, a comprehensive set of well managed release lines for a database might be very prudent. And it might even be good to select and work with one of the long term support releases instead a more recent one. In this case, you are sacrificing nice-to-have functionality from the “latest and greatest” release for the additional stability you get from using an older, actively maintained release line.

But for something less critical, maybe you’re happy to deal with a project that is regularly “moving fast and breaking things”. Though even then, there are ways to mitigate risk. For example, by forking the project yourself, essentially creating your own pseudo release line. From there you can then cherry-pick the changes you want.

Licensing

And finally we come to licensing.

In the open source world, compatibility between licences can be an issue. A complex issue. In fact, so complex that I won’t even attempt to go into it here.

Instead, I recommend you read up on whatever license your project uses. There are probably resources available that discuss its compatibility with other licenses. And of course, when in doubt, seek actual legal advice.

But given that licensing can make or break a project’s suitability, why did I leave this to the end of the post? Well, sometimes there ways around problematic licensing. And if you look at licensing first, there’s a risk you’ll see an incompatible license and discount the project before looking into your options.

What sort of options? Well, depending on the project, my first suggestion would be to ask them if they would consider relicensing, or making the software available under an additional license. This sounds like a big ask, but this does happen. And especially if the project is a small one, some maintainers might be more than happy to accommodate you.

There’s also the possibility of relicensing your own code!

Wrap-Up

In this post we looked at a few of the basic top-level things you need to look at when you’re evaluating adding a third-party project as a dependency for your code.

How does the project handle breaking changes? Do they do it with care, or are you going to run into problems? Do they have tests that validate their promises around breaking changes? Do you have your own tests that do the same?

How does the project handle release branches? Are there lots of well maintained release branches offering a degree of choice with regards to stability and feature set? Or are you going to be forking a master branch and doing it yourself?

Finally, is the license compatible with your own license? And if not, is there anything you can do about it?

If you’ve answered these questions, you should have a fairly decent view of whether a third-party software project is going to be a good fit and whether the risks associated with adding it as a depency are going to be worth it.

However, let’s not stop there. In part two of this miniseries we’ll go deeper and start to look at the code itself, as well as the community that surrounds it. Stay tuned!

Share your thoughts with @engineyard on Twitter

Talk about it on reddit

About Chris White

Chris White is a Distribution team member at Engine Yard and works on the automation of many of the virtualization solutions used at the company. For more than 10 years, he has enjoyed hacking away on the Gentoo Linux Distribution. While not checking out what’s under the hood of widely used technologies, Chris also enjoys brushing up on his Japanese.