MongoDB Best Practices
Hello from the Engine Yard Data Team! We wanted to let you know what we’ve been up to since the last time we blogged.
When the team was formed earlier in the year, our first job was to expand our stack with MongoDB. However, we felt it would be a disservice to you, our customers, if we added a NoSQL datastore into the stack without first updating the relational databases that we support. So, we decided to pause MongoDB development to update both MySQL and PostgreSQL. As of today, MySQL 5.5 has gone into Beta, PostgreSQL 9.1 Beta is coming soon, and we plan to GA both releases in Q1 2012.
As we focus more on MongoDB, we have assisted several customers with custom MongoDB environments. During this process we discovered a variety of potentially problematic settings. So, we wanted to take this opportunity to share Engine Yard’s best practices for MongoDB.
If you have a custom installation of MongoDB, please make sure to check your installation against this post. We recommend that you make changes as necessary. If you need help (until we offer Mongo in our product), our Professional Services organization can lend you a hand.
General NoSQL best practices
Many articles have been written to address the NoSQL selection process. Factors that influence your choice of database deal with your application’s needs when it comes to: read/write throughput, durability, consistency of data, latency, etc. This criteria is nicely summarized by Nathan Hurst in his “Visual Guide to NoSQL Systems”.
Selecting the right NoSQL database is beyond of the scope of this post, but please do your research. It will pay off in the end as no single solution fits all scenarios. This article assumes that your research has led you to choose MongoDB for your application. We at Engine Yard recommend that you:
Test exhaustively
Test within the context of your application and against traffic patterns that are representative of your production system. A test environment that does not resemble your production traffic will block you from discovering performance bottlenecks and architectural design flaws. Examine your queries closely and always collect metrics.
Don’t assume that what worked for your RDBMS will translate
Whatever worked on your SQL database may not work on MongoDB so make sure that your expectations are realistic and aligned with the features of the database. For better performance, design your documents and queries according to what 10gen recommends. Understand that your application might need to be re-architected in order to migrate to a non-relational data store. Read “The cost of Migration” for more information on migrating to NoSQL.__ __
Think about the consistency and durability needs of your data.
Think about your durability and consistency needs. We cannot emphasize this enough. During your research you will find that MongoDB offers durability through replication. It is never recommended to run a standalone MongoDB for production use, make sure you understand why.
Understand what to expect from EBS volumes
If you are an Engine Yard Cloud customer (AWS EC2), you should know that the performance of Amazon’s Elastic Block Storage (EBS) can be inconsistent. Collect throughput metrics over time when benchmarking your application and plot your data. Engine Yard Managed customers do not have this limitation.
MongoDB Best Practices
Here are the guidelines we follow as we work on releasing MongoDB into our stack.
Always use replica sets
Replica sets provide high availability through automatic failover. If your primary node fails, a secondary node will be elected as primary and your cluster will remain functional. We will not support a non-replicated MongoDB for production environments.Consider a hosted solution if the cost of replicating Mongo is too much. Engine Yard has established partnerships with MongoHQ and MongoLab. See our Partners page for more information regarding offerings for Engine Yard customers.
Keep current with versions
Please keep your version of MongoDB current. 10gen rolls out numerous fixes within each release that help your cluster run smoother. Version 2.0.x includes significant performance and concurrency improvements, index changes, bug fixes, a compaction command, and it even makes it easier to upsize your cluster. If you are still using 1.6.3 please be sure to upgrade as soon as possible.
Don’t run MongoDB on 32-bit systems
MongoDB has a ~2.5GB data limit on 32-bit systems. Its storage engine uses memory-mapped files for performance and they are tied to the available memory addressing. With Engine Yard Cloud, you should use a Large instance as your base installation. We will only support production MongoDB on 64-bit instances.
Turn journaling on by default
MongoDB supports write-ahead journaling of operations to facilitate crash recovery and node durability. We strongly recommend that you turn on journaling by default.
Mind the location of your data files
Check your recipes to make sure that your MongoDB data files exist in a persistent volume (example: /data/mongodb). Using ephemeral drives is possible, but you should be extremely careful when deciding to do so, as it will influence your cluster architecture. We recommend using EBS for your MongoDB data.
Your working set should fit in memory
Being able to keep the working set (and indexes) in memory is an important factor in overall cluster performance. If you notice the number of page faults increasing, there is a very high chance that your working set is larger than your available RAM.
You have two options when your data exceeds your available RAM: increasing the size of your MongoDB instance or sharding. We recommend increasing instance size first.
Scale up if your metrics show heavy use
If your instance shows a load over 65%, you should consider scaling up. Your load should be consistently below this threshold during normal operations. This also impacts recovery and vertical scaling scenarios. If you need to increase your instance size, AWS recommends the following upgrade path: Large, Extra Large, High Memory 4XL. We have also observed less latency on larger EBS volumes.
Be careful when sharding
Sharded installations require careful understanding of your application’s data access patterns. Please take the time to understand how MongoDB sharding works and if you really need it. Also remember that selecting a good sharding key is important as it will affect your performance.
Config servers are critical to the health of your cluster. You MUST have 3 configuration servers in a sharded production environment. NEVER delete their data, always make sure you back them up frequently, and refer to them, if you can, by name using an /etc/hosts file (this will make your cluster more resilient).
Config servers are light processes but they must also live on 64-bit instances. Don’t put all 3 config servers in the same instance! You can schedule a consultation with Engine Yard Professional Services if you are considering a sharded install.
Use Mongo MMS to graphically monitor your service
If you are not doing so already, try using Mongo MMS. 10gen is actively developing this product and it is proving to be an excellent way to visually evaluate the health of your cluster.
Keep up with MongoDB resources
Keep informed as things change rapidly.MongoDB resources:
- Documentation: http://www.mongodb.org/display/DOCS/Home
- Google Group: http://groups.google.com/group/mongodb-user
- Bugs: https://jira.mongodb.org
- Blog: http://blog.mongodb.org/
###Want to hear more? Give us feedback!
Your feedback is used to speed release processes, plan blog posts, prioritize roadmap tickets and much more! Please let us knowif you try any of the alpha/beta releases, or if you have any questions.We will continue to ensure that our relational databases are current and optimally configured, so expect ongoing improvements.
Share your thoughts with @engineyard on Twitter