November 15th 2013

By Ines Sombra

TAGS:
Technology
Databases

On the state of HStore

Earlier this year, Engine Yard began to sponsor some exciting development as part of our first ever OSS Data Sponsorship. The sponsored team (which includes PostgreSQL hackers Oleg, Teodor, and Alexander) recently got back from PGCon EU where they gave two talks - Next Generation of GIN and Binary storage for nested data structures and application to hstore data type.

We had an opportunity to ask Oleg about their experience. The team has been quite busy building the next generation of HStore, the extension that allows you to store key/value (or schema-less) data in PostgreSQL. Even more updates have been committed to HStore v2.0 and we couldn’t be more excited to share them with you!

Oleg’s blog post reviews their latest release in exciting detail (cross-posted) but here are a few highlights:

HStore support for scalars and types

We reworked the binary storage for nested hstore to add scalars and types support. Types are now supported in hstore. They are numeric, boolean, strings and NULL.

postgres=# select 'a'::hstore, 't'::hstore;
hstore | hstore
--------+--------
"a"    | t

We added a bunch of new operators and functions to work with these types (slides 42-46). Types are huge improvement, since now hstore and json could be converted to each other without any problem. Slide 40:

=# select '{"a":3.14}'::json::hstore::json;
json
---------------
{"a": 3.14}
=# select '3.14'::json::hstore::json;
json
----------
3.14

HStore and MongoDB Performance

We added performance comparison with MongoDB. MongoDB is very slow on loading data (slide 59) - 8 minutes vs 76s__Sequential scan speed is the same - about 1s, index scan is very fast - 1ms vs 17 ms with GIN fast-scan patch. But we managed to create a new opclass (slides 61-62) for hstore using hashing of full-paths concatenated with values and got 0.6ms, which is faster than mongodb ! Here GIN++ is the GIN with fast-scan patch.

Method	HStore seqscan	HStore GiST	HStore GIN	HStore GIN++	HStore GINHash	MongoDB
Index size		64MB	815MB	815MB	349MB	100MB
Query time	0,98s	0.3s	0.1s	0.017s	0.0007s	0.001s
Speedup		3x	10x	60x	1400x	1000x

It’s worth noticing, that MongoDB index is very “narrow” index, while hstore’s indexes could speed up more queries.

For more information see Part 1, Part 2 , and GIN fast-scan and speedup of @> operator of their HStore v2.0 blog series. The team has formally submitted their work to the Postgresql 9.4 commitfest where it awaits the PostgreSQL community’s feedback.

We are very excited to see the work we have sponsored reach the next stage of acceptance towards becoming part of the 9.4 release.

Try Engine Yard for your Ruby, PHP, Java or Node.js app.

See the Features

What does it mean for you?

Oleg, Teodor, and Alexander’s work not only means that you’ll have a better and more powerful HStore available, it also means that PostgreSQL’s native json support has the opportunity to share its binary representation with HStore. This ultimately results in a faster and more efficient support for json.

If you still haven’t tried HStore in your application and want to give it a go, we have a few resources to help you get started:

See our own J. Austin Hughey’s guide to using PostgreSQL HStore in a Rails application on Engine Yard Cloud.
If you have Railscast Pro, there is a step-by-step guide on using HStore.
Craig Kerstiens answers the question of hstore vs. JSON - Which to use in Postgres.
Andrew Godwin gave an excellent presentation at All Your Base on using HStore for Rapid Schema Development with PostgreSQL.

Finally the next release of PostgreSQL could use your help! The the third commit fest (out of four in the PostgreSQL 9.4 development cycle) will start this Friday November 15th and patch reviewers are needed. Don’t be intimidated, all you need to know about reviewing patches has been explained.

Share your thoughts with @engineyard on Twitter