On the state of HStore

Earlier this year, Engine Yard began to sponsor some exciting development as part of our first ever OSS Data Sponsorship. The sponsored team (which includes PostgreSQL hackers Oleg, Teodor, and Alexander) recently got back from PGCon EU where they gave two talks - Next Generation of GIN and Binary storage for nested data structures and application to hstore data type.

We had an opportunity to ask Oleg about their experience. The team has been quite busy building the next generation of HStore, the extension that allows you to store key/value (or schema-less) data in PostgreSQL. Even more updates have been committed to HStore v2.0 and we couldn’t be more excited to share them with you!

Oleg’s blog post reviews their latest release in exciting detail (cross-posted) but here are a few highlights:

HStore support for scalars and types

We reworked the binary storage for nested hstore to add scalars and types support. Types are now supported in hstore. They are numeric, boolean, strings and NULL.

postgres=# select 'a'::hstore, 't'::hstore;
hstore | hstore
--------+--------
"a"    | t

We added a bunch of new operators and functions to work with these types (slides 42-46). Types are huge improvement, since now hstore and json could be converted to each other without any problem. Slide 40:

=# select '{"a":3.14}'::json::hstore::json;
json
---------------
{"a": 3.14}
=# select '3.14'::json::hstore::json;
json
----------
3.14

HStore and MongoDB Performance

We added performance comparison with MongoDB. MongoDB is very slow on loading data (slide 59) - 8 minutes vs 76s__Sequential scan speed is the same - about 1s, index scan is very fast - 1ms vs 17 ms with GIN fast-scan patch. But we managed to create a new opclass (slides 61-62) for hstore using hashing of full-paths concatenated with values and got 0.6ms, which is faster than mongodb ! Here GIN++ is the GIN with fast-scan patch.

Method HStore seqscan HStore GiST HStore GIN HStore GIN++ HStore GINHash MongoDB
Index size 64MB 815MB 815MB 349MB 100MB
Query time 0,98s 0.3s 0.1s 0.017s 0.0007s 0.001s
Speedup 3x 10x 60x 1400x 1000x

It’s worth noticing, that MongoDB index is very “narrow” index, while hstore’s indexes could speed up more queries.

For more information see Part 1, Part 2 , and GIN fast-scan and speedup of @> operator of their HStore v2.0 blog series. The team has formally submitted their work to the Postgresql 9.4 commitfest where it awaits the PostgreSQL community’s feedback.

We are very excited to see the work we have sponsored reach the next stage of acceptance towards becoming part of the 9.4 release.

What does it mean for you?

Oleg, Teodor, and Alexander’s work not only means that you’ll have a better and more powerful HStore available, it also means that PostgreSQL’s native json support has the opportunity to share its binary representation with HStore. This ultimately results in a faster and more efficient support for json.

If you still haven’t tried HStore in your application and want to give it a go, we have a few resources to help you get started:

Finally the next release of PostgreSQL could use your help! The the third commit fest (out of four in the PostgreSQL 9.4 development cycle) will start this Friday November 15th and patch reviewers are needed. Don’t be intimidated, all you need to know about reviewing patches has been explained.