Follow BigDATAwire:

July 26, 2013

Rainstor Offers Mastery of Time and Schema

Isaac Lopez

Rainstor announced this week that they have been awarded two new patents for their high compression based data storage system, which they say add more function and flexibility to their database systems. The patents, they explain, enable database admins to be masters of time and schema.

Rainstor has been addressing challenges of big data before it ever existed as a mass-conscious phenomenon. Birthed in the British Ministry of Defense, their database technology was created for the explicit purpose storing large volumes of data that remains available for query regardless of how large the data set grows. 

The company’s claim to fame is their hyper-efficient compression scheme which they say can reduce storage footprints by factors that frankly look ridiculous in print (in their press release, they claim reductions of up to 97% being possible under whatever optimal conditions). Lofty claims aside, there are many believers in Rainstor’s claims – their partner list looks like a veritable “Who’s Who” list in systems technology, with giants such as IBM, HP, Dell, Teradata all on the list, not to mention three of the leading Hadoop distro vendors in the space, Cloudera, Hortonworks, and MapR.

The Rainstor compression scheme, in essence, is about deduplication in the database – attacking the repeated values and compressing them down to a single common value. However, it’s not merely about single values, says Rainstor Chief Architect, Mark Cusack, but also the patterns that develop within the data.

“It’s all about spotting those patterns within a relational table and storing them only once,” he explains. “As records are inserted into the a Rainstor table, what we’re doing is spotting to see whether the values have existed in previous records and then basically storing a pointer as you go forward.”

Cusack explains that as new values come into the database, huge patterns develop that can be compressed into a single instance, thus causing the database to grow at a slower rate than if each full-blown record was being added one at a time. This is especially true, says Cusack, in financial services and telecom companies, which he identifies as their two key verticals.

“We’ve built this from the ground up,” explains Cusack. “All of our IP is our own and allows us to do things that we think no other database can do, have optimizations that no other system can do – particularly around compression and query performance.”

Adding to their list of intellectual property, the company announced new US patents this week. The first patent, says Cusack, enables users to relatively easily examine changes in a database over time. “What the archiving patent is all about is the ability to take snapshots of database tables and high level indexes which are really cheap to recreate and can be time stamped.”

“Imagine basically having a slider over a time scale which you can see the content of that database changing,” illustrates Cusack, who says that this feature is useful in such applications as financial services where you might want to do an sort of “instant replay” of a traders actions during a particular day or period of time.

The second patent, revolves around providing operational flexibility to a database surrounding schema changes. Cusack explains that traditionally, when adding a new column to a table, the entire database would be locked, and a column added, including to every single record that went before. Rainstor, instead, uses a thin wrapper around a table that adds the new column without needing to worry about changing the legacy data, thus making schema changes relatively painless and forward looking.

“We’re talking seven, ten, twenty, thirty years of data, depending on the retention requirements,” he explains, noting that a schema change of that level on a classic relational database would be cost prohibitive. “In Rainstor, you could have your entire historical dataset going back many years – it could be petabytes of data.”

The two patents taken together, says Cusack, efficiently accomplish operations that can be done at the petabyte scale in their system that would otherwise be massively cost inefficient in other relational systems.

“It’s very much about our compression,” concludes Cusack. “The compression is a driver for efficiency, not just in terms of the footprint of the data, but also in terms of the speed of query… We like to think of Rainstor as a traditional database from the outside in, but under the hood, it’s very different.”

Related Items:

DataStax Rakes $45 Million; Schemes Growth

Zettaset Puts Hadoop on Lockdown

Cloudera Adds a Sentry to Their Stack 

BigDATAwire