hornbeck

thoughts on business, community and customer service

First thoughts on MongoDB

with 5 comments

MongoDB

Recently I decided to move a large portion of a RDBMS backed application to a document database. The reason for this is that I’m consuming a lot of data from many different sources and it’s not always consistent when it comes to fields provided. The data is normally in csv based files and contain at least fifty thousand rows each. While they are never really consistent with all the data, every row contains at least some of the same fields every time. To date we have loaded the csv files into ruby and pulled the common fields that we knew would be there and discarded the rest. However, we would like to keep all of the fields around for legacy reasons or to run back end queries against. The current approach is that we just keep all of the files in a directory within the app. This consumes space and is wasteful as a lot of that information is duplicated within the database.

Since I’m an avid Peepcode subscriber the first document database that I looked into was Couchdb. While Couchdb is an amazing project and I will use it in the future, for my current needs it proved to not be a good fit. Since I need sql like queries for a lot of what I’m going to do with the data, Couchdb’s views did not fit my needs for this project. Researching Couchdb did lead me to a couple of other projects, one of which was MongoDB. Their website describes Mongodb like so

MongoDB is a high-performance, open source, schema-free document-oriented database. MongoDB is written in C++ and offers the following features:

* Collection oriented storage – easy storage of object-style data
* Dynamic queries
* Full index support, including on inner objects
* Query profiling
* Replication and fail-over support
* Efficient storage of binary data including large objects (e.g. videos)
* Auto-sharding for cloud-level scalability (Q209)

A key goal of MongoDB is to bridge the gap between key/value stores (which are fast and highly scalable) and traditional RDBMS systems (which are deep in functionality).

So a goal of MongoDB is to bridge the gap between document stores and RDBMS. This sounded very good to me. It’s also worth noting that they supply many different language drivers on their site for connecting and working with the db. In my case I needed Ruby which they supplied.

MongoDB seemed to fit exactly what we were needing, it allowed me to load my full data sets into it and only pull out the fields that I needed. While I was not fully happy with the Ruby libraries provided they did their job. Today, John Nunemaker released his MongoMapper library that allows you to have ActiveRecord like functionality within your Rails app while using MongoDB. This will allow us in some cases to do nothing more than remove the inheritance of ActiveRecord from a model and add in

include MongoMapper::Document

plus whatever keys we are wanting to access and keep moving along our way. This type of integration and functionality is not there with any of the other document databases and is what really makes MongoDB stand out for this project. I’m sure I will use other document databases and key/value stores in the future. They are great alternatives to RDBMS, allowing you to store similar data in groups without worrying if they contain the same exact fields. While a RDBMS is often a great solution it is not always the best solution for the job.

* MongoDB Chef recipe for Engine Yard Solo : fork me

Written by hornbeck

June 28, 2009 at 3:59 am

Posted in couchdb, mongodb, ruby

5 Responses

Subscribe to comments with RSS.

  1. Excellent post, John.

    We’ve been using MongoDB with Ruby since March with the MongoRecord ruby gem. It has been great for storing data coming from an api (in our case, Twitter) without worrying about keeping a rdbms schema in sync.

    The dynamic queries are great, too. This is also the reason we moved away from CouchDB.

    Jim

    June 28, 2009 at 5:15 am

  2. Thanks for the linky. Glad MongoMapper looks interesting. Let me know how it works out for you.

    John Nunemaker

    June 28, 2009 at 10:34 pm

  3. I recently conducted a similar exercise, evaluating various document databases for a new project I’m working on. I found MongoDB’s query engine to be very convenient for my app, which allows users to make arbitrary queries, but I ran into an interesting limitation with it.

    Since MongoDB uses periods to dereference hashes, hash keys used in queries cannot contain periods. I’ve considered encoding periods before storing them in the database, and then encoding search queries before passing them to MongoDB, but I’m curious how others have worked around this issue.

    Otherwise, I was very happy with MongoDB.

    Clay McClure

    June 29, 2009 at 2:10 am

  4. I work on MongoDB, so I was wondering if you had any suggestions for making the Ruby libraries better. We’d love any feedback you have. Thanks!

    Kristina

    June 29, 2009 at 3:47 pm

  5. […] 1, 2, 3, 4,5,6,7,8 […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: