I just stumbled upon this question on stackoverlow. I began typing my answer and when I was finished I realized that it was pretty long and it might be a good idea to do this in a blog post. So here is what I answer when I am asked whether to choose RavenDB or MSSQL as the storage technology for your greenfield application.
Disclaimer: before accusing me of being biased towards RavenDB, please note that we’ve just started two projects where we had chosen SQL server as our database. Obviously I love RavenDB, but it’s definitely not a one-for-all thing.
RavenDB is said to be very fast and you often hear that it’s faster than sql server. That’s not true. It is not RavenDB itself that is faster as a database, instead it’s the application built on top of RavenDB that is faster most of the times. There are a few reasons for that:
- Reduced amount of database requests
RavenDB enables you to store whole object graphs and collections inside a single document. That’s why you generally need much fewer requests when using RavenDB compared to a relational database.
- Less joins when querying
Although RavenDB has some concepts to support joining between documents, a document database embraces the use of denormalization. So instead of picking all the necessary information for one query together from different documents, you probably already have everything inside one single document. RavenDB is very fast in loading single documents so that’s the reason for the performance boost here.
- Indexes: precomputed aggregations and calculations
Map/reduce indexes can be used to do intensive calculation in the background (and also distributed) and store the calculated results on the disk, so that querying those is cheap because all the calculations have already been done. The same goes for normal map indexes as well. RavenDB can do calculations and store the results inside indexes for fast querying. You should note however, that indexes have BASE consistency, that means, they can be stale and need to catch up latest write operation on the database. While that means that in busy systems you don’t have a guarantee that you get most up-to-date results from an index, this is also a big selling point on the other hand, because it means, that write operations are not affected by the number of indexes. They simple complete and RavenDB updates the indexes in the background without slowing down insert/update perfomance.
Although RavenDB is putting a lot of efforts into making your experience as easy as possible, there’s still a steep learning curve related to the mindshift in data modeling. You will need to think about your data in a different way than you probably had been doing before when designing relational databases. This is something you definitely want to think about before choosing RavenDB as the storage technology for a new application with a tight development deadline.
However, learning document modeling and working with RavenDB is a rewarding experience because you get immediate feedback and see how much easier and faster you can be when you’re finished.
Although we have EF4.1 code-first, FNH with automappings and even micro-orms like Massive or Petapoco that do a very good job in simplifiying our development efforts when working with sql server, it’s still far easier to work with RavenDB (please note, this is my personal opinion here). You just don’t have to think about mappings and normalization. Just put your objects in and you’re done.
Tooling and ecosystem
This is something you will definitely miss if you’re coming from sql server and are used to tons of third-party software that helps you work with and manage your sql servers. RavenDB has made a few big improvements in this regard recently and I expect a massive improvement with the 1.2 release, however it’s still far beyond what you get when using sql server. If you don’t feel comfortable with configuration files and are more the wizard-magic guy then you probably don’t want to use RavenDB (right now).
Another pain point in working with RavenDB (and NoSQL databases in general) is that it’s nearly impossible to get good ad-hoc reporting support. RavenDB has a bundle that let’s you replicate data out to sql server for reporting purposes but obviously that’s much more friction than just querying on the datasets in the first place.
Which to choose?
The question which database to choose obviously depend on your concrete scenario, the skills of your team, your enviroment (existing licenses), etc. but here is what I think could help you:
We choose RavenDB when…
- we can think of our data in terms of aggregates with mostly independent chunks of data (e.g. customer, order, product, etc.)
- we need to have good performance on aggregation and calculation queries
- we need to have complex searching (full-text, facets, etc.)
- we need to be able to scale
- we need high availability at low costs
We choose SQL Server when…
- when we need to support user generated reports and highly dynamical data analysis
- we have to deal with mostly relation data (e.g. accounting, statistics)
- we want to use Windows Azure
- our customer definitely wants us to choose sql server without knowing better
I hope these arguments help you choose one or the other storage technology for your application…
But wait! You don’t have to choose one or the other database. Actually you can mix them and pick the advantages of both databases. So here is the deal: if you feel your application is complex enough or it would benefit from one or the other database, just use them both.