How to handle relations in RavenDB

One thing that confuses a lot of people when they dive into RavenDB, is how to handle relations and references between documents. RavenDB is a document database and thus is certainly not built around the idea of relations. Instead, most of the advantages of document databases come from the document oriented modeling, which treats every single document as isolated and meaningful on its own, therefore reducing the number of request to the database (in order to serve a web request for instance) and enabling easy horizontal scaling (sharding across multiple servers).

However, the real world has relations and so do most application built on top of RavenDB. Let’s consider a simple scenario. Say we have a web applications that stores customers and orders.  It’s very clear that orders needs to have a reference to their customer. This is typical 1:n relationship.

image

In a relational database there would be two separate tables, “Customer” and “Order”. In RavenDB we have 2 documents, one for each as well.

Now, on the order details page of our application, we want to display not only the order and its items, but also the name of the customer (as a link to his details page for instance). In SQL, this would be done using a single join between both tables. Since we’re working with a document database and thus have no built-in joins available, what shall we do? The good news is, RavenDB offers a lot of great feature that make it easy for you to handle those situations. Let’s see what options we have:

Disclaimer: I list every possible way I can think of here. Those with the sad smileys are the ones that I would not recommend in any case! Look for the green ones!

sadGo to the database twice

This is a very naïve solution:

class Customer
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
}

class Order
{
    public string Id { get; set; }
    public string OrderNumber { get; set; }
    public string CustomerId { get; set; }
}

var order = documentSession.Load<Order>("orders/1");
var customer = documentSession.Load<Customer>(order.CustomerId);
// do whatever you like with order and customer

You first fetch the order and then the customer in two separate roundtrips to the database. While this will run pretty fast in most situations (because it’s fast to get a document by id), it’s considered to be bad practice because of the unnecessary request. I suggest to avoid doing so.

sadInclude one document inside the other

class Order
{
    public string Id { get; set; }
    public string Number { get; set; }
    public Customer Customer { get; set; }
}

var order = documentSession.Load<Order>("orders/1");
// do whatever you like with order and order.Customer

Another way is to include the whole Customer inside the Order. Because the Customer itself is an independent document, a full copy of it will be stored within every order. This not only bloats up the database, but also is very dangerous since you have to maintain all the copies then. If a property changes (for instance the customers category), you might need to find all orders containing the changed customer and update them appropriately. Very inefficient and bad practice, obviously.

normalImplement a read trigger to do server-side joins

Skip this paragraph, if you’re just learning RavenDB.
I don’t see absolutely any reason why one would want to go this path, as it’s a whole lot of work and there are much better alternatives! However, since I cannot think of any critical drawback either, I will list it here though.

RavenDB has lot of extension points. You can write your own trigger method that is being executed on reads on the server. All you need to do is to reference RavenDBs source, compile a .dll and throw it into the plugin directory on the server (which is “/Plugins” by default). Inside your read-trigger you could check whether it is an Order document that is being loaded, and – if so – load the referenced customer and inject it into your order. Here is the documentation for read triggers.

normalImplement a custom responder

Again, you will not want to do so in reality. This is just to show you what is possible. Skip it, if you just interested in building a great application…

Another way to extend RavenDB on the server side is to write a custom responder. At its core, RavenDB is just (you know what I mean…) an http server. It accepts incoming requests, tries to figure out what to do with them and then responds accordingly. It’s actually quite simple. For every incoming request, the server iterates through all registered responders and checks if one can serve the request (defined using http-verbs and a matching url-pattern). If so, the responders “Respond” method is called. That’s all about it.

The same way RavenDB adds its core functionality to the http server component with a set of responders that are shipped out-of-the-box, you can add your own responders to it. The procedure is similar to writing a read trigger. Reference it’s source, implement AbstractRequestResponder, compile, copy into plugin directory and you’re done.

There’s nothing that prevents you from joining multiple documents within your own responder. Read more about that here: Raven Request Responders.

happyUse the .Include<T>() method

RavenDB also has a great way to load multiple documents in one database request. Conceptually it is similar to eager-loading in NHibernate (for those who are familiar with it). I will not go to explain it here, since a very good explanation already exists:  Optimizing referenced documents load

Unless you are sharding across multiple servers (or considering to do so in the future) this is the approach I recommend because it is very easy and natural to implement, while being reasonably fast.

If you want to shard your data you can either make sure that related data is stored on the same server and use a normal DocumentSession instead of the sharded one, or go by the following approach (which is what I recommend):

happyDenormalize your references

Yes, now we’re into what scares (SQL) database architects! Denormalize your data – in our case that simply means that we store parts of the customers information along with its identifier inside our Order document. That way, we only need to load the Order document and have all the information needed to display the page, including the customers name, address etc. If you think about it, I’m pretty sure that you would have done the same in a SQL database, because a customers name and address is something you normally don’t want to change on orders.

How does that look like?

class Customer
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public string OtherProperty { get; set; }
}

class Order
{
    public string Id { get; set; }
    public string Number { get; set; }
    public string CustomerId { get; set; }
    public string CustomerName { get; set; }
    public string CustomerAddress { get; set; }
}

var order = documentSession.Load<Order>("orders/1"); // do whatever you like with order and order.CustomerName, order.CustomerAddress,…

That’s it. If you apply naming convention carefully, tools like AutoMapper can help you a lot to write fewer code.
A very nice way to implement denormalized references and a more in-depth explanation can be found here.

Attention: There are cases where you want to update your denormalized data. If you’re interested what to do with them, read on.

Say we have denormalized a property “Category” on our order and we want to update it on all orders, when the customer is moved to another category. To do that, we can use RavenDBs built-in support for set-based opertions to update denormalized data. However, this is somewhat limited (lack of conditional updates) and can get very tricky soon. That’s why I personally only use denormalization when I have data that won’t change or use .Include<T>() or another approach instead.

In theory, there are two more ways you could use to keep your denormalized data up-to-date. “In theory”, because I don’t believe anyone wants to implement them in practice (you can safely skip this):
1) Update your denomalized data by just loading and updating all your documents containing denormalized data. Very dangerous in fact.
2) Implement a PUT trigger on the server-side that updates all denormalized references. This sounds interesting, but it would be so much development work to do for every single reference, that just doesn’t work…

happyUse Multi Maps / Reduce indexes to join document

Multi Maps / Reduce indexes are one of my favorite features of RavenDB. They allow you to write strongly-typed indexes that map information out of different types of documents and reduce together so you can do some really cool stuff. Ayende has a post describing how to use them here.

Using them, you can easily join multiple documents on the server-side and persist the result. That way you get similar perf characteristics as with materialized views in the relational world. Because they are processed asynchronously in the background, both querying and writing is very fast.

I will do a post that shows a simple join using Multi Maps / Reduce the next days (probably tomorrow).

 

Conclusion

If you’re coming from the sql world, chances are you will be confused by the lack of relations in document databases. However, if you’re running RavenDB you’ve got plenty of options to address this trade-off. I personally cannot think of any situation where I’d wish back SQLServer because of this (there could be other reasons). If you’ve made other experiences, please drop me a comment below…

Subscribe

Subscribe to my e-mail newsletter to receive updates whenever there is a new post.

,

8 Responses to How to handle relations in RavenDB

  1. Dave January 4, 2012 at 3:01 am #

    Nice post. You lay out all the options quite nicely. I’ve linked to you from my blog.

  2. tp January 18, 2012 at 11:22 pm #

    funny, working with nosql databases the last 15 years (Lotus Notes forever), same problem, similar solutions :-)

  3. Leniel Macaferi February 17, 2012 at 9:30 pm #

    WOW! I’ve been looking for this all afternoon. Your post put me on the right track to use the Includes approach…

    Thanks Daniel.

  4. Chris Marisic March 19, 2012 at 2:36 pm #

    I’d have to put some criticism against: “1) Update your denomalized data by just loading and updating all your documents containing denormalized data. Very dangerous in fact”

    That isn’t a true statement on it’s own, to update denormalized data like this you need to do it with the application being in an offline state so you don’t have to contend with people modifying/inserting data that you miss on your pass through.

    Introducing a large system change that requires this isn’t that big of a deal, however you need to weigh the cost of how often you would need to do this. Every deployment would lead to tons of outages, if it’s “well i can’t ever think of why i’d want to update this” and then Z occurs and you need to refresh the data, it’s not a big deal. Unless this starts to happen continuously which would imply you strongly want to rearchitect something.

    There’s also a realm of scope of here, if you have a handful of documents that the data can be denormalized and greatly improve the simplicity of the entire application design and except for 1 specific operation needs to update the batch of those documents in 1 shot, that can be reasonable. The major concern here is very few people end up with this scenario as most databases grow faster than linear fashion, as opposed to staying mainly within M*N items that this could be reasonable be sustained.

    • Daniel Lang March 19, 2012 at 3:02 pm #

      Chris, thank you for your feedback. I couldn’t agree more. However, I don’t consider this to be criticism as it includes my original statement of ‘it can be dangerous’…

Trackbacks/Pingbacks

  1. Continuous Education « IgorShare Weblog - January 26, 2012

    [...] – Journal – CQRS and Pretotyping I’m a phony. Are you? How Trello is different How to handle relations in RavenDB Design Patterns for Distributed Non-Relational Databases Like this:LikeBe the first to like this [...]

  2. benpowell.org - February 12, 2013

    [...] How to handle relations in RavenDB | Daniel Lang September 19, 2012 Leave a reply [...]

  3. Handling relationships in RavenDB - blooming code - Ben Powell - January 10, 2014

    […] How to handle relations in RavenDB | Daniel Lang   0 Kudos […]

Leave a Reply