Using dataloaders

Overfetching problem

It’s time to start talking a bit about performance. You may have noticed that your server now has multiple resolvers that do data fetching with MongoDB. What happens if you make a query for 10 links, including the users that posted and voted on each of them? This data will all be fetched as expected, but due to the decentralized way in which the resolvers work, you may find that your server is doing multiple requests for the same data!

This code will log all requests to the db server, and number them so you can easily know how many were made.

You should be able to see the logs in the terminal running your server. The screenshot below shows the logs for this query when the db has exactly 10 links, all posted by the same user:

As you can see, this simple query triggered 12 requests to MongoDB! One of these was for fetching the links data, but all of the others were for the same exact user! That’s not good at all, you should be able to reuse data that has already been fetched. That means extra logic for handling cache across different resolver calls…

Thankfully though, there already are great solutions out there for this kind of problem, so you don’t really need much extra code to handle this.

User Dataloader

You’re going to be using a library from Facebook called Dataloader for this. It’s very useful for avoiding unnecessary multiple requests to services like MongoDB. To achieve that, it not only covers caching, but also batching requests, which is important as well. If you test a different use case, where you have multiple links posted by different users, you’ll see that a separate request is made to fetch each of these users. After using Dataloader, they would all be fetched using a single batch request to MongoDB instead.

Let’s go through this code step by step:

  1. As was mentioned before, Dataloader handles batching by default as well, and for that it needs you to provide it with a batch function to be called when it has multiple items to fetch together. This loader will be used for user data, and the keys are going to be user ids. So the batch function just needs to make a single call to MongoDB with all the given ids.
  2. One important thing to know about data loaders is that it’s not supposed to be reused between different GraphQL requests. Its caching feature should be short-term, to avoid duplicate fetches happening for the same query. Check out its docs for more detailed explanation. Because of this, this file will return a function for creating the data loaders, that will later be called for each request.
  3. Finally, create the user data loader, passing it the batch function. In this case you also need to set a data loader option called cacheKeyFn. That’s because the user ids returned by MongoDB, which will be passed as keys to the data loader, are not actually strings, but objects that may fail comparison checks even when the ids are equal. This option allows you to normalize keys so that they may be compared correctly for caching purposes.

If you try restarting the server and running that same allLinks query again, you’ll clearly see that the outputted logs show a lot less requests. Using the same db as before (10 links posted by the same user), the new logs are:

The number of requests dropped by 75%, from 12 to just 3! It could become only 2 if the authentication process fetched the user by the id instead of the email address. But even if it needed to be email based there’s a way to reuse the data as well. If you’re curious, take a look at the prime function from data loader, which can help with this.

There still are other areas that could benefit from data loader in this server, like fetching votes and links. The steps would be the same for these though, which you’ve already learned, so we won’t repeat them here. It’s a good chance to practice more though, so feel free to work on that if you’d like to.

Unlock the next chapter
What are data loaders useful for?
Improving the resolver code's readability
Catching and handling errors
Reducing the number of data requests
Making data fetch calls consistent