Working with Firestore: Building a simple database model

Published in

ProAndroidDev

7 min readMay 19, 2018

One of the most asked questions on Stackoverflow or any other community when we are talking about Firebase is how to avoid complex or nested queries and how to build a good and simple database model. These two concepts go together, because if your model is good, it’s quite unlikely that you would need nested queries.

This article talks about general concepts to take into account at the time of build our Database model for Firestore or Realtime Database. There are a few concepts and tips that will help you to make your mind changes from the usual relational databases model that we are use to work with. Lets start with:

Avoid Deep-data

In the code above we can see how the structure that we are building is nesting deeper and deeper the child models and any of their attributes.
This make our model grow in size and unfit to work with in the cases when we don’t need all the information contained there. For example, if we want the data from the user, the information related with the articles is unnecessary. In this case, normalizing our dabatase will help us. What means normalize? Well, it simply means to separate our information as much as we can:

Doing this, our structure gets flattened and we will be able to pick the data from the user_id_1 without retrieving any data related with the articles or viceversa.
In this example, we can also separate comments from articles, moving them to a Sub-Collection which lead us to our next question:

When should I nest a collection in Firestore?

There are a lot of cases where we can think about creating a collection inside a document, or maybe just using an array or map to keep a list of data in our current document. Before that, we have to ask us some questions:

Is this data associated to the parent document?
We have to check if there is a direct relation between there is a direct relation between the two. The comments of an articule, likes or shares of a post…
Do I want to show this data together with it’s parent data?
If you want to keep the data separated in the models at the time of displaying it on. Keep in mind that then you can make your model lighter moving this data to a collection. Just think about a Facebook publication, where you see the text that your friend has posted, but only if when you click on comments, the comments associated to the publication are shown.
Is my list going to grow too much?
If you want to nest a few tens or hundreds of objects in your POJO, you can keep it inside an array. Just take into account that your model will grow and become heavy to download in the client. This is very important because a Firestore documents can’t surpass 1MB.
What if I want to just partially download the nested data?
Move the data directly to a collection. It’s the only way you will be able to limit the number of retrieved documents.
Any of my parameters have to trigger a Cloud Function?
There could be cases where you don’t need to save your data in a collection but you want that data to trigger a Firestore Cloud Function. Moving said data to a collection is the right movement because you will be able to be more selective with the documents that trigger your Cloud Function.

When we would like to do this? Well, a good example will be the moment where we want to update a lot of paths in our database at the same time in an atomic way. An example could be update duplicated data in our database… Duplicated data in my database! This guy is crazy!

Duplicated data is a common practice when working with non-relational databases as Firebase. It saves us from performing extra queries to get data making data retrieval faster and easier. Think about our current model above, if we want to show the data related with the author of an article, we would need to do an extra query. Now think about a list where you want to show lots of articles. that means lots of extra queries to retrieve the author’s data. Let’s fix it:

We can observe that every article has an object called author which contains all the data related to the user. In the same way we could do this to the comments if we want to show the profile pic and the name of the author of the comment on our app. For these scenarios we are going to talk about the next point in the article: Atomicity

Atomicity

What is atomicity? Wikipedia explains it better than me:

An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs.
A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright

Atomic operations assure us that the data that we duplicate in our model is updated at the same time, all at once, avoiding possible inconsistencies. This is usually made using Cloud Functions working on the server or WriteBatches working on the client.

In both cases, keeping the atomicity implies the implementation of a few changes on our model. In the model above. How would we be able to know which articles have been wrote for a user? or which comments have been wrote by the user? We will have to do a lot of queries along the whole database to get the ID’s of the articles and comments related to our user. This would be a lot of work and time wasted.

That’s why we keep relation-paths in our models to do multi-path updates

Relation-path and Multi-path updates

Keeping the ID’s and needed data of the articles and comments made by a user, we can easily access all of them at once and perform with the atomic operation in all the paths, allowing us to modify the username and profile picture everywhere in our model.

In this example, you can save the article ID in authorOfComments. This will allow you to build the whole query :
articles/{articleId}/comments/{commentId}

Tip: A good practice in case you need some meta-information about the collection is to add an extra field and update it accordingly with a cloud function. Item count, last update time… are perfect examples for this use case.

You can also find a lot of information about building non-relational databases on the web if you’re interested. I highly recommend to read the article NoSQL Data Modeling Techniques by Ilya Katsov which you can find here.

What should I do if I still need nested queries?

Graph-based services like social networks are quite common nowadays. In these services small pieces of information like follow/unfollow status users update thousands of duplicated records that impact nested information.
We want to avoid these scenarios in any way we can. I’ll show you some techniques that may be useful for theses cases with the following example.

Our objective is to show the articles of a user as part of a feed which is composed by all the articles made by every user in the following list.
This is not possible with Firestore because we can’t query data using a traditional database query like a complex where with multiple values. We need to extract the feed into a new collection associated to the user. This collection will contain duplicates of every post that we want to display:

Duplication solves the problem in this case. We just need to do a single query over{userId}/feed.
However, with this kind of duplication we might introduce a new set of problems. Let’s imagine that you follow Lady Gaga on Twitter, and you need to copy to your feed the 2 million tweets that she has posted during past years. You would copy information that may never be seen by the user. In a similar manner, any time that Lady Gaga edits a tweet, an update would be required for each follower. This would become a big bottleneck.

There are some cases where normalizing our information and do a few extra queries is not a bad approach. If you find this case in your data model, you can apply some good practices to improve performance:

Avoid copying the 100% of the information. Duplicating the data from the last 1–3 months is enough for these cases.
Evaluate your information when you want to update data from multiples places. For example, update the data of the users that have been active during the last month. The rest can be updated with some Cloud Functions or batch process during the night.
Normalize the database for this case if your data is going to be modified multiple times. Keep the IDs to do individual queries and the timestamp to index it. It’s better to do 100 individualized queries to load the user’s data than update it thousands of times each time a post is edited.
Consider to use other database types if your model become too complex.
Firebase is amazing but is not the always best solution.

You can find an example of the good practices and techniques explained in the article using Firebase Firestore Cloud Functions in my Github account:
https://github.com/FrangSierra/firestore-cloud-functions-typescript

I hope you find this article useful and, as always, any feedback is very welcome.