Principles & Practice in Repository Layer

Data-mapping, Cache, Concurrency & Flow

Chen Zhang

Published in

ProAndroidDev

5 min readDec 11, 2022

Introduction

All modern Android apps adopt some architectural variations, MVVM or MVI being the most common ones. Regardless which architecture, they all define a layer with the Repository pattern. I recently dived into some refactoring in the repository layer and noticed a few gotchas. I started this as my own memo. But realized that some notes & code may be helpful for others to avoid pitfalls.

Principles of Repository Layer

Why do we need any application architecture? Because it helps us manage the complexity by breaking it down into modules with distinct & cohesive responsibilities. The Repository layer is no exception. Here are the few distinct responsibilities in this layer:

Data Mapping

A repository mediates between domain and data-source. It maps data into domain models, so that the Domain layer only need to deal with the domain models for business logic. Thanks to the repositories, all network data models should be concealed from Domain layer.

Take GraphQL as example, the network data is generated by GQL schema, which is defined by backend services. We don’t want those externally defined network-models to leak into any layers beyond repositories. So Repositories deal with the data mapping and hide the underlying network mechanism from Domain layer.

Cache

If data cache is used, repositories should encapsulate the underlying caching mechanism. Domain layer should be agnostic about them.

Concurrency

Caches could work under the concurrent environment. Ex. When app cold starts, multiple call-sites could require the same essential piece of user data for respective functionality. Modern Android apps use coroutines, which achieve asynchrony by suspending and resuming work via the implicit Continuations. Older Android apps could use parallel processing with multi-threads. But either way, the Repository layer with cache should be concurrency-safe. That is especially important if our code manages own caches.

Single Source-of-truth

Since Repository layer takes on responsibilities and encapsulates data-mapping, caching & concurrency-safety, it should become the single source-of-truth for the corresponding domain-models in the entire application. Watch out for other sources in the app for the same domain-models or sub-models. They are probably the false sources and should be deprecated. It is especially important when concurrency is involved and the unsafe sources could cause bugs under race conditions. Ex, a common scenario is when app starts, a few entry points of the application could be looking for the same user status, from Activities, Application object, Lifecycle events, etc. In that case, it is important to route the call paths to the single source of truth.

Sample Code for Concurrency-safe Cache

A common use case for Repository pattern is to support cache. If we use libraries to manage their own underlying caches, they should be the concurrency-safe but ensure to check their documentation. But If our code manages own cache, in order to be concurrency-safe, we want to ensure the accesses(reads & writes) to the cache are synchronized. Here is a snippet to illustrate a simple in-memory cache in concurrent environment with a few callouts.

Mutex is an idiom in Kotlin coroutines for mutually exclusive lock
When cache is available(assuming non-null means available), we return cache immediately
When cache is unavailable, use mutex to allow only the 1st coroutine to execute within withLock{} lambda. All concurrent coroutines are waiting for the access to mutex lock.
The first coroutine would perform network operation in I/O-thread, map network to domain model, save it to cache and return
After the first coroutine finished its execution in withLock{} lambda, the next coroutine that was waiting for the lock before can enter withLock{} now. They should always check if cache is already available as the result of the 1st coroutine. This is important, because those await coroutines already got passed the 1st cache check and waiting at the mutex lock entry. When they enter withLock{}, if they don’t check cache again, they would repeat executing the network query again to cause inefficiency and possible errors. That is why we want to check cachedAccount for the 2nd time. It is for a different purpose in concurrency environment.
Noticed that for both cache checks, if cache is available, no switching to I/O dispatcher is needed. We return cache immediate in the same thread.

The callouts above are implemented in order to achieve safety & efficiency. Hope they make sense. The principles of the implementation should be applicable to other caching, concurrency or networking models.

StateFlow from Repository

I also want to mention a few words about Kotlin’s StateFlow as it has become more popular nowadays thanks to a common use case in Android to implement the state-holder observable pattern as a substitute to the traditional Android LiveData.

StateFlow can not only be used in ViewModel to expose observables to the UI layer, it can also be applied to Repository layer. For example, if a user logged out, we want to observe the the login-state and clear all caches for that user. In that case, we can expose a flow for the login state, so when user logs in or out, the StateFlow emits a new state. Here is an over-simplified snippet:

There are few attributes about StateFlow that we want to be aware of, in order to apply it in the right situation and use it in the right way:

StateFlow is like other Flow APIs, the producer side(flow builder) is not suspending and does not require any coroutine scope. But the subscriber (consumer) side with the terminal operations (like collect{}), have to be in a coroutine scope.
StateFlow can be subscribed by multiple consumers. So StateFlow is “hot”, which means the producer side are emitting items proactively. This is different than many other Flow APIs, which are “cold”, meaning they are lazy and do not start emitting until the consumer subscribed to it with terminal operations.
StateFlow always replays the last item. That means, when a consumer subscribes to it, the most recent item will be received first, then will the subsequent emissions.

Final Thought

This post captured the main principles of repositories. For the sake of brevity, I omitted some nuances such as delegated functions, nullable returns and exception handling.

Hope those notes & tips around repository layer could help others solve common problems and avoid pitfalls.