ProAndroidDev

The latest posts from Android Professionals and Google Developer Experts.

Follow publication

Collecting the Garbage: A brief history of GC over Android versions

Augusto Herbel
ProAndroidDev
Published in
10 min readFeb 28, 2020

When I came to the Android world about two years ago, I didn’t know anything about how things were done. I came from a Java world learned in college and I learned about the Garbage Collector and how this piece of software affects the performance of an application, especially mobile devices where the resources are limited.

So I wondered: “How do we overcome this penalty in a device that has very limited resources?”. This was crucial to understand how my application was going to perform and how I can avoid major penalties by taking advantage of the work that the Android Dalvik/ART Team was doing, and also to understand the Android platform better.

So I started my research and this post is the summary of it.

The post will be organized following a historical timeline, from the oldest implementations to the latest implementations/updates.

So, let’s start!

When I was doing my research I noticed two ways to classify the changes that were done to the GC. Those changes are:

  1. Changes to the ways that GC does its work: this means that the Android Team changes the implementation of GC, meaning changes in the algorithms that GC uses to allocate/free memory and how they search for reachable objects and spaces to allocate objects.
  2. The optimizations that were done in those algorithms: these are improvements in the algorithms mentioned before.

It’s worth noticing those two categories because there were major changes in some versions and I think it’s important to recognize when a change is a total reimplementation or just an improvement because that defines historical breakpoints in the platform.

So, there were four “versions” of GC in the Android OS.

  1. Dalvik GC: the first GC implementation. It was a conservative, “stop the world” GC. It stops all the threads in the VM and does its work.
  2. ART GC (Lollipop & Marshmallow): the major and biggest change. The ART/Dalvik Android Team rewrites the entire GC. This GC was called the “Generational GC” because the objects now have “generations” based on the time they live. Several other big improvements were introduced here, including the way that GC allocates objects.
  3. ART GC (Nougat): the ART/Dalvik Android Team rewrites the entire allocation process in assembly code.
  4. ART GC (Oreo): an improvement of the ART GC v1. This time, the ART/Dalvik Android Team improves the way that GC does its work by making it concurrent. This was called the “Concurrent Copying Garbage Collector” among a lot of other improvements.

Dalvik

Dalvik was the runtime for Android until KitKat. In KitKat, ART (Android Runtime, the new runtime that Google was working in) was introduced to the platform alongside Dalvik Runtime (you can enable ART with a flag in Developer Options). The goal of having ART alongside Dalvik was to obtain feedback from developers and partners. In Lollipop, Dalvik was replaced by ART entirely.

Allocation and collection of objects at this time were slow. This is because the GC in Dalvik is single-threaded, meaning that the Dalvik VM pauses all the threads in the VM (all the system threads, the app threads, etc.) and fires the GC to make the collections. So the recommendation was to avoid allocations whenever possible.

The GC in Dalvik uses a Concurrent Mark-And-Sweep algorithm. When using the mark-and-sweep algorithm, unreferenced objects are not reclaimed immediately. Instead, the collection is delayed until all available memory has been exhausted, meaning that short-lived objects remain in memory a lot longer after they become unused.

And it only compacts for apps in the background, leaving a very fragmented heap.

Why should I care about this?

Because this is the cause of heap fragmentation. When the GC frees up an object from the heap, that space remains as a gap between two other allocations, and these gaps tend to be really small, being useless to allocate new objects because the GC needs a contiguous space to allocate a new object. So you end up with a heap with dozens of these tiny little spaces where nothing can fit.

Little gaps left by the GC after collection.

So, when garbage collection can be triggered? There are three situations:

  • When an allocation fails
  • When the size of the heap hits some soft limit
  • When a GC was explicitly requested

When one of these scenarios happens, the GC has to reclaim memory.

How Garbage Collector reclaims memory in Dalvik?

The Dalvik GC has four phases to reclaim memory:

  1. FIND ROOT SETS: The GC pauses all the threads in the system to find the root sets. “Root sets” are typically local variables, threads, static variables, etc (static variables, stack entries, and registry entries technically speaking). In short, all the objects that can be reached in your application and are alive. This takes time, and during this time your application cannot do anything.
  2. MARK REACHABLE OBJECTS (I): In this phase, the GC marks all the root sets found in the previous phase as reachable, letting the unreachable objects be collected later. This phase is concurrent. This means that your app is running again, but the concurrency brings a problem here. Since your app is running again, it can allocate objects.
  3. MARK REACHABLE OBJECTS (II): When a new allocation happens, the GC needs to find all the root sets again, which means pausing all your threads again. This phase is not concurrent so your application starts to have little “pauses” or “stops” that feel like the phone is doing a lot of heavy work.
  4. COLLECT: The GC collects all the objects that have not been marked as live. This phase is concurrent.
Image took from Wikipedia — By M17 — Illustrative only

Do you remember that GC_FOR_ALLOC logcat message in KitKat? Well, that message shows up when the GC fires a garbage collection because of a failed allocation. When this happens, a collection is triggered, and if there is not enough space after the collection, two things can happen:

  • The heap grows (if it’s not at the max heap size already)
  • Or an OutOfMemoryError is thrown (if it’s at the max heap size already)

This typically happens when you try to allocate big objects (e.g. bitmaps).

ART

The ART (Android Runtime) was the replacement for the Dalvik runtime in Lollipop. It came with a lot of improvements, including AOT (Ahead-of-time) compilation, removing entirely the JIT compiler and the interpreter, which was a big win, and of course, it came along with a lot of improvements for the GC.

Lollipop & Marshmallow

The algorithm to allocate and reclaim memory still was the CMS (Concurrent Mark-and-Sweep), but with improvements.

One of the biggest improvements in the way that the GC allocates objects is the replacement of the dlmalloc algorithm for a RosAlloc algorithm. Those algorithms are the ones used when you call malloc in native code or new in, e.g., C++ or Java code. The great thing about RosAlloc is that it is thread-aware, meaning that it is able to do allocations that are specific to a particular thread. In Oreo, this change enables a huge improvement.

The second important improvement is that the GC now groups small allocations in chunks and page-aligned large allocations, making it much more performant than before. The reason why page-aligned allocations are possible is that ART does not have the memory constraints that Dalvik had. Dalvik only allowed a process to grow for a total of 36 Mb in memory (depending on the specific device configuration), but ART has no limit on the amount of memory a process can request.

Another big improvement is that the GC now compacts the heap in the background and “eventually” in the foreground. This is something really good, because it frees up memory even in the foreground if needed, in contrast with Dalvik that would crash if that scenario happened. And also brings the benefit of having a lesser fragmented heap, having to fire fewer garbage collections.

Other improvements in this version are fine-grained locks, making GC blocking much less code than before to do the garbage collection, and making the 1st phase (FIND ROOT SETS) of the mark-and-sweep algorithm concurrent, reducing the time needed to run the mark-and-sweep algorithm from ~10 ms to ~3 ms.

And finally, the “Generational” part of the GC. This was the biggest change in performance. The idea here is that GC keeps track of all the objects that have been allocated since the last major garbage collection. Then, those objects are classified into “generations”, based on the expected life and size of the object. The objects are inserted into the “Young generation” first. When an object stays active long enough, it can be promoted to an older generation, followed by a permanent generation.

Each heap generation has its own dedicated upper limit on the amount of memory that objects there can occupy. When a generation is reaching that limit, the system fires a garbage collection in that generation. The duration of the collection depends on the generation being collected and the number of active objects in there.

This allows the GC to save a lot of time during allocations because it tries to free up space collecting generation by generation and only if necessary. If freeing up the young generation is enough to allocate the object requested, it doesn’t go through the rest of the generations.

Note: if you want to learn more about the Generational GC, this post is a great source:

Nougat

In this version, there was only one big improvement, but not less important than the others. The ART/Dalvik Team rewrite the entire Garbage Collector allocations in Assembly code, allowing the allocations to be 10 times faster than KitKat.

Oreo

Well, here comes a really big improvement on the GC. They rewrote the entire collector. The new collector was named “Concurrent Heap Compaction Collector”.

This new version comes with an always-defragmentation strategy for the heap compaction problem. They made the GC heap compacts always necessary, even in the foreground. This brings a really big improvement in device-wide memory because now the GC can compact more often the heap of the applications that are always in the foreground, like System Services, Play Services, SystemUI, etc. along with the apps in the background too. According to Google, this brings about a 30% savings on overall memory requirements.

The Concurrent Compaction part of the new GC is a great improvement also. In this version, the GC divides the heap into “buckets” (e.g. 256k buckets to allocate objects). When a thread needs memory, the GC gives that thread a “bucket” to allocate the objects locally. So, the thread can locally allocate or collect the objects in that bucket, avoiding locking the entire system to do the work.

It also brings another improvement: when a bucket gets defragmented, if it is less than 70% or 75% utilization, that bucket will get collected and the objects in there will be moved to another bucket, emptying the first one.

So, do you remember that I told you that RosAlloc is important in Oreo? Well, this technique (RosAlloc) merged with the bump-the-pointer technique (introduced in this version) enables something called thread-local bump-allocator. This technique brings an awesome 18 times faster allocation time than Dalvik and 70% faster than Nougat. But, how this is done?

The bump-the-pointer technique for allocating objects is efficient because the only thing that is needed to allocate a new object is to move a pointer in a certain bucket. When a bucket is given to a thread, it looks up if the new object fits in the remaining space between the pointer and the end of the bucket. If yes, the new object is allocated and the pointer is updated with the address of the new object.

But there is also a little drawback with this update… the Generational part of the GC was removed, losing all of that advantages.

Android 10 (Q)

The Generational GC is back again! Along with all the features of the Oreo version, bringing an even faster Garbage Collector than before.

Conclusion

So, as you can see there have been a lot of changes made to the Garbage Collector throughout the Android versions. The introduction of ART and its new design allow the ART/Dalvik Team to do a lot of improvements, and was, for sure, a great change made to the platform itself.

The Garbage Collector has evolved to a more mature and robust GC than the one in Dalvik, allowing for a much more powerful and efficient allocation and a more fine-grained collection.

It also gets improved the way it locks, locking only the threads that need collecting and not the entire VM.

In Oreo, it brings the power of compaction even to the foreground, enabling the compacting of the heap of the processes that seldom enter in the background saving a lot of memory in the entire device.

Bibliography

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in ProAndroidDev

The latest posts from Android Professionals and Google Developer Experts.

Responses (2)

Write a response