ProAndroidDev

The latest posts from Android Professionals and Google Developer Experts.

Follow publication

Collections vs Sequences: War of use-cases!

A definite use-case driven guide to Collections vs Sequences in Kotlin.

Photo by Joshua Eckstein on Unsplash

Kotlin has 2 APIs to perform operations on a list or a collection of data. Namely the Collections API and the Sequences API. Both of these have differences in the way they perform operations on the underlying elements.

While there are numerous blogs and articles suggesting their usages and documenting their behavior, there are very few describing it from a use-case perspective. Not to forget the varied information provided by all of them. Thus the end goal of this post would be, to become a definite use-case driven guide of choosing either of these powerful APIs.

TL;DR?

For people more into visuals, here’s link to the presentation on the same given at a local Meetup.

This post is divided as follows:

  1. Similarities
  2. Differences
    a. Operation Behavior
    b. Memory footprint
    c. Length
    d. Collection mutations
    e. States
    f. Inlining
  3. Extra points
  4. Verdict
  5. Links

1. Similarities:

Before we begin discussing which one to use, let’s discuss the similarities that both of these APIs share.

  • Both of these APIs are Kotlin specific. They’re analogous to the Collections and Streams API available from Java 8 and above except for a few differences.
  • They both operation on a collection of elements a list of sorts.
    Note: Collection is different from the Collections API in discussion here. Collections operate on collections (the Iterable interface) :P
    A collection is any class that implements the Collection interface in Java. Like List , Map , Set etc.
  • They both have a very similar operation set, with almost equal number of operations being supported by both of them, subject to fundamental implementation problems, which will be discussed laterin this post.
  • And, last but not the least, they’re both difficult to choose!

2. Differences:

Though both of them have some similarities, their differences are quite many. Make sure you don’t judge any of them midway and wait till the end to select which one of them works the best for you.

a. Operation Behavior

  • Collections are “eagerly” evaluated
  • Sequences are “lazily” evaluated

To understand this better, consider the image below from a Blog by Dave Leeds on “When to use sequences”

Collection and sequence operation chain differences

If we look at the code generating this, it would look something like:

Here, if we carefully look at the chain of operations performed by both the APIs, we can clearly see that Collections perform each operation on all the elements (eager behavior), while Sequences perform all operations on each element (lazy behavior).

Thus for use-cases which take “n” element from the operation chain aka short-circuiting operations:

  • Collections will perform the entire chain of operations and then pluck the “n” elements.
  • Sequences will perform the operations one-by-one sequentially for the “n” elements.

Thus, so far: Collections: 0 :: Sequences: 1. As Sequences suit better for “take n” operations.

b. Memory footprint

In the above snippet, if we read the comments carefully, we see that:

  • Collections will create a fresh new copy of the underlying collection instance after every operation/transformation.
  • Sequences will perform each operation/transformation on the same collection instance.

Thus, sequences would have a lower memory footprint compared to collections as it doesn’t have to create intermediate lists of data. Hence, for use-cases which are very memory conservative, sequences might be really useful* (* subject to the disadvantages discussed in point ‘d’ and ‘f’).

So far we’re at: Collections: 0 :: Sequences: 2

c. Length

  • Collections are finite
    - Collections, since they’re processed eagerly, have an end.
  • Sequence are theoretically infinite or zero
    - Sequences, divide the operations set into two types: intermediate and terminal.
    - Until a terminal operation is called, the operation chain in the sequence is not triggered, thus making sequence have theoretically infinite length or no length at all as it has not begun its processing yet.

Consider the snippet below:

Here, as we can read in the comments as well, the sequence only triggers its chain of operations once a terminal operation, which in this case is the toList operation is called. While the collection already having performed the operation has a finite definite length after the operation itself.

While, this doesn’t have any direct impact for most use-cases, it just has one small added advantage, which is that you can create a sequence and keep on appending operations to it and in the end perform a terminal operation to it, thus providing you with more flexibility.

Thus, Collections: 0 :: Sequences: 3

So far, we’re all good to go with Sequences as a silver bullet for all our use cases, it’s the API we need for any kind of operation or transformation on a collection of data!

Kotlin devs vouching for sequences!

But, as we discussed before, let’s not jump on to a conclusion so soon.

d. Collection Mutations

The collection we’re referring here, is the collections API in Java, which as mentioned before includes List , Map etc and are an implementation of the Collection interface.

  • Collections don’t mutate the underlying collection instance but rather generate a new instance after each operation as seen in ‘b’.
  • Sequences mutate the underlying collection instance as they do not create a new instance after every operation.

To understand how this is an important point to be discussed, let’s dig deep into how collection mutations are handled inside Java, for simplicity let’s just discuss ArrayList mutations.

ArrayList Mutation handling in Java:

  • An ArrayList is implemented using an Object[] in Java.
  • Arrays as data structures are immutable in terms of their length.
  • When adding a new element, if the underlying array is unable to accommodate the new element, a new array is created with 50% more elements than the initial capacity, while the default initial capacity being 10 elements.
  • An array copy operation is then performed from the old array to the new array. This array copy operation is a heavy, time consuming operation (linear time).
  • Thus, if we were to add a lot of elements (let’s say 1000), iteratively. We would be performing a lot of array copy operations, each time increasing 50% of the initial capacity.
  • A useful SO thread

Now, with this new knowledge, let’s re-read the above mentioned points on both of them. Clearly collections outperform here!

Thus, if a use-case involves a large number of elements in the output stage of an operation, sequences might not be a good choice, as the number of array copy operations in it would be huge!

Finally, a point in favor of collections! Making: Collections: 1 :: Sequences: 3

e. States

Before we being this, let’s discuss what are stateful and stateless operations:

  • Stateless operations are the ones where each element is operated on without any context or knowledge of the other elements present in the collection.
    - These operations are sandboxed and don’t care about other elements and their values.
    - The map operation is a good stateless operation.
  • Stateful operations are the ones where each element is operated on with full context or knowledge of all the other elements.
    - These operations cannot do what they do, without having knowledge about all the other elements.
    - The distinct and sort operations are some good examples in this category.

Let’s see how collections and sequences perform for both of these operations:

  • Stateless operations: Both collections and sequences perform equally in this category if we ignore the earlier mentioned points, thus this is not something we actually care for, for the sake of our discussion.
  • Stateful operations: Collections perform much better here. The problem with sequences is that it’s not designed to perform stateful operations, as it fundamentally doesn’t know all the elements it has! It gets to know them sequentially. As a matter of fact, the implementation for stateful operations is more of a hack in sequences, parallel data structures are created to implement this functionality. For ex. a HashSet is created internally for the distinct operation, an ArrayList for the sort operation so on.

Clearly, collections are much better for stateful operations.

Thus for use-cases where stateful operations are required, sequences might not be a good choice.

Thus, Collections: 2 :: Sequences: 3

f. Inlining

Inlining is a very important operation for optimizing the runtime stage for lambdas. The advantages of inlining and how it works is a massive topic and can be covered in a different article for itself. For ref: this could be a good point to start with, I’ve tried covering almost all aspects in the 2 part series about the inline keyword in Kotlin.

The operations for both Collections are Sequences are lambdas. They sometimes even have closure values.

  • Collections inline the operation lambda, they have a very simple implementation.
  • Sequences need to keep the instance of the operations in the lambda as they need to pass the value down the chain, thus they do not inline the operations inside the lambda.

Not inlining the operation lambda is a big performance impact. Collections definitely win here!

For use-cases where a lot of lambda invocations might be predicted, sequences should be avoided.

So, the final scoreboard looks like: Collections: 3 :: Sequences: 3

It’s a tie!

Since we have a tie here, let’s see we can discuss a few points which are not actually differences but can still help us in deciding which one to choose!

3. Extra points

  • Collections API is a bit richer in terms of methods and operations
  • There are some operators which are not available or supported in Sequences, thus again making Collections superior. These are, again, not supported by sequences cause of their operation behavior.
    - Eg, foldRight, takeLast etc.
  • Sequences do not support “reversing” it’s an infra problem :’(
  • Sequences are not parallel streams. Though they’re very similar to Java streams, they cannot support parallel processing which is done in Java at the platform level.

Thus collections do win at a meta level, though which one should we choose?

4. Verdict

As we can see, choosing one is a tough job. To answer simply, there’s no silver bullet. It completely depends on the type of use case we have.

  • For operations which are short-circuiting and not lengthy, sequences are better
  • For stateful ones, collections outperform sequences.
  • For stateless operations having a lot of operators chained, for small lists, sequences are better as they don’t iterate the date multiple times
  • But for stateless operations having a lot of operators chained, for bigger lists, sequences are not better as they’ll have a lot of array copy operations inside, hence you should choose collections.

In the end, it totally depends on your use case and what you wish to do. If there’s ever a confusion on which one to use, revert to Collections as not using Sequences properly, will just create more problems than it will solve. It’s like opening a can of worms :)

Thank you!

5. Links:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in ProAndroidDev

The latest posts from Android Professionals and Google Developer Experts.

Written by Suraj Shah

Principal Engineer @Headout, Founding Engineer @SuperShare

Responses (2)

Write a response