Until a few months ago while developing android application I never thought much about data structures. This is partly because there are many more problems to solve while developing for android as compared to competitive programming where only running time matters. And partly because I started android development before coming to college, learning data structures and algorithms. Many of the habits and thought processes I developed during the initial days have stuck. I would dismiss data structures thinking that it would be trading code clarity for a little bit of efficiency and since we generally don’t have to handle a lot of data in apps, it would make sense to stick to the very basic data structures and keep code maintainable. But this opinion was wrong.
The reasons why now I think choosing the right data structure is important:
- Every data structure puts constrains on how you can access your data.
- Choosing a more constrained data structure makes it more obvious how you are going to access the data. Reduces the chances of astonishing your fellow programmers. Resulting in cleaner, more maintainable code.
- Programming abstractions (interfaces, abstract classes, inheritance) put constrains on what you can and cannot do. All programming constrains minimize the chances of you shooting your own foot.
- Choosing the right data structure leads you to the right algorithm so that also improves efficiency as a side effect.
- And lastly it’s scary how many resort to
ArrayListno matter the task. I was guilty of this too.
Array rather than
ArrayList when the number of elements is fixed. The most important reason to do this is that this changes your fixed length constrain from words to code. The
 syntax clearly speaks out that this is a normal array with a fixed length to your fellow developers. Another reason which is a bit subjective is that
 syntax is more compact than
ArrayList itself uses
Array to actually store the data. Then how can it magically have changing lengths? The documentation says
As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
Constant amortized time is just a fancy way of saying that it is constant in most of the cases. If you look at the source code, it turns out
ArrayList starts with default length 10 and increases by 50% every time it reaches full capacity. So if you declare a new
ArrayList and proceed to fill 100 elements. It would actually create arrays of length 10, 15, 22, 33, 50, 75, 113 !! And every time the capacity is exceeded it would copy over all the elements in the last array to the new one.
Inserting and removing an element are also costly operations. Say you remove element 10 from an
ArrayList of length 100. All the elements from position 11 onward would have to be shifted left! Similarly inserting would shift all elements to the right.
LinkedList, it does not need to move all the elements of the list to insert or remove an element in between. Adding an element is always constant time as it actually has dynamic capacity. Other operations such as
indexOf have same cost in both list implementations. And
sort costs the same for all
Collections because the sort method first dumps everything into an array. But
LinkedList comes with an added constrain of not being able to access elements randomly. You can do
get(index) but it would be a costly operation. This is the only place where
ArrayList is better than a
LinkedList. And I believe
ArrayList should only be preferred over a
LinkedList if random access is a necessity. More often than not I find that my code is using this feature due to my own laziness. Not because it was the best approach. To counter this
ArrayList addiction start with initializing all your lists as
LinkedList and only switch to an
ArrayList if you can’t live without random access. If you are going to fetch a list of posts from a server and then iterate over them, using a
LinkedList would make your code more elegant. Since instead of
list.get(i) you would write
list.previous() which is much more easier to understand that seeing a
list.get(i) and then trying to figure out how
i is changing. Using an
ArrayList is justified for a music app where the user might hit the random button and so
i has to be randomly changed.
The second most used data structure in android dev is
HashMap. Like the
ArrayList argument. Use a
EnumMap over a
HashMap if the keys are known before hand. They provide the similar advantages in code maintainability by being more constraint. And as a side effect, your app would be a little faster.
Iterating over a
HashMap provides no guaranty of the order in which elements will be traversed. Say you are building a shopping cart and store the orders in a map of product -> quantity. The user now wants to see all the items in his cart. Since a
HashMap is being used, the card would show all the items but the order would be random. Instead using a
LinkedHashMap is much better here, since you can iterate over it like a normal
LinkedList. Using a
LinkedHashMap here to show the products in the order they were added would be nice.
HashMap variants are
WeakHashMap stores only a weak reference to its keys which can help prevent memory leak. For example, you are storing
View references in a
HashMap, which is being shared between
Activity and background
Service. You would want the
View references to be garbage collected when the
Activity finishes. If you use a normal
HashMap in this case it would hold a strong reference causing memory leak.
TreeMap is useful when you want entries to be sorted by key.
Infrequently used Data Structures
If you know that a list is only going to have unique elements use a
HashSet instead of a
As in the case of
HashMap we also have
HashSet. For example if you have a list of configuration keys you can represent the “selected” keys use an
EnumSet. If order is important, similar to
LinkedHashMap, here we have
LinkedHashSet which preserves insertion order and
TreeSet which keeps the elements sorted.
Your data would hardly ever be of the form of a
Tree but one very frequent use case is that of nested comments. I naively implemented nested comments using an
ArrayList once. As a result all operations on the data (adding, removing, editing a comment) because incredibly complex. For example, removing a comment would be simply removing the comment node from the tree data structure, but since it was stored in a
ArrayList I had to iterate forward from the index of the comment to remove all its children too. The work that should have been abstracted away by the data structure was now being handled when the data structure was used. Unfortunately java does not have a general tree implementation.
Stacks and Queues
To be honest I have never used a
Stack during android development. However they are used in Android itself (for example the activity back stack). I did
Queue use, once in a background
IntentService, to buffer work requests.
Queue have very rare use during android development. But there are still important to know just in case the use ever comes up. Have you every used these during development ?
Concurrency is an important issue too while choosing the data structure which is very much ignored. When the same data structure is modified and used in two different threads they should be synchronized. Concurrency issues rarely happen but when they do its very difficult to point it out since they are hard to reproduce. I have mostly encountered multi-threading using
Service or using
rxjava. As a thumb rule, when using these classes simply check if you are modifying a data structure from any thread aside from the
UI thread. You need to use a concurrency compatible version of your data structure, even if you are not modifying the data structure but modifying the elements in it.
Inheriting from these data structures
Finally if you have some special constrains on the data, its better to have a separate class to encapsulate it by extending from the in built data structures. Say for example you want to have an
File objects but you constrain it to only store
Files that actually exist. You could check every time before adding a file to the list. Or you could have a new class
ExistingFileList which extends from
ArrayList and wraps the add function.
For more in depth analysis of data structures and algorithms I recommend Prof. Clifford A. Shaffer’s Algorithm book. I like this book because it does not ignore other aspects of software development like most resources of the data structures and algorithm resources on the internet.
If you think I missed something important or I was wrong somewhere, please let me know through the comments :)