Data Structures and Algorithms Week10: Comparing and making informed decisions
Comparing and making informed decisions
Content
- Recap of abstract data types introduced
- Comparison of implementations
- The Java Collections API: The good, the bad and the ugly
- Determining the best abstract data type and implementation for a task
Disclaimers
- This session may be heavy in places but will give you some useful insights
- Points regarding memory are not absolute due to JVMs (Java Virtual Machines) handling memory differently. Other programming languages may also behave differently.
A note on Typesetting
- Any text in
monospacedfont explicitly denotes either a Java class / interface (e.g.ArrayList/List) or source code. - There are a number of times these words are used in a more general sense and are not specific to Java.
Abstract Data Types
- List
- Set
- Map / Dictionary
- Graph
- Queue
- Priority Queue
- Double-ended queue
- Stack
Some other Abstract Data Types
- Multiset / Bag
Same as a Set, but multiple equal values allowed - Multimap
Same as a Map, but a key can have multiple values - Double-ended priority queue
Double-ended queue, based on a priority
Which data structure is suitable
- Task:
Check a user’s login session is active - Information available:
A user’s session ID and a collection of active session IDs
Question:
What data structure is best suited to the task?
- Task:
Retrieve a user’s details by their username - Information available:
A user’s username
Question:
What data structure is best suited to the task?
List - Expected Behaviour
- Duplicates allowed: Yes
- Order: Determined by the insertion point
- Actions:
- append / prepend to the list
- retrieve from a list
- empty check
- Use cases:
- Many, when we want a collection of objects with duplicates and dynamic sizing
- e.g. Storing adjacent nodes in a graph
Set - Expected Behaviour
- Duplicate allowed: No
- Order: Depends on implementation. In a standard set, order is not guaranteed.
- Actions:
- add value to set
- contains value
- check if empty
- Example use cases:
- Retaining visited objects in a depth/breadth-first search of a graph
- Any task where we want a collection but no duplicates
Map - Expected Behaviour
-
Duplicate keys allowed: No
-
Duplicate values allowed: Yes
-
Order: Depends on implementation
-
Actions:
- add mapping (key → value)
- retrieve value by key
- empty check
-
Example use case:
- Lookup tables (mostly)

Graph - Expected Behaviour
- Duplicate allowed: Each ‘vertex’ is unique
- Order: Depends on the traversal algorithm
- Actions:
- add / remove vertices
- add / remove edges
- get vertex value
- update vertex value
- get vertex neighbours
- adjacency between vertices
- Example use cases:
- Social network friendship connections
Queue - Expected Behaviour
- Duplicates allowed: Yes
- Order: First In First Out (FIFO)
- Actions:
- add to the back of the queue (enqueue)
- remove from the front of the queue (dequeue)
- peek at the front of the queue
- check if empty
- Example use cases:
- Process transactions
- Breadth-first search of a graph
Priority Queue - Expected Behaviour
- Duplicates allowed: Yes
- Order: Best In First Out (BIFO)
- Actions:
- add to the queue with priority
- remove highest-priority from the queue
- peek at the highest-priority element in the queue
- check if empty
- Example use cases:
- Process transactions based on priority
- Pathfinding (Dijkstra, A*)
Stack - Expected Behaviour
- Duplicates allowed: Yes
- Order: Last In First out (LIFO)
- Actions:
- add to the top of the stack (push)
- remove from the top of the stack (pop)
- peek at the top of the stack
- check if empty
- Example use cases:
- Undo/redo functionality/Depth-first search of a graph
- Iterative in-order traversal of a binary search tree
List implementations
- There are two common list implementations:
- Array based (
ArrayListandVector) - Linked list (
LinkedList)
- Array based (
- Which is best?
- Array-based implementations offer O(1) retrieval for access of any element in the list. However, the array must be resized when full O(n).
- Linked lists offer O(1) time complexity for appending/prepending and retrieval for the head and tail and resizing is not needed.
List implementations Time complexity big-O
| Operation | Array list | Linked list |
|---|---|---|
| append | O(1) / O(n) if resize needed | O(1) |
| prepend | O(n) | O(1) |
| get first | O(1) | O(1) |
| get last | O(1) | O(1) |
| get by index | O(1) | O(n) |
| remove first | O(n) | O(1) |
| remove last | O(1) | O(1) |
| remove by index | O(n) | O(n) |
| size | O(1) | O(1) |
Arrays and Memory Random Access
- Arrays are random access(direct access) structures
- This makes arrays O(1) for accessing any element by its index
- How?
- An array is allocated:
size × data type size memory - We know the start location of the array
- So, jumping to the correct memory location is determined by:
start location + (index × data type size) - This is constant O(1), i.e., not depends on n
- An array is allocated:
Arrays and Memory
- Arrays are usually a contiguous block of memory for storing values (references in the case of reference types)
- A lot of space can be wasted, due to needing free space to allow the insertion of new elements
- Object references in Java require 32 bits (for 32-bit systems) or 64 bits (for 64-bit systems).
- The following array:
Object[] array = new Object[1000];
Requires 64 ×1,000 bits of contiguous memory allocation = 640,000 bits (80,000 bytes) even if the list is empty
Not drastic, but still a waste of memory (1,000 is a very small list in terms of real systems)
Arrays and Memory Crude Example
Consider the following:
public class Obj { private double a, b;}public class Main { public static void main(String[] args) { Obj[] arr = new Obj[20]; for (int i = 0; i < 4; i++) { arr[i] = new Obj(); } }}- Each block in this example is 8 bytes (64 bits)
- Grey → Unused array space (but space still allocated)
- Green → Used
- White → Unused

Objects require 8 bytes ‘housekeeping’ then space for references/ primitives (in this case object Obj has two double primitives, each 8 bytes)
Arrays and Memory Pros and Cons
- Although space is ‘wasted’, arrays benefit from modern CPU caching and pre-fetching during iteration as the memory is in one contiguous block
- The primary issue with arrays is finding contiguous memory sufficient for storing the array.
Imagine a list intended to cater for 1 billion objects = 8 bytes * 1 billion = 8 GB of contiguous memory and this is just to store the object references!
- Consider that the above array was resized when attempting to add the ½ billion to 1 item (original array size ½ billion). Until garbage collection occurs, 4GB for the original array + 8GB for the new array is required
- Moral: big data requires lots of memory and arrays ‘struggle’ with a large amount of data due to their contiguous data and the need to resize in order not to waste memory
- The max size of an array-based list is
Integer.MAX_VALUE→ 2,147,483,647 - Some of the problems above could be worked around by linking multiple arrays, but this introduces overheads
List iteration
public static void displayList(List<Object> list) { for (int i = 0; i < list.size(); i++){ System.out.println(list.get(i)); }}List iterationList<Object> arrayList = new ArrayList<>();List<Object> linkedList = new LinkedList<>();…displayList(arrayList); // Time complexity fordisplayList(linkedList); // these two calls?Array list iterator implementation
class ArrayListIterator<E> implements Iterator<E> { int index = 0; @Override public boolean hasNext() { return index < array.length; } @Override public E next() { return array[index++]; }};Linked list iterator implementation
class Node<E> { Node<E> next; E data;}class LinkedListIterator<E> implements Iterator<E> { Node<E> current = head; @Override public boolean hasNext() { return current != null; } @Override public E next() { E result = current.data; current = current.next; return result; }};Enhanced for loop Syntactical sugar
for (Object obj : iterable) { System.out.println(obj);}The enhanced for loop above is just syntactical sugar for:
for (Iterator<Object> iter = iterable.iterator(); iter.hasNext(); /** incrementer not needed **/) { System.out.println(iter.next());}List iteration - Take Two
public static void displayList(List<Object> list) { for (Object obj : list) { System.out.println(obj); }}List<Object> arrayList = new ArrayList<>();List<Object> linkedList = new LinkedList<>();…displayList(arrayList); // Time complexity fordisplayList(linkedList); // these two calls?Sets
Question:
How do you retrieve a specific element from a set?
Answer: That’s a silly question.
The only way you could retrieve a specific element from a set is by having the element to begin with.
Sets - What are they good for?
- Checking the presence of an element within a set, e.g. whether a vertex has already been visited in a depth-first / breadth-first search of a Graph
Faster than a list, O(1) hash set or O(log n) tree set - Avoiding duplicates
- Checking whether a subset exists in another set
- Obtaining intersections/differences (e.g. Venn diagram)
- Efficiently maintaining non-index order (sorted sets such as tree set [i.e. binary search tree])
Set implementations Time complexity big-O
| Operation | Hash set | Tree set |
|---|---|---|
| add | O(1) / O(n) if resize needed | O(log n) |
| remove | O(1) | O(log n) |
| contains | O(1) | O(log n) |
| size | O(1) | O(1) |
- Hash set wins hands-down for the basic operations, but the order is chaotic.
- Whenever an order is required, a tree set should be used
Tree Sets
- You have been taught that tree set implementations (i.e. binary search trees) offer O(log n) time complexity
- On average this is the case for a self-balancing binary search tree, but the worst-case time complexity is actually O(h), where h is the height of the tree.
- A balanced tree with 1024 to 2047 (inclusive) items will have a height of 10.
Hence, O(log n) as log n == h, if balanced
- If you are struggling with logarithms, think about it this way:
level 0 = 1 node (root)
level 1 = 2 * previous level nodes = 2 nodes
level 2 = 2 * previous level nodes = 4 nodes
level 3 = 2 * previous level nodes = 8 nodes
level 4 = 2 * previous level nodes = 16 nodes
……………
level 9 = 2 * previous level nodes = 512 nodes
level 10 = 2 * previous level nodes = 1024 nodes
[height == 10]
1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024 = 2,047

If a balanced tree were to contain double the number of elements (i.e. 2 × 2047 = 4094 elements), the height of the tree would only increase by 1.
Question:
For a balanced binary search tree containing 1 billion elements, at most how many iterations would be required (on average) to find any element in the tree?
Answer:
iterations (floored result above)
Comparison to a list:
O(n) = 1,000,000,000 iterations
- ~ 34.483 million times more than the balanced binary search tree
Digression: Calculate logarithms
int log2Est(int n) { return n > 1 ? (1 + log2Est(n/2)) : 0;}or
int log2Est(int n) { assert n > 0; int y = 0; while (n > 1) {n = n/2; y++;} return y;}the number of times you can divide it by 2 before you go below 1
Map implementations Time complexity big-O
| Operation | Hash map | Tree map |
|---|---|---|
| put | O(1) / O(n) if resize needed | O(log n) |
| get | O(1) | O(log n) |
| remove | O(1) | O(log n) |
| contains | O(1) | O(log n) |
| size | O(1) | O(1) |
- In reality, a Map is just a Set of keys that map to a Collection of Objects.
- In the Java API, Maps are used to implement the Sets…
JAVA SET Implementation, It’s just a MAP…
e.g. Inside HashSet<E>:
private transient HashMap<E,Object> map;// Dummy value to associate with an Object in the// backing Mapprivate static final Object PRESENT = new Object();public boolean add(E e) { return map.put(e, PRESENT)==null;}Java Collections API
- The Java Collections API provides the following implementations of common abstract data types (some omitted)
- List →
ArrayList, LinkedListandVector - Set →
HashSet, LinkedHashSetandTreeSet - Map →
HashMap, LinkedHashMapandTreeMap - Graph → You wish
- Queue →
LinkedList, ArrayDeque - Priority queue →
PriorityQueue - Double-ended queue →
LinkedList, ArrayDeque - Stack →
Stack, ArrayDeque, LinkedList
- List →

Java Collections API Issues
- Design decisions and changes to the Java language have resulted in some notable issues with the Collections API:
- No Stack interface. There is a class
Stack, which extendsVector, an old ‘thread safe’ list implementation, which introduces overhead. HashSetusesHashMapinternally. This creates extra bloat, since although there is just a value, a key value pair is stored (2 object references)TreeSetusesTreeMapinternally. Less efficient, as per theHashSetproblem above- Legacy ‘
Object’ methods, where generics should be used. Introduces type safety issues.
- No Stack interface. There is a class
- As a developer you should be aware that the Collections API is designed to be flexible. This has efficiency and performance consequences.
- Use of generic types or
Objectmeans no direct storing of primitive types. - Consider an
ArrayListfor storing integers (the problem applies to all of the Collections API implementations…)ArrayList’s underlying array is:Object[] elementData;On a 64-bit system, this means each element requires 8 bytes.- Additionally the wrapper class
Integerincurs the overhead of being a reference type (16 bytes). - A primitive
intonly requires 4 bytes.
| Comparison | |
|---|---|
| Object[] | int[] |
| 8 bytes × array length + 16 bytes per Integer wrapper |
4 bytes × array length, stored directly in the array |
|
100 elements in a full array = 800 bytes + 1,600 bytes Total = 2,400 bytes |
100 elements in a full array Total = 400 bytes |
As a result, a full ArrayList of Integer elements uses 6 times more memory than an efficient implementation designed explicitly for storing int primitives.(This is ignoring the additional overhead of the static attributes in the Integer class and auto-boxing / unboxing).
Java Collections API Conclusion
- The Collections API is great if you want to a straight-out-of-the-box implementation of your favorite abstract data type
- However when time and space complexity is of utmost importance, you should consider a custom implementation
- For the most part you will be fine using the Collections API and will benefit from the range of utilities available and interoperability with other libraries
Which data structure is suitable
- Task:
Check a user’s login session is active - Information available:
A user’s session ID and a collection of active session IDs
Question:
What data structure is best suited to the task?
Example List implementation:
boolean isSessionActive(List<String> active, String user) { boolean found = false; Iterator<String> iter = active.iterator(); while (!found && iter.hasNext()) { found = iter.next().equals(user); } return found;}Example Set implementation:
boolean isSessionActive(Set<String> active, String user) { return active.contains(user);}- Conclusion:
- Both are equally functional, but a
Listwould offer O(n) regardless of implementation, whereas a hash-table basedSetimplementation would offer O(1). - Note: A
SortedSetimplementation such asTreeSetwould offer O(log n).
- Both are equally functional, but a
- Task:
Retrieve a user’s details by their username - Information available:
A user’s username
Question:
What data structure is best suited to the task?
- List?
- Maybe, but would require an O(n) search of the list and check that the User object’s username matched what we were looking for (also assumes the username is stored in this object)
- Set?
- No. The Set would contain the User objects, so to get it we would need to have the User object to begin with…
- Queue? Stack?
- No. Sure if the interface allows us to do a linear search, but we’ve established this is O(n)
- Map?
- A mapping of the username → User object would work a treat!
- If we don’t care about order (or rarely care) then a Hash map implementation is the way to go O(1)
- Sometimes the right data structure isn’t immediately apparently.
- We should first program to an interface (e.g. Set / List) and then change the implementation to suit the use the case.
- For example:
- If I want to store a key value pair, a Map is the obvious choice. It’s specifically designed for this use. Then
- But should I use a Tree map or a Hash map?
- In most cases the answer will be a Hash map. It’s quick with O(1) time complexity on most operations.
- If however, the contents of the Map rarely change and we frequently want to display the keys in order, then the Tree map is a better choice.
- Hash maps also waste memory, since the underlying implementation is an array. Java’s
HashMap(using default load factor 0.75) always wastes at least 0.25 * table size * reference size(8 bytes) bytes
The implications of a poor decision
Consider the algorithm for a breadth-first traversal of a graph:
class Node { List<Node> adjacent; String value;}Consider the algorithm for a breadth-first traversal of a graph:
void bft(Node start) { List<Node> nodes = new ArrayList<>(); List<Node> visited = new ArrayList<>(); nodes.add(start); while (!nodes.isEmpty()) { Node current = nodes.remove(0); if (!visited.contains(current)) System.out.println(current.value); visited.add(current); for (Node n : current.adjacent) { if (!visited.contains(n)) nodes.add(n); } }}void bft(Node start) { List<Node> nodes = new ArrayList<>(); List<Node> visited = new ArrayList<>(); nodes.add(start); while (!nodes.isEmpty()) { // isEmpty O(1) Node current = nodes.remove(0); // remove O(n) if (!visited.contains(current)) // contains O(n) System.out.println(current.value); visited.add(current); // add O(1) for (Node n : current.adjacent) { if (!visited.contains(n)) // contains O(n) nodes.add(n); // add O(1) } }}The implications of a better decision
void bft(Node start) { Queue<Node> nodes = new LinkedList<>(); Set<Node> visited = new HashSet<>(); nodes.add(start); while (!nodes.isEmpty()) { // isEmpty O(1) Node current = nodes.remove(0); // remove O(1) if (!visited.contains(current)) // contains O(n) System.out.println(current.value); visited.add(current); // add O(1) for (Node n : current.adjacent) { if (!visited.contains(n)) // contains O(1) nodes.add(n); // add O(1) } }}支持与分享
如果这篇文章对你有帮助,欢迎分享给更多人或赞助支持!