Linked Lists
The linked list data structure explained from first principles — nodes, pointers, O(1) insert at a known position, and why arrays still win most of the time.
The mental model: a treasure hunt, not a row of lockers
The array mental model is a row of numbered lockers — every locker is the same size, so you compute an address and jump straight there. A linked list is more like a treasure hunt. The first clue tells you where to find the second clue. The second clue tells you where to find the third. To get to clue 50, you have to read clues 1 through 49 in order.
That sounds obviously worse, and for random access it is. But here is what the treasure hunt model buys you: rearranging the hunt is trivially cheap. Want to insert a new clue between clue 3 and clue 4? Just update clue 3 to point at the new one, and have the new one point at the old clue 4. Two writes, done. You didn't move anything else.
flowchart LR
H["head"] --> A["4 | •"]
A --> B["8 | •"]
B --> C["15 | •"]
C --> D["16 | •"]
D --> E["23 | null"]
style H fill:#22d3ee,color:#0a0a0f
style C fill:#7c5cff,color:#fff
Each node is a small object: a value, and a pointer (or two, for doubly linked lists). In Python that looks like:
class Node:
def __init__(self, val):
self.val = val
self.next = None # singly linked
And a list is just a reference to the head node plus, optionally, a tail pointer so you can append in O(1):
class LinkedList:
def __init__(self):
self.head = None
self.tail = None
self.size = 0
That is the whole data structure. No backing array, no capacity, no pre-allocation.
Under the hood: what a pointer rewire actually looks like
Let's make this concrete. Say you have a list 4 → 8 → 15 → 16 → 23 and you want to insert 42 right after the 8 node.
flowchart LR
A["4 | •"] --> B["8 | •"]
B --> C["15 | •"]
C --> D["16 | •"]
D --> E["23 | null"]
N["42 | ?"]
style N fill:#00ff9d,color:#0a0a0f
Step 1: Point the new node's next at node 15.
Step 2: Point node 8's next at the new node.
flowchart LR
A["4 | •"] --> B["8 | •"]
B --> N["42 | •"]
N --> C["15 | •"]
C --> D["16 | •"]
D --> E["23 | null"]
style N fill:#00ff9d,color:#0a0a0f
Two pointer writes. That is it. This is O(1) as long as you already have a reference to node 8. If you have to search for node 8 first, finding it is O(n) — but the insertion itself, once you're there, is O(1). The distinction matters.
Compare this with an array: inserting 42 between index 1 and index 2 requires shifting 15, 16, and 23 one slot to the right. With 10 million elements after the insertion point, that is 10 million moves.
{ "type": "linked-list", "title": "insert at head — two pointer rewires", "data": [8, 15, 16, 23] }
Hit the insert-at-head operation. Every element stays exactly where it is in memory. The only change is two pointers. Now imagine doing that 10 million times in a hot loop — the linked list pays a constant per insert; the array accumulates shifting debt.
Singly vs. doubly linked
A singly linked list gives each node one pointer: next. It is simple and lean.
A doubly linked list gives each node two: next and prev. You can walk backward, and more importantly, deleting a node you already have a reference to is truly O(1) — no need to find the predecessor first.
flowchart LR
A["4"] <-->|next/prev| B["8"] <-->|next/prev| C["15"] <-->|next/prev| D["16"] <-->|next/prev| E["23"]
style A fill:#7c5cff,color:#fff
style E fill:#22d3ee,color:#0a0a0f
With a singly linked list, deleting node C requires knowing node B so you can point B.next at C.next. If all you have is a pointer to C and no prev, you have to walk from the head to find B — that is O(n). Doubly linked lists eliminate that walk.
The cost: one extra pointer per node. On a 64-bit system that is 8 bytes per node. For a million-node list that is 8 MB. Usually not a problem, but worth knowing.
Use doubly linked when:
- You need to delete arbitrary nodes efficiently (LRU cache is the classic case).
- You need backward traversal.
- You're implementing a deque (double-ended queue).
Stick with singly linked when:
- You only ever insert/delete at the head or tail.
- Memory is very tight.
- You're doing stack/queue work where you never need the predecessor.
Python's collections.deque is a doubly linked list under the hood, and it is why appendleft and popleft are O(1) while the same operations on a list are O(n).
The two canonical problems
Every linked list interview question is some variation of one of two problems. Know these cold.
1. Reverse a linked list
Given the head of a singly linked list, return the head of its reversal. Do it in O(n) time, O(1) space.
The trick is three pointers. Before you cut a node's next connection to walk forward, save where you were going.
def reverse_list(head):
prev = None
curr = head
while curr:
next_node = curr.next # 1. save the forward link
curr.next = prev # 2. reverse the arrow
prev = curr # 3. advance prev
curr = next_node # 4. advance curr
return prev # prev is now the new head
Trace through 4 → 8 → 15 manually. After iteration 1: null ← 4, curr is at 8. After iteration 2: null ← 4 ← 8, curr is at 15. After iteration 3: null ← 4 ← 8 ← 15, curr is null, loop ends, prev is 15. Return 15 as the new head.
The thing people trip over: they forget to save curr.next before overwriting curr.next. Do step 1 before step 2, always.
2. Detect a cycle
Does this linked list loop back on itself? If you naively walk it and it has a cycle, you loop forever.
Floyd's algorithm — also called fast and slow pointers, or the tortoise and hare — handles this in O(n) time and O(1) space. No hash set of visited nodes needed.
def has_cycle(head):
slow = head
fast = head
while fast and fast.next:
slow = slow.next # one step
fast = fast.next.next # two steps
if slow is fast:
return True
return False # fast hit null — no cycle
Why does this work? If there is no cycle, fast reaches the end. If there is a cycle, fast enters it first and starts lapping slow. Because fast is always one step ahead of slow each time they're in the cycle, the gap between them closes by exactly one per iteration — they are guaranteed to meet.
This is a special case of the fast and slow pointers technique, which solves a whole family of problems: finding the middle of a list, finding the start of a cycle, detecting duplicates in a constrained integer array. Worth learning the pattern rather than just memorizing this one problem.
Building the core operations
Here is a clean, self-contained singly linked list with the operations you actually need:
class Node:
def __init__(self, val):
self.val = val
self.next = None
class LinkedList:
def __init__(self):
self.head = None
self.tail = None
self._size = 0
def prepend(self, val):
"""Insert at head — O(1)."""
node = Node(val)
node.next = self.head
self.head = node
if self.tail is None: # first element
self.tail = node
self._size += 1
def append(self, val):
"""Insert at tail — O(1) with a tail pointer."""
node = Node(val)
if self.tail:
self.tail.next = node
else:
self.head = node # empty list
self.tail = node
self._size += 1
def delete_head(self):
"""Remove head — O(1)."""
if not self.head:
return None
val = self.head.val
self.head = self.head.next
if self.head is None:
self.tail = None
self._size -= 1
return val
def find(self, val):
"""Walk until found — O(n)."""
curr = self.head
while curr:
if curr.val == val:
return curr
curr = curr.next
return None
def to_list(self):
"""Snapshot as a Python list — O(n). Useful for debugging."""
result, curr = [], self.head
while curr:
result.append(curr.val)
curr = curr.next
return result
A few things worth noting here. First, the tail pointer is not optional if you care about O(1) appends. Without it, every append has to walk to the end — O(n). Second, delete_head has to handle the edge case where the list becomes empty, otherwise self.tail will point at a dangling node. That specific bug — tail not cleared on last deletion — shows up in real code more than you'd expect.
The cache miss problem nobody mentions in textbooks
Here is what the Big-O table doesn't tell you. On modern hardware, accessing a random memory address takes roughly 100 ns when it's a cache miss (RAM), versus ~1 ns for something in L1 cache. An array scan is blazing fast not just because it's O(n) but because each element is right next to the last one — the CPU prefetcher loads the next cache line before you even ask for it.
A linked list destroys that. Each node pointer points somewhere unpredictable in the heap. For a list of 1 million nodes, virtually every curr = curr.next is a cache miss. In practice, scanning a linked list can be 10–50x slower than scanning an array of the same length, even though both are technically O(n).
This matters when you're deciding between the two. The theoretical O(1) insert advantage often evaporates in benchmarks because the constant factors are terrible. Jeff Bonwick (who designed the slab allocator for Solaris) found that per-node allocation overhead and cache behavior dominated practical linked list performance. It is not a toy concern.
The lesson: reach for a linked list when the structural property — O(1) splice at a known position — is genuinely the bottleneck. Not just when you think it might matter.
The LRU cache: where linked lists actually shine
The canonical real-world use of a doubly linked list is the LRU (Least Recently Used) cache. The idea: keep the most recently accessed item at the front, the least recently accessed at the back. When the cache is full, evict from the back.
flowchart LR
H["HEAD\n(most recent)"] --> A["key:A"] --> B["key:B"] --> C["key:C"] --> T["TAIL\n(evict next)"]
style H fill:#22d3ee,color:#0a0a0f
style T fill:#ff4d8d,color:#0a0a0f
When you access an item, you move its node to the front. When you evict, you remove the tail node. Both operations are O(1) — if you have a doubly linked list and a hash map pointing at each node. The hash map gives you O(1) lookup to find the node; the doubly linked list lets you move or remove it without a predecessor search.
That is the combination you almost always see in practice: a linked list for O(1) pointer rewiring, a hash map for O(1) lookup. Either alone is incomplete.
Pitfalls people actually hit
Losing the head. The single most common linked list bug. If head is your only reference to the list and you do head = head.next without saving, you have permanently lost the original head and leaked memory (in languages with manual management). Always save before you advance.
Off-by-one on the tail. When you insert the first node, both head and tail should point at it. When you delete the last node, both should become null. Every list implementation I've seen in code review has had a bug in one of these two edge cases.
Using a singly linked list when you need O(1) delete. If you're implementing an LRU cache or a priority queue with arbitrary eviction, you need doubly linked. Reaching for singly linked and then wondering why your delete is O(n) is a common mistake.
Thinking a stack needs a linked list. It doesn't. A stack is push-and-pop-at-one-end, which an array with an index handles in O(1) with better cache behavior. Use list.append and list.pop() in Python — they are O(1) amortized and faster in practice than pointer-chasing.
When NOT to use a linked list
Be honest with yourself. The cases where a linked list genuinely wins are narrower than the Wikipedia article makes them sound.
- Random access needed? Use an array. Full stop.
- Just need a stack or queue? Use Python
listorcollections.deque. Both are backed by more cache-friendly structures. - Frequent membership tests? A hash set is O(1) average; a linked list scan is O(n).
- Sorting the collection? Merge sort on a linked list is O(n log n) but the constant factor is miserable. Sort an array, transform back if needed.
- The list barely changes? If you're mostly reading and only occasionally inserting, the cache win of an array dominates completely.
The linked list's domain is a small set of genuinely pointer-rewiring-heavy workloads: LRU caches, undo/redo stacks that splice in the middle, OS scheduling queues, memory allocators managing free blocks. Outside those domains, you are paying the cache miss penalty for a theoretical O(1) that your access pattern never actually uses.
If you want to go deeper, the fast and slow pointers technique page shows how a huge class of linked list interview problems — cycles, middle-finding, nth-from-end — all collapse to the same two-pointer pattern. And the stack page shows the single most useful linked-list-backed structure, implemented two ways with a complexity comparison.
Frequently asked questions
▸What is a linked list and how does it differ from an array?
A linked list is a sequence of nodes where each node holds a value and a pointer to the next node. Unlike an array, the nodes are scattered anywhere in memory — there is no contiguous block. That means you cannot jump to node i with arithmetic; you must walk from the head. The trade-off: because there is no contiguity requirement, inserting or deleting at a known position is O(1) — just rewire pointers — instead of O(n) like an array.
▸When is a linked list faster than an array?
When you need O(1) insertions or deletions at a position you already hold a reference to — for example, a constantly-reordered list, an LRU cache eviction, or a task scheduler that plucks items from the middle. In practice those situations are rarer than they sound, and the cache miss penalty of pointer-chasing often swamps the theoretical O(1) advantage. Arrays win the vast majority of real workloads.
▸What is the difference between a singly linked list and a doubly linked list?
A singly linked list has one pointer per node — next only. You can walk forward, but to delete a node you need either the node before it or to walk from the head to find the predecessor. A doubly linked list stores both next and prev, so deletion of a known node is truly O(1) without any traversal. The cost is one extra pointer per node (8 bytes on a 64-bit system), which is usually worth it when you need backward traversal or O(1) delete.
▸How do you detect a cycle in a linked list?
Floyd's cycle-detection algorithm (fast and slow pointers) is the standard answer. Run two pointers through the list simultaneously — slow moves one step at a time, fast moves two. If there is no cycle, fast hits null first. If there is a cycle, fast laps slow and they meet inside the loop. It runs in O(n) time and O(1) space with no visited set needed.
▸How do you reverse a linked list in place?
Walk the list once with three pointers — prev (starts null), curr (starts at head), and next. On each step: save curr.next into next, point curr.next back to prev, advance prev to curr, advance curr to next. After the loop, prev is the new head. One O(n) pass, O(1) extra space, no recursion needed (though recursion works too).
You may also like
Union-Find (Disjoint Set Union)
The near-magic data structure for connectivity queries — are these two nodes in the same component? Nearly O(1) per operation with union by rank and path compression. Kruskal's MST, redundant connections, account merging, and more.
Tries (Prefix Trees)
The data structure powering autocomplete, spell-check, and IP routing — a tree where each edge is a character and every path from the root spells a prefix. Insert, search, and prefix queries all run in O(L) on word length alone, completely independent of how many words you store.
Strings
Strings look like simple text, but they hide a brutal trap: naive concatenation in a loop is O(n²). Learn why, how encoding actually works, and the handful of patterns — sliding window, two pointers, hashing — that solve 80% of string interview problems.