Searching: Linear vs Binary
Binary search vs linear search explained from first principles — why halving a sorted list gives you O(log n), when it crushes linear search, and why hash tables beat both.
The problem every program has
Your program stores data. At some point, it needs to answer the question: is this value here, and if so, where?
That question shows up constantly. Is this username already taken? Does this product ID exist in inventory? Which log line has this error code? The naive answer — just look at everything — works fine when there are 50 items. It quietly murders you when there are 50 million.
Let's build both strategies from scratch and understand exactly when each one wins.
Linear search: honest and slow
Linear search is what your brain does when you lose your keys. You check your desk. Then your jacket pocket. Then the counter. Then the couch cushions. You check places one by one until you find the keys or run out of places to look.
def linear_search(arr, target):
for i, val in enumerate(arr):
if val == target:
return i # found it at index i
return -1 # not here
There is no cleverness here, and that is the point. The algorithm makes exactly one assumption: that you can look at elements one at a time. No sorting required, no ordering required, no structure at all. You can run this on a shuffled list, a list with duplicates, a list where the elements are strings. It just works.
The cost is linear in the obvious sense: in the worst case, the element is either at the very end or not there at all, which means checking every one of the n elements. That is O(n). With 10 million rows, you might do 10 million comparisons.
For small data, that is fine. For anything the size of a real database table, it is not.
Binary search: exploit the order
Here is the key insight — if the list is sorted, every comparison you make tells you more than "is this it?" It also tells you which half to throw away.
Imagine you are guessing a number between 1 and 100 and I tell you "higher" or "lower" after each guess. A smart player guesses 50 first. If I say "lower," they have just eliminated every number from 50 to 100 — half the possibilities, in one guess. They guess 25 next. Another half gone. After seven guesses, they've cornered the answer regardless of what it is.
That is binary search. The implementation:
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2 # exact middle index
if arr[mid] == target:
return mid # found it
elif arr[mid] < target:
lo = mid + 1 # target is in the right half
else:
hi = mid - 1 # target is in the left half
return -1 # not here
Two pointers, lo and hi, define the "still possible" region. Each iteration either finds the target or cuts that region in half. The loop runs until the region is empty.
Watch what happens on a concrete example. Searching for 38 in [3, 7, 12, 19, 24, 31, 38, 45, 50, 58]:
flowchart TD
A["[3, 7, 12, 19, 24, 31, 38, 45, 50, 58]\nlo=0, hi=9, mid=4 → arr[4]=24\n24 < 38, so lo = mid+1 = 5"]
B["[31, 38, 45, 50, 58]\nlo=5, hi=9, mid=7 → arr[7]=45\n45 > 38, so hi = mid-1 = 6"]
C["[31, 38]\nlo=5, hi=6, mid=5 → arr[5]=31\n31 < 38, so lo = mid+1 = 6"]
D["[38]\nlo=6, hi=6, mid=6 → arr[6]=38\nFound at index 6!"]
A --> B --> C --> D
style A fill:#7c5cff,color:#fff
style D fill:#00ff9d,color:#0a0a0f
Four comparisons to find one element in a ten-item list. Not dramatic on this scale — but scale it up and the math gets interesting.
Why halving gives you log n
The reason binary search is O(log n) is not a theorem you memorize; it falls out of the halving directly.
Start with n elements. After 1 comparison, you have at most n/2 left. After 2, n/4. After k comparisons, you have at most n / 2^k left. You stop when that region hits 1 (or 0). So you need k such that:
n / 2^k ≤ 1
2^k ≥ n
k ≥ log₂(n)
That is where log₂ comes from — it is just asking "how many times can you halve n before you get to 1?" For n = 1,000,000, log₂(1,000,000) ≈ 20. Twenty comparisons. Not 1,000,000.
The practical consequence:
| Elements | Linear (worst case) | Binary (worst case) |
|---|---|---|
| 100 | 100 | 7 |
| 10,000 | 10,000 | 14 |
| 1,000,000 | 1,000,000 | 20 |
| 1,000,000,000 | 1,000,000,000 | 30 |
Each time you multiply the data by 1,000, linear search needs 1,000x more work. Binary search needs 10 more steps. That gap is why O(log n) sits so comfortably between O(1) and O(n) in the hierarchy — it barely grows.
The only catch: you have to sort first
Binary search requires sorted data. That is a real constraint, not a fine-print footnote.
If your data arrives unsorted and you sort it just to run one binary search, you have spent O(n log n) on the sort plus O(log n) on the search — total O(n log n). A single linear search would have cost O(n). You came out behind.
The math tips the other way when you search repeatedly. Sort once for O(n log n), then every subsequent search is O(log n). If you run m searches:
- Linear:
O(m × n)— it stacks - Sort + binary:
O(n log n + m × log n)— after the one-time sort cost, searches are cheap
If m is large, the binary approach wins decisively. This is the "sort-once-search-many" pattern you will see in real systems constantly — load the data, sort it, then answer queries against it.
The other practical note: arr[mid] requires O(1) random access by index, which means binary search needs an array, not a linked list. The linked list cannot jump to the middle in constant time, so you can't halve cheaply.
Play with it
Here is that sorted array again. Try targeting a value that does not exist — watch how binary search terminates as soon as the "possible region" collapses to empty rather than scanning to the end.
{ "type": "array", "title": "binary search on sorted data", "data": [3, 7, 12, 19, 24, 31, 38, 45, 50, 58] }
Then try targeting something near the beginning — notice it still only takes a few steps, where linear search would have gotten lucky and stopped early. The average case for linear search is n/2, not n. Binary search's worst case is log n. Binary search still wins.
What to watch out for
Binary search has a reputation for being "simple but easy to get wrong." The off-by-one errors are real.
The classic overflow bug. In many languages, (lo + hi) / 2 overflows when lo + hi exceeds the integer max. The safe version is lo + (hi - lo) / 2. Python's integers don't overflow, so this matters more in Java, C, and Go — but it is worth knowing.
Open vs. closed interval confusion. The version above uses a closed interval: lo and hi are both valid candidate indices. Some implementations use a half-open interval where hi is one past the last valid index. The loop condition changes (while lo < hi instead of while lo <= hi) and the midpoint update changes. Both are correct; mixing conventions is not. Pick one and be consistent.
Finding the first or last occurrence. If the array has duplicates, the version above finds an occurrence, not necessarily the first or last. For "find the leftmost occurrence of 7," you need to keep going left even after a match. That variant is worth its own workout — it is covered in detail in the binary search technique deep dive.
Don't binary search an unsorted array. This sounds obvious but has bitten people: you sort a copy, binary-search it, get an index into the copy, and that index means nothing in the original. Or worse, you forget to sort at all and the algorithm returns a wrong answer confidently. Binary search on unsorted data does not error out — it just silently lies.
The O(1) alternative: hash lookups
Before you reach for binary search, ask one question: do you actually care about order?
If you only need "is X in the set?" or "what is the value for key X?", a hash table does both in O(1) average time — faster than binary search's O(log n). Python's set and dict, Java's HashSet and HashMap, JavaScript's Map and Set — all of them are hash-table-backed.
# O(log n) — binary search approach
sorted_ids = sorted(product_ids)
if binary_search(sorted_ids, target_id) != -1:
...
# O(1) — just use a set
id_set = set(product_ids)
if target_id in id_set:
...
The hash table wins on pure lookup speed. But it has real trade-offs:
- No range queries. "Find all elements between 100 and 200" — a sorted array can do this in O(log n + k) time. A hash table has no idea.
- No ordered iteration. Want the next-largest element? Binary search (or a sorted array with
bisect) handles that cleanly. - More memory. Hash tables carry overhead from their bucket structure, typically running at 50–70% fill to keep collision rates low.
So: hash tables for membership tests and key lookups, binary search for sorted data with range queries or order-sensitive operations. Both beat linear search for large data.
When to just use linear search
Linear search is not a consolation prize. There are concrete situations where it is the correct answer:
- The list is short. For under ~20 elements the overhead and code complexity of binary search are not worth it. Linear search is also branchless in many optimized implementations and plays nicer with CPU prefetchers.
- The data isn't sorted and you're searching once. Sorting costs O(n log n). If you only need one answer, linear search's O(n) beats sort-then-binary-search.
- You need to find all matches, not one. Binary search finds a match. If you need every occurrence of a value, you either scan outward from the match (which degrades to O(n) anyway) or just scan linearly from the start.
- The structure doesn't support random access. A linked list, a stream, a generator — binary search can't halve a structure it can't index directly.
Setting up for the deep dive
The basic form of binary search is a short while lo <= hi loop. But the real power of the pattern is how it generalizes.
You can binary search on any monotone function — not just a sorted array of values. "Find the minimum capacity such that this warehouse can handle the load." "Find the first day where the cumulative total exceeds a threshold." "Find the smallest integer where this property becomes true." All of these are binary search problems in disguise, even though there is no explicit sorted array in sight.
That generalization is what the binary search technique page covers in full, with worked problems of increasing difficulty. The intuition you want to leave this lesson with: whenever you can ask a yes/no question about a value and the answer transitions from no to yes (or yes to no) exactly once as the value grows, you can binary search for the transition point.
That one insight opens up a huge category of problems that look nothing like "find 38 in a sorted list."
Next up: Binary Search (technique) — templated code, boundary conditions, and problems where you binary-search on the answer rather than the array.
Frequently asked questions
▸What is the difference between linear search and binary search?
Linear search checks every element from the start until it finds what it wants — O(n) in the worst case. Binary search only works on sorted data, but it eliminates half the remaining candidates on each step, hitting the target in O(log n) comparisons. On a list of a million items, linear search takes up to 1,000,000 steps; binary search takes at most 20.
▸Why does binary search require sorted data?
Binary search works by comparing the target to the middle element and deciding which half to discard. That decision is only valid if the data is ordered — if the array is not sorted, the element you are looking for could be on either side of the midpoint, so you cannot throw either half away safely.
▸When should I use linear search instead of binary search?
Use linear search when the data is unsorted and sorting first would cost more than the search saves, when the list is very short (under ~20 elements the overhead of binary search is not worth it), or when you are searching a data structure that does not support random access by index, like a linked list. Also use it any time you need to find all elements that match, not just one.
▸Is binary search always faster than linear search?
On large sorted datasets, yes — by a wide margin. But if your data is not sorted, you have to sort it first (O(n log n)), which is worth it only if you will search many times. For a single search on unsorted data, linear search wins. For short arrays, linear search is simpler and often fast enough. Binary search is the right tool for repeated lookups on sorted data.
▸What is faster than binary search for lookup?
A hash table gives O(1) average-case lookup — faster than binary search for pure membership tests and key-value retrieval. The tradeoffs are that hash tables use more memory, do not support range queries, and have worst-case O(n) behavior on hash collisions. Binary search on a sorted array is the right tool when you need range queries, ordered iteration, or space efficiency.