Sorting & Searching
How to arrange data in order and locate items within it efficiently. The algorithms every language's standard library uses under the hood — and how a bug in Java's sorting implementation corrupted data in millions of Android apps in 2015 without throwing a single error.
Why this matters at your level
Implement binary search without off-by-one errors. Know the complexity of merge sort, quicksort, and insertion sort. Know what "stable sort" means and when it matters for multi-key sorting.
Recognise "binary search on the answer" problems (find the minimum speed to deliver in D days). Implement merge sort from scratch. Know when counting sort achieves O(n) for bounded integers.
Design systems where data is kept pre-sorted (event streams, time-series databases, append-only logs). Know how database B-tree indexes relate to binary search. Understand sort stability implications for multi-key sorts in distributed systems.
Sorting & Searching
How to arrange data in order and locate items within it efficiently. The algorithms every language's standard library uses under the hood — and how a bug in Java's sorting implementation corrupted data in millions of Android apps in 2015 without throwing a single error.
TimSort invented by Tim Peters for Python — later adopted by Java, JavaScript V8, and most modern languages
Researchers publish formal proof that Java's TimSort violates its own invariant on 67+ element inputs in a specific pattern
WARNINGAndroid developers report silently incorrect sort output in production apps — contacts, leaderboards, sorted UI lists affected
CRITICALJava patches the one-line invariant fix; Android update pushed to devices
Millions of older Android devices never update — silent wrong sort persists for years
WARNINGThe question this raises
Sorting algorithms can have bugs in edge cases that produce wrong output without crashing. How do engineers verify that data is actually sorted correctly, not just returned quickly?
You have a sorted list of 1,000,000 product IDs and need to check if a specific ID exists. Your colleague says "just loop through and check each one." What is the fastest you can possibly do this?
What you'll learn
- Sorting algorithms have different strengths: merge sort is O(n log n) always and stable; quicksort is O(n log n) average but O(n²) worst case on sorted input
- Binary search requires sorted data — O(log n) vs O(n) linear scan; after 20 steps it has searched over 1 million elements
- Stable sort preserves the original order of equal elements — critical for multi-key sorting (sort by date, then by name within same date)
- TimSort (used by Python, Java, JavaScript) detects naturally sorted runs and handles them in O(n) — it is O(n log n) worst case and O(n) best case
- The most common interview mistake: applying O(n log n) sort when a heap or counting sort would be O(n) or O(n log K)
Lesson outline
What this looks like in real life
The library vs the phone book
Finding a book in an unsorted library: check each shelf until you find it — O(n). Finding a name in a phone book: open the middle, eliminate half, repeat — O(log n). The phone book only works because it is sorted. This is the core tradeoff: pay O(n log n) once to sort, then get O(log n) searches forever after. Or pay O(n) per search forever if you never sort.
Binary Search
Requires sorted array O(log n) time, O(1) space Left and right pointers — move to mid each step
Use for: Reach for binary search whenever your data is sorted and you need to find a specific value or insertion point. Any time you hear "sorted array" in an interview problem, binary search is almost certainly involved
The insight that makes it work
The key insight: binary search does not find the item — it eliminates the half of candidates that cannot contain the item. Each step does not move you closer to the answer by one; it removes half the remaining possibilities. After 20 steps you have searched 1,048,576 elements.
How this concept changes your thinking
I need to check if a value exists in a sorted list of 1 million items
“I loop through the list from the beginning and check each item — seems natural”
“Binary search: 20 comparisons instead of up to 1 million. I check sorted-ness first — if not sorted, I sort once in O(n log n) then binary search forever after.”
I need to sort data before returning it from an API
“I use whatever .sort() my language provides without thinking about it”
“I check: is the data nearly sorted already? TimSort will be O(n) on nearly-sorted data. Is the key range small enough for counting sort — O(n) regardless? Knowing why matters.”
I need the Kth largest element in an unsorted array
“I sort the whole array O(n log n) and take index K”
“I use a min-heap of size K: O(n log K), faster when K is small. Or quickselect: O(n) average. Sorting is not always the right first step.”
Walk through it in plain English
Binary search: find target 37 in sorted array [2, 9, 15, 24, 37, 48, 60]
01
Start with the full range. Set LEFT = index 0 (value 2), RIGHT = index 6 (value 60). We are considering all 7 elements.
02
Find the middle. MID = (0+6)/2 = 3 (integer division). The value at index 3 is 24.
03
Compare and eliminate. Target 37 > 24, so the target is in the RIGHT half. Set LEFT = MID + 1 = 4. We will never look at indices 0-3 again.
04
Find new middle. LEFT=4, RIGHT=6. MID = (4+6)/2 = 5. Value at index 5 is 48.
05
Compare again. Target 37 < 48, so the target is in the LEFT portion. Set RIGHT = MID - 1 = 4. Remaining: just index 4.
06
Converge. LEFT=4, RIGHT=4. MID=4. Value at index 4 is 37. Match found — return index 4.
07
Key: 7 elements, 3 comparisons. Linear search would need up to 7 checks. At 1 million elements: binary search needs at most 20 comparisons.
Start with the full range. Set LEFT = index 0 (value 2), RIGHT = index 6 (value 60). We are considering all 7 elements.
Find the middle. MID = (0+6)/2 = 3 (integer division). The value at index 3 is 24.
Compare and eliminate. Target 37 > 24, so the target is in the RIGHT half. Set LEFT = MID + 1 = 4. We will never look at indices 0-3 again.
Find new middle. LEFT=4, RIGHT=6. MID = (4+6)/2 = 5. Value at index 5 is 48.
Compare again. Target 37 < 48, so the target is in the LEFT portion. Set RIGHT = MID - 1 = 4. Remaining: just index 4.
Converge. LEFT=4, RIGHT=4. MID=4. Value at index 4 is 37. Match found — return index 4.
Key: 7 elements, 3 comparisons. Linear search would need up to 7 checks. At 1 million elements: binary search needs at most 20 comparisons.
Now in code
Binary search — the clean implementation. The off-by-one errors live entirely in the loop condition and pointer updates. Get these two lines wrong and the function skips valid answers or loops forever.
1def binary_search(arr: list, target: int) -> int:Both bounds are inclusive: left=0 and right=len-1 include the first and last valid indices. The invariant: the target, if it exists, is somewhere in arr[left..right].2left, right = 0, len(arr) - 1 # inclusive bounds3while left <= right: # <= not <: checks single-element rangeleft <= right (not left < right). Using < exits before checking the last remaining element — so a single-element array with the target returns -1 incorrectly.4mid = left + (right - left) // 2 # avoids integer overflowmid = left + (right - left) // 2 prevents integer overflow. In Java or C with 32-bit integers, (left + right) can overflow if both are large. This form is mathematically identical but safe.5if arr[mid] == target:6return mid7elif arr[mid] < target:8left = mid + 1 # target in right half, exclude midleft = mid + 1 not left = mid. Setting left = mid when left equals right causes an infinite loop — the loop repeats without making progress.9else:10right = mid - 1 # target in left half, exclude mid11return -1 # target not present
The trap — what most people try first
Binary search has two classic bugs: using < instead of <= in the while condition (misses single-element arrays), and using mid instead of mid+1 in the pointer update (causes infinite loops). Both look almost correct and pass most test cases.
Off-by-one in loop condition and pointer update
# BROKEN: two subtle off-by-one errors
def binary_search_broken(arr, target):
left, right = 0, len(arr) - 1
while left < right: # BUG 1: misses single-element array
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid # BUG 2: infinite loop when left==right-1
else:
right = mid - 1
return -1
# On [5], target=5: loop never runs, returns -1 (wrong)
# On [3,5], target=5: left stays 0 forever (infinite)# CORRECT: both off-by-ones fixed
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right: # inclusive: handles single element
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1 # exclude mid, move right
else:
right = mid - 1
return -1The left < right bug means the loop exits before checking the last remaining element: a single-element array with the target always returns -1. The left = mid bug means when left and right are adjacent, mid equals left forever — an infinite loop. Both bugs pass almost all test cases because they only trigger on 1 or 2 element arrays. Production binary search needs explicit tests for: empty array, single element, two elements, target at first index, target at last index.
When to reach for this
How fast is it?
| Scenario | What happens | Speed |
|---|---|---|
| Binary search on sorted array of 1 million items | Eliminates half the candidates each step — found in ≤20 comparisons | Very fast (O(log n)) |
| Linear search on unsorted array of 1 million items | Checks items one by one — target could be the last one | Slow (O(n)) |
| Merge sort any input | Always divides and conquers — no worst case | Fast always (O(n log n)) |
| Quicksort on sorted or reverse-sorted input | Pivot is always the min or max — degenerates to O(n²) | Slow worst case (O(n²)) |
| TimSort on nearly-sorted data | Detects existing runs and merges them — approaches linear | Very fast (O(n) best case) |
Exam Answer vs. Production Reality
Sorting & Binary Search
📖 What the exam expects
Merge sort: O(n log n) time, O(n) space, stable. Quicksort: O(n log n) average, O(n²) worst case, O(log n) space. Binary search: O(log n) time, O(1) space — requires sorted input. State complexity before writing code.
Toggle between what certifications teach and what production actually requires
How this might come up in interviews
Binary search appears in two forms: literally (find a value in a sorted array) and disguised as "binary search on the answer" (find the minimum or maximum value that satisfies a condition). The disguised form is common in medium and hard problems: find the minimum speed to arrive on time, find the minimum capacity for shipping packages, find the K-th smallest element in a matrix.
Common questions:
- Binary search (basic — implement correctly from memory)
- Search in rotated sorted array
- Find minimum in rotated sorted array
- Kth largest element (quickselect or heap)
- Find first and last position of element in sorted array
Strong answer: states the sortedness invariant before coding / handles empty array and single-element array as explicit test cases / uses overflow-safe mid calculation / explains left <= right vs left < right and WHY / can extend to find-first-occurrence or find-insertion-point variants without prompting
Red flags: uses left < right (misses single-element arrays) / sets left = mid instead of mid+1 (causes infinite loop) / cannot write binary search from memory / does not check whether input is sorted / confuses binary search (sorted array) with BFS (graph traversal)
💡 Analogy
Sorting is like organising a library. Insertion sort: pick up each book one at a time and slide it into the right position among the books you already have — fast for small collections, painfully slow for thousands. Merge sort: split the library in half, sort each half, then merge them together — does the same work regardless of how random the starting order was. Binary search: you can only find a specific book instantly if the library is already sorted alphabetically — then you open the middle shelf, decide left or right, and repeat.
⚡ Core Idea
Sorting rearranges data into order. Binary search finds data within that order. Binary search eliminates half the candidates at each step, running in O(log n). The sorting algorithms differ in how they achieve O(n log n): merge sort divides and conquers, quicksort partitions around a pivot, TimSort uses the natural order already present in the data. For interviews: know merge sort and binary search cold.
🎯 Why It Matters
Every database index is a sorted structure. Every binary search in production assumes sorted data. Every ORDER BY in SQL uses a sort algorithm. Getting the wrong sort algorithm causes silent performance cliffs: quicksort degenerates to O(n²) on already-sorted data — a very common input pattern — which is exactly why Python and Java both use TimSort instead.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Discussion
Questions? Discuss in the community or start a thread below.
Join DiscordIn-app Q&A
Sign in to start or join a thread.