Interactive Explainer

Relevant for:JuniorMid-levelSenior

Why this matters at your level

Junior

Implement binary search without off-by-one errors. Know the complexity of merge sort, quicksort, and insertion sort. Know what "stable sort" means and when it matters for multi-key sorting.

Mid-level

Recognise "binary search on the answer" problems (find the minimum speed to deliver in D days). Implement merge sort from scratch. Know when counting sort achieves O(n) for bounded integers.

Senior

Design systems where data is kept pre-sorted (event streams, time-series databases, append-only logs). Know how database B-tree indexes relate to binary search. Understand sort stability implications for multi-key sorts in distributed systems.

Sorting & Searching

How to arrange data in order and locate items within it efficiently. The algorithms every language's standard library uses under the hood — and how a bug in Java's sorting implementation corrupted data in millions of Android apps in 2015 without throwing a single error.

~5 min read

Be the first to complete!

LIVESilent Data Corruption — Java TimSort — Android, 2015

Breaking News

2002

TimSort invented by Tim Peters for Python — later adopted by Java, JavaScript V8, and most modern languages

2015

Researchers publish formal proof that Java's TimSort violates its own invariant on 67+ element inputs in a specific pattern

WARNING

Same week

Android developers report silently incorrect sort output in production apps — contacts, leaderboards, sorted UI lists affected

CRITICAL

+2 weeks

Java patches the one-line invariant fix; Android update pushed to devices

Legacy

Millions of older Android devices never update — silent wrong sort persists for years

WARNING

—TimSort returned wrong output and signalled success — data silently corrupt in production

—minimum input size that triggered the invariant violation — small enough to appear in everyday use

The question this raises

Sorting algorithms can have bugs in edge cases that produce wrong output without crashing. How do engineers verify that data is actually sorted correctly, not just returned quickly?

Test your assumption first

You have a sorted list of 1,000,000 product IDs and need to check if a specific ID exists. Your colleague says "just loop through and check each one." What is the fastest you can possibly do this?

What you'll learn

Sorting algorithms have different strengths: merge sort is O(n log n) always and stable; quicksort is O(n log n) average but O(n²) worst case on sorted input
Binary search requires sorted data — O(log n) vs O(n) linear scan; after 20 steps it has searched over 1 million elements
Stable sort preserves the original order of equal elements — critical for multi-key sorting (sort by date, then by name within same date)
TimSort (used by Python, Java, JavaScript) detects naturally sorted runs and handles them in O(n) — it is O(n log n) worst case and O(n) best case
The most common interview mistake: applying O(n log n) sort when a heap or counting sort would be O(n) or O(n log K)

Lesson outline

What this looks like in real life

The library vs the phone book

Finding a book in an unsorted library: check each shelf until you find it — O(n). Finding a name in a phone book: open the middle, eliminate half, repeat — O(log n). The phone book only works because it is sorted. This is the core tradeoff: pay O(n log n) once to sort, then get O(log n) searches forever after. Or pay O(n) per search forever if you never sort.

Binary Search

Requires sorted array
O(log n) time, O(1) space
Left and right pointers — move to mid each step

Use for: Reach for binary search whenever your data is sorted and you need to find a specific value or insertion point. Any time you hear "sorted array" in an interview problem, binary search is almost certainly involved

The insight that makes it work

The key insight: binary search does not find the item — it eliminates the half of candidates that cannot contain the item. Each step does not move you closer to the answer by one; it removes half the remaining possibilities. After 20 steps you have searched 1,048,576 elements.

How this concept changes your thinking

Situation

Before

After

I need to check if a value exists in a sorted list of 1 million items

“I loop through the list from the beginning and check each item — seems natural”

“Binary search: 20 comparisons instead of up to 1 million. I check sorted-ness first — if not sorted, I sort once in O(n log n) then binary search forever after.”

I need to sort data before returning it from an API

“I use whatever .sort() my language provides without thinking about it”

“I check: is the data nearly sorted already? TimSort will be O(n) on nearly-sorted data. Is the key range small enough for counting sort — O(n) regardless? Knowing why matters.”

I need the Kth largest element in an unsorted array

“I sort the whole array O(n log n) and take index K”

“I use a min-heap of size K: O(n log K), faster when K is small. Or quickselect: O(n) average. Sorting is not always the right first step.”

Walk through it in plain English

Binary search: find target 37 in sorted array [2, 9, 15, 24, 37, 48, 60]

→

Start with the full range. Set LEFT = index 0 (value 2), RIGHT = index 6 (value 60). We are considering all 7 elements.

→

Find the middle. MID = (0+6)/2 = 3 (integer division). The value at index 3 is 24.

→

Compare and eliminate. Target 37 > 24, so the target is in the RIGHT half. Set LEFT = MID + 1 = 4. We will never look at indices 0-3 again.

→

Find new middle. LEFT=4, RIGHT=6. MID = (4+6)/2 = 5. Value at index 5 is 48.

→

Compare again. Target 37 < 48, so the target is in the LEFT portion. Set RIGHT = MID - 1 = 4. Remaining: just index 4.

→

Converge. LEFT=4, RIGHT=4. MID=4. Value at index 4 is 37. Match found — return index 4.

Key: 7 elements, 3 comparisons. Linear search would need up to 7 checks. At 1 million elements: binary search needs at most 20 comparisons.

Start with the full range. Set LEFT = index 0 (value 2), RIGHT = index 6 (value 60). We are considering all 7 elements.

Find the middle. MID = (0+6)/2 = 3 (integer division). The value at index 3 is 24.

Compare and eliminate. Target 37 > 24, so the target is in the RIGHT half. Set LEFT = MID + 1 = 4. We will never look at indices 0-3 again.

Find new middle. LEFT=4, RIGHT=6. MID = (4+6)/2 = 5. Value at index 5 is 48.

Compare again. Target 37 < 48, so the target is in the LEFT portion. Set RIGHT = MID - 1 = 4. Remaining: just index 4.

Converge. LEFT=4, RIGHT=4. MID=4. Value at index 4 is 37. Match found — return index 4.

Key: 7 elements, 3 comparisons. Linear search would need up to 7 checks. At 1 million elements: binary search needs at most 20 comparisons.

Now in code

Binary search — the clean implementation. The off-by-one errors live entirely in the loop condition and pointer updates. Get these two lines wrong and the function skips valid answers or loops forever.

binary_search.py

1def binary_search(arr: list, target: int) -> int:
Both bounds are inclusive: left=0 and right=len-1 include the first and last valid indices. The invariant: the target, if it exists, is somewhere in arr[left..right].
2    left, right = 0, len(arr) - 1          # inclusive bounds
3    while left <= right:                    # <= not <: checks single-element range
left <= right (not left < right). Using < exits before checking the last remaining element — so a single-element array with the target returns -1 incorrectly.
4        mid = left + (right - left) // 2   # avoids integer overflow
mid = left + (right - left) // 2 prevents integer overflow. In Java or C with 32-bit integers, (left + right) can overflow if both are large. This form is mathematically identical but safe.
5        if arr[mid] == target:
6            return mid
7        elif arr[mid] < target:
8            left = mid + 1                 # target in right half, exclude mid
left = mid + 1 not left = mid. Setting left = mid when left equals right causes an infinite loop — the loop repeats without making progress.
9        else:
10            right = mid - 1               # target in left half, exclude mid
11    return -1                              # target not present

The trap — what most people try first

Binary search has two classic bugs: using < instead of <= in the while condition (misses single-element arrays), and using mid instead of mid+1 in the pointer update (causes infinite loops). Both look almost correct and pass most test cases.

Off-by-one in loop condition and pointer update

Bug

# BROKEN: two subtle off-by-one errors
def binary_search_broken(arr, target):
    left, right = 0, len(arr) - 1
    while left < right:        # BUG 1: misses single-element array
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid         # BUG 2: infinite loop when left==right-1
        else:
            right = mid - 1
    return -1
# On [5], target=5: loop never runs, returns -1 (wrong)
# On [3,5], target=5: left stays 0 forever (infinite)

Fix

# CORRECT: both off-by-ones fixed
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:              # inclusive: handles single element
        mid = left + (right - left) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1            # exclude mid, move right
        else:
            right = mid - 1
    return -1

The left < right bug means the loop exits before checking the last remaining element: a single-element array with the target always returns -1. The left = mid bug means when left and right are adjacent, mid equals left forever — an infinite loop. Both bugs pass almost all test cases because they only trigger on 1 or 2 element arrays. Production binary search needs explicit tests for: empty array, single element, two elements, target at first index, target at last index.

When to reach for this

Is the data sorted, or can you sort it once and search many times?

YesBinary search applies — O(log n) per search after O(n log n) sort

NoUse linear search O(n) — binary search requires sorted data

Are you searching for an exact value or the insertion point?

YesUse binary search with inclusive bounds: left=0, right=len-1, while left<=right

NoContinue...

Are you selecting the Kth largest or smallest element?

YesUse a min-heap of size K — O(n log K), faster than full sort when K is small

NoSort with merge sort for stable O(n log n), or counting sort for integers in a bounded range

How fast is it?

Scenario	What happens	Speed
Binary search on sorted array of 1 million items	Eliminates half the candidates each step — found in ≤20 comparisons	Very fast (O(log n))
Linear search on unsorted array of 1 million items	Checks items one by one — target could be the last one	Slow (O(n))
Merge sort any input	Always divides and conquers — no worst case	Fast always (O(n log n))
Quicksort on sorted or reverse-sorted input	Pivot is always the min or max — degenerates to O(n²)	Slow worst case (O(n²))
TimSort on nearly-sorted data	Detects existing runs and merges them — approaches linear	Very fast (O(n) best case)

Exam Answer vs. Production Reality

1 / 1

Sorting & Binary Search

📖 What the exam expects

Merge sort: O(n log n) time, O(n) space, stable. Quicksort: O(n log n) average, O(n²) worst case, O(log n) space. Binary search: O(log n) time, O(1) space — requires sorted input. State complexity before writing code.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Binary search appears in two forms: literally (find a value in a sorted array) and disguised as "binary search on the answer" (find the minimum or maximum value that satisfies a condition). The disguised form is common in medium and hard problems: find the minimum speed to arrive on time, find the minimum capacity for shipping packages, find the K-th smallest element in a matrix.

Common questions:

Binary search (basic — implement correctly from memory)
Search in rotated sorted array
Find minimum in rotated sorted array
Kth largest element (quickselect or heap)
Find first and last position of element in sorted array

Strong answer: states the sortedness invariant before coding / handles empty array and single-element array as explicit test cases / uses overflow-safe mid calculation / explains left <= right vs left < right and WHY / can extend to find-first-occurrence or find-insertion-point variants without prompting

Red flags: uses left < right (misses single-element arrays) / sets left = mid instead of mid+1 (causes infinite loop) / cannot write binary search from memory / does not check whether input is sorted / confuses binary search (sorted array) with BFS (graph traversal)

🧠Mental Model

💡 Analogy

Sorting is like organising a library. Insertion sort: pick up each book one at a time and slide it into the right position among the books you already have — fast for small collections, painfully slow for thousands. Merge sort: split the library in half, sort each half, then merge them together — does the same work regardless of how random the starting order was. Binary search: you can only find a specific book instantly if the library is already sorted alphabetically — then you open the middle shelf, decide left or right, and repeat.

⚡ Core Idea

Sorting rearranges data into order. Binary search finds data within that order. Binary search eliminates half the candidates at each step, running in O(log n). The sorting algorithms differ in how they achieve O(n log n): merge sort divides and conquers, quicksort partitions around a pivot, TimSort uses the natural order already present in the data. For interviews: know merge sort and binary search cold.

🎯 Why It Matters

Every database index is a sorted structure. Every binary search in production assumes sorted data. Every ORDER BY in SQL uses a sort algorithm. Getting the wrong sort algorithm causes silent performance cliffs: quicksort degenerates to O(n²) on already-sorted data — a very common input pattern — which is exactly why Python and Java both use TimSort instead.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord