67 - Find Median From Data Stream
The median is the middle value in a sorted list of integers. For lists of even length, there is no middle value, so the median is the mean of the two middle values.
Implement the MedianFinder class with:
- addNum(int num): Adds an integer to the data stream.
- findMedian(): Returns the median of all elements so far.
Examples¶
Example 1
Input:
["MedianFinder", "addNum", 1, "findMedian", "addNum", 3, "findMedian", "addNum", 2, "findMedian"]
Output:
[null, null, 1.0, null, 2.0, null, 2.0]
Solution¶
Java Code¶
class MedianFinder {
private PriorityQueue<Integer> small; // Max-heap: stores the smaller half
private PriorityQueue<Integer> large; // Min-heap: stores the larger half
public MedianFinder() {
small = new PriorityQueue<>(Collections.reverseOrder());
large = new PriorityQueue<>();
}
public void addNum(int num) {
// 1. Add to the smaller half (max-heap)
small.add(num);
// 2. Balance: ensure small <= large (value-wise)
// Move the largest element from 'small' to 'large'
large.add(small.poll());
// 3. Keep heaps balanced in size
// If 'large' is bigger, move one back to 'small'
// This ensures 'small' always has >= elements than 'large'
if (small.size() < large.size()) {
small.add(large.poll());
}
}
public double findMedian() {
if (small.size() > large.size()) {
// Odd total elements: median is the top of the larger heap (small)
return small.peek();
} else {
// Even total elements: average of roots
return (small.peek() + large.peek()) / 2.0;
}
}
}
Intuition¶
Calculating the median requires a sorted order. Sorting the entire list every time an element is added would be \(O(N \log N)\) per addition. We can do better.
Two Heaps Strategy¶
We can divide the data stream into two halves:
1. Lower Half: Stored in a Max-Heap (small). The root is the largest of the small values.
2. Upper Half: Stored in a Min-Heap (large). The root is the smallest of the large values.
By maintaining these two heaps such that their sizes differ by at most 1, we can find the median in \(O(1)\) time:
- If total size is odd, the median is the top of the heap with more elements (we design it to be small).
- If total size is even, the median is the average of the tops of both heaps.
Balancing Logic¶
Every time we add a number:
1. Put it in the max-heap (small).
2. Take the maximum element of small and move it to the min-heap (large). This ensures that the upper half actually contains larger numbers than the lower half.
3. If large now has more elements than small, move the minimum element of large back to small.
Complexity Analysis¶
Time Complexity¶
- addNum: \(O(\log n)\) due to the heap insertions and extractions.
- findMedian: \(O(1)\) directly peeking at the roots.
Space Complexity¶
- \(O(n)\) to store all the numbers in the heaps.
Key Takeaways¶
- The two-heap approach is the gold standard for dynamic median tracking.
- Max-heap stores small numbers; Min-heap stores large numbers.
- Always be careful with integer division when calculating the average for even lengths.