Statement
Given a sequence of rectangular buildings viewed from your position, efficiently compute the combined skyline which you will see.
Each building in the input is given as a triple (left, width, height)
. The sequence of buildings is not necessarily sorted in any meaningful order.
The skyline returned should consist of pairs (left, height)
and be returned in order sorted by left
(low to high). The right side of each rectangular section can be omitted as it will be the same as the next section's left. The final section should look like (left, 0)
.
Solution
As this post is intended to be a discussion of implementation details in Python, we can refer to the excellent post by Brian Gordon, which discusses the problem and the outline of its solution.
The final algorithm (including the update from Ivan Malison) roughly boils down to:
for each point where a building starts or ends: add any starting buildings to the heap while the building on top of the heap ends to the left of the point: remove this building from the heap yield the height of the top of the heap
Implementation
Which Heap?
Whilst this is not the first section of the script that will run, it is the place where the majority of operations in the algorithm are going to happen, making it the most important for consideration.
The Python standard library comes with two implementations of (min-)heaps:
heapq
queue.PriorityQueue
(the module isQueue
in legacy Python)
heapq
is the default method of dealing with heaps in Python, it provides a collection of methods to turn a standard Python list
object into an in-place heap.
The standard workflow for this module begins with either an empty python list or a call of heapq.heapify(your_list)
, which will transform the list into a heap in-place in linear time. Once a heap is established, items can be added and removed via heapq.heappush(heap, item)
and heapq.heappop(item)
.
As the heap is still just a list (albeit with a specific ordering), it is still accessible to any normal functions taking a list as input, although care must be taken not to reorder items in a way that breaks the heap ordering. Of particular importance to this problem, we can view the top of the heap (the first element in the list) without making any changes.
queue.PriorityQueue
is one of several queueing classes included in the queue
module which share a common object based interface. Elements are added and removed with the .put(item)
and .get()
methods, whilst size can be determined with .qsize()
or .empty()
. The difference between the classes in this module comes down to the output order of elements. In particular the underlying data structure for an instance of PriorityQueue
is a heap managed by heapq
, meaning that elements are returned in order from smallest to largest.
As a PriorityQueue
instance relies on heapq
, it will necessarily be the slower of the two implementations, especially as it provides no ability to inspect elements of the heap without removing them. The counter to this disadvantage is that PriorityQueue
, as with all classes from queue
is thread-safe, allowing for parallelization. We could in theory have multiple threads checking several elements of the heap simultaneously, but in the case of this algorithm, the only processing being done on the received items is a simple comparison, so the heap operations would still be the bottleneck.
heapq
is the clear winner for this problem.
Heap Implementation
Let's call our building heap heights
. As we only start the skyline when the first building is encountered, we can just initialize this as an empty list to start with.
When we add a building (left, width, height)
to the heap, there are two key pieces of information that need to be stored:
- The
height
of the building. - The position of its right-hand edge.
We don't need to store information about the left-hand edge of a building as it will only be added to the heap when the critical point being considered is its left-hand edge.
We want to sort the heap by height
from high to low, so we should store the buildings with the negation of height as the first element in a tuple (remember heapq
works with a min-heap).
heapq.heappush(heights, (-height, left + width))
Critical Values
As our input of buildings is unordered and critical values are needed for both the left and right sides of each building, we will need to create some new structure to hold and iterate over the critical values. We could use another heap for this, but the values only need to be sorted once, and multiple changes may happen at the same critical value, so a dict
of lists sorted before iteration will end up being simpler.
For each building, we know that the information needed at its left-hand side is its height
and the position of its right-hand side. For the critical point at its right-hand side, no information is needed aside from the presence of a critical point.
To make things even simpler, we can use the defaultdict
container from the standard library module containers
to hand off the complication of checking whether a given key
exists before adjusting its value
. This leaves us with the following code:
crit_points = collections.defaultdict(list) for left, width, height in buildings: crit_points[left].append((-height, left + width)) crit_points[left + width]
The final line may seem strange, but accessing the value for the right-hand side of the building will force defaultdict
to initialize a list there if one does not already exist, whilst making no change if there is one.
Full Implementation
Most of the interesting points have now been discussed, leaving only the output. Since we are outputting a sequence of values and are unsure of their use, we may as well make use of the yeild
keyword to make our output a generator with lazy evaluation.
This leaves us with the final code:
from collections import defaultdict from heapq import heappop, heappush def skyline(buildings): # Assemble a the critical values crit_points = defaultdict(list) for left, width, height in buildings: crit_points[left].append((-height, left + width)) crit_points[left + width] # Calculate the skyline heights = [] for position, buildings in sorted(crit_points.items()): for building in buildings: heappush(heights, building) while heights and heights[0][1] <= position: heappop(heights) if heights: height = -heights[0][0] if not skyline or height != skyline[-1][1]: yield position, height else: yield position, 0