Statement

Given a sequence of rectangular buildings viewed from your position, efficiently compute the combined skyline which you will see.

Each building in the input is given as a triple (left, width, height). The sequence of buildings is not necessarily sorted in any meaningful order.

The skyline returned should consist of pairs (left, height) and be returned in order sorted by left (low to high). The right side of each rectangular section can be omitted as it will be the same as the next section's left. The final section should look like (left, 0).

Solution

As this post is intended to be a discussion of implementation details in Python, we can refer to the excellent post by Brian Gordon, which discusses the problem and the outline of its solution.

The final algorithm (including the update from Ivan Malison) roughly boils down to:

for each point where a building starts or ends:
    add any starting buildings to the heap

    while the building on top of the heap ends to the left of the point:
        remove this building from the heap

    yield the height of the top of the heap

Implementation

Which Heap?

Whilst this is not the first section of the script that will run, it is the place where the majority of operations in the algorithm are going to happen, making it the most important for consideration.

The Python standard library comes with two implementations of (min-)heaps:

  • heapq
  • queue.PriorityQueue (the module is Queue in legacy Python)

heapq is the default method of dealing with heaps in Python, it provides a collection of methods to turn a standard Python list object into an in-place heap.

The standard workflow for this module begins with either an empty python list or a call of heapq.heapify(your_list), which will transform the list into a heap in-place in linear time. Once a heap is established, items can be added and removed via heapq.heappush(heap, item) and heapq.heappop(item).

As the heap is still just a list (albeit with a specific ordering), it is still accessible to any normal functions taking a list as input, although care must be taken not to reorder items in a way that breaks the heap ordering. Of particular importance to this problem, we can view the top of the heap (the first element in the list) without making any changes.

queue.PriorityQueue is one of several queueing classes included in the queue module which share a common object based interface. Elements are added and removed with the .put(item) and .get() methods, whilst size can be determined with .qsize() or .empty(). The difference between the classes in this module comes down to the output order of elements. In particular the underlying data structure for an instance of PriorityQueue is a heap managed by heapq, meaning that elements are returned in order from smallest to largest.

As a PriorityQueue instance relies on heapq, it will necessarily be the slower of the two implementations, especially as it provides no ability to inspect elements of the heap without removing them. The counter to this disadvantage is that PriorityQueue, as with all classes from queue is thread-safe, allowing for parallelization. We could in theory have multiple threads checking several elements of the heap simultaneously, but in the case of this algorithm, the only processing being done on the received items is a simple comparison, so the heap operations would still be the bottleneck.

heapq is the clear winner for this problem.

Heap Implementation

Let's call our building heap heights. As we only start the skyline when the first building is encountered, we can just initialize this as an empty list to start with.

When we add a building (left, width, height) to the heap, there are two key pieces of information that need to be stored:

  • The height of the building.
  • The position of its right-hand edge.

We don't need to store information about the left-hand edge of a building as it will only be added to the heap when the critical point being considered is its left-hand edge.

We want to sort the heap by height from high to low, so we should store the buildings with the negation of height as the first element in a tuple (remember heapq works with a min-heap).

heapq.heappush(heights, (-height, left + width))

Critical Values

As our input of buildings is unordered and critical values are needed for both the left and right sides of each building, we will need to create some new structure to hold and iterate over the critical values. We could use another heap for this, but the values only need to be sorted once, and multiple changes may happen at the same critical value, so a dict of lists sorted before iteration will end up being simpler.

For each building, we know that the information needed at its left-hand side is its height and the position of its right-hand side. For the critical point at its right-hand side, no information is needed aside from the presence of a critical point.

To make things even simpler, we can use the defaultdict container from the standard library module containers to hand off the complication of checking whether a given key exists before adjusting its value. This leaves us with the following code:

crit_points = collections.defaultdict(list)
for left, width, height in buildings:
    crit_points[left].append((-height, left + width))
    crit_points[left + width]

The final line may seem strange, but accessing the value for the right-hand side of the building will force defaultdict to initialize a list there if one does not already exist, whilst making no change if there is one.

Full Implementation

Most of the interesting points have now been discussed, leaving only the output. Since we are outputting a sequence of values and are unsure of their use, we may as well make use of the yeild keyword to make our output a generator with lazy evaluation.

This leaves us with the final code:

from collections import defaultdict
from heapq import heappop, heappush

def skyline(buildings):
    # Assemble a the critical values
    crit_points = defaultdict(list)
    for left, width, height in buildings:
        crit_points[left].append((-height, left + width))
        crit_points[left + width]

    # Calculate the skyline
    heights = []
    for position, buildings in sorted(crit_points.items()):
        for building in buildings:
            heappush(heights, building)

        while heights and heights[0][1] <= position:
            heappop(heights)

        if heights:
            height = -heights[0][0]
            if not skyline or height != skyline[-1][1]:
                yield position, height
        else:
            yield position, 0