PostGIS marker culling

published 2020-03-29

When placing markers on maps there is a delicate balance between too many markers and too clumsy UX.

The most popular solution to this is marker clustering, where overlapping or visually clustered markers are collapsed into single nodes indicating the number of nodes in said cluster. Clicking the cluster node then opens up the cluster, possibly zooming in to the map to reveal the nodes underneath.

However, clustering is not a perfect solution. By showing the user too many options (or markers), you’re also subjecting them to a choice overload. Additionally, clustering the nodes is visibly hiding relevant information from the user, resulting in a subconscious annoyance and constant zooming back and forth.

Airbnb had a clever solution to this: selectively remove markers to keep the number of visible markers pleasantly low.

I’m calling the trick to achieve this marker culling (after face culling)

Marker culling in PostGIS

Here’s the full query, assuming we have a table places containing lonlat entries (of st_point type), we want to cluster the places into 10 clusters, and we only select top 5 places from each cluster.

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
), clustered AS (
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places
), culling_candidates AS (
  SELECT
    clustered.id, ROW_NUMBER() OVER (PARTITION BY cid)
  FROM clustered
) SELECT culling_candidates.id
FROM culling_candidates
WHERE row_number <= 5;

The SQL statement is split into multiple Common Table Expressions:

sparce_places

Pick the columns we will be using (lonlat for culling, id for results, count as a limit for kmeans later)

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
), clustered AS (
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places
), culling_candidates AS (
  SELECT
    clustered.id, ROW_NUMBER() OVER (PARTITION BY cid)
  FROM clustered
) SELECT culling_candidates.id
FROM culling_candidates
WHERE row_number <= 5;

clustered

Do the actual clustering using PostGIS’ ST_ClusterKMeans. In k-means clustering you provide the number of wanted clusters and each point is converged to a cluster index. Importantly, the number of clusters provided to the function must be lower or equal to the number of input rows, which is why we had to take the count in sparce_places.

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
), clustered AS (
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places
), culling_candidates AS (
  SELECT
    clustered.id, ROW_NUMBER() OVER (PARTITION BY cid)
  FROM clustered
) SELECT culling_candidates.id
FROM culling_candidates
WHERE row_number <= 5;

culling_candidates

Finally, we run a window function over the clusters assigning an index to each place within each cluster. If you want to implement ordered culling (e.g. prefer places with lower distance or cost to others when culling), you can add an ORDER BY clause within the OVER().

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
), clustered AS (
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places
), culling_candidates AS (
  SELECT
    clustered.id, ROW_NUMBER() OVER (PARTITION BY cid)
  FROM clustered
) SELECT culling_candidates.id
FROM culling_candidates
WHERE row_number <= 5;

main query

Finally, the main query returns places from culling_candidates with row_number less than 5 (namely, five or less places from each cluster).

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
), clustered AS (
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places
), culling_candidates AS (
  SELECT
    clustered.id, ROW_NUMBER() OVER (PARTITION BY cid)
  FROM clustered
) SELECT culling_candidates.id
FROM culling_candidates
WHERE row_number <= 5;

Here’s roughly what our clustering looks like

Having arbitrary data hidden is a UX issue

While this is a nice trick to make maps look less busy, it also arbitrarily hides some of the data that might otherwise be within user search filter parameters. This is a clear UX issue and should be dealt with in some capacity, if this way of hiding data is used.

One way would be to clearly highlight the total number of results and how many of them are not visible in the map view. Additionally, encouraging users to finetune search filters to be more exact may work well to have the user see more relevant results and reduce the need for marker culling in the first place.