### Task

Take a look at the following table.San Francisco | Sacramento | Los Angeles | Las Vegas | |
---|---|---|---|---|

San Francisco | - | 121 km | 559 km | 670 km |

Sacramento | 121 km | - | 582 km | 622 km |

Los Angeles | 559 km | 582 km | - | 361 km |

Las Vegas | 670 km | 622 km | 361 km | - |

San Francisco | Sacramento | Los Angeles | Las Vegas | |
---|---|---|---|---|

San Francisco | - | 1 h 50 min | 7 h 18 min | 17 h 7 min |

Sacramento | 2 h 12 min | - | 11 h 30 min | 16 h 51 min |

Los Angeles | 7 h 40 min | 10 h 0 min | - | 6 h 14 min |

Las Vegas | 14 h 45 min | 18 h 28 min | 6 h 5 min | - |

The similarity of both color maps shows that, generally, the travel time is proportional to the distance between two cities. However, there are small irregularities: the distance between San Francisco and Los Angeles is roughly the same as the distance between San Francisco and Las Vegas. But the time needed to travel to Las Vegas is more than twice the time needed to travel to Los Angeles from San Francisco. So either San Francisco and Los Angeles are exceptionally well-connected or San Francisco and Las Vegas exceptionally badly.

Overall, we see that the distance between two cities does not necessarily correlate with their proximity by public transport. So, if we try to visualize, how well cities are connected to each other, their actual geolocation may not be representative. A better solution is to take the table of travel times and calculate a set of coordinates so that the distance between those coordinates resembles the travel time as best as possible. I will describe the required steps in detail below. But first, you can take a look at the result:

### Result

I created a small tool to visualize the results of this post. In the upper left corner of the map below, you will find a search box. Select four to ten cities, which are reachable by public transportation. In the left / upper map, you will see their actual locations. The right / lower map will display the cities in a way that well-connectedâ€‹ cities are closer to each other.**Distances:**

**Travel durations by public transport (next Monday, 12 pm):**

### Solution

**Given:**The cities' geolocations and a matrix of travel times between their central stations.

##### Multidimensional Scaling

We start with the matrix of travel times. Given a matrix of distances / similarities between points, it is possible to calculate coordinates for each point that depict the similarities between the points when drawn onto a coordinate system. This process is called Multidimensional Scaling. However, the distance between two points cannot always be equal to the value given by the matrix. Imagine a matrix with small entries for A-B and B-C but a large entry for A-C. How can you draw A,B and C into a coordinate system without breaking the constraints given by the matrix?

Generally, we draw the points into Euclidean space, the usual two-dimensional space of a coordinate system (see notes for further details on why we use Euclidean space). But if the distances given by the matrix of travel times do not strictly follow Euclidean rules, like in the example above, it is impossible to draw points in a coordinate system so that their distances match the entries of the matrix. Instead, we need to draw points that **approximate** the given travel times.

Ben Frederickson describes in his blog post, how you can use the Singular Value Decomposition of the squared distance matrix to obtain a set of points with optimal coordinates. The coordinates will be optimal in a way that the sum of all squared errors is minimal. The error of a pair of points is defined as the difference of their distance given by the matrix and the distance in the obtained set of points. Given a matrix \(A\) and the set \(P\) with \(N\) points, the overall error \(e\) is defined as $$e = \sum _{i=1}^{N}{~} \sum _{j=1, j\ne i}^{N}{~}(a_{i,j}~-~dist(p_i,~p_j))^2$$ with \(dist(p_i, p_j)\) being the Euclidean distance between a pair of points. This error is minimized when using Multidimensional Scaling. As said above, it will be zero, if the given matrix entries follow Euclidean geometry rules. We, however, deal with travel times, which are not strictly proportional to the distances between cities, since trains do not always go the same speed and not every city is directly connected to all others. So we obtain a set of points, whose distances approximate the given travel times.

##### Transforming the points to match the geolocations

The points obtained by Multidimensional Scaling will not match the geolocations of the cities. This is due to two facts:

- The travel times are given in seconds, while the distances between cities are measured in meters. As these are two completely different scales, the points' coordinates first have to be scaled to geo-coordinates.
- Rotating or moving the set of points does not change the distances between its points. So, even if we scale the points to geo-coordinates, they may be upside down or at a completely different place on the earth. But since the set's inner distances do not change, it is still the optimal solution for approximating the matrix of travel times.

**rotate, scale**and

**move**the set of points in order to get meaningful results.

First, we will rotate and scale the points so that they match the actual geo-coordinates as closely as possible. Note that their relative distances will be preserved, so the resulting set of points is still the best approximation for the travel times. We will move the points afterwards, so before rotating and scaling them, we center both the set of points and the set of geo-coordinates. Thus, their centroids will be \((0,0)\). Since we have 2-dimensional points \(p = (p_0, p_1)\), we can rotate and scale them by applying a 2x2 matrix \(R\) to each point. Given a Nx2 matrix \(P\) containing all points, a Nx2 matrix \(G\) containing all geo-coordinates and the rotation matrix \(R\), we transform \(P\) so that $$P R = G.$$

##### How do we find R?

We can transform the equation above to $$R = (P^T P)^{-1} P^T G.$$ Now we have the matrix that scales and rotates the points obtained by MDS so that they match the geo-coordinates as closely as possible. To**move**the points to the right place, we simply calculate the centroid of the geo-coordinates and add it to each point in our set. The resulting points are depicted in the right map above.

### Notes

**Transit vs. Driving:**By a small modification of the JavaScript code, you can also calculate the driving durations. I tried it out and the resulting map did only differ slightly from the real map. So, driving durations seem to be quite proportional to distances between cities.**Euclidean space:**When using MDS, we map the matrix of travel times to a set of two-dimensional points. This is incorrect, as we should rather map them onto a sphere so that their great-circle distance is minimized. But as long as we are not dealing with polar trains, the introduced imprecision should be negligible.**Credits:**to Ben Frederickson for providing insights and the JavaScript code used for Multidimensional Scaling.

### Like what you read?

I don't use Google Analytics or Disqus because they require cookies. I would still like to know, which posts are popular, so if you liked this post you can let me know here.