This weekend, on a whim, I pursued a little side project that had been kicking around in my head for a while. Perhaps you’ve seen Marty Elmer’s very fine Laconic History of the World, in which each country is represented by the most common word from the Wikipedia article on its history.
It’s pretty fun, and I wanted to have a look at doing something similar myself. As per usual, I started in Michigan. Here’s a quick map with perfectly ordinary labels.
And here’s a version in which every name has been replaced with a word that appears uncommonly often in that feature’s corresponding Wikipedia article.
I ran these 38 Wikipedia articles through a Python script that calculated the TF-IDF score for each word in each article: basically, it tells me which words appear with unusual frequency in that particular article. So, if we’re talking about Grand Rapids, the word “furniture” appears a lot in that article, but not so much elsewhere, so it gets a high TF-IDF score. The word “city,” on the other hand, might appear a lot in that article, but it also appears a lot in other articles, so it gets a low score. So, TF-IDF is a neat way to figure out what words are unique in an article.
After running these articles through the TF-IDF script, I picked the highest-scoring word that wasn’t the name of the city/river/etc., or its enclosing county (or state).
Some interesting things to notice in the names that ended up on the map.
- For many cities, the unique words ended up being the names of various locally important people. Titus Bronson founded Kalamazoo, and there’s a park and hospital named after him, so his name comes up often in the article. Likewise, James Jesse Strang founded a kingdom on Beaver island.
- In other cases, nearby geographic/political features were the top score, such as the Georgian Bay for Lake Huron, or the nearby city of Saginaw for Bay City.
- Local industries show up in a couple of cases. Grand Rapids has long been a furniture capital, while there’s many a winery around Traverse City.
- Wikipedia article idiosyncrasies account for some of these. One dedicated Wikipedian listed all the churches in Ironwood. Sault Ste. Marie’s article lists its various TV/radio stations, many of which happen to offer rebroadcasts of various other stations.
- Lake Superior, of course, gets renamed for the tragic loss of the Edmund Fitzgerald.
If you’re not as familiar with the names of nearby geographic features, or locally important families in some of these cities, this map might be less interesting. So, let’s make another pass, in which we skip over names of people, ships, or nearby geographic/political features (lakes, cities, counties, etc.).
This one is equally interesting, in a different way.
- A few cities have unfortunately needed emergency financial managers.
- The Muskegon article lists not only major companies, but what they were formerly called.
- The river articles contain a lot of generic words that still get high TF-IDF scores because they don’t show up a lot in the other articles for cities and lakes. So, if we skip names of nearby cities/counties, we still end up with words like flows, or downstream sometimes. Though important of logging in the history of the Muskegon River now pops up.
Anyway, this was a fun little project that took me way longer than it should have. Mostly because I was trying to be too perfectionist. I was originally trying to control for word pluralization, parts of speech, etc., and eventually I gave up after a few hours of that and decided that this fun project clearly didn’t need that level of obsession.
Update: I decided to toss in Wisconsin, too!