My final project, Topic Modeling Detroit, was rooted in my longstanding interest in the history and representation of Detroit. I had previously written about Detroit being represented as ruined in both newspaper articles and photography. The figuration of Detroit as ruined is bound up with a narrative that locates the beginning of its decline in the 1967 riots. That project focused on texts from the 1990s and 2000s, so for this project, I wanted to look at texts from before the riots.
In terms of methods, I was inspired by projects like Robots Reading Vogue, which seemed to take an exploratory, open-ended approach to a group of texts. I didn’t really have any preconceptions as to what I might find in texts about Detroit published prior to 1967, although I did have a broad sense of its history and the centrality of the automobile industry and labor unions to that history. I liked the idea of being able to engage with a group of texts on their own terms, without looking for something specific. Because the university I work at is a member of HathiTrust and the HathiTrust Research Center allows you to use Mallet, the same tool used in Robots Reading Vogue, on a set of texts, I decided to conduct my project within the HathiTrust Research Center. This meant I could only use public domain texts digitized by HathiTrust, but the corpus I was able to create was still quite large – over 2000 texts.
Because HathiTrust includes Library of Congress Subject Headings with each item record, I decided to create my corpus using a subject search for Detroit. Initially, I thought I might break up my corpus by publication date, but it turned out that a lot of the texts in my corpus did not have accurate publication dates. This was pretty surprising given how good HathiTrust metadata usually is. I instead tried different numbers of words and topics in the topic modeling algorithm, and my final project incorporates perspectives from each version. More topics generally leads to finer grained topics but also more noise. Fewer topics got at the big picture of what was in the corpus but subsumed some interesting distinctions within topics.
The feedback I got mostly indicated that I needed to explain the background of the project more fully, which isn’t surprising. This is a topic I’ve been thinking about for a long time, and I keep up on the scholarly literature on the topic (it helps that I buy books for the library for American history and studies). I also had to adjust the way I usually approach texts and textual analysis. I generally read the text first, let it sit in my head, and eventually come up with what I think the text is doing. This was a bit different, and it was (is) hard for me to think about this project as showing what the corpus is, what the texts in it are. This project is not making an argument about the texts, which is what I am accustomed to doing it; it is asking “what are these texts?” I’m still trying to wrap my head around this, because it is so different from what I generally do.
Now that I’ve looked at the results of the topic modeling several times (and made myself approach them differently), I think they’re actually pretty neat. I would want to combine these results with some closer reading of some of the texts, because to me, what texts do is the more interesting question, but being able to get at what they are (and more than 2000 of them, too) is kind of amazing.