Working with Voyant

Voyant is a tool that allows you to discover and visualize word frequencies and trends in word frequencies across a corpus of multiple documents. After you add text to Voyant, a dashboard appears. The word cloud is a visual representation of the most frequent words and can be modified with a stop word list so that only meaningful words are included. If you click on a word in the word cloud, a “Word Trends” window will open and show the frequency of that word across documents. The Word Trends graph can be revised to include only specific documents or specific words, while the “Words in Documents” window shows the a chart of the different documents, the raw and relative count of the word being graphed in the Word Trends window, and the mean relative counts across the corpus. In the Words in Documents window, you can also search for and graph specific words and even create a list of favorites to graph words against each other. There is an analogous window to the Words in Documents window, “Word in the Corpus,” that shows the raw count and the relative frequencies for a word across the different documents of the corpus. This window can also be searched and used to create a list of favorites. In both “Words” windows, you can add additional columns of statistical information (mean, standard deviation, etc.) to the chart.

Between the Word Trends and Words in Documents windows is a window called “Keywords in Context.” This lets you briefly see how the word/s you’re graphing appear in the actual text. The “Corpus Reader” window, in the middle of the dashboard plays a similar role, but shows more of the surrounding text. The “Summary” window provides, as it indicates, an overview of word frequency information about the text corpus. It provides a total count of words and a count of unique words, the longest documents, the documents with the highest vocabulary density; the most frequent words and words with notable peaks in frequency; and distinctive words for each document.

In terms of what this tool allows you to discover, I would perhaps rephrase that as this tool allows to investigate or interrogate a corpus of texts; you might not actually discover anything. In trying to complete the activities, I had several ideas that didn’t really go anywhere, that didn’t really reveal anything about the texts. It was only when I explicitly used the Musher article and my (admittedly limited) contextual knowledge to think about what sorts of directions might be interesting or what sorts of questions might be answered by the texts that I got any meaningful results (and they are debatably meaningful). It’s important not to be limited by presuppositions about the texts, as seen in the Robots Reading Vogue project – who knew Vogue covered art and health as much as this project revealed – but having some context was important for me when I used Voyant to analyze the WPA Slave Narratives. Gibbs and Cohen echo this: “Prospecting a large textual corpus in this way assumes that one already knows the context of one’s queries, at least in part” (74). Having that context was also important in, for example, understanding that in my first pass, the list of distinctive words was almost entirely renderings of what were primarily stop words in dialect. This led me to add that list of stop words globally in order to reveal a more meaningful list of distinctive words. However, despite this more meaningful list of words and some sense of the context, in the activity that asked us to look at distinctive words and compare them, I still felt a bit adrift and ended up redoing the activity several times. One of the drawbacks with Voyant, I think, is that it doesn’t enable open-ended queries as much as something like the topic modeling done in the Robots Reading Vogue or Signs@40 projects.  
Voyant does enable “distant reading” through its statistical analyses and visualizations of words within a corpus, but a significant benefit of it is that it also enables close reading by allowing you to move between statistical charts and visualizations and the context of specific words in specific documents. This is important due to the slipperiness of language – we likely have presuppositions as to how and why words are being used, and close reading forces us to look at the specific contexts and examine those presuppositions. It’s not entirely related, but I really like Underwood’s point that search and discovery processes need to be articulated and theorized and think that an emphasis on specificity and context in conjunction with the sorts of statistical analyses afforded by Voyant do some of that work. In some ways, a tool like Voyant also forces us to remove some of our suppositions by revealing that no, that word isn’t important, but it can also enable the sort of fishing expeditions that Underwood discusses. Randomly, but entirely appropriately, when I was typing this up in Google Docs and was trying to link to Underwood’s article, Google did a search and suggested a link to an article about the country singer Carrie Underwood. So yes, algorithms have biases and context matters.

Leave a Reply

Your email address will not be published. Required fields are marked *