Category Archives: Projects

Project Progress Update #4

This week I focused on revising my exhibit. Originally I had planned for a two-part exhibit but when I looked at it recently, it seemed liked I covered a lot of what I had planned to cover in the second half in the first half. I ended up revising the last section of the original exhibit to include some of the information and images that were meant for the second part. One of the additions was actually an animated map in gif form that I created with still images of maps (it was way easier than I thought it would be, thanks internet!). I also created a starting page for the exhibit that hopefully frames some of the issues, and also includes a most amazing ephemeral film about Detroit. I also focused on filling out the “related resources” page and did some digging around for open digital collections. I had already done some research to find blogs. Right now I’m also waffling on including books, too, mostly because I don’t want the list to be overwhelming. I also created an “About” page and wrote up a brief explanation. I played around a bit with the site’s appearance, but am pretty happy with it for the moment. It might be time for someone else to look at it, really.

Project Progress Update #3

This week I added a “Related Resources” page using the Simple Page plugin, which was very easy. I will also probably add an About page, so that users can have a brief, clear explanation as to what the site is for and what they can do on the site. I begin creating the list of related resources in a Google Doc. Right now it is primarily links to digital collections, both those specifically tied to Detroit and broader collections like DPLA (which actually links to some great Detroit materials). It would probably be good to have brief annotations for links like this, so I need to work on that. I’ve also gathered some blogs about the history of Detroit – there are several that include original research with primary materials. I’m planning on including books on the list, with links to their WorldCat entries, but am not sure about articles, since there are more of them, and the process of getting them is less straightforward than for books. I’ve not played around with the appearance of the site any more, so I need to work on that, and I also need to finish up the other half of the exhibition. Right now, I’m feeling pretty good because all of the plugins and interactive features are working.

Project Progress Update #2

This week I was inspired by the readings on place, particularly the Boyer and Marcus article about PhillyHistory.org (this would be really neat to do with a project like mine, too), and worked on installing the Omeka Geolocation plugin. This plugin was very straightforward, although I might end up playing around with the configuration to change the appearance. I added a location to an item on the admin side and it worked (an example) and it’s also working on the public site. I have not yet tested it from the contributor side. That’s one of my next steps, as is finishing the other half of the exhibit and continuing to play around with the options for contributors. I also need to build the related resources page/bookshelf (as OutHistory calls it). I also might play around more with the appearance of the site, as that’s something I enjoy doing.

Project Progress Update #1

This is my first “project progress update” post, and it’s late as I had guests last week and didn’t get much of anything done (in any context, sigh). I did set aside some time today to work on my site, though, and managed to make some progress. I’ve set up half of the exhibit, so today I focused on installing the plugins that will allow users to contribute materials. I followed the instructions and managed to get everything working (I created an account, viewed the submission form, looked at users from the admin side, etc.), which is pretty exciting. What I didn’t think about, though, was how to customize everything – What information do I want users to include with their contributions? Do I want to give them the option of creating a profile? What sorts of items do I want them to be able to submit? What are my site’s terms and conditions? There are so many different choices here, and while I did do some thinking about what options would be most appropriate for my site and users, I have a feeling that I will end up revising them in the very near future. This is both a challenge and a next step, as is finishing the other half of the exhibit, thinking about how to incorporate a list of additional resources, and figuring out how to add and configure the geolocation plugin.

Final Project Reflection

My final project, Topic Modeling Detroit, was rooted in my longstanding interest in the history and representation of Detroit. I had previously written about Detroit being represented as ruined in both newspaper articles and photography. The figuration of Detroit as ruined is bound up with a narrative that locates the beginning of its decline in the 1967 riots. That project focused on texts from the 1990s and 2000s, so for this project, I wanted to look at texts from before the riots.

In terms of methods, I was inspired by projects like Robots Reading Vogue, which seemed to take an exploratory, open-ended approach to a group of texts. I didn’t really have any preconceptions as to what I might find in texts about Detroit published prior to 1967, although I did have a broad sense of its history and the centrality of the automobile industry and labor unions to that history. I liked the idea of being able to engage with a group of texts on their own terms, without looking for something specific. Because the university I work at is a member of HathiTrust and the HathiTrust Research Center allows you to use Mallet, the same tool used in Robots Reading Vogue, on a set of texts, I decided to conduct my project within the HathiTrust Research Center. This meant I could only use public domain texts digitized by HathiTrust, but the corpus I was able to create was still quite large – over 2000 texts.

Because HathiTrust includes Library of Congress Subject Headings with each item record, I decided to create my corpus using a subject search for Detroit. Initially, I thought I might break up my corpus by publication date, but it turned out that a lot of the texts in my corpus did not have accurate publication dates. This was pretty surprising given how good HathiTrust metadata usually is. I instead tried different numbers of words and topics in the topic modeling algorithm, and my final project incorporates perspectives from each version. More topics generally leads to finer grained topics but also more noise. Fewer topics got at the big picture of what was in the corpus but subsumed some interesting distinctions within topics.

The feedback I got mostly indicated that I needed to explain the background of the project more fully, which isn’t surprising. This is a topic I’ve been thinking about for a long time, and I keep up on the scholarly literature on the topic (it helps that I buy books for the library for American history and studies). I also had to adjust the way I usually approach texts and textual analysis. I generally read the text first, let it sit in my head, and eventually come up with what I think the text is doing. This was a bit different, and it was (is) hard for me to think about this project as showing what the corpus is, what the texts in it are. This project is not making an argument about the texts, which is what I am accustomed to doing it; it is asking “what are these texts?” I’m still trying to wrap my head around this, because it is so different from what I generally do.

Now that I’ve looked at the results of the topic modeling several times (and made myself approach them differently), I think they’re actually pretty neat. I would want to combine these results with some closer reading of some of the texts, because to me, what texts do is the more interesting question, but being able to get at what they are (and more than 2000 of them, too) is kind of amazing.

Topic Modeling Detroit: First Draft

REVISED 12/6/15

[This is a draft/outline version of my final project, which uses topic modeling to analyze public domain books about Detroit in the HathiTrust Research Center.]

“Topic Modeling Detroit” seeks to perform a “distant reading” of public domain books about Detroit digitized by HathiTrust and available in the HathiTrust Research Center. While ultimately it would be ideal to bring this project into conversation with other work on representations of Detroit, this project is primarily exploratory and designed for me to familiarize myself with the textual analysis tools available through the HathiTrust Research Center.

The HathiTrust Research Center is available to universities that are members of the HathiTrust and is designed to “[enable] computational access for nonprofit and educational users to published works in the public domain.” What this means is that once you have created an account, you can create text corpora and then analyze those corpora with eleven different techniques SUPER EASILY. Francesca Giannetti has a very good primer on the HathiTrust Research Center, which also includes information about using the Data Capsule. The Data Capsule allows you to use in-copyright books, unlike the algorithms embedded in the HathiTrust Research Center.

Creating a workset was very easy, since I knew I wanted the items to have “Detroit” as a subject heading. Subject headings get at the topic of an item, unlike full-text searches (too broad) or title searches (too narrow). There are two interfaces for creating worksets (all images can be clicked on to embiggen):

workset-1 workset-2

The workset I created and used for analysis contained 2,364 books whose publication dates were prior to 1963. Although HathiTrust is very good about metadata like publication year, there were 244 titles whose publication date was indicated 1800, but spot checking indicates that that date is not entirely accurate. There were also 52 items with a publication date of “0000,” which is unpossible and 124 items with a publication date of “1000,” which seems unlikely. 2324 of the books were in English and 2248 were published in the United States. Each of these facets (there are a few others) can be used to limit your workset and each can be clicked on to see the list of items that have the selected characteristic.

facets

languages

Each item record is connected to the full view and full catalog record of the item in HathiTrust.

The topic modeling algorithm within the HTRC uses Mallet and allows you to select the number of “tokens” (words) and topics. In this project, I mostly played around with varying numbers of both, rather than limiting by years as I initially thought I might. As I mentioned earlier, the publication dates are incorrect for many items and it’s not possible to limit your search to a date range (years have to be entered individually and joined by Boolean connectors). Running the algorithm involves two clicks, naming the job, and deciding on the number of tokens and topics. It does take a day or two to return the results, as far as I can tell, and they are displayed within the browser, like so:

results

This means the word clouds are not able to be manipulated and the best way to capture them is with a screenshot. The results page also includes a text list of the most popular words.

I ran the topic modeling algorithm four times: 100 tokens/10 topics; 200 tokens/10 topics; 200 tokens/20 topics; 200 tokens/40 topics. Screenshots of the results for each run are below and include between one and three topics. This is due to the total number of topics and the way they are displayed on the results page. Differences in size between topics should be ignored, since the topics are the same size on the results page (that is, I just took screenshots and didn’t resize them). Also, each set of results had at least one topic that consisted of punctuation marks, diacritics, symbols, and other non-word content. I did not include those here.

100 tokens/10 topics

100tokens-10topics1 100tokens-10topics2 100tokens-10topics3 100tokens-10topics4

200 tokens/10 topics

200tokens-10topics-1 200tokens-10topics-2 200tokens-10topics-3

200 tokens/20 topics

200tokens-20topics1 200tokens-20topics2 200tokens-20topics3 200tokens-20topics4 200tokens-20topics5 200tokens-20topics6 200tokens-20topics7

200 tokens/40 topics

200tokens-40topics-1 200tokens-40topics-2 200tokens-40topics-3 200tokens-40topics-4 200tokens-40topics-5 200tokens-40topics-6 200tokens-40topics-7 200tokens-40topics-8 200tokens-40topics-9 200tokens-40topics-10 200tokens-40topics-11 200tokens-40topics-12 200tokens-40topics-13 200tokens-40topics-14

Analysis

The topic models created with 100 tokens and 10 topics and 200 tokens and 10 topics seem to resemble each other and also to be the most coherent set of topics. These models clearly identify topics or genres within the corpus. They are:

  • history
  • city government/administration/development/public projects
  • biography
  • education/schools
  • geography/maps

The topic model created with 200 tokens and 20 topics refines these categories somewhat and introduces related topics. The topics above are still present

  • 18th/19th century history
  • construction/building
  • medicine
  • libraries
  • accounting/budgets
  • cars

These are pretty interesting refinements/related topics (and we see the emergence of “cars,” which is of course what Detroit has been associated with throughout the twentieth century), but this topic model also introduces some noise. Two of the topics above are not meaningful, and I removed one consisting of symbols.

The topic model created with 200 tokens and 40 topics further refines the broad topics of the 10 topic models. It includes the following additional subtopics:

  • legal profession/court
  • public works
  • water
  • population/demography
  • books
  • government documents
  • engineering/math
  • church
  • car manufacturing

This topic model also reveals that some of the corpus is in French, although the words included in that topic are primarily stop words.

When I initially reviewed the four topic models, I was kind of disappointed, but in taking another look, they do reveal a fair amount about the items in the corpus, particularly the broad categories they fall into. Compared to a content analysis I did of newspaper articles about Detroit, this is obviously much broader and less detailed, but it could definitely help identify subsets of texts to engage in close reading. Using the HathiTrust Research Center was very easy and I can now show students or faculty how to build a workset and use the analysis tools embedded within HTRC. There are a few drawbacks, however, both specific to this project and more generally. Limiting to public domain texts means that only specific post-1924 texts are included, like government documents, which may overly influence the resulting topics. This is particularly significant with the subject of this specific project, which only really became significant in the twentieth century (I’m particularly interested in the period between 1920 and 1960, and that period is not well-represented in HTRC, or really in any digitized text corpus). I’m also very interested in change over time, so the lack of good publication year metadata for so many of these texts was really disappointing. I had hoped to be able to perhaps look at individual decades, even with the caveat I just mentioned. This could be addressed by manually looking at the catalog records for texts with years of 1000 or 0000, since for at least some of them, the publication date is in the title or text. This would be extremely time-consuming, though, and for uncertain results.

 

 

Social Media Strategy

This post is meant to outline my social media strategy for sharing my course project (which is a textual analysis of books about Detroit digitized by Hathitrust), but I do have to say that I don’t know if I want to share it as of right now. I ran the algorithms within Hathitrust and am not wild about the results; that is, I don’t think they actually reveal anything interesting, although I will take a closer look over the next week.

These are the social media strategies I developed for different, but sometimes overlapping, groups that I belong to (or sort of belong to, in the second instance). I chose the platforms I did because I already have a presence on those platforms and am connected to those groups via those platforms. For academic librarians and digital humanities librarians and scholars, Twitter does seem to be the preferred social media platform. I am more personally connected to Detroit scholars and activists, which is why I would try to reach them through Facebook. In all three cases, I would maybe consider asking someone prominent in those groups to boost the signal, as it were.

I did not include other social media platforms because I do not have a presence on them and frankly, it is a lot of work to build a presence and then to interact through numerous social media platforms. Some platforms – like Instagram – don’t seem particularly appropriate, since my project will incorporate a fair amount of text in addition to some images.

Social media strategies

Audience: Academic librarians

Platform: Twitter

Messages: The message for this group would probably be primarily an announcement, since it may or may not be something they’re actually professionally interested in.

Measure: I would measure the success of this via favorites, replies, and retweets and also possibly via analytics on my online portfolio.

___________________________________________

Audience: Digital humanities librarians and scholars

Platform: Twitter

Messages: The message for this group would be more about soliciting feedback and initiating discussion about the project.

Measure: I would measure the success of this through conversations (which could be replies on Twitter) and comments on the portfolio post.

___________________________________________

Audience: Detroit scholars/activists

Platform: Facebook

Messages: This would be a combination of the previous two messages: an announcement, but also looking for feedback about the project from the perspective of people less interested in digital humanities per se and more interested in Detroit.

Measure: I would measure the success of this primarily through conversations on Facebook and the portfolio post, but would also consider likes and shares on Facebook and analytics from my portfolio.