As befits a history class, for this post I went back and looked at the portfolio I prepared in May 2016 for this internship (I can’t believe that was a year ago). The first items I have listed that I want to learn are about working with data, and indeed, I did learn a good amount about how to clean and manipulate data in both Excel and (primarily) OpenRefine. I would like to build on what I’ve learned and have been looking at courses on Excel and data science, especially since I keep seeing humanities data curation and management in librarian job descriptions. This internship confirmed that this is an area I would like to do more with professionally, and I very much enjoyed trying to figure out how best to enhance records. As a public services librarian, I am often frustrated by incomplete records and uninformative metadata, so it was fun being on the other side and drawing on my own experiences working with faculty and students in thinking what might be most useful to the user.
One of the drawbacks of the internship, and perhaps this would be true with any internship at any organization as large as the Smithsonian Institution, is that the project felt like a very small, very narrow, very specific thing. The work I did still has to go through layers of approval before it’s public, and that can be a little frustrating. Going forward, the program might want to consider internships with smaller institutions that are likely to be underfunded and have fewer volunteers or interns. It’s definitely impressive to work with the Smithsonian, but it can also be limiting in terms of the type of work you’re able to do. As a local, it was great to be able to visit and meet the people I was working with (and whose data I was working with).
The internship filled in a gap I felt in the coursework – preparing all of that lovely data for digital humanities tools – and the coursework helped me figure out how to use the data I had access to in maps and timelines. The coursework on user needs also helped me think through both the possibilities of and problems with linked data. As a librarian, I was already pretty familiar with metadata, and very happy that so much of our coursework emphasized the importance of it, and the internship work reiterated the value of clean and complete as possible metadata. I keep returning to Sam Wineburg’s notion of the “jagged edges” of history, and what also struck me in this project is the fundamental unknowability of some things. Is this author in the Smithsonian Digital Library actually the same person as in this VIAF authority record? Sometimes it is just not possible to tell based on the information we have. Another idea I keep returning to is the labor that is behind digital public humanities work, especially in regards to things that are less visible, like the creation of metadata or linking data. Relying on OpenRefine or a similar technology to automatically match names will, at this moment, likely result in unacceptable amounts of errors, unless the data is fairly complete. It takes a lot of time and effort to figure out what can’t be automated and then to do that work manually. It also takes human judgement (and often additional research) to make decisions in a lot of cases.