Category Archives: Guides

What Can You Do with Crowdsourced Digitization?

In answer to the question in the title of this blog post, you can do a whole lot with crowdsourced digitization. Members of the public can transcribe manuscripts and archival materials (as in the Papers of the War Department and Transcribe Bentham), correct incorrect OCR (as in Trove newspapers), and verify shapes and colors on historical maps (as in Building Inspector). They can also do thing like add tags and comments to materials, which helps make them more findable to other users. Trove offers this to members of the public and so do other projects such as Flickr Commons, which I use a lot for historical photographs.

The types of projects and tasks that seem likely to attract contributors are those that appeal to their interests. In the case of Trove, primary contributors are most interested in family and local history and genealogy. In the case of Transcribe Bentham, frequent contributors were interested in Bentham or history and philosophy more broadly. Main contributors to the Papers of the War Department were similarly interested in American history. These tasks and projects also let contributors feel like they are giving back, that they are contributing to something larger and possibly of historical significance.Building Inspector is somewhat different; it seems more like the sort of task that contributors would do while standing in line or waiting for the bus (and since it’s optimized for mobile devices, I imagine they were). Because the New York Public Library is asking for help, though, I suspect that it would still be seen as altruistic or as helping out with a larger, more important project, similar to the ways in which these other projects are perceived by contributors to them.

Based on my experiences contributing to the Papers of the War Department and Trove, having a wysiwyg and easy-to-use interface is crucial. This is particularly true of the Papers of the War Department, since I had to expend a significant amount of brainpower on reading eighteenth century handwriting. Essentially, the interface can’t stand in the way of the contributor. In terms of community building, it does seem to be helpful to have some sort of community, although that can manifest in different ways. The Trove forums seem to be quite active and a good resource if you’re not quite sure what you’re doing. The Papers of the War Department has basically a conversations tab for each document, on which you can ask questions about the item you’re transcribing. The community of Transcribe Bentham used to be moderated, which was extremely effective but also labor-intensive; now there is a scoreboard, which I’m guessing does some of the same community-building work, but to a lesser degree. The community around Building Inspector is more implied – the same images are shown to three people – but it’s reassuring, as it lets you know that you won’t ruin something.

There is one aspect of crowdsourced digitization that hasn’t come up, and that is its labor politics. Several project creators/managers indicated that their motivation for crowdsourcing transcription and other work is because their institutions will never have the ability to pay for that labor. I certainly don’t blame organizations for using crowdsourced labor (yay austerity), but I do sometimes (particularly as a member of the information professions) wonder about how/if crowdsourced digitization replaces the creation of finding aids for manuscript collections or of catalog records and metadata for almost any item. Not everyone appreciates metadata, and even among librarians I frequently hear about how we don’t need metadata when everything is full-text searchable. This makes me want to bang my head on the wall, since metadata searching can be sooooooo much easier and more effective. Using unpaid labor – often interns – is also endemic to libraries, museums, and archives, and even full-time labor is often underpaid and undervalued, as these are historically feminized positions that involve soft skills and emotional labor.     

How to Read a Wikipedia Article

Reading a Wikipedia article is fairly straightforward in one sense. In order to see the changes made to the article and who has made those changes, you can click on the “View History” tab. Some users will have profile pages, while others (as in the case of the Digital Humanities article) use their real names, which you can then search. Some profile pages include real names, credentials, and institutional affiliations. Other changes, though, will have been made by users only marked by an IP address or by an unsearchable pseudonym. Some changes will include notes as to why those changes were made, but there is also a “Talk” tab where you can see discussions about the article. Most Wikipedia articles also include a list of references, which serve the same purpose as they do with any other book or article; they show the sources used to create the article and allow the reader to find and read those sources herself. These elements emphasize the transparency that Wikipedia cultivates in the creation of articles.

What is not necessarily transparent, but should be kept in mind when reading Wikipedia articles, is what both Rosenzweig’s and Auerbach’s articles emphasize: the social context in which Wikipedia operates. Rosenzweig notes the demographics of Wikipedia writers and editors (and Wikipedia’s corresponding “geek priorities”), localism,  avoidance of controversy, and emphasis on conventional wisdom. Auerbach discusses the organizational culture of Wikipedia’s editors, including its problematic gender politics. Of course, I would argue (and try to convey to my students) that it’s important to be aware of the social context of any text, since that context should inform both the selection and use of that text. I really don’t like to think of “assessing” any text, including a Wikipedia article, outside of how I plan on using it, so providing a general guide on doing this for Wikipedia is difficult. I primarily use Wikipedia when I quickly want to know something and the stakes for knowing it aren’t high – when I want to know who was in a movie or what year it came out, for example, or for a quick and dirty definition of something like digital humanities. If I was writing an article on that film, though, I would track down a different source, but that has less to do with the quality or trustworthiness of the Wikipedia article than with the conventions of academic writing and publishing.

I do use Wikipedia frequently for succinct explanations that will work in the moment and not much else because it is fundamentally an encyclopedia, as Rosenzweig notes (I really like McHenry’s “blandness of mere information, which Rosenzweig cites). This is apparent in the digital humanities wikipedia article, which glosses over complexity and disagreement and contestation to produce a view of digital humanities that is more or less coherent, but lacks depth. Rosenzweig ties this to ideas of objectivity and neutrality, ban on original research, and heavy emphasis on citing “published” sources. I think this leads to sentences like this one from the digital humanities article: “The definition of the “digital humanities” is something that is being continually formulated by scholars and practitioners; they ask questions and demonstrate through projects and collaborations with others.” If I was grading this paper, I would probably write “vague” next to this sentence.

If Wikipedia articles tend to be too shallow in some situations, the references and links are almost always valuable, some of which is undoubtedly attributable to Wikipedia’s emphasis on “published” sources. Linkypedia showed that many Wikipedia articles on economics and labor link to pages from the U.S. Bureau of Labor Statistics and frankly, it’s usually much easier to find the Wikipedia page than any given U.S. government website or document. The Galloway and DellaCorte article discussed how libraries, museums, and archives are and have been adding links to digitized materials, finding aids, and other resources to Wikipedia. Even if the article on, say, the Pittsburgh Courier is not ultimately detailed enough for my purposes, the links to the University of Pittsburgh’s collections would still be valuable. In the digital humanities article, the links to centers, resources, related entries, references, and bibliography would help me move beyond the somewhat superficial article, if I needed to (the list of references can also help with this, but in many Wikipedia articles – I think the digital humanities article is an exception rather than the rule – the sources cited tend to be things that can be easily found and accessed. No recent journal articles, because those are frequently behind paywalls. Not a lot of monographs, unless they’re out of copyright. And so on). I’ve bolded if I needed to, because it’s key: Wikipedia articles should be read and used with both the context of their creation and the context of their ultimate use in mind.