This is my third blog post about my Smithsonian Libraries internship. I have continued to work with the artists’ names spreadsheet, which consists of about 85,000 names. The organizational names are ready to be loaded, but we’ve been discovering weird things in the dataset, like dates with no names and random blank lines. The reconciliation program more or less ignores these, fortunately, but it does stand out when I’m scanning the data. I’ve also been using the newer version of Refine Viaf, which is called Conciliator and allows more targeted reconciliation – with ORCID numbers, or just VIAF records from the Library of Congress.
I have also hand reconciled a fair amount of records in this dataset, and it is making me think more about the imprecision of language and how that plays out in text searches. And the ongoing importance of human labor because of those shortcomings. I am able to quickly reconcile a lot of artists because the birth year and death year is in the VIAF record, but not in the title of the record that Conciliator is using to reconcile. People with common names are conflated both within VIAF and in the reconciliation of the artists’ names; sometimes I am not certain whether the VIAF record and the artist are the same person (usually due to the types of works they’re associated with), but the name is good enough for Conciliator. There is a good deal of uncertainty in this process that can’t be removed, and it is inevitable that there will be mistakes, but these aspects are hard to wrap our heads around, I think, because we expect less squishyness and more clarity when we interact with technology. I spent the last year or so reading critiques of technology and technological determinism for a writing project, and when working with these datasets, it’s very apparent that humans have had their grubby little hands all over, because there is so much variation, even though the information seems incredibly straightforward: name and life dates.
The next piece I will be working on is writing up best practices/lessons learned from working with Open Refine, Refine VIAF, and Conciliator. I have also been thinking about ways to get better results, generally by slicing up the data, or using a specific set of records, like LC or Getty. And then trying to figure out where to go from here.
I did do a map of artist nationalities (only about 5000 entries, but still neat):