We’ve had a couple of meetings with Karen Smith-Yoshimura and Thomas Hickey, of the Scholars Contributions to VIAF group, to discuss possible collaborations, exchange of information, and mutual benefits of sharing standards between the SNAP:DRGN project and VIAF (the Virtual International Authority File, a federated authority list of persons from library catalogs, mostly from author or subject fields).
We considered two main questions:
- What can SNAP:DRGN gain from VIAF data or formats? Most concretely, what subset of VIAF person-records, and what fields in them, should we consider ingesting into the SNAP graph?
- How can VIAF benefit from SNAP:DRGN’s work in this area? In what ways can SNAP provide data and information that might be passed back to VIAF for inclusion in the authority file?
Preliminary answers and thoughts below.
1. What can SNAP get from VIAF?
Looking at the VIAF data model, we decided that in most cases the only categories of information we would get from them would be, (a) a URI, (b) a name. There was some discussion as to whether we could sometimes extract from the data (c) some alternative name forms (e.g. in Greek, by searching for Unicode codepoint range), (d) a date, whcih is present in some name strings; (e) an associated place, which is present in some name strings. VIAF records that come in via Wikipedia or Wikidata would also give us (f) alternative ID, in the form of a Wikipedia/Dbpedia uri. We didn’t think that the LAWD “attestation/citation” categories were appropriate for modelling the information about books with these persons as authors or subjects, although that is the most useful information that one would get by going back into the VIAF data from the SNAP graph.
We discussed what subset of the VIAF dataset would be of interest to model in SNAP, and after a few experiments with filtering by date (which is not always given), language (not always given), and contributing collection, we settled on a preliminary export of persons who matched:
- birth OR death date present, and before 1000 C.E.
- AND any one or more of
- language: Latin
- OR language: Ancient Greek
- OR collection: Perseus Catalog.
Which gives us a small corpus of 1,781 ancient authors to experiment with.
2. What can SNAP give to VIAF?
VIAF is an authority list of authors, artists, other creators, and people important enough to have a book (or at least a chapter) written about them, so they won’t be interested in hundreds of thousands of names of Greeks and Romans about whom all we know is their gravestones, contracts they signed, or graffiti they left on a theater wall. In order to flag a subset of persons whom VIAF might be interested in importing from the SNAP graph, we are proposing to add a new property to the SNAP ontology: associatedRole. This would allow us to flag poets, historians, authors, potters, sculptors, actors, performers etc., whom VIAF would include in their authority file, even if no works by these people survive. We’ll consider doing so in a later revision of the Cookbook, since version 1.0 is now locked down.
Other ways in which the SNAP dataset may be of value to VIAf is through the connections that we make between databases by coreferencing and disambiguating unique individuals. If we have a VIAF record for a person, but that person is also in the British Museum person thesaurus, the Trismegistos author table, LGPN and/or PBW, then variant names, dates, citations, alternate identifiers and other information from these databases might enrich the VIAF data on these records, and could be automatically ingested via the linked data we produce.
Many of the issues discussed above will also come up when we speak to other potential data partners about linking up SNAP records with their data, so it was great to have this preliminary conversation.