Category Archives: data modeling

Cookbook, data modeling, partner datasets

Recommendations for EDH person-data RDF

July 3, 2017 Gabriel Bodard Leave a comment

At the first meeting of the Open Epigraphic Data Unconference (OEDUc) in London in May 2017, one of the working groups that met in the afternoon (and claim to have completed our brief, so do not propose to meet again) examined the person-data offered for download on the EDH open data repository, and made some recommendations for making this data more compatible with the SNAP:DRGN guidelines.

Currently, the RDF of a person-record in the EDH data (in TTL format) looks like:

<http://edh-www.adw.uni-heidelberg.de/edh/person/HD000001/1>
    a lawd:Person ;
    lawd:PersonalName "Nonia Optata"@lat ;
    gndo:gender <http://d-nb.info/standards/vocab/gnd/gender#female> ;
    nmo:hasStartDate "0071" ;
    nmo:hasEndDate "0130" ;
    snap:associatedPlace <http://edh-www.adw.uni-heidelberg.de/edh/geographie/11843> ,
        <http://pleiades.stoa.org/places/432808#this> ;
    lawd:hasAttestation <http://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD000001> .

We identified a few problems with this data structure, and made recommendations as follows.

We propose that EDH split the current person references in edh_people.ttl into: (a) one lawd:Person, which has the properties for name, gender, status, membership, and hasAttestation, and (b) one lawd:PersonAttestation, which has properties dct:Source (which points to the URI for the inscription itself) and lawd:Citation. Date and location etc. can then be derived from the inscription (which is where they belong).
A few observations:
1. Lawd:PersonalName is a class, not a property. The recommended property for a personal name as a string is foaf:name
2. the language tag for Latin should be @la (not lat)
3. there are currently thousands of empty strings tagged as Greek
4. Nomisma date properties cannot be used on person, because the definition is inappropriate (and unclear)
5. As documented, Nomisma date properties refer only to numismatic dates, not epigraphic (I would request a modification to their documentation for this)
6. the D-N.B ontology for gender is inadequate (which is partly why SNAP has avoided tagging gender so far); a better ontology may be found, but I would suggest plain text values for now
7. to the person record, above, we could then add dct:identifier with the PIR number (and compare discussion of plans for disambiguation of PIR persons in another working group)

data modeling, partner datasets, RDF, Uncategorized

State of the Snap-Nation

November 19, 2014 Faith Lawrence Leave a comment

With the end of the pilot project scarily in sight it is time to review where we are and where we hope to be by the end of December.

The big news is that (hopefully) the first set of SNAP identifiers are now frozen!

What this means is that for the first 5 datasets have now been ingested and had SNAP identifiers linked to each of the persons and those identifiers are fixed. There may still be a few tweaks to the RDF descriptive data coming in from the projects but the identifiers will remain the same. Continue reading State of the Snap-Nation →

co-reference analysis, data modeling, partner datasets

SNAP and VIAF

August 20, 2014 Gabriel Bodard Leave a comment

We’ve had a couple of meetings with Karen Smith-Yoshimura and Thomas Hickey, of the Scholars Contributions to VIAF group, to discuss possible collaborations, exchange of information, and mutual benefits of sharing standards between the SNAP:DRGN project and VIAF (the Virtual International Authority File, a federated authority list of persons from library catalogs, mostly from author or subject fields).

We considered two main questions: Continue reading SNAP and VIAF →

api, data modeling, Workshop

Looking Towards an API for SNAP:DRGN

June 15, 2014 Faith Lawrence 1 Comment

During the first SNAP:DRGN workshop a breakout group was convened to discuss the potential API for the project. Rather than come up with a specific API during that session, we instead focused on creating a “wish list” of applications and functions that we wanted to support. We were then able to abstract the functions that would be needed to support the list. Continue reading Looking Towards an API for SNAP:DRGN →

data modeling, RDF

You Aren’t Gonna Need It

May 22, 2014 Hugh Cayless 1 Comment

We’ve been discussing lately how to merge person records in SNAP, so that when we encounter partner projects that each have a record for the same person, SNAP can provide a useful service by combining those into single, merged records, and we can start to get an idea of the requirements for performing operations like merges on our data. This discussion has proved something of a rabbit hole. Continue reading You Aren’t Gonna Need It →

data modeling, ontology, RDF

The Old Classes vs Properties Debate (or Relationships Are Hard, Part 2)

May 13, 2014 Faith Lawrence 1 Comment

One of the decisions that has to be made when creating an ontology is which concepts you encode as classes and which you encode as properties of those classes. One of the difficulties is that there is no overarching ‘right answer’ (although there are wrong ones) to how you should model your domain, in has to be decided on a case-by-case basis of what works best for the type of world view that you are trying to encapsulate within your model. This post is a request for feedback to help us decide which model works best for both the project and the wider community. Continue reading The Old Classes vs Properties Debate (or Relationships Are Hard, Part 2) →

data modeling

Why SNAP IDs?

May 9, 2014 Hugh Cayless Leave a comment

One question that came up during the workshop a couple of weeks ago was: if partner projects already assign their own URIs/ids to their person/name/etc. records, then why should SNAP assign its own identifiers? There are two answers to that, one very practical, and the other a bit more philosophical.

SNAP IDs will be URIs themselves, and when dereferenced in a browser, or by an application, will return a result. Either a web page listing what SNAP knows about the record in question, or RDF data about it. We can’t do this in a practical way without assigning our own identifiers.
On a more theoretical level, we think that any updates made to data post-ingest shouldn’t be made directly on our partners’ data. We believe, for example, that while SNAP might assert an identity between two person records coming from two partner datasets, it will be up to the partners whether they accept that identification.

Continue reading Why SNAP IDs? →

data modeling, press release, prosopography, Workshop

SNAP at Digital Humanities 2014

April 25, 2014 Faith Lawrence Leave a comment

The SNAP Project is proud to announce the Ontologies for Prosopography: Who’s Who? or, Who was Who? one-day workshop developed in conjunction with the People of the Founding Era project based at the University of Virginia. The workshop will give the opportunity for SNAP to present our data model to a wider audience and engage with the researchers working on similar problems other periods and geographic areas. Continue reading SNAP at Digital Humanities 2014 →

data modeling, NER

Named Entity Recognition and SNAP

April 14, 2014 Mark Depauw Leave a comment

One of the other breakout sessions of the SNAP workshop dealt with Named Entity Recognition. One can wonder whether setting up a Named Entity Recognition procedure from scratch is worth the effort for an after all limited and finite set of full text documents. The experience of Trismegistos People has shown the answer is definitively YES. Continue reading Named Entity Recognition and SNAP →

data modeling, ontology, prosopography, RDF

Tensions

March 10, 2014 Hugh Cayless 1 Comment

Last week, Faith gave a great overview of some of the issues involved in describing the relationships between people. This week, I’m going to come at the problem from the other side, looking at what data we have, and how SNAP plans to represent them.

Our initial datasets include Trismegistos People (TM, described by Mark), the Lexicon of Greek Personal Names (LGPN, described by Sebastian), and a set of names (article headwords) from the Prosopographia Imperii Romani, 2nd edition (PIR²) put together by Tom Elliott. TM has web pages that document the references to names and people found in papyri, many of which are hosted at Papyri.info, as well as resources describing the names and person; references, names, and persons all have unique identifiers. LGPN comes at the problem of modeling people from a different angle. They start with persons and add names and references; persons and names have unique identifiers. From PIR², we have only persons, with a “principal” name and identifying number (the article number) attached to them. Continue reading Tensions →