One of the conversations that it was really useful to hash out in person and with the involvement of so many experts and interested parties present at the workshop a couple weeks ago, was the question of how the SNAP:DRGN Cookbook should recommend contributing person-datasets represent date information.
It has been our working assumption that the minimalist information SNAP is ingesting would optionally include a single, undifferentiated, very crudely recorded date associated with person. (By the same token, any place information associated with a person would be given only in very blunt form, inasmuch as it serves almost as an extra name, epithet or indentifier for a person. Further more granular place association, à la Pelagios, might be included in the original prosopography, and/or in the exposed RDF serialization of said dataset, but SNAP will only expect and take advantage of associated place in the most abstract form.) The argument may be at its clearest with respect to dating, however, partly because there are so many strong arguments for including more granular and semantic date information in a prosopographic dataset.
There are many different classes of data information that one might, in theory, want to include in a prosopographical or biographical record of a person. Some dates are firmly known, from a variety of evidence, and might include a person’s exact birth and death dates, or just a secure floruit, when they were known to associate with other persons, take part in historical events, leave behind artistic or written creations, etc. A person attested to in an inscription might be dated (perhaps to the end of their life) if the inscription is datable by context, content, palaeography, style, or some other feature. Other inscriptions might mention a person (e.g. a historical ruler) whose date bears no relation to that of the text or support. Some people are datable very broadly by century or era; for example a database of Imperial Roman elites with no specific date information for each entry, might nonetheless enable us to ascribe a date range of 0001 – 0300 for all the persons therein, which is worth recording if it is all we have. Dating criteria should also be recorded, as should uncertainty, complexity, and editorial attribution for dating decisions (especially in the case, for example, of conflicting dates offered by different sources or editors). Dates might also be attached to events and c hanges of state within a person’s life, with all of the complexities and varieties described above.
Despite this potential complexity, the value of having robust and reliable dates attached to person-entries in the SNAP graph is compelling. Dates can be used to help disambiguate persons and perform co-reference detection; the date is one of the first things that a human user would want to see in an at-a-glance summary of a person’s information, alongside their name, source, location, and key relationships; especially in combination with place, dating information can contribute to the building of a social network of persons. Given the complexity of dating even within a single project, and massive variety of approaches to dating that historical databases have taken, it is hard to imagine that there will be any consistency in this regard in the SNAP graph of ancient persons, except for that which the SNAP:DRGN project recommends and even demands from contributed data.
Having said this, we have always tried to follow as a sacred mantra the idea that SNAP will only ask for and surface the bare minimum of information needed to identify a person (at a minimum name and URI in contributing project), and in a simple format so that there will be as much consistency across the entire network as possible. So a personal name contributed to the SNAP graph should be in plain text with a language tag attached (“Apollonius”@en, “Ἀπολλώνιος”@grc), and may also include a URI for the name in an associated onomatsticon, but SNAP will not recognise or take account of, for example, TEI-encoded XML recording the condition and certainty of the name, abbreviations, emendations, lemmata etc. All this information can and should be recorded somewhere, in the originating project, but SNAP isn’t the place to find and use this complexity, which will no doubt be unique to individual projects anyway. Likewise, no amount of specific and qualified dates will be of any use if 90% of the graph is in a different format, has much less granular dating, and doesn’t record criteria or semantics of their dates. Even after agonising over the possibilities for a seemingly endless afternoon session, there was no consensus on what a more sophisticated dating mechanism that would be feasible across multiple huge and heterogenous datasets might look like.
The main problem, of course, is that not every date assigned to every person will be selected by a human editor who can individually record their source, certainty, precision, criteria, range and scope of the date. Some datasets will have relatively full dates for all persons, encoded in great detail in TEI or CIDOC-CRM or a bespoke relational database following the Factoid model. Others will have no dates at all attached to persons or names, but a date may be extrapolated from the date of the source document(s) in which each person is attested. These will be a very different quality of date: many will be epitaphs, but might or might not be closely dated; others may be texts clearly written while the person is alive (some might be dated only by the known dates of the person mentioned within them); some may have no relationship between the date of the document and that of them person, mentioning an emperor from several hundred years before the text was written, say. In the case of a dataset like this, should we:
- not indicate any dates in the person-data contributed to SNAP, because in some cases these dates will have no relation to the lives of the persons recorded?
- indicate the extrapolated dates, but with criteria saying that these are a different kind of date (across the whole database, because we have no way of differentiating between the entries)?
- indicate the extrapolated dates, allowing for the very occasional date that might be misleading, because all but a tiny minority of the tens of thousands of entries will still meet SNAP:DRGN standards?
Much to the disapproval of some purists, I think the correct solution for SNAP, which does not record full prosopopgraphical data but only aggregates a summary of it, is number 3. We shall define our date property such that it may contain a very loose superset of all possible dates associated to a person, and tolerate a tiny amount of error in inappropriate dates being recorded with this property.
Even if the source dataset contains detailed dating information in sophisticated formats, please record in the RDF that you submit to the SNAP graph a single date range, expressed as an ISO 8601 time interval (e.g. “0101/0200” for “second century CE”), which, to the best of your project’s ability to ensure it, probably at least overlaps with the lifetime of the person being recorded.
This definition, and a similarly general definition of our associated place property (“place or placename genrally associated with this person for any reason”), make it especially important that we use a property formally defined in the SNAP ontology as having this general definition, that will not be used for more specific dates (born, died, floruit, married, reign, moved city, attested, etc.) or places (lived, ruled, visited, founded, attacked, ethnicity, etc.).
Hello,
How would you record a time interval in RDF ? As far as I know, time should have an XSD type, like xsd:date :
gnd:dateOfBirth “1815-07-07″^^xsd:date
or xsd:gYearMonth if the day is missing :
gnd:dateOfBirth “1815-07″^^xsd:gYearMonth ;
But I am not sure there is a XSD type for time intervals. However, time intervals could be expressed with OWL Time.
Thanks,
Yes, this is why we decided to represent these intervals as ISO 8601 time interval, both to keep it simple, and to allow the expression [date]/[date] (where [date] may be just a year, year+month or year+month+day). It’s not terribly sophisticated, but that’s kind of the point.
(Others may have more useful suggestions to your question, but I think this is the limit of what SNAP:DRGN needs to be able to encode.)