A progress report

We now have 3238 records in the I.Sicily database, but we’re not yet online (not long now!) – why not? The major challenge throughout this stage of the project has been moving from an old, flat Access table of metadata (i.e. information about the inscriptions: bibliography, provenance, description, classification, etc.)…

Access screen shot
Screenshot of part of the Access table

….to the much richer and more flexible XML EpiDoc format.

Oxgyen screen shot
Provenance information encoded in TEI-XML

There is a lot that we can add in this process: if you compare the provenance information in the table with that in the Epidoc, the former just has place names, whereas the latter has Pleiades URIs (Unique Reference Identifiers) for the ancient place names, Geonames URIs for the modern places, and specific geodecimal degree coordinates for the precise locations where known. All of this information can be added in during the conversion process (thanks to the marvellous James Cummings), and while this involves manually creating tables of this information, doing it once, e.g. for each placename, outside the main table is far quicker and simpler than adding all of this for each individual record. In deciding to use Pleiades as our primary reference for ancient place names, we have taken the opportunity to edit and improve the Pleiades data for Sicily (and sometimes the data in OpenStreetMap and Geonames as well) – the benefits are cumulative all round (thanks to Valeria Vitale for doing most of this work, and Jeffrey Becker at Pleiades for continuing support!).

The same thing can be done (and we are doing so) for many of the other types of information. The EAGLE project has generated a number of online vocabularies for many of the classifications used in epigraphy (the problem of course being that every epigrapher uses these slightly differently, or with slightly different words – and different languages – so one of the major contributions of these vocabularies has been an attempt to try to align and translate terminology). During the conversion process we are incorporating reference to the URIs for Inscription Type (e.g. honorific), Object Type (e.g. altar), Material (e.g. limestone), and Execution Technique (e.g. engraved).

In all these cases, one benefit of taking the time to do this now, is that ensures that we clean up and normalise our own data. In the long term, the holy grail is that adding in all of these references to the XML opens the door to Linked Open Data, connecting the information which we’re putting online to other related datasets and resources (for an easy example of this in action, have a look at the page for Syracuse in Pleiades, and then look at all the ‘Related content from Pelagios’ in the frame on the right side of the page: in due course, you could expect to see I.Sicily content referenced here too).

A further key part of this process is making sure that all of our records are clearly and uniquely identifiable. Internally we can do this without difficulty, and every record (i.e. every inscription) has its unique I.Sicily number (ISic 0000). We will in turn maintain each of those identities as a URI: http://www.sicily.classics.ox.ac.uk/isicily/inscriptions/0000. But we want to make sure that those identities make sense to others and are recognisable, and crucially that they align to any existing identities for the inscriptions. Indeed, that was one of the original objectives behind the first database, collecting all the traditional bibliographic references and trying to align them to ensure that there was a single record for each inscription. But now there are multiple digital online identities too. For Sicilian epigraphy the key existing resources are the Epigraphic Database Roma (EDR), which has about 2000 Sicilian records (on all materials); and the PHI Greek epigraphy database [this link is to the more accessible version which does not require JAVA], which has entries for c.1800 Greek inscriptions on stone from Sicily. Ideally, we want all of our records to cross-reference all of their records. Unfortunately, at this stage, there is no quick way of achieving this, and so in recent weeks I have been manually adding the EDR and PHI numbers to the I.Sicily records (a slow process, but much quicker within the simple framework of the flat Access table). One potential solution to this particular problem is the Trismegistos project, which began life working on ancient Egypt, but now aims to generate unique identifiers for all ancient papyrological and epigraphic texts. If every project references via a TM number, then they can all be aligned much more easily. We have recently exchanged data with Trismegistos and we now have TM numbers (many of them new) for about 90% of our records (huge thanks here to Mark Depauw). In the future we hope to collaborate with Trismegistos for the recording of names and people also.

Finally, we are doing our best to improve the information on the current location of the inscriptions, which curiously seems to be something that epigraphers have not always been very diligent about recording. We are working closely with several of the Sicilian museums already (in particular the Paolo Orsi Archaeological Museum at Siracusa, and the Museo Civico of Catania) to improve the cataloguing of their collections, and so to be able to provide inventory numbers. As part of that process we are providing a URI  for every archaeological collection (and in due course every public archaeological site/park), which will enable the proper linking of inscription and museum records, and the conversion will embed that information in the XML also.

This whole process of data enrichment and conversion is very nearly complete. When it is, we hope to put a ‘beta’ version of I.Sicily online, to enable people to start using and testing the site and to help us develop it as a resource. You may have noticed that the thing that we haven’t really talked about much so far is the texts themselves (and there will be images too). At this stage the project has deliberately concentrated on the metadata, since that is where our resources are much richer than those of the existing online datasets. In the first instance, therefore, most of the records will lack a proper marked-up EpiDoc text; but, having aligned our records with those of other online databases, users will still be able to get to a text for any inscription they are interested in at a couple of clicks. And we expect to convert and incorporate the majority of texts rapidly in the coming months. Our longer term goal is to build in a version of the Perseids platform to enable anyone to contribute texts or edits (subject to peer review and with due authorial credit) and so to build I.Sicily into a complete and very rich collaborative online corpus of Sicilian epigraphy.

O, and none of this would be possible without the tireless efforts of James Chartrand, at Open Sky Solutions (Canada), who is actually building all of this!


%d bloggers like this: