The Journey through Metadata – Cultural Heritage Informatics Initiative

Previously….

My previous update was written while in the middle of developing a metadata scheme for an anthropology department digital library. Most of this effort was directed towards finding the appropriate data to describe the data being curated by the department. This largely entailed the researching of metadata schemes and the consideration of unique factors that come along with the curation of archaeological heritage. This often meant making a decision on what entities to use.

Entities

When coming up with a metadata scheme to organize cultural heritage data, it is important to use entities which are widely recognized to describe that data. Entities refer to words such as “site”, “author”, or “date” which can be used to group your data in a meaningful way. Using entities with standardized definitions allow for easier comparison between different sets of data and less ambiguity when describing the data. A ceramic pot, for example could be labeled in many ways such as artifact, object, heritage asset, and so on. In this particular case, I chose the overarching term of “heritage asset” to describe the many objects related to humans as they live their lives. This term came from the CARARE metadata standards, which in turn was built on other existing standards.

CARARE and other Metadata Schemes

I arrived at CARARE after considering multiple metadata schemes such as LIDO, CIDOC CRM, and DCMI. All of these schemes were built with particular goals in mind for the organization of different types of data. LIDO, for instance, was made with the intention of allowing easier aggregation of data through data mining software. Another example, CIDOC CRM, was designed for the creation of comprehensive record systems for the description of archaeological data. CIDOC aims to facilitate communication and research with its data among scholars. I eventually decided to utilize another metadata scheme, CARARE, because it builds on existing metadata standards like CIDOC and LIDO, but its aims fit more with those of our project. It focuses on the detailed description of heritage assets, the events in which the asset is involved, the digital resources which are available, and their provenance.

The CARARE system uses 4 main themes or “wrappers” to describe their entities: Collection Information, Heritage Asset, Digital Resource and Activity. These describe information about the collection; information about the artifact, monument, text or other heritage asset; the associated digital resource; and any activities such as restoration associated with the datum, respectively. This lines up with the overarching goals of the current project which are, digitization, easier management and eventually public outreach.

The goal of this project was not to perfectly preserve objects digitally for the purposes of facilitating research or archiving the objects being curated by the anthropology department. CIDOC CRM and LIDO are much more comprehensive than CARARE and would have been more appropriate for an archival project. That is, a project in which “perfect” description and preservation of the data was the main objective. If it is decided I the future that we want to be able to archive data in this repository, the appropriate metadata from schemes like these can be supplemented in. I am not, however, simply copy and pasting CARARE into KORA (the digital repository I am using).

A New Metadata Scheme

I utilized many entities from CARARE such as “Collection Information,” “Heritage Asset,” and “Activity,” but many other entities from the scheme are irrelevant to the circumstances of this project. Additional entities from other metadata standards have also been used to supplement the new scheme. This creates a unique selection of metadata from other standards which needs to be properly defined and organized within the use of KORA.

As a repository, KORA is organized in a way that necessitates the compartmentalization of particular chunks of information. The general organization goes from Project à Form à Pages à Field. The Anthropology digital library will be the project, while the forms will describe particular sets of data within the project. This will include “wrappers” such as “Collection Information” and “Heritage Asset.” These forms will then contain fields organized within pages for the entry of data. These pages and fields will make up the majority of the metadata scheme. Filling these pages and fields with data will constitute a record for a particular form, which can then be associated with records from other forms. Following is a cascading list of the terms that Kora uses to organize data and examples of metadata I will be using for each of these subjects:

Project

Anthropology Department Digital Library

Forms

Collection information, Heritage Asset, Text Site, Actor, Activity, ARGUS ID, Digital Exhibits

Pages (e.g. for a Heritage Asset)

Object identification, digital resource, etc.

Fields (e.g. for object identification)

Description, General Type, Heritage Asset Type, etc.

So, lets pretend I want to digitize a ceramic sherd. This particular artifact should produce one record for a heritage asset form when information about the object is entered into a form. This record can then be associated with other records from different forms. For instance, records from other forms describing the Schmidt Collection (Collection ID), the site (Site), and the archaeologist (Actor) can be associated with the Heritage Asset record describing the ceramic sherd. This allows data to be compartmentalized and associated with each other without having to reenter the same data for repeated information such as the Schmidt Collection.

The Near Future

My goal for the near future is to polish this final metadata scheme, enter it into KORA, and then begin entering data from the Schmidt Collection into the newly created repository.