The Importance of Fields in Database Projects – Cultural Heritage Informatics Initiative

We discussed in previous posts about the importance of selecting representative fields when we are creating a database based on historical records. It is critical to go back again to this point due to the importance it has while designing a functional digital database. We all know that historical sources contain disparate universes of data. Historians, in general, extract from the documents what they need for they own research. This selectiveness inherently to the historiographical craft makes sources manageable for us. We simplify, mutilate, and make documents “legible” in order to answer our own questions. We ignore or overlook elements that we consider are not significative for our research. For instance, if we are working on different type of sources such as inquisitorial and plantation records, and we are looking at religious practices of Africans in the Americas, we are going to privilege the testimonies of slaves on the legal trial or their ethnicities recorded in some plantation papers. Probably, we will overlook the sugarmills machinery because it is not significative to make our point. However, if we are creating a database of plantation records from Louisiana, and that database aims to be comprehensive, we probably would like to include as many fields as possible such as sugar mills machinery. For doing a historical digital database, it is crucial to think about it on the most broader possible way. A database is not just an individual enterprise tributing to our particular research. It is a repository for potential multiples types of historical inquiries.

However, like it is the case for a conventional monograph, we need a central theme for a database. It is essential that we are clear about what is our subject because the fields need to be connected among them around the main topic. For instance, slaves themselves are the main protagonists of a database on runaway slaves. In a relational diagram of fields, the slaves are at the center while owners, physical marks, date of capture, and “nation” are subfields tributing to the slave or main entity. Take now the example of the most successful digital project on the slave trade: the Trans-Atlantic Slave Trade Database (TSTSB). The main subject is the slave ship. Every field in the database is centered around those vessels transporting forced human cargoes from Africa to the Americas. Variables such as flags, date of departure, captain, owners, number of slaves or mutiny on board are instances related to a particular ship. The TSTDB resulted from the diverse type of documents located around the globe. These disparate records were written in different languages, with diverse purposes, over more than three centuries, and by disparate historical actors. Many of this documents had been used before by historians to write their classic monographs and some of these historians collaborated later to enlarge the TSTDB. Therefore, the question to ask is how was possible to translate such diverse historical sources into a single and coherent project without losing sight of comprehensiveness.

First of all, the authors of the project are renowned specialists not only on the topic of the slave trade but also on quantitative studies. They simplified to standardize. I think this the key to create a manageable digital database when the universe of documents we are using is extremely heterogeneous. After a careful study, and based on their years of experience, the authors of the TSTDB determined those fields that were likely to show up on documents related to the Atlantic slave trade. For instance, documents usually mention information such as the ship name, the captain, number of captives, date of departure/arrival or nationality of the vessel. The fact that sometimes the name of the vessel is not mentioned does not make any difference about the importance of including that field. In the same way, that sometimes the color of the vessel is mentioned in some documents is not a reason to include that information as an individual field. Why? because the aesthetic of the ship is not something that appears regularly in the sources. As a consequence, that feature does not deserve a particular field. If we create a field for every detail from the documents in order to create a database, the result would be an oddly high number of empty fields. The database would not be functional.

The other element we have to take into account is that we will deal during the process with software developers and their programming language. They need a clear project based on coherent and interrelated fields. Programmers in general, in particular, those accustomed to create databases based on contemporary data, do not understand completely our initial intention of putting together a database based on fragmentary data. Take the example of a programmer that have done digital platforms for credit card companies. He/she has been databasing customers. He/she is used to a coherent and complete set of data. Unlike the aforementioned case, historians have to deal usually with fragmentary data. Thus, programmers have to create relationships between fields that could be or not entirely populated. Second, It happens often that historians resist simplifying their information when it come to formulate their digital projects. This attitude is based on epistemological principles that make sense while writing monographs, but that are not completely functional while creating a digital database. This is not a matter of gathering all the data we think are or could be significative information for potential research. We have to choose fields that regularly appears in the documents in order to standardize them which mean, make a functional digital database. Our solution for exceptional or not usual data is an empty box where we write complementary information that did not make it as a separate field because its lack of representativeness. Fortunately, we did not face that issue while creating BARDSS. Our database is based on an extremely coherent set of information regarding time and space. After all, baptismal records were from the beginning, intended to be a sort of legible and coherent collection of data on population. Next post we will show some documents and how we extracted the information from them and transformed into a relational diagram