The William Brumfield Russian Architecture Digital Collection: From Database to Semantic Web
By Michael Biggins and Theo Gerontakos
This article was orignally posted in the March NewsNet
For the past 16 years, the University of Washington (UW) has collaborated with Prof. William Craft Brumfield (Tulane U) to preserve and provide online access to part of his vast archive documenting significant architectural objects of Russia – a collection of photos and slides spanning the work of nearly six decades. The joint project has culminated most recently in the public release in 2017 of UW’s William Brumfield Russian Architecture Digital Collection.
The UW-Brumfield collaboration has been driven by a complex of shared concerns. Of these, creating access to a major information resource for teaching and research of an underrepresented architectural tradition, as well as the collection’s long-term preservation for future use, have been the most fundamental. At various stages we have naturally been drawn to emerging technologies and standards that have offered ways of enhancing the final product −for instance, by employing locally-developed image management software (CONTENTdm, originally developed at UW) to serve images and metadata; by adapting then newly released national metadata standards (the so-called VRA Core developed by the Visual Resources Association, as well as VRA’s manual for Cataloging Cultural Objects) to describe the collection’s non-mainstream subject matter; by geo-referencing each building in the collection to allow for an optional GIS-based graphic interface; and by overcoming the limitations of the essentially flat information structures of existing image databases in order to present objects within the context of their complex hierarchical relationships to larger or subordinate objects (e.g., a detail of a fresco within a church within a monastery).
As with so many endeavors supporting our area of study, however, we have also felt we were working to fill a void in the existing array of available information resources for the study of Russia. For example, the Artstor image database, licensed by many academic libraries, is widely considered a definitive image resource for the study and teaching of art history. But, as of this writing, it contains a scant 35 images of Russian church iconostases, and not even five images of Russian church frescoes, as compared to its more than 9,000 images of frescoes in Italy. Even after accounting for the difference in relative worldwide impact of these two traditions, this disparity in their documentation is out of all reasonable proportion and, unfortunately, all too common.
Work on the Brumfield Collection began at UW in 2002 with the help of a generous grant from the Gladys Kriebel Delmas Foundation to develop a pilot project featuring some 1,200 digitized images of Russian architectural objects representing a wide geographical, chronological, typological, and stylistic range. This allowed the project team to index objects and images across a matrix of categories and test the ability of existing metadata standards and thesauri to provide the granularity and cultural specificity of description that the subject matter required. The completed pilot project presented a striking, early example of a relatively small- scale database of consistently indexed, georeferenced records for a category of information (Russian architecture) that utterly lacked that kind of systematic online access at the time. It also underscored some of the limitations of data organization and presentation that were inherent in the technology we were using.
In its next phase, under the sponsorship of a three-year NEH Digital Humanities grant, the UW development team, in consultation with Prof. Brumfield, sought to scale the resource up to encompass some 30,000 images representing over 8,000 separate buildings, or “works.” It was at this point that we began using XML as our markup standard, largely because of its capacity for accommodating hierarchical relationships among individual works (buildings) and their parent works (architectural complexes) or constituent parts (art works, named side chapels, etc.). Work records were created to describe each complete architectural or artistic entity, including its standard name, along with variant forms of the name, the type of structure, date(s) of construction, the name(s) of the architect, builder, artist and patron or sponsor (if any), and – in many cases – a free-text description of the history of the building’s construction and significance. Linked to the work records are image records corresponding to each photograph of the work, or of its parts and details contained in the database. In addition to describing the photograph itself (including the date taken, film format, photographer’s name) each image record amply describes the specific architectural features depicted in the corresponding image – e.g. windows, gables, lintels, shaters (шатер/шатры), zakomaries (закомара), kokoshniks, or any of hundreds of other generic or culturally specific details. In accordance with best practice for achieving consistent indexing, we applied metadata using the controlled vocabularies, thesauri and name authority files stipulated by the VRA Core.
In the project’s most recent phase, the project XML metadata has been converted to “linked data” for publication on the semantic web. Currently some English-language linked data about Russian architectural objects is freely available on the web. For example, DBpedia has published some useful data, such as http://dbpedia.org/page/Alexander_Nevsky_Lavra. Unfortunately, though, this sort of data about Russian architecture tends to be scarce. In the interest of expanding the availability of such data, UW is now in the process of making its dataset from the Brumfield Digital Archive freely available online. A sample of this dataset (formatted as HTML to facilitate reading, with some explanatory text added), also describing Saint Alexander Nevsky Lavra, can be seen at http://faculty.washington.edu/tgis/ld/sampleData/sampleData.html. For a look at UW’s complete Russian architecture dataset (still in development, but viewable as it develops) see https://github.com/russianArchitecture-uwLibraries/brumfield or http://faculty. washington.edu/tgis/ld/brumfield/ (both sites contain the same data).
Although these collections may appear to some to be a mere tangle of data, they have many possible uses. For example, some part or all of the data can be downloaded to provide resource descriptions locally, saving many hours of descriptive work. The data could be integrated with a dataset in a local database, greatly increasing the amount of data collected about any single entity. The data can be selected (often by a machine), harvested, and used for constructing web annotations (for example, as described in the document “Embedding Web Annotations in HTML” at http://www.w3.org/TR/annotation-html/). There are countless possible uses for a freely available dataset.
One of the distinguishing characteristics of this data is that it is linked data. This has several meanings: the data is structured using the data model for the semantic web, the Resource Description Framework (RDF); also the data is “linked” to other datasets; specifically, when an entity in the Russian architecture dataset matches the same entity in another organization’s dataset, additional data is created in the Russian architecture dataset that states that relation. Creating such links is central to the broad movement toward linked open data, where the goal is a web of data with a common data model, interlinked, and freely available over the Internet.
By publishing its data on Russian architecture, UW is contributing over a million assertions that can serve as the basis for many millions more. By giving our resources persistent names on the world wide web (using http identifiers), others can use these identifiers to unambiguously refer to the same resource and build additional assertions. This dataset provides unchanging identities for resources related to Russian architecture, and others can use those identities in a worldwide collaborative effort to produce and consume data on Russian architecture.
One should note that this data is not a database for viewing images with a user-friendly display (although it could be used for that purpose). This is a data collection that can be referenced, harvested, downloaded, and reused for any purpose. UW as the data provider provides the data in a highly structured format optimized for machine processing, and any user is then free to create a new use for the data.
The dataset includes descriptions of the following entities:
Works: describes over 8,000 Russian sites; it includes names for buildings, historical information, the type of building or site, and references to places and people associated with the site or building.
Photos: describes photographs taken by William Brumfield; it includes view information, terms for architectural details pictured, and references to the buildings pictured. All photo descriptions include a hyperlink that, when followed, displays the photographs described in the William Brumfield Russian Architecture Digital Collection.
Agents: describes people and corporate bodies associated with a site or photograph. It includes architects, photographers, builders, etc., their names, their era, and links to descriptions of the same person in other datasets.
Three additional datasets are intended to supplement more detailed datasets on the web:
Places: lists the locations of the sites pictured in Professor Brumfield’s photographs; Subjects: primarily lists the architectural details visible in each photograph;
Worktypes: lists types of buildings featured in the “Works” dataset.
One final word of caution: the datasets are currently under development. They can be viewed as they develop; however, formally incorporating the data into your own data is not fully operational, as the data will be changing, including the URIs. The datasets will be complete sometime in late 2018.
1. Principal members of the development team included James D. West and Michael Biggins (project coordinators), Aylin Llona (computer support librarian), Mary Giles (metadata technician) and Theodore Gerontakos (metadata librarian).
Michael Biggins is Slavic, Baltic and East European studies librarian at the University of Washington, Seattle, and an affiliate professor with UW’s Department of Slavic Languages and Literatures (firstname.lastname@example.org)
Theo Gerontakos is the principal metadata librarian at the University of Washington, Seattle (email@example.com)