You never know what a tweet will lead to … one from Tanya Zanish-Belcher about her experience at CurateGear led to this great guest post by Tanya and two of her colleagues. I’m pleased to be able to share this overview with you, and I know that all three authors would love to answer any questions you may have. If you know of other resources that might be of interest to other archivists, please share them in the comments.
By Tanya Zanish-Belcher (Director, Special Collections: email@example.com); Rebecca Petersen (Access Archivist: firstname.lastname@example.org); and Chelcie Rowell (Digital Initiatives Librarian: email@example.com), Z. Smith Reynolds Library, Wake Forest University
Recently, the three of us had the opportunity to attend the most recent CurateGear 2014 (hosted by the University of North Carolina in Chapel Hill). Held for the third year, this one-day event offers presentations by a variety of technical gurus and to also participate in demos of various products. The majority of participants discussed digital tools of great interest for not only digital collections/humanities, but archives and electronic records as well. Not only were the presentations of interest, but it was a great opportunity to hear about other potential programs which we may consider implementing.
I was most interested to hear Reagan Moore, who talked about iRODS, (integrated Rule Oriented Data System), which is an open source data grid to be used for organizing and managing large collections of data. Basically, when collections are submitted, the user can set up default rules and procedures which allow you to do any number of things, including validation, creating audit trails, and even extract metadata–and the system is interoperable with both Fedora and DSpace! This system is currently being used by many in the research community, such as the Southern California Earthquake Center (SCEC) and NASA’s Center for Computational Sciences. While at this point in time, we are not handling large data sets, I could see in the future where we would be able to provide this service (in light of NSF and NIH requirements).
MetaArchive is a co-op of university libraries and independent research libraries that work together to maintain a dark archive of their digital content. This effort mitigates risk to digital content due to disasters both major and minor, such as hurricanes, floods, fires, human error, and storage media failures.
Each MetaArchive member institution contributes a secure, closed-access, preservation server to the MetaArchive LOCKSS (Lots of Copies Keep Stuff Safe) network. After an institution ingests content to its own preservation server, six other preservation servers in the MetaArchive LOCKSS network replicate that content. Servers are assigned to content in order to maximize geographic distribution. After ingest from a repository, the servers continue to revisit the contributing repository, and wherever they detect changes or additions, they ingest altered content as a version, which is stored alongside the original. Similarly, if deletions are detected, the deletions are noted without deleting original versions. After ingest, the seven servers check in with each other periodically in order to perform fixity checks and verify that all seven copies remain identical. If a mismatch is identified (due to bit rot or another degradation), the servers reach consensus about which copy is authoritative, and they repair the mismatch. The repair is treated as a version and stored alongside the original.
In fact, this support for versioning is one of the advantages of MetaArchive’s preservation strategy. As a MetaArchive report titled A Guide to Distributed Digital Preservation notes, “LOCKSS is able to preserve more than just a snapshot of a collection at a moment in time—it can preserve the changing form of that collection over time.” Additionally, the co-op model offers economies of scale. Lastly, the knowledge community of MetaArchive may be attractive as an alternative to preservation-as-a-service vendors such as DuraCloud and Preservica.
TRAC review tool (Chelcie):
TRAC refers to Trustworthy Repositories Audit and Certification (TRAC): Criteria and Checklist, now ISO 16363. Essentially, TRAC is a method for demonstrating the reliability and readiness of an institution to assume long-term preservation responsibilities for a repository of digital content. There are 85 criteria on the checklist, and they fall into three categories:
- Organizational Infrastructure – e.g. mission statement, succession plans, professional development, financial stability
- Digital Object Management – e.g. metadata templates, persistent unique identifiers, registries of formats ingested, preservation planning
- Technologies, Technical Infrastructure, and Security – e.g. detecting bit corruption, migration processes, off-site backup
While TRAC is designed for repositories to become certified as trustworthy, many institutions simply use it as a self-assessment tool. Developed by Nancy McGovern, the Head of Curation and Preservation Services at MIT Libraries, the TRAC review tool enables the assessor to provide evidence of how well a repository meets a TRAC criterion and rate its compliance on a five-point scale:
- 4 = fully compliant – the repository can demonstrate that it has comprehensively addressed the requirement
- 3 = mostly compliant – the repository can demonstrate that it has mostly addressed the requirement and is on working on full compliance
- 2 = half compliant – the repository has partially addressed the requirement and has significant work remaining to fully address the requirement
- 1 = slightly compliant – the repository has something in place, but has a lot of work to do in addressing the requirement
- 0 = non-compliant or not started – the repository has not yet addressed the requirement or has not started the review of the requirement
Simply a Drupal instance with a page for each TRAC criterion, the TRAC review tool does not make determining an institution’s level of compliance any easier. The people who are working on the assessment must still exercise their judgment in this regard. Instead, the TRAC tool documents the assessment process and keeps track of the evidence that supports assertions of compliance. Furthermore, because knowledge of whether a repository meets all of these 85 criteria isn’t the purview of one person, another benefit of the TRAC review tool is that it enables the lead assessor to assign certain criteria to other people (such as administrators or tech support), making the whole process of assessing repository activities more transparent across an organization.
At CurateGear 2014, I had the opportunity to listen in on Kam Woods’ Demo and Discussion of the BitCurator Project. I found this to be an extremely fascinating session. From a non-technologist standpoint, these are some of the takeaways that I found most interesting:
What is it? “The BitCurator project is an effort to build, test, and analyze systems and software for incorporating digital forensics methods into the workflows of a variety of collecting institutions.” What was most important to me, as an archivist, is the fact that they take consider these two factors: “incorporation into the workflow of archives/library ingest and collection management environments, and provision of public access to the data.” So really, what is it? BitCurator allows archival institutions such as ours to manage their born digital content and perform tasks such as creating disk images of floppies, disks, hard-drives, etc., and create metadata. It also allows users to “make sense of materials and understand their context, while also preventing inadvertent disclosure of sensitive data” by maintaining order and eliminating information such as Social Security numbers, credit card #’s, etc.
This tool will be invaluable to archives and other institutions as we all accept more and more born-digital collections into our holdings. I highly recommend reading the white paper “From Bitstreams to Heritage: Putting Digital Forensics into Practice in Collecting Institutions” to learn more about the project.
At the end of the day, Dr. Cal Lee gave concluding remarks and he noted the increasing number of intersections between all these programs. When he asked the audience for comments and recommendations for the future, I suggested having some kind of assessment of these tools for decision-makers, such as what do they all do? How do they interact? How much do they cost? How much expertise do they need to operate? Basically, it all comes down to choice, but there is a great need for education before making these important decisions that can impact your program or library for years to come. For me, the particular problem is being in a new position, where many of these decisions were made years before my arrival.
This discussion and a brief reference led me to Preserving Digital Objects with Restricted Resources (POWRR). Funded by IMLS, among other local agencies, this project provides a number of helpful resources, including just what I wanted, a tool grid. There is an evaluation of all the major digital tools, using a set of criteria focusing on ingest, processing, access, storage, maintenance and cost.
Here are some other resources which may be of interest:
Indiana University recently hosted a day-long workshop in the digital humanities, planned and organized by students. Similar in concept to CurateGear, participants were given flashdrives containing a variety of programs which they were then encouraged to explore, and which you can find here: Digital Sandbox: Building a community of digital humanists
Finally, the Chronicle of Higher Education recently featured an article focused on the importance of digital stewardship. In so many ways, digital and electronic objects and records (and their formats) require a high level of time, attention, and technical expertise. All in all, we are reminded, yet again, that all archivists are digital archivists.