Katy Rawdon, Coordinator of Technical Services
Special Collections Research Center, Temple University
On December 3, I had the opportunity to attend the Society of American Archivists workshop, “Digital Forensics for Archivists” (SAA workshop #1365). The one-day workshop was graciously hosted by the American Philosophical Society, and is part of the Digital Archives Specialist (DAS) Curriculum and Certificate Program.
What is digital forensics, you may ask? Good question. It is the identification, recovery, and preservation of digital information. The origins of digital forensics are in law enforcement and the legal profession: in order to build a case of evidence against a suspect, digital records in a wide variety of formats must frequently be obtained, transferred, protected, and their authenticity ensured – not so different from archival goals in working with digital collections, particularly those in obsolate formats. From an archivist’s perspective, digital forensics begins to answer the ever-popular question of how to deal with those boxes of old, outdated floppy disks stored in the back of the vault.
Our instructor for the day was Christopher (Cal) Lee, Associate Professor at the School of Information and Library Science at the University of North Carolina, Chapel Hill. A well-known expert on digital collections, the fact that he teaches this class was a strong motivating factor in my decision to attend. In addition to his extensive knowledge of the subject of digital forensics, he is an engaging, entertaining, and clear teacher.
The pre-class readings — “Digital Forensics and Born-Digital Content in Cultural Heritage Collections,” by Matthew G. Kirschenbaum, et al.,and “Extending Digital Repository Architectures to Support Disk Image Preservation and Access,” by Cam Kam Woods, et al. — were both useful and gave a preview of the content of the class. Attendees were expected to provide their own laptop, and prior to the workshop, each attendee had to download a number of software applications to his or her laptop. Anyone who plans to attend this class in the future, but who is perhaps less technologically savvy than they would wish, may want to ensure that they have assistance to prepare their laptop before the class.
The workshop began with an introduction to the nature of digital media, and to the history and current application of digital forensics. In the area of law enforcement, there is already significant expertise in the technology of recovering files and information to provide legal evidence against criminal suspects. The purpose of forensics in law enforcement, of course, is time-sensitive and case-based, while archivists are more focused on permanent preservation and ongoing access. These differences mean that best practices and recommended technology for archivists may differ from that of law enforcement officials; however, many current archival digital forensics practices do come straight out of law enforcement.
The class included discussion of hardware that can be used to set up a digital forensics station in an archival setting. The Forensic Recovery of Evidence Device (FRED) is a single workstation solution able to acquire data from most hard drives and other storage media. While pricey, it provides a high-end starting point for building a station. Standford University Libraries and Academic Information Resources (SULAID) has a semi-customized FRED that includes older disk drives than what would normally be required in a law enforcement environment, since much of what archivists acquire is stored on outdated media. Another option is to piece together a workstation using an existing computer, and purchasing or repurposing outdated drives.
One issue that I had not previously considered is the necessity of processing digital records while avoiding any accidental alteration of data or embedded metadata. Simply by opening a file to view the contents, metadata is frequently altered. A write blocker – a device that allows read commands while blocking write commands – is a necessary piece of equipment (included in a FRED, but also available separately) and prevents accidental alteration. In order to verify that data has not been altered, we learned about checksums: a short string of characters generated by running a file (a stream of data) through an algorithm, which can be compared to previous versions to check for alterations.
Methods of acquiring data from legacy media and formats was discussed, particularly the process of disk imaging. A disk image is an exact bit-level copy of a disk, including all data, metadata, and file structure: a “master copy” of a disk. By creating a disk image of data on obsolete storage media and/or in obsolete software, an archivist then has the ability to read, assess, preserve, and make available the contents without threat of damage to the original – either through their own work or through physical corruption of the media over time. The software applications EnCase and FTK (the latter more often used by archives and libraries) were discussed, and we completed several assignments using FTK. We were taken through exercises that allowed us to generate and verify checksums, create disk images, and examine their contents.
Ethical issues in digital forensics were discussed in brief, and clearly could be expanded into a workshop of their own. In this class, we touched on issues related to retrieving supposedly deleted files and cached information from digital collections, uncovering “hidden” data, and retrieving (and potentially making accessible) donors’ online “habits,” such as browsing history.
Resources for further exploration were provided, including an extensive bibliography in the workshop handouts. Online resources discussed in class included the BitCurator site, a Mellon-funded project to create and analyze digital forensics systems for collecting institutions; Forensics Wiki, a site primarily from a law enforcement perspective; and the AIMS (Born Digital Collections: An Inter-Institutional Model for Stewardship) project. A number of archival institutions are actively engaged in digital forensics, and a recent master’s thesis by Martin J. Gengenbach – “’The Way We Do It Here’: Mapping Digital Forensics Workflows in Collecting Institutions” – examines various existing processes.
Digital forensics is a massive subject, and one day is hardly sufficient to cover everything. However, given the time limitations, I felt that this class did an excellent job in providing background, required technological knowledge, and practical suggestions for hardware and software in order to begin “dealing with those floppy disks in the back.” The class covers neither description and cataloging of digital records, nor providing research access, but SAA offers separate workshops on both of those topics. For those who are interested, “Digital Forensics for Archivists” will be given in March in Worcester, Massachusetts, as part of the New England Archivists spring meeting.