Content from Setup


Last updated on 2024-05-13 | Edit this page

Here’s how to setup BitCurator

Instructions


For this lab, the main goal is to get started with BitCurator. To do this, you will need to install BitCurator on your local computer, and you will also need to have a small removable storage device, such as a USB drive or thumb drive. (Your removable storage media should be modest in capacity in order to reduce file sizes.)

The overall steps in this task are as follows:

  1. Install BitCurator locally (on your laptop, or a desktop that you use and have access to). This will require installing VirtualBox, which is a software from Oracle, an “image” of BitCurator to use for importing the image to VirtualBox, and then starting up the BitCurator environment on your computer.
  2. Launch BitCurator in the VirtualBox environment, and spend some time getting familiar with the tool’s in the BitCurator environment.
  3. Acquire a disk image of some sample of digital materials (from a USB or small storage device) using GuyMager, a disk imaging tool that is part of the BitCurator environment.

For reference, these are the tools you will be focused on:

  • BitCurator - as of January 2023, the current version is 4.4.1
  • GuyMager (note that this is already part of the above, so there is not need to install it separately)

Installation Tips


Sample Data


Content from Digital Forensics for Archives


Last updated on 2024-05-02 | Edit this page

Overview

Questions

  • Why might archivists, librarians, and other cultural heritage workers want to use digital forensics techniques and tools?
  • What are some known use cases of digital forensics usage amongst archivists, librarians, or others?
  • What is BitCurator? What is the BitCurator environment? Is it the same as “digital forensics”?

Objectives

  • Become familiar with digital forensics techniques and their application in cultural heritage and digital curation
  • Identify and understand various types of magnetic disk removable media, which might be encountered in collections
  • Describe and recommend tools and techniques for extracting content from legacy media, including use of write blockers and creation of disk images
  • Understand various types of metadata that can be generated for born-digital content extracted from legacy media Become familiar with BitCurator and its toolset

Digital Forensics


Digital forensics refers to a suite of activities and tools to preserve the original context of digital materials (e.g., the system timestamps and OS structure) and extract content at the bitstream level from damaged or deleted digital content.

Archivists + Digital Forensics: Why


What are some use cases for digital forensics with legacy born digital materials?

Discussion

Why are you interested in digital forensics for archives and other cultural heritage collections?

What potential uses for these tools are you considering in your context?

Enter BitCurator Environment (BCE)


To address this, a group of archivists and researchers developed the BitCurator Environment, or BCE. The BCE is a suite of open-source digital forensics softwares that are particularly useful to archivists in tracking creation metadata, structure, file identification, and documenting provenance. It even contains some built-in writeblockers and other tools to preserve original order and chain of custody. BitCurator tools are grouped within an Ubuntu-based Linux environment and can be run virtually or installed directly as the main OS of a workstation, and together this is all known as the BCE. We will discuss BCE more in the next episode.

Resources


There are many resources that explain how to use the BCE and other digital forensics tools. Given that this lesson focuses on BCE, most of the resources are geared toward this software environment, but the list also includes a few more general resources:

Key Points

  • Digital forensics identifies a range of activities which aim to extract and preserve contextual information about digital content on external devices, like laptops, servers, drives, and even legacy devices like floppy disks and USB drives
  • Digital forensics tools and techniques can help digital preservation work, particularly in maintaining information about original order, provenance, and chain of custody for digital objects
  • Digital preservation workers, particularly archivists, have used digital forensics techniques and tools to record information about, process, and preserve digital content, and particularly to address content stored on legacy digital devices

Content from Getting Started with BitCurator


Last updated on 2024-04-04 | Edit this page

Overview

Questions

  • How do I install and use digital forensics tools that may be useful for digital curation activities?
  • What is a disk image and how can I create one?
  • What tools may be used to acquire born-digital materials from removable storage media (and other locations), which ensure the integrity of the data, create useful information about the source and the resulting materials, and can help to preserve the context of the original materials?
  • What sorts of digital media are most well suited to this sort of activity? Are there some that are not?

Objectives

  • Test and evaluate tools for use in the identification, transfer, and preservation of born-digital materials.
  • Install and become familiar with the tools in the BitCurator environment.
  • Identify appropriate tools for acquiring born-digital content from removable media and scan for potentially sensitive information stored in that media.
  • Use the Guymager disk imaging software to acquire the contents of a storage device and its associated metadata.

Activities


Getting around in BitCurator

Getting around - tasks

Getting around: Answers will vary, depending on what you choose to look at. At minimum, you should look at the various “Applications” (menu up at the top), use the right click option to look at file information, checksums, and look around to find other interesting things.

Key Points

  • Use BitCurator as a helpful way to bundle together and run many tools useful to digital forensics that are appropriate to digital curation. That is, tools that assist in creating trustworthy digital copies, provenance information, contextual data, and chain of custody information.
  • You can use GuyMager to make disk images.
  • BitCurator has things set up so you can use GuyMager as well as other tools that will document your transfer and copying processes.

Content from Disk Imaging


Last updated on 2024-04-24 | Edit this page

Overview

Questions

  • What is a disk image?
  • When would you disk image media?

Objectives

  • Use the Guymager disk imaging software to acquire the contents of a storage device and its associated metadata.
  • Learn to evaluate when to image a disk based on individualized criteria.

What is a disk image?


A disk image a bit-perfect sequence of all the bits on a particular physical device; in other words, a complete bitstream (as defined by the physical limits of a storage device).

  • You may have seen .dmg, or .iso files - these are images (like a thumb drive, CD, diskette)
  • We will work with “forensic images,” specifically the “Expert Witness” format (aka .E01 or EWF), which is a complete sequence of a physical drive, does not allow any modifications

To image or not to image?


Considerations for disk imaging

There is no right or wrong answer to whether or not you should image a disk!

  • Are you choosing between extracting files and/or chunks of content?
  • Collection considerations:
    • What is your collecting purpose?
    • What is the role of the device(s)?
    • What are you storage concerns?
  • Device considerations:
    • What devices have an OS (that means lots of redundant & proprietary files)?
    • If it’s a storage device, may have deleted/unintended files (these are captured by forensic imaging approaches)
    • What is on the device? Sometimes the device contents will determine if a disk image is required, such as executable files or other software.
    • What are the preservation needs?

Creating a disk image


For this walk through we will be using 3.25 inch floppy disks. Similar concepts are applicable to other storage mediums, but the exact steps may differ.

Using Guymager

The following instructions are modified from the Bitcurator Quick Start Guide

Mounting the device is not required to create an image of it. If you wish to mount the device, click on the Files icon in the dock, and select the name of the indicated volume on the device to mount. If you are not using a hardware write blocker, or if the USB device read-only policy is not enabled, your device is now mounted and writable.

Click on the Applications menu in the top left of the screen, then navigate to the Imaging and Recovery submenu. Then click on Guymager. Guymager requires elevated privileges for access to physical devices; you will be prompted for your password to enable this. Once Guymager has loaded, the main interface appears as in the picture above. In this example, the 3.25 inch floppy disk drive is selected.

Screenshot of the 3.25 inch floppy drive selected in the Guymager interface
Screenshot of the 3.25 inch floppy drive selected in the Guymager interface

Next, right-click on the selected device (in this example, a 3.25 USB floppy drive listed as MITSUMI_USB_FDD) and select Acquire Image from the context menu.

Screenshot of the Acquire Image action highlighted
Screenshot of the Acquire Image action highlighted

A new dialog prompt will appear. This disk image will be acquired using the Expert Witness Format (the second option at the top). Guymager will split EWF images into 2048MiB segments by default. If you do not wish to split the image, set the Split size to something very large (2 EiB, for example).

The five metadata fields starting with Batch number are optional, but can be useful for tracking and metadata purposes. Under Destination select the image directory you would like the disk image to be saved to. In this case, we have simply chosen to write the image to a folder on the Desktop. Finally, provide a name for the image. Then click Start.

Screenshot of the Acquire Image dialog box with completed metadata
Screenshot of the Acquire Image dialog box with completed metadata

You will see the main dialog state change to Running. When the acquisition finishes, you will see a Finished - Verified & Ok message in the State column.

Some disk image formats you may see

  • RAW and Split RAW (RAW stored across multiple files)
  • Advanced Forensics Format (AFF) [no longer recommended]
  • EnCase Evidence File (.E01)
  • ISO (for CD-ROM)
  • IMG (floppy or sometimes CD-ROM)
RAW format (dd)
  • Copies of the raw media data. Often split into smaller chunks to make them more manageable and so that the resulting images can fit onto limited file systems and media such as FAT or DVD/CDROM.
  • Advantages:
    • Very simple, use simple tools to manipulate the image.
    • Image can be easily split for storage and transport on removable media
    • Output can be piped to other applications for immediate processing
  • Disadvantages:
    • Can be very large (no compression). Zipped raw images cannot be operated on directly with regular tools (efficiently perform arbitrary seeks).
    • Often too large to store on FAT formatted media
    • No metadata other than file names, no hashes.
    • No checksumming on files – not robust
    • Missing segments (for example from scratched CD/DVD – can sometimes be overwritten with 0’s).
    • Overwritten data (unrecoverable – no checksums on small blocks in file).
Expert Witness Format (EnCase)
  • Evidence file consists (in order) of: Acquisition information, Data Block, CRC (cyclic redundancy check), acquisition hash (MD5)
  • Can be split for storage, transport
  • CRC computed for every 32K block; balance between integrity and speed, also makes it very difficult to tamper with the evidence file (1 in 4 billion chance of collision)
  • Cannot be manipulated with simple (open source UNIX) tools; support reverse engineered in libewf
  • Previously limited to 2GB size
  • Largely proprietary
  • Has been reverse engineered by Joachim Metz in libewf (used in open source tools that read EWF) -
ISO (.img) for CD-ROM, DVD
  • Similar to raw, but can’t contain
    • multiple tracks
    • audio or video tracks
  • Doesn’t contain control headers or error correction fields (raw can include these)
  • Filesystem usually will be either ISO 9660 (CD-ROM) or UDF (DVDs)

Accessing disk images


  • Virtualization and emulation
  • Mounting the original filesystem
  • Accessing (but not mounting) disk images using forensics software
  • For end user access:
    • Remote, dynamic access to disk image contents (via server, virtual environment)
    • Cross-drive analysis

Mount disk image: Using BitCurator Mounter

The following instructions are modified from the Bitcurator Quick Start Guide

In the file manager dialog, right click on any of the sample images you have created, select Scripts, and then select Disk Image Mount. This script serves as a wrapper for libewf and some mounting tools to attempt to automatically mount any identified file systems. If such a filesystem is found, you will see it appear as a mountable device in the list on the left.

Screenshot of selecting Disk Image Mount from the file manager
Screenshot of selecting Disk Image Mount from the file manager

Note: This mount is read-only. You cannot alter the content of a filesystem mounted from an E01 file (modifying, adding new files, or deleting) from this desktop interface.

Once you have finished examining the content, click the eject indicator next to the filesystem name in the file dialog. You will get a prompt for your user password in order to complete this step.

Activities


Challenge

Split into two groups.

Group 1: Create your own disk images from the supplied 3.25 inch floppy disks.

Group 2: Mount one of the sample disk images available in the GitHub repo. What information do you see? Is there anything that sticks out to you?

Once you’ve completed one group activity then switch to the next group!

Key Points

  • Digital forensic approaches can offer useful tools to digital curators in working with legacy removable media
  • Important concepts include thinking beyond the file level and disk imaging
  • BitCurator environment offers a useful bundle of tools that are of use to digital curators

Content from Reporting


Last updated on 2024-05-13 | Edit this page

Overview

Questions

  • What tools are available in the BCE for analyzing disk images or directories of data transferred from legacy media?
  • How do you use them?
    • Specifically, how can librarians and archivists capture basic system characteristics and metadata?
    • How can they generate reports to help them triage and organize files for digital archiving processes?
    • How can they scan for for potentially sensitive information to help them make decisiosn about access?

Objectives

  • Gain basic experience with:
    • Brunnhilde, a reporting tool for directories and disk images;
    • Bulk Extractor and Bulk Reviewer, which scans for credit card numbers, emails, etc.; and
    • fiwalk, to print filesystem statistics
  • Learn more about reporting functionality in the BitCurator Environment, in general, and where to learn more.

Reporting in BitCurator is essentially a method of generating technical and preservation metadata about a disk image or directory of data.

At a high level, you will be using, and creating a workflow piecing together:

  • a “map” of the disk image, which records relationships, integrity (checksums), names, timestamps, etc. (this is in DFXML);
  • a summary of the file types, duplicates, and other relationship information;
  • tools for assessing Personally Identifialble Information (PII) and sensitive content; and
  • summaries of sensitive content, if discovered.

Note: If you haven’t yet created a disk image or otherwise have a directory of data to work with, you can use Bentley Code4Lib Samples or download sample data from BitCurator’s Github site and work with that: bcc-dfa-sample-data.

One possible structure to group content and metadata (the one we’ll be using for this workshop):

c4l24_bicuratorintro_group0X_image0XX/              <-- parent directory (sample name)
│
├── reports/                                        <-- subdirectory for detailed metadata (use mkdir)
│   ├── beout/				                        <-- bulk extractor reports (generated by bulk_extractor)
│   ├── brunn_output/		                        <-- brunnhilde reports (generated by brunnhilde.py)
│   └── mappedfeatures/                             <-- sensitive info (generated by identify_filenames.py)
│
├── c4l24_bicuratorintro_group0X_image0XX_dfxml.xml <-- DFXML (E01 “map” generated by fiwalk)
├── c4l24_bicuratorintro_group0X_image0XX.E01 		<-- disk image (generated by Guymager)
└── c4l24_bicuratorintro_group0X_image0XX.info      <-- disk image metadata (from Guymager)	

First Things First


Today we’ll be using a number of command line tools in the BCE, including:

  • fiwalk
  • brunnhilde.py
  • bulk_extractor
  • identify_filenames.py

All of these are “pre-loaded” in the BCE, and a simple way to get usage instructions for any of them is to simply type their names in the terminal and press enter. E.g., brunnhilde.py, which is the same as as using brunnhilde.py -h or brunnhilde.py --help. This is standard for CLI tools, but we hope it helps illustrate how what we’re doing today is only the “tip of the iceberg” for any of these individual tools or the BCE in general.

Brunnhilde Usage
Brunnhilde Usage

Reporting


BitCurator includes a variety of tools to analyze and report on disk images and the filesystems they contain.

Map Your Image AKA How to Create DFXML (with fiwalk)

Your first goal is to create a Digital Forensics or DFXML “map” of the disk image. DFXML is used to automate digital forensics processing, and includes all filesystem data, checksums for integrity, and explain the relationships of elements of the disk image. We’ll do this using fiwalk, a program that processes a disk image using the SleuthKit library (a library and collection of command line tools that allow you to investigate disk images for various file systems) and outputs its results in Digital Forensics XML. This map will be used later in other tools.

Tool: fiwalk

To run: Use fiwalk in the terminal.

Command syntax:

fiwalk -f -X <output filename_dfxml.xml> <input image file.E01> 

This command tells the terminal to run fiwalk, run the “file” command on each file that it finds (-f), write the results to an XML file with the specified filename (-X <output filename_dfxml.xml>) and identifies the source of the analysis (the disk image).

Generate File Summaries and Reports AKA How to Run brunnhilde to Report on the Disk Image

Your next goal is to create a summary of file types, duplicates, and any hard to identify files using Brunnhilde. Brunnhilde runs Siegfried, a signature-based file format identification tool, against a specified directory or disk image, loads the results into a sqlite3 database, and queries the database to generate reports to aid in assessment: triage, arrangement, and description of digital archives. The program will also check for viruses unless specified otherwise, and will optionally run bulk_extractor against the given source.

Tool: brunnhilde

To run: Use brunnhilde in the terminal.

Command syntax:

brunnhilde.py -d -b --tsk_fstype fat --tsk_imgtype ewf <image input file.E01> <output destination/reports/brunn_output> 

This command tells the terminal to run brunnhilde, treat the input as a disk image (-d), generate a bulk extractor report (-b), analyze the disk image as a FAT filesystem (--tsk_fstype fat), and analyze the disk image as an expert witness file (--tsk_imgtype ewf). Then, the command provides the location of the source disk image (<image input file.E01>) and the destination for reports (<output destination/reports/brunn_output>).

brunnhilde Output
brunnhilde Output

Outputs include:

  • report.html: Includes some provenance information on the scan itself, aggregate statistics for the material as a whole (number of files, begin and end dates, number of unique vs. duplicate files, etc.), and detailed reports on content found (file formats, file format versions, MIME types, last modified dates by year, unidentified files, Siegfried warnings/errors, duplicate files, and -optionally - Social Security Numbers found by bulk_extractor).
  • csv_reports folder: Contains CSV results queried from database on file formats, file format versions, MIME types, last modified dates by year, unidentified files, Siegfried warnings and errors, and duplicate files.
  • siegfried.csv: Full CSV output from Siegfried

Identify Sensitive Information AKA How to Identify Features (with bulk_extractor)

Your next goal is to create reports that identify potentially sensitive information, like SSNs, emails, etc. To do this, we’ll use Bulk Extractor, which rapidly scans any kind of input (disk images, files, directories of files, etc) and extracts structured information such as email addresses, credit card numbers, JPEGs and JSON snippets without parsing the file system or file system structures.

Tool: bulk_extractor

To run: Use bulk_extractor in the terminal AND/OR use Bulk Reviewer.

Command syntax:

bulk_extractor -o <output destination/reports/beout> <input target disk image file.E01>  

This command tells the terminal to run the bulk_extractor tool, then to output a report to the specified directory (-o <image directory>/reports/beout) and specifies the target file to analyze (<input target disk image file.E01>).

bulk_extractor Output
bulk_extractor Output

Note: To use Bulk Reviewer, a GUI alternative and an Electron desktop application that aids in identification, review, and removal of sensitive files in directories and disk images, and which scans directories and disk images for personally identifiable information (PII) and other sensitive information using bulk_extractor, click over Applications (top left) > Forensics and Reporting > bulk-reviewer. Click “Scan new directory or disk image.” Select the “Type” (“Directory” or “Image”), create a “Name” for the report, “Browse” to the directory or disk image, select and “Options” and then click “Start Scan.” Once it’s finished, you can then view the report and have options to save or export the results.

Bulk Reviewer Interface
Bulk Reviewer Interface

The desktop application then enables users to:

  • Review features found by type and by file in a user-friendly dashboard that supports annotation and dismissing features as false positives
  • Generate CSV reports of features found
  • Export sets of files
    • Cleared: Files free of PII
    • Private: Files with PII that should be restricted or run through redaction software

Note: The “terry-work-usb-2009-12-11.EO1” disk image in the sample data from BitCurator’s Github site produces a number of “hits”–including social security numbers, phone numbers, and email addresses–if the directories or disk images you’re working with do not.

Summarize Sensitive Information Reports AKA How to Summarize Identified Features (with identify_filenames.py)

Your final goal is to summarize the reports on sensitive information, show main types of features, and to note what files contain the features. To do this, we’ll use identify_filenames.py, which identifies filenames from “bulk_extractor” output and uses the DFXML to map to point between various hits discovered earlier to the files on the disk images (rather than the byte offsets).

Tool: identify_filenames.py

To run: Use identify_filenames in the terminal.

Command syntax:

identify_filenames.py --all --image_filename <input disk image.E01> --xmlfile <DFXML of the image_dfxml.xml> <bulk extractor reports location/reports/brunn_output/bulk_extractor> <destination for summary report>/reports/mappedfeatures>  

This command tells the terminal to run the identify_filenames.py script, look at all of the feature files (--all), specifies the source image (--image_filename <input disk image>), use the specified DFXML file (--xmlfile <DFXML of the image_dfxml.xml>), identifies the bulk extractor output to use (<bulk extractor reports location>, use the one in <image directory/reports/brunn_output/bulk_extractor>), and specifies a destination for the the analysis (<image directory/reports/mappedfeatures>).

So What?


What is the utility in creating all these reports? Reports create technical and preservaton metadata about directories or disk images that can accompany them in to the future and aid in later appraisal and processing for preservation and access.

Key Points

  • Some reports may be needed for contextualizing and using the disc images in other programs (dfxml).
  • Some reports may be more for risk management and analyzing PII.
  • Some may be more for preservation planning (file types).
  • Some may be for general description (dates of creation, titles/names of files, users, or other topical information).

The way you’d interpret any of these reports depends on the report on what you’re wanted to get out of it. Some reports, like the bulk_extractor reports, are easier to read through. The DFXML, while “harder” to read, gives you all the checksums and a listing of what’s on a disk image, which could be good for checking fixity, but also helping you to determine if you want to extract the files from the disk image.

Additional resources