What DIT bash scripts got right (and where they broke)

Before dedicated copy tools existed, DITs built their own ingest pipelines in bash. The scripts defined the spec. They also defined the failure modes.

In 2006, the RED ONE started shipping with CF card media. No ingest software existed for it. The camera assistant handed the card to someone with a MacBook Pro and no job title, and the footage needed to get onto a drive, then a second drive, with proof that all copies were identical. If any copy was wrong, the day’s shoot was unrecoverable.

The solution was Terminal.

rsync and a prayer

rsync was the default, not because it was the best option, but because it shipped with macOS and handled recursive copies with checksums. The command was simple enough to memorise:

rsync -avP --checksum /Volumes/RED_CARD/ /Volumes/RAID_A/DAY_01/

That was the entire ingest pipeline. One line.

Verification was manual. Run md5sum or shasum against both source and destination, diff the output in a second Terminal window. If the hashes matched, label the drive, case it, move to the next card.

This was the complete data management workflow on a significant number of productions through the late 2000s. It was slow and manual, but it was correct.

Then the scripts got longer

Scripts grew incrementally. The same five commands ran for every card, so they went into a .sh file. Then a timestamp in the log. Then a free space check before copy. Then audio notifications on completion and failure.

#!/bin/bash
# offload.sh — v11 (don't ask about v4 through v8)

SRC="$1"
DST="$2"
LOG="$DST/copy_log_$(date +%Y%m%d_%H%M%S).txt"

echo "=== OFFLOAD START ===" >> "$LOG"
echo "Source: $SRC" >> "$LOG"
echo "Destination: $DST" >> "$LOG"
echo "Started: $(date)" >> "$LOG"

df -h "$DST" >> "$LOG"

rsync -avP --checksum "$SRC/" "$DST/" 2>&1 | tee -a "$LOG"

echo "=== VERIFICATION ===" >> "$LOG"
cd "$SRC" && find . -type f -exec md5 {} \; | sort > /tmp/src_hashes.txt
cd "$DST" && find . -type f -exec md5 {} \; | sort > /tmp/dst_hashes.txt

diff /tmp/src_hashes.txt /tmp/dst_hashes.txt >> "$LOG" 2>&1

if [ $? -eq 0 ]; then
    echo "VERIFIED — all checksums match" >> "$LOG"
    afplay /System/Library/Sounds/Glass.aiff
else
    echo "*** MISMATCH DETECTED ***" >> "$LOG"
    afplay /System/Library/Sounds/Sosumi.aiff
    afplay /System/Library/Sounds/Sosumi.aiff
    afplay /System/Library/Sounds/Sosumi.aiff
fi

echo "Finished: $(date)" >> "$LOG"

The triple error sound is a real pattern from production scripts. It exists because a single Glass.aiff is easy to sleep through, and a silent copy failure means lost time. Repeated alerts solved a real failure mode: the operator not noticing.

What the scripts knew

Collected together, DIT bash scripts from 2006 to 2012 form an accidental specification for on-set data management.

Every script had the same structure. Copy the files. Hash the copies. Compare the hashes. Log everything. Report the result. Independent practitioners converged on the five-phase copy pipeline without coordinating. Some scripts added RAID health checks, free space calculations, automatic folder structures keyed to camera roll numbers, or email alerts over hotel WiFi. The core was always the same five operations.

The convergence is the interesting part. No spec existed. The requirements emerged from the work: take data off a temporary medium, put it somewhere safer, prove the copy is correct.

Why some DITs never stopped

Some experienced DITs still use scripts despite knowing every graphical tool on the market. They have evaluated them, tested them, recommended them to others. Then they go back to their own scripts.

The reason is auditability.

A self-written script has no ambiguity about what it does. It runs rsync with --checksum, not a faster mode that skips verification. It hashes source and destination independently rather than trusting a copy tool’s exit code. The log format is explicit because the operator chose every field. The failure modes are known because the operator wrote the error handling (or didn’t, and learned from the result).

A graphical tool is a black box. When a producer asks “are you sure that copy is good?” the answer “the app showed a green checkmark” means trusting someone else’s definition of “good.” “Checksum verified” is not a yes-or-no question. It matters how the hash was computed and when. “I hashed source and destination independently with SHA-256 and diffed the output, here’s the log” is a different class of answer.

This is not about preference. It is about the difference between observable and opaque verification.

What the scripts couldn’t do

The scripts had hard limits, and those limits tracked production scale.

A single bash script handles four cards from one camera. It does not handle eight cameras shooting to two card slots each, dual-system audio on a Sound Devices recorder, and six more cards arriving from second unit. That scenario requires parallel copies with backpressure, queue management, priority ordering, progress tracking across multiple destinations, and handoff of verified media to editorial before the last card finishes copying.

All of that can be built in bash. People have. The result is a few thousand lines of shell that only the author understands, that breaks when a macOS update moves the path to md5, and that has no error handling for failure modes that haven’t occurred yet.

The specific failure modes:

  • No concurrency model. Bash scripts run sequentially. Parallelising with background jobs and wait creates race conditions in log output and error handling.
  • No state persistence. If the script crashes mid-copy, there is no record of which files completed. The entire offload restarts from scratch.
  • Platform fragility. md5 vs md5sum, BSD find vs GNU find, Homebrew path changes across macOS versions. Scripts written on one machine often fail on another.
  • Single operator dependency. The script works for the person who wrote it. A replacement DIT on day two of a shoot cannot debug someone else’s bash.

Dedicated tools earned their place here. Not by doing something scripts couldn’t do in principle, but by handling concurrency, state recovery, and cross-platform consistency at a level that single-author shell scripts cannot sustain.

The implicit spec

DIT scripts contain an implicit product specification. Every line traces to a concrete incident or requirement.

No product manager spec’d triple-Sosumi error alerts. No UX researcher determined that the copy log should include a df -h disk space snapshot at the start. No feature request asked for copy resumption after a power failure killed the truck’s inverter. These features exist because a specific production incident made them necessary.

The useful measure for any copy tool is whether it works under production conditions: hour fourteen, six drives, the AD asking when media will be ready for editorial. Bash scripts met this test by construction. No splash screen, no onboarding flow, no update notification, no analytics ping. Instant launch, single function, clear result.

The scripts remain the clearest specification for what data management software needs to do. Not what it could do. What it needs to do.