What actually happens when you press 'copy'
Between the first byte read and the final checksum, five distinct phases run. Skipping any of them is how data gets lost.
You drag a folder onto a drive. A progress bar fills up. The OS tells you the copy is complete. You eject, label, and ship.
That workflow is fine for moving vacation photos. It is not fine for moving camera originals that cannot be recreated. The difference is not about the files themselves. It is about what “complete” means.
A robust copy pipeline has five phases. Each one exists because a specific, documented failure mode demanded it. When you understand what each phase does and what breaks without it, you stop trusting progress bars and start trusting evidence.
Phase 1: Enumerate
Before a single byte moves, the pipeline walks the source and builds a manifest of every file and directory it intends to copy. File paths, sizes, timestamps, permissions. The full tree, recorded before anything changes.
This sounds trivial. It is not.
The failure it prevents: partial copies that look complete. If you start copying files as you discover them, you have no way to know when you are done. A camera card with 400 clips and a nested folder structure can be partially traversed if the card is pulled early, if a directory is locked by another process, or if a filesystem error makes a subdirectory invisible. Without a complete enumeration up front, you cannot compare what you copied against what existed. You land with 398 files, the copy tool reports success, and two clips are silently missing.
Enumeration also catches problems before the slow part starts. A destination with insufficient free space, a source with unreadable directories, filename conflicts on the target volume. All of these are cheaper to discover before you start moving terabytes.
The output of this phase is a source manifest: a list of everything that should exist on the other side when the job is done.
Phase 2: Hash
With the source manifest in hand, the pipeline reads every source file and computes a cryptographic hash. xxHash, SHA-256, MD5, or all three depending on the requirements. The hash is a fingerprint of the file’s exact contents. Change one bit and the hash changes completely.
The failure it prevents: copying corruption faithfully. If you only hash after copying, you can prove the destination matches what you sent. But you cannot prove the source was healthy when you read it. Camera firmware bugs, failing flash controllers, and degrading media all produce files that are internally corrupt but still readable. A source hash taken before the copy gives you a reference point. If the source file is already damaged, you have evidence of when the damage existed. Without it, you discover the corruption downstream and have no way to determine whether it happened during the copy or was present on the card.
Source hashing is also the foundation of deduplication. On multi-day shoots, the same LUT files, sidecar XMLs, and camera metadata appear on every card. Hashing the source lets the pipeline recognise duplicates without byte-comparing files across volumes.
This phase is the slowest part of the pipeline for large jobs because it requires a full sequential read of every source byte. It is also the most valuable. The source hash is the single most important piece of evidence in the entire chain.
Phase 3: Write
Now the bytes move. The pipeline reads each source file and writes it to the destination. This is the phase that your operating system’s copy dialog shows you. It is also the phase that most people confuse with the entire process.
The failure it prevents: nothing, on its own. The write phase by itself only guarantees that your software asked the operating system to place bytes on the destination. It does not guarantee those bytes arrived correctly. It does not guarantee every file was included. It does not guarantee the destination is readable.
This is not a criticism. Moving data is necessary. But the write phase is just transportation. Treating it as the whole pipeline is like treating the flight as the whole logistics chain: you still need to verify the cargo manifest at the other end.
Modern pipelines optimise the write phase aggressively. Parallel writes to multiple destinations from a single source read. Adaptive buffer sizing based on drive throughput. Write ordering that keeps the source drive’s read head sequential even when multiple destinations have different performance characteristics. All of this matters for speed. None of it matters for correctness without the phases on either side.
One detail that trips people up: operating systems use write caches. When your OS reports that a file write is complete, it often means the bytes have been accepted into a memory buffer, not that they have been committed to the physical media. Ejecting a drive immediately after a copy finishes can interrupt the cache flush. The OS said “done” while bytes were still in flight. This is not a theoretical risk. It is one of the most common causes of data loss on set.
Phase 4: Verify
The pipeline reads every file back from the destination and hashes it. Then it compares the destination hash against the source hash from Phase 2. A match means the bytes on the destination are identical to the bytes that were on the source at the time of hashing. A mismatch means something changed during transit.
The failure it prevents: silent write errors. Hard drives, SSDs, USB controllers, Thunderbolt cables, bus adapters, and RAID controllers can all introduce errors during writes. These errors are rare per-byte, but when you are writing hundreds of gigabytes, “rare per-byte” becomes “routine per-job.” A single bit flip in a compressed video frame can destroy the frame or cause a decoder crash. A single bit flip in a metadata header can make an entire file unreadable.
Verification catches all of these. It is the only phase that proves the destination is healthy. Not “probably fine,” not “no errors were reported.” Actually, independently verified.
There is a subtle distinction here that matters. Some tools hash during the write: they compute a checksum from the bytes as they flow through memory on their way to the destination. This is better than nothing. But it proves what your software sent, not what the drive received. A true verification requires an independent read-back from the physical media. The drive has to seek back to the beginning, read the file again, and produce a hash that matches. That second read is the evidence.
The cost is time. Verification roughly doubles the total duration of a copy job because it requires reading the entire dataset from the destination. On a tight schedule, this is the phase people are tempted to skip. It is also the phase whose absence causes the most damage.
Phase 5: Manifest
With source hashes, destination hashes, and verification results in hand, the pipeline writes a manifest. This is a structured record of what was copied, when, from where, to where, and whether each file passed verification. Formats vary. ASC MHL is the film industry standard. Some pipelines write CSV logs, XML reports, or database entries.
The failure it prevents: unverifiable claims. Without a manifest, you are asking people to trust your word that the copy was verified. Three months from now, when a file is missing or a frame is corrupt, nobody will remember whether this particular drive was verified at copy time. The manifest is the receipt. It is timestamped, it contains the hashes, and it can be independently checked against the files on the drive.
Manifests also enable the concept of a chain of custody. When a drive ships from set to a post facility, the facility can verify the manifest against the files they received. If everything matches, they know the drive was not altered in transit. If something mismatches, they know exactly which files are suspect. Without a manifest, the only option is to call the DIT and ask “did you check everything?” That is not a workflow. That is hoping.
A good manifest also records what was not copied. Files that were skipped, directories that failed enumeration, and hashes that did not match on the first attempt. This negative evidence is just as important as the positive. Knowing that a job completed with zero errors is different from not knowing whether there were errors.
Why your OS skips three of these
When you drag a folder in Finder or Explorer, here is what actually runs: enumerate (partially, as it discovers files), write. That is it. No source hash. No read-back verification. No manifest.
The OS copies files correctly almost all of the time. For documents, downloads, and application data, that is fine. The failure rate is low enough that it does not justify the time cost of hashing and verifying every file.
Camera originals are different. They cannot be re-created. They represent days of work by dozens of people. And they are large enough that “rare per-byte” errors become statistically likely over the course of a job. The risk profile is completely different, and it demands a process that matches.
The cost of skipping
Each phase has a time cost. Enumeration is fast, usually seconds. Source hashing is slow, proportional to the data volume. Writing is slow, limited by the slowest drive in the chain. Verification is slow, another full read pass. Manifest generation is fast, usually seconds.
A full five-phase pipeline takes roughly twice as long as a simple copy. On a 2TB card, that might mean 40 minutes instead of 20. That feels expensive in the moment, sitting in a truck at midnight waiting for a progress bar.
But the alternative is not “20 minutes and it works.” The alternative is “20 minutes and you hope it works.” The next time someone discovers it did not work, the cost is a reshoot, a missed delivery, or footage that simply no longer exists.
The five phases exist because people learned this the hard way. Every phase was added in response to a real loss. Enumeration was added after partial copies were shipped as complete. Source hashing was added after corrupt originals were faithfully duplicated to three drives. DITs writing bash scripts at 3am discovered every one of these failure modes the hard way. Verification was added after silent write errors destroyed footage that tested fine at the source. Manifests were added after facilities received drives with no evidence and no recourse.
“Copy complete” is not a statement about your data. It is a statement about one phase of a five-phase process. The question is not whether the copy finished. The question is whether you have evidence that it worked.