- Nationwide Digital Forensic & Cyber Services
- BOOK A FREE CONSULTATION TODAY!
A clear, forensic explanation of MD5, SHA-1, SHA-256, and PhotoDNA β how they’re used in CSAM investigations, when they’re reliable, and where independent expert review tests their limits.
| Question | Short Answer |
|---|---|
| What does a hash value prove? | That a file is bit-for-bit identical to another (cryptographic) or visually similar (perceptual). |
| Common cryptographic hashes? | MD5, SHA-1, SHA-256. |
| What is PhotoDNA? | A perceptual hash designed by Microsoft / Hany Farid for matching visually similar images. |
| Can SHA-256 collide? | Practically no β no known practical collision. |
| Can MD5 collide? | Yes β known practical collisions exist. |
| Can hash matches be wrong? | Rare for cryptographic hashes; possible for perceptual hashes; reference-set errors are a separate risk. |
| Does a hash match prove knowing possession? | No β it identifies a file, not the user’s intent. |
A mathematical function that converts an input (file) into a fixed-length output (the hash value or “digest”).
128-bit cryptographic hash; fast but cryptographically broken (collisions known). Still used for de-duplication and quick comparison.
160-bit cryptographic hash; deprecated for security uses (collisions demonstrated) but still widely used in CSAM hash sets.
256-bit cryptographic hash from the SHA-2 family; no known practical collisions; the modern standard.
A perceptual hashing technology that matches images even after resizing, color shifts, or minor edits. Maintained by Microsoft and used by most major providers.
A curated collection of hash values (typically of known CSAM, contraband, or known-good files) used as reference.
When a file is uploaded to a major provider (Google, Microsoft, Meta, Apple), the provider computes a hash of the file and compares it against a hash list β often NCMEC’s and the provider’s own. A match triggers an automated CyberTipline report. Investigators later seize devices, compute hashes on each file, and compare them to the reference set again.
These functions produce a wildly different output for any change in the input. A single bit flip in a file produces a completely different SHA-256. As a result, cryptographic hash matches identify exact-file duplicates with extremely high confidence. SHA-256 has no known practical collision. MD5 has known collisions but they require deliberate engineering β random collisions are vanishingly rare in real evidence.
Perceptual hashes are designed to match images that look the same even after resizing, recompression, or color shifts. That flexibility is also their weakness: false positives are more plausible, particularly with adversarial inputs or low-information images. PhotoDNA’s thresholds, training, and update history are not all public, which is itself a forensic concern.
A defense forensic expert independently re-runs hash matching, verifies the reference set, and audits image counts before the prosecution’s numbers harden into a sentencing position.
A hash match is identification, not proof of intent. The prosecution still has to prove knowing possession or receipt.
They do not. No practical SHA-256 collision is known. The hash space is astronomically large.
No. PhotoDNA is a perceptual hash and works very differently β matching visually similar images rather than bit-for-bit duplicates.
The hash tells you the file is bit-identical to a previously tagged file. Errors in the reference set, mis-tagging, or perceptual-hash false positives are all possible and have to be tested.
| Algorithm | Type | Output Size | Collision Status | Common Use |
|---|---|---|---|---|
| MD5 | Cryptographic | 128-bit | Practical collisions known | De-duplication, legacy hash lists |
| SHA-1 | Cryptographic | 160-bit | Collisions demonstrated | Many CSAM hash sets historically |
| SHA-256 | Cryptographic | 256-bit | No practical collision | Modern forensic standard |
| PhotoDNA | Perceptual | 144 bytes (vector) | False positives possible | Provider-side CSAM detection |
| pHash / dHash | Perceptual | Variable | False positives possible | Visual similarity matching |
Our digital forensic examiners and court-qualified expert witnesses support criminal defense attorneys nationwide on CSAM and child exploitation matters. A typical defense forensic engagement includes:
Elite Digital Forensics is an independent digital forensics firm providing computer, mobile, and cloud forensic analysis, expert witness testimony, and defense-aligned forensic review for criminal defense attorneys, civil litigators, and individuals nationwide. Our examiners include former law enforcement forensic examiners and court-qualified expert witnesses. We do not provide legal advice and do not represent clients in court; we provide the independent forensic record that counsel uses to defend the case.
Theoretically yes; practically no. No SHA-256 collision has been demonstrated. The probability of a random collision is approximately 1 in 2^256.
MD5 collisions can be engineered with modest computing resources. Random collisions in real evidence are essentially nonexistent, but the broken status of MD5 is one reason modern forensic tooling reports multiple hashes.
Yes. Perceptual hashes are designed to be tolerant of edits, which makes them more likely to match visually similar but non-identical images. Whether this matters in a given case depends on the threshold used and whether the underlying file is available for review.
NCMEC maintains hash lists of files previously identified as apparent CSAM by analysts. Providers query against it; law-enforcement use it as a reference set. The methodology, update cadence, and false-positive review process are not all public β which is itself a forensic and policy issue.
No. It identifies a file. The prosecution still has to prove that a particular user knowingly possessed it. Attribution, file lifecycle, and user activity are separate forensic questions.
Often, yes. Duplicates, cache, thumbnails, and re-encoded copies can artificially inflate counts that drive sentencing enhancements. A defense forensic re-count is frequently meaningful.
Yes β provided the forensic image is produced in discovery. The expert computes hashes with documented tooling, compares to the reference set, and reports discrepancies.
Many courts treat cache and thumbnail files differently from user-saved files for purposes of knowing possession. A defense expert documents the file’s lifecycle and origin so this distinction is on the record.
Confidential consultation. Work-product protected when retained through defense counsel. Federal and state cases nationwide.
Elite Digital Forensics Assistant