Why You Should Redact Documents Instead of Destroying Them
Why You Should Redact Documents Instead of Destroying Them
How de-identification preserves institutional knowledge while keeping you POPIA-compliant.
A hospital archives 50,000 patient files over a decade. A law firm accumulates 10 years of case files. A financial services company stores thousands of loan applications. Then someone asks the question every compliance officer dreads: what do we do with all this data?
POPIA's answer seems simple. Section 14 says you must destroy or delete personal information — or de-identify it — as soon as you're no longer authorised to keep it. Most organisations hear "destroy" and reach for the shredder. The filing cabinets get emptied. The databases get purged. The problem goes away.
But so does everything else. The research potential. The training material. The institutional memory. The audit evidence. The competitive intelligence buried in years of operational data.
There's a third option most organisations overlook, and it's right there in Section 14: de-identify. POPIA treats destruction and de-identification as equally valid responses to the retention problem. One eliminates the data entirely. The other eliminates only the personal information — preserving everything else.
This post explains why that distinction matters, and why redaction is almost always the better choice.
What POPIA Actually Says About Retention
Let's start with the law itself. Section 14(1) establishes the core rule: records of personal information must not be retained any longer than is necessary for achieving the purpose for which they were collected. Once the purpose is fulfilled and no other law requires you to keep the records, the clock starts ticking.
Section 14(4) then gives you three options: destroy the record, delete it, or de-identify it. The Act doesn't prefer one over the other. All three are equally compliant responses.
This is the part most organisations miss. De-identification — properly removing personal information so the data subject can no longer be identified — takes the record outside POPIA's scope entirely. Section 1 defines "de-identify" as deleting any information that identifies the data subject, can be used to identify them, or can be linked to other information that identifies them. Once that's done, the remaining data is no longer "personal information" under the Act. You can keep it indefinitely.
The critical point: de-identification is not a compromise or a workaround. It's an explicitly authorised compliance mechanism built into the legislation itself. When POPIA's drafters wrote "de-identify" alongside "destroy" and "delete," they recognised that data has value beyond the personal information it contains.
What You Lose When You Destroy
Destruction sounds clean. Decisive. Compliant. But consider what actually disappears when you shred a decade of operational records.
Medical research loses its foundation. In May 2025, researchers at UCL and King's College London announced they were training an AI model called Foresight on de-identified NHS records from 57 million patients — the largest dataset of its kind. The model is designed to predict healthcare needs and identify early intervention opportunities that could save lives. None of this would be possible if those hospitals had destroyed their records instead of de-identifying them.
That same month, health data company Truveta launched its Truveta Genome Project, aiming to create a database of genetic information from 10 million patients paired with their health records — all de-identified. The goal is to enable precision medicine research at a scale previously impossible. The raw material for this research exists only because hospitals preserved their data in de-identified form rather than destroying it.
South African healthcare faces the same choice. With POPIA's restrictions on processing special personal information under Section 26, medical researchers can't simply access patient records for secondary research. But properly de-identified records? Those fall outside POPIA entirely. A hospital that redacts its archived patient files creates a permanent research resource. A hospital that destroys them creates nothing.
Legal knowledge disappears. Law firms spend years building expertise through their case files — the strategies that worked, the arguments that didn't, the patterns in opposing counsel's tactics, the judicial tendencies across different courts. When those files are destroyed after matters close, the knowledge walks out the door with the senior partner who handled them. Junior associates learn from scratch instead of from experience.
Redacted case files preserve everything a firm needs for training, precedent research, and knowledge management — the legal reasoning, the procedural history, the strategic decisions — while removing the client names, witness details, and confidential terms that make the files sensitive. The institutional knowledge stays. The privacy risk doesn't.
Audit trails break. Financial services firms, accounting practices, and regulated industries need to demonstrate historical compliance. When auditors or regulators ask how you handled a particular type of transaction five years ago, "we destroyed those records" is not a satisfying answer — even if the destruction was itself POPIA-compliant.
De-identified records preserve the substance of what happened — the transaction types, the amounts, the processes followed, the outcomes — without the personal details that triggered the retention obligation. You can demonstrate your historical practices without exposing anyone's personal information.
AI and machine learning stall. South African insurers building fraud detection models need historical claims data. Banks developing credit scoring algorithms need past application records. Healthcare providers training diagnostic tools need patient histories. Every one of these use cases requires large volumes of real-world data — and every one of them can work with de-identified data. None of them can work with destroyed data.
The global de-identified health data market alone involves companies specifically set up to buy de-identified patient data from hospitals and sell it to AI developers and researchers. Whatever your views on that model, the underlying principle is clear: de-identified data has enormous value. Destroyed data has none.
The False Binary: Keep Everything vs Destroy Everything
Most organisations operate at one extreme or the other. They either keep everything — accepting the POPIA compliance risk of sitting on vast stores of personal information they're no longer authorised to retain — or they destroy everything periodically, losing institutional knowledge in the name of compliance.
Both approaches fail.
The "keep everything" approach violates Section 14 and exposes the organisation to enforcement action. The Information Regulator's increasing activity — 2,374 data breaches reported in the 2024/25 financial year, a 40% year-on-year increase — means the odds of being held accountable are rising. And when breaches happen, every unnecessary record you retained becomes additional exposure. If you didn't need to keep 50,000 unredacted files and they get compromised, the Regulator will rightly ask why they still existed.
The "destroy everything" approach is compliant but self-destructive. You eliminate the privacy risk, but you also eliminate the data's utility. And here's the thing POPIA recognises: that utility often serves the public interest. Medical research improves healthcare outcomes. Legal knowledge bases improve access to justice. Financial pattern analysis improves fraud detection. POPIA's drafters understood this. That's why de-identification is right there in Section 14, as an equal alternative to destruction.
Redaction is the middle ground. It eliminates the personal information that creates the compliance obligation while preserving the substantive content that creates value. It's the only approach that serves both privacy and utility.
Where Redacted Data Creates Value
Here are concrete scenarios where redacted data — data that would otherwise be destroyed — continues to serve the organisation and the public.
Training and onboarding. New employees learn faster from real-world examples than from hypothetical scenarios. A redacted medical case file teaches a junior doctor more than a textbook description. A redacted contract dispute teaches a trainee attorney more than a moot court exercise. A redacted fraud investigation teaches a new analyst more than a PowerPoint presentation. The realism matters. The personal information doesn't.
Quality assurance and process improvement. When a company wants to review how it handled a particular type of matter — medical cases, legal disputes, insurance claims, HR investigations — it needs access to those historical files. Redacted versions allow quality reviews, process audits, and outcome analysis without the privacy implications of accessing unredacted originals. You can identify patterns, spot errors, and improve procedures using the same data that drove the original outcomes.
Benchmarking and reporting. Industry bodies, regulators, and internal leadership teams need aggregate data to understand trends. How many claims of a particular type were filed? What were the average resolution times? Which procedures had the highest complication rates? All of these questions can be answered from redacted data. None of them require knowing whose claim, whose resolution, or whose procedure it was.
Academic and policy research. Universities, think tanks, and government bodies regularly seek access to private-sector data for research purposes. Providing unredacted data raises POPIA concerns around purpose limitation and consent. Providing redacted data raises none — because de-identified data falls outside the Act. Organisations that maintain redacted archives can contribute to research without the compliance overhead of negotiating data-sharing agreements that satisfy Section 15's further processing limitations.
Regulatory defence. If the Information Regulator investigates your data handling practices, redacted historical records demonstrate your processes without exposing current personal information. You can show the Regulator how you handled data, what procedures you followed, and what controls were in place — all using records that no longer contain the personal information that would otherwise require fresh processing justification.
How to Do It Right
Not all de-identification is equal. POPIA requires that personal information be de-identified in a way that prevents the data subject from being identified — directly or indirectly, by the organisation or by anyone else. This means visual-only redaction (drawing black boxes over text in a PDF viewer) doesn't qualify. The underlying data must actually be removed.
This is why proper redaction tooling matters. When SureDox redacts a document, it doesn't just cover the text — it removes the underlying content from the PDF. The redacted output passes a verification check confirming that no personal information survives in the file's text layer, metadata, or hidden content. A full audit log records exactly what was detected, what was redacted, and what verification was performed — the kind of documentation the Information Regulator expects when you claim data has been de-identified.
The audit log also serves as your Section 14 compliance evidence. It demonstrates that you identified the personal information in the record, applied de-identification in a manner that prevents reconstruction (as Section 14(5) requires), and retained only the de-identified version. If the Regulator asks why you still have records from five years ago, the answer is simple: they've been de-identified and no longer contain personal information.
A Practical Framework
If your organisation is approaching a data retention review — and under POPIA, you should be — consider this framework before defaulting to destruction:
First, identify the records you're no longer authorised to retain in their original form. These are the records where the original collection purpose has been fulfilled and no other law (the Companies Act's seven-year requirement, the Basic Conditions of Employment Act's three-year post-termination requirement, or sector-specific regulations) requires you to keep them.
Second, for each category, ask: does this data have secondary value? Would it be useful for training, research, quality assurance, benchmarking, regulatory defence, or institutional knowledge? If yes, redact rather than destroy.
Third, apply proper redaction using tools that perform true content removal — not visual masking — and that produce verification evidence. Retain the redacted versions and the audit logs. Securely destroy the unredacted originals.
Fourth, document the process. Your data retention policy should explicitly address de-identification as a retention mechanism, referencing Section 14(4) and your organisation's criteria for when redaction is preferred over destruction.
The Bottom Line
POPIA doesn't force you to choose between compliance and institutional knowledge. The Act itself provides a mechanism — de-identification — that satisfies the retention obligation while preserving the data's substantive value.
Every record you destroy is knowledge your organisation can never recover. Every record you redact is knowledge your organisation keeps forever, with zero privacy risk.
The question isn't whether you can afford to redact your archives. It's whether you can afford not to.
SureDox is built by Boone and Boo (Pty) Ltd, operating as a POPIA Operator that processes personal information on behalf of clients. For questions about our compliance approach, see our Privacy Policy and Terms of Service.
References & Further Reading
- Section 14 — Retention and Restriction of Records — POPIA's data retention requirements, including the de-identification option.
- Section 1 — Definitions — POPIA's definition of "de-identify" and "personal information."
- Section 26 — Special Personal Information — Restrictions on processing health, genetic, and other sensitive data.
- Section 15 — Further Processing Limitation — Restrictions on using personal information for secondary purposes.
- UCL: AI Model Trained on De-Identified Data from 57 Million People — The Foresight project using de-identified NHS records (May 2025).
- STAT News: The Companies Paying Hospitals for Health Data to Train AI — The de-identified health data market and Truveta Genome Project (January 2025).
- ITWeb: InfoReg Exposes POPIA Violators as Data Breaches Mount — 2,374 breaches reported in 2024/25, with 40% year-on-year increase.
- Financial Institutions Legal Snapshot: Retention of Records in South Africa — Summary of retention requirements across South African legislation.
- Mondaq: The Destruction or Deletion of Personal Information in the POPIA Era — Practical guidance on POPIA-compliant data destruction and de-identification.
- Glacier Insights: POPIA Explained — Part 11 (Retention of Records) — Detailed walkthrough of Section 14 requirements.
- SAICA: Guide on the Retention of Records (June 2023) — Comprehensive retention period reference across South African legislation.