ISO 19005-4-2020 (Document Management-Electronic document file format for long-term preservation)

What it is ISO 19005-4:2020 ?

ISO 19005-4:2020 is the fourth part of the ISO 19005 series, which defines the use of the Portable Document Format (PDF) for the long-term preservation of electronic documents. Commonly known as PDF/A-4, it is an international standard that specifies a profile of PDF (based on ISO 32000-2, or PDF 2.0) designed to ensure that a document can be reproduced with the exact same appearance years or decades into the future, independent of the tools, systems, and software used to create it.

  1. The Core Philosophy: “Preservation by Output”
    PDF/A’s fundamental principle is to encapsulate all information necessary for faithful visual reproduction within the file itself. This eliminates dependencies on external resources like fonts, color profiles, or multimedia content that may become unavailable or obsolete.

Key Conceptual Pillars of PDF/A-4:

  • Self-Containment: All fonts must be embedded, color information must be device-independent, and no external references (like hyperlinks to websites or linked multimedia) are permitted in the core conformance level.
  • Self-Documentation: The file includes metadata (using the XMP standard) that describes the document, its origin, and its technical characteristics.
  • Device Independence: The document’s appearance should not rely on specific hardware, operating systems, or application software.
  • Unambiguous Rendering: The standard prohibits elements that are non-deterministic, such as JavaScript, encryption, and LZW compression (which has patent restrictions).

Major Evolution from PDF/A-3: Why PDF/A-4 Matters

PDF/A-4 is a significant leap forward, designed for the modern digital world:

  • Based on PDF 2.0: It leverages the modern, more robust, and feature-rich foundation of ISO 32000-2.
  • Unified Conformance: It simplifies the previous “a,” “b,” and “u” conformance levels. PDF/A-4 has a single Level A conformance that mandates both visual reproducibility and a complete logical structure (tags) for accessibility, and a Level B conformance for basic visual reproducibility.
  • Embodiment of “All-Information” Principle: A groundbreaking feature is the explicit allowance for embedding any other file format (e.g., source CAD files, original spreadsheets, XML data) within the PDF/A-4 container. This acknowledges that sometimes preserving the look is not enough; you must also preserve the source data. This is a formalization and refinement of the approach introduced in PDF/A-3.
  • Enhanced Support for Modern Content: It offers better support for transparency effects, layers (Optional Content Groups), and a wider range of color spaces like Lab and ICC-based colors, making it suitable for preserving complex digital artworks and technical drawings.

Roadmap: The Strategic Path to PDF/A-4 Adoption

This roadmap is designed for memory institutions (archives, libraries), government bodies, and corporations with legal or regulatory data retention needs.

Phase 1: Discovery & Scoping (Months 1-2)

  • Objective: Understand the scope of your preservation challenge and how PDF/A-4 addresses it.
  • Activities:
    • Document Inventory: Catalog the types of documents you need to preserve (e.g., scanned paper, born-digital reports, technical drawings, office documents).
    • Stakeholder Engagement: Involve records managers, IT, legal/compliance, and archivists.
    • Regulatory Alignment: Identify any legal or internal policies that mandate specific preservation formats or durations.

Training: Educate key personnel on the benefits of PDF/A-4 over previous versions and other formats.

Phase 2: Requirements & Gap Analysis (Months 2-4)

  • Objective: Define your specific conformance requirements and assess your current capabilities.
  • Activities:
    • Conformance Level Decision: Decide between Level B (visual only) and Level A (visual + accessibility). For public sector or public-facing documents, Level A is often mandatory.
    • Tool Assessment: Audit your current software (scanning, office suites, PDF creators) for their ability to generate valid PDF/A-4 files. Many tools claim “PDF/A” support but may not be fully compliant.
    • Workflow Analysis: Map your current document creation and ingestion processes to identify where PDF/A-4 conversion and validation should occur.

Phase 3: Policy & Planning (Month 4)

  • Objective: Formalize your strategy into a concrete policy and project plan.
  • Activities:
    • Develop a File Format Policy: Officially designate PDF/A-4 as the required format for long-term preservation of certain document classes.
    • Create an Implementation Plan: Define timelines, resource allocation (budget, personnel), and success metrics.
    • Vendor Selection: If needed, select and procure PDF/A-4 compliant creation, validation, and management tools.

Phase 4: Implementation & Workflow Integration (Months 5-12)

  • Objective: Integrate PDF/A-4 creation and validation into your document lifecycle.
  • Activities:
    • Configure Scanners & Software: Set up scanning software to output directly to valid PDF/A-4.
    • Implement Conversion Tools: Deploy batch conversion tools for legacy documents and born-digital files (e.g., converting MS Word to PDF/A-4 upon record declaration).
    • Integrate Validation: Embed automated PDF/A-4 validation checks into your document ingestion workflow in your Electronic Document and Records Management System (EDRMS) or Digital Preservation System.

Phase 5: Validation & Quality Assurance (Ongoing)

  • Objective: Ensure all preserved files are valid PDF/A-4.
  • Activities:
    • Use Official Validators: Implement tools like veraPDF (the industry-standard, open-source validator) to check every file upon ingestion.
    • Sampling & Auditing: Periodically sample and re-validate files in the repository to ensure their integrity over time.
    • Address Failures: Establish a process for correcting or quarantining non-compliant files.

Phase 6: Long-Term Preservation & Review (Ongoing)

  • Objective: Maintain the accessibility and integrity of the PDF/A-4 collection.
  • Activities:
    • Bit Preservation: Ensure robust storage, backups, and data integrity checks.
    • Monitor the Standard: Watch for new versions of the PDF/A standard or the emergence of new preservation formats.
    • Format Migration Planning: Plan for a future where PDF/A itself may need to be migrated to a new format, though the long-term stability of PDF makes this a distant concern.

Process: The Technical Workflow for Creating a Valid PDF/A-4 File

This process outlines the steps from a source document to a preserved PDF/A-4 file.

  1. Source Document Preparation:
  • For Born-Digital Files (e.g., MS Word, Excel):
    • Ensure the document uses standard, embeddable fonts.
    • Apply semantic styles (Heading 1, Heading 2, etc.) to enable high-quality tagging.
    • Add alternative text to all images for accessibility (critical for Level A conformance).
  • For Scanned Paper:
    • Use high-quality, uncompressed TIFF scans as the source.
    • Perform Optical Character Recognition (OCR) to create a text layer. For Level A, this text layer must be tagged.
  1. PDF/A-4 Generation:
  • Conversion: Use a compliant PDF generator (e.g., a dedicated PDF/A driver, Adobe Acrobat PDFMaker, or a server-based tool like Ghostscript with a PDF/A-4 configuration).
  • Key Technical Actions Performed by the Generator:
    • Embeds all referenced fonts.
    • Converts colors to device-independent color spaces (sRGB, CMYK, or grayscale).
    • For Level A: Creates a robust logical structure tree (tags) based on the document’s semantics.
    • Removes or flattens prohibited content: JavaScript, encryption, multimedia not allowed in the core profile.
    • Adds XMP Metadata: Populates the standard XMP metadata fields with information about the document.
  1. (Optional) Embedding of Source Files:
  • If following the “all-information” principle, the original source files (e.g., the .docx or .dwg file) can be attached and embedded within the PDF/A-4 container. The standard requires describing these attachments in the XMP metadata.
  1. Validation:
  • The newly created PDF/A-4 file is run through a validator like veraPDF.
  • The validator checks against the official PDF/A-4 ruleset (the “Validation Profile Report” or SCH file) to ensure:
    • All required metadata is present.
    • No prohibited features are used.
    • All fonts are properly embedded and described.
    • The file structure is sound and conforms to PDF 2.0.
  1. Ingestion into the Preservation System:
  • The validated PDF/A-4 file, along with its checksum and technical metadata extracted during validation, is ingested into the digital repository.

Methodology: A Framework for Sustainable Implementation

  1. The “Trust but Verify” Principle:
  • Method: Never assume a software tool produces valid PDF/A-4. Automated, independent validation is non-negotiable. Integrate veraPDF or a similar validator directly into your ingestion workflow to create a quality control gate.
  1. “Normalize upon Ingest” Strategy:
  • Method: Convert diverse file formats into a single, standardized preservation format (PDF/A-4) as they are declared as records and ingested into your long-term repository. This simplifies future management and ensures consistency.
    1. Risk-Based Triage:
    • Method: Not all documents require the same level of preservation.
      • Level B (Visual): Use for simple, image-based documents where visual fidelity is the only concern (e.g., scanned historical letters).
      • Level A (Visual + Accessible): Use for all new, born-digital documents, especially those for public dissemination or subject to accessibility laws (e.g., WCAG, Section 508).
      • “All-Information” with Embedding: Reserve for high-value, complex documents where the source data is as important as the rendered output (e.g., engineering designs, scientific datasets).
    1. Metadata-Driven Preservation:
    Method: Treat the XMP metadata as a first-class citizen. Enforce policies for populating key fields like dc:titledc:creator, and pdfaid:part during the creation process. This metadata is crucial for discovery, management, and proving authenticity in the future.
Scroll to Top