PDF/A is an ISO standard version of Portable Document Format (PDF) specifically for use in archiving and retaining long-term electronic documents. PDF/A differs from PDF by banning features that are not suitable for long-term archiving, such as connecting fonts (as opposed to embedding fonts) and encryption. ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and user interfaces to read embedded annotations.
Video PDF/A
Standard
Maps PDF/A
âââ ⬠<â â¬
PDF is a standard for encoding documents in the form of "as printed" that is portable between systems. However, the suitability of PDF files for archiving archives depends on the options selected when the PDF is created: mainly, whether to embed the necessary fonts for document rendering; whether to use encryption; and whether to store additional information from the original document beyond what is required to print it.
PDF/A was originally a new joint activity between the Printing, Publishing and Convergence Technology (NPES) Association and the Image Information and Management Association to develop an international standard that defines the use of the Portable Document Format (PDF) for archiving documents. The goal is to address the growing need to archive documents electronically in a way that will ensure the preservation of their content over a long period of time and ensure that these documents will be retrievable and delivered with consistent and predictable results in the future.. These needs exist in various areas of government and industry around the world, including legal systems, libraries, newspapers, and regulated industries.
Description
Standard PDF/A does not specify archiving strategy or purpose of filing system. It identifies "profiles" for electronic documents that ensure documents can be reproduced in exactly the same way using various software in the coming years. A key element for this reproducibility is the requirement for PDF/A documents to be 100% self-contained. All the information needed to display the document in the same way embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. PDF/A documents are not allowed to rely on information from external sources (eg font programs and data streams), but may include annotations (such as hypertext links) linked to external documents.
Other important elements for PDF/A adjustments include:
- Audio and video content is prohibited.
- JavaScript and the launch of executable files are prohibited.
- All fonts must be embedded and must also be legally embedded for unlimited universal rendering. This also applies to standard PostScript fonts like Times or Helvetica.
- Colorpaces are specified in a device-free way.
- Encryption is prohibited.
- Use of standard-based metadata is required.
- An external content reference is prohibited.
- LZW is prohibited due to intellectual property restrictions. The JPEG2000 image compression model is not allowed in PDF/A-1 (based on PDF 1.4), as was first introduced in PDF 1.5. JPEG 2000 compression is allowed in PDF/A-2 and PDF/A-3.
- Transparent objects and layers (Optional Content Groups) are prohibited in PDF/A-1, but are allowed in PDF/A-2.
- Provisions for digital signatures in accordance with PESES standards (PDF Advanced Electronic Signatures) are supported in PDF/A-2.
- Embedded files are prohibited in PDF/A-1, but PDF/A-2 allows embedding of PDF/A files, facilitating the archiving of PDF/A documents in one file. PDF/A-3 allows embedding any file formats such as XML, CAD, and more into PDF/A documents.
- The XML Form XML-Based Architecture Form (XFA) is prohibited in PDF/A. (XFA form data can be stored in PDF/A-2 files by moving from the XFA button to the Name tree which itself is the XFAResources key value of the dictionary Name dictionary catalog of documents.)
- The interactive PDF form field must have a dictionary of views associated with field data. The display dictionary should be used when displaying fields.
Rate and version of conformation
PDF/A-1
Part 1 of the standard was first published on September 28, 2005, and set two levels of suitability for PDF files:
- PDF/A-1b - Level B (basic) conformity
- PDF/A-1a - Level A (accessible) conformity
Level B compliance only requires that the standards required for reliable reproduction of the document's visual appearance are followed, while Level A compliance includes all Level B requirements in addition to features intended to improve document accessibility.
Additional Level Requirements A:
- Language specification
- Hierarchical document structure
- Provision of text ranges and descriptive text for images and symbols
- Character mapping to Unicode
The Conformity Level is intended to improve the accessibility of customizing files for users of physical disturbance by enabling auxiliary software, such as screen readers, to more precisely extract and interpret file contents. The next standard, PDF/UA, was developed to eliminate what is considered a deficiency of PDF/A, replacing many general guidelines with more detailed technical specifications.
PDF/A-2
Part 2 standard, published on June 20, 2011, discusses some of the new features added with versions 1.5, 1.6 and 1.7 of the PDF Reference. PDF/A-1 files do not necessarily correspond to files corresponding to PDF/A-2, and PDF/A-2 is not necessarily compatible with PDF/A-1.
Part 2 of the PDF/A Standard is based on PDF 1.7 (ISO 32000-1), rather than PDF 1.4 and offers a number of new features:
- JPEG 2000 image compression
- support for effects and transparency layers
- embed OpenType fonts
- provisions for digital signatures according to PDF Advanced Electronic Signatures - PAdes standard
- option to embed a PDF/A file to make it easier to archive a document set with a single file.
Section 2 defines three levels of conformity. PDF/A-2a and PDF/A-2b according to the conformity a and b in PDF/A-1. The new adjustment rate, PDF/A-2u, indicates the suitability of Level B (PDF/A-2b) with the additional requirement that all text in the document has a Unicode mapping.
PDF/A-3
Part 3 of the standard, published on October 15, 2012, differs from PDF/A-2 in just one thing - it enables embedding of arbitrary file formats (such as XML, CSV, CAD, word processing documents, spreadsheet documents, and others ) into the appropriate PDF/A document.
PDF/A-4
Part 4 of the standard, based on PDF 2.0, is expected to be published in 2018.
Identify
PDF/A documents can be identified as such through the PDF/A-specific metadata located at the "http://www.aiim.org/pdfa/ns/id/" namespace. This metadata represents a claim of conformity; it alone does not guarantee conformity:
- PDF documents can be PDF/A-compliant, except because there is no PDF/A metadata. This can happen for example with documents created before the standard definition of PDF/A, by the authors aware of features that present a problem of preservation of the term long.
- PDF documents can be identified as PDF/A, but may incorrectly include unauthorized PDF features in PDF/A; therefore, documents claiming to be PDF/A-compliant should be tested for PDF/A compliance.
Validation
Test Suite Isartor
The industry collaboration at the original PDF/A Competency Center led to the development of the Test Suite Isartor in 2007 and 2008. The test suite consisted of 204 PDF files deliberately designed to systematically fail any requirements for the suitability of PDF/A-1b, allowing developers to test their software's ability to validate against the most basic standard conformance levels. By mid-2009, test suites have made a considerable difference in the general quality of PDF/A validation software.
veraPDF
The veraPDF consortium, led by the Open Preservation Foundation, and the PDF Association, was created in response to the EU Commission's PREFORMA challenge to develop open-source validators for PDF/A format. The PDF Association launched the Technical PDF Validation Working Group in November 2014 to articulate plans to develop industry-supported PDF/A validators.
The veraPDF consortium then won phase 2 of the PREFORMA contract in April 2015. Development continues throughout 2016, with Phase 2 being completed as scheduled in December 2016. The testing and acceptance period of Phase 3 is concluded in July 2017. veraPDF now covers all sections (1 , 2 and 3) and the degree of conformity (a, b, u) of PDF/A.
veraPDF is available for installation on Windows, macOS, or Linux using PDF parser "PDF based" or "Greenfields".
PDF/A viewers
The PDF/A specification also states some requirements for the corresponding PDF/A viewer, which should be
- ignores any data not specified by PDF and PDF/A standards;
- ignore any linearization information provided by the file;
- uses only embedded fonts (not local, substituted, or simulated local fonts);
- display only using embedded color profiles;
- ensure that the form fields do not change the presentation that is displayed and rendered irrespective of form data;
- ensure that annotations are given consistently.
When you find a file that claims to be compatible with PDF/A, some PDF viewers will use a special "PDF/A" view mode to meet the corresponding reader requirements. To take one example, Adobe Acrobat and Adobe Reader 9 include a warning to notify users that the PDF/A view mode has been enabled. Some PDF viewers allow users to disable view mode PDF/A or delete PDF/A information from a file.
Weakness
PDF/A document must embed all fonts used; therefore, PDF/A files are often larger than equivalent PDF files that do not include embedded fonts.
Use of transparency is prohibited in PDF/A-1. Most PDF creation tools that allow PDF/A document fulfillment, such as PDF export in OpenOffice.org or PDF export tools in Microsoft Office 2007, will also make transparent images in a given document non-transparent. The restriction was removed in PDF/A-2.
Some archivists have raised concerns that PDF/A-3, which allows arbitrary files to be embedded in PDF/A documents, may result in circumvention of institutional memory procedures and restrictions on archived formats.
The PDF Association has handled misunderstandings about PDF/A in the publication "PDF/A in a Nutshell 2.0".
Converting PDF (up to version 1.4) to PDF/A-2 usually works as expected, except for issues with glyph. According to the PDF Association, "Problems can occur before and/or during the creation of PDF The PDF/A file can be officially correct but still has the wrong glyphs Only a careful visual inspection can uncover this problem Because the generation issue also affects Unicode Mapping , the problem draws attention when a visual check is done on extracted text.In PDF/A, the use of text/font is uniquely determined enough to ensure that it can not be wrong.If the viewer or printer does not offer complete support for the encoding system, PDF/A. "Means that for documents that are completely compliant with the standard, it will be true internally, while the system used to view or print documents can produce undesirable results.
Documents generated with OCR conversion into PDF/A-2 or PDF/A-3 do not support the notdefglyph flag. Therefore, this type of conversion can generate unchanged content.
See also
- digital dark ages
- PDF/X - another part of the PDF standard, this time optimized for print production
References
Further reading
- PDF/A in a Nutshell 2.0 - published by the PDF Association (2013)
- PDF/A 101: An Introduction - presentation of the First International Conference PDF/A (2008)
- White Books: PDF/A - Basics - from PDF Tools AG (2009)
- Description of the format for PDF/A-1 - in digitalpreservation.gov
External links
- PDF Associations
- PDF/Competency Center
- veraPDF - PDF/Software validation
Source of the article : Wikipedia