For many years, libraries and archives have used the JPEG and TIFF coding standards to store and make available images in an electronic format. Decades of research in image compression techniques as a subfield of signal processing have yielded advancements through the use of wavelet transformation (as opposed to JPEG which uses discrete cosine transformation and various competing standards for TIFF compression), and some have adopted products based on proprietary wavelet compression implementations such as SID. In the 1990s, under the auspices of the International Standards Organization and the standards section of the International Telecommunication Union, the Joint Photographic Experts Group worked to create a new imaging standard using wavelet compression. The work of the committee reached a pinnacle in December 2000 with the ratification of Part 1 of the JPEG 2000 standard.
As JPEG 2000 is embraced by specialized vertical markets (such as medical imaging and national defense intelligence gathering) and appears in the consumer digital camera and scanner markets, it has the potential to revolutionize common practices in libraries and archives. In addition to achieving greater magnitudes of compression with reduced or no loss of image data, JPEG 2000 was designed to imbed the technical and descriptive metadata associated with images that has become crucial to long-term usability of the image file as a digital artifact. With funding from the Gladys Kreible Delmas Foundation and the Connecticut State Library, the University of Connecticut convened the Symposium on the Adoption of JPEG 2000 by Archives and Libraries on November 4-5, 2004 to begin the process of understanding, coordinating and accelerating the implementation of the standard by providing a forum for delegates to outline the efforts required to achieve wide-scale adoption.
JPEG 2000, named for the Joint Photographic Experts Group and the adoption year of the first part of the standard, was conceived as an effort to offer significant improvements over the first JPEG standard. In particular, the JPEG committee sought to create an open standard that provided better efficiency in image compression (including the option of âlosslessâ compression), the ability to bundle metadata with images in the same physical file, and storage of multiple resolutions of the same image in one file. The committeeâs intention is for JPEG 2000 to replace JPEG as the prevalent standard for storing digital images across many industries and research fields. Although members of the library and archive communities were not explicitly involved with the creation of the new standard, the characteristics of JPEG 2000 files are of great interest to those professions. In fact, JPEG 2000 is not only an evolutionary progression of file formats but also a revolutionary step that can change best practices in ways that will sustain the use of digital imaging for access, preservation, and interchange
Open Standard. An âopen standardâ is one that is free for anyone to read, understand, and implement with no royalties or fees. TIFF is an open standard, although the optional compression algorithm it uses is not. Another file format used extensively in the map and satellite photography fields, SID, offers wavelet compression but in a patented form. The JPEG 2000 standard, as with all open standards, encourages the creation of multiple products for processing these files in a competitive marketplace with assurance that adherence to the standard will result in interoperability. The open nature of the standard also ensures that software can be written to process the files long after a commercial advantage has been realized.
Replacement for TIFF, JPEG and SID. Current best practices for the storage of digital images in the library and archive communities use the TIFF, JPEG, and SID file formats. The older JPEG format reduces the amount of space required to store images through a form of compression that sacrifices image detail, so it is mainly used as an access format. TIFF, used widely as a preservation format, provides options for compression that do not affect image fidelity, but few have gained wide acceptance and those that have are based on patented algorithms. SID is a proprietary wavelet compression format heavily used in map, satellite, and large-format imagery. By comparison, the compression scheme used by JPEG 2000 is free of license and royalty restrictions and provides for âlosslessâ compression of image data. In addition, JPEG 2000 allows for multiple resolutions of images to be contained in the same file; the thumbnail, access version, and preservation version of an image can be stored in the file and retrieved using standards-compliant software.
Bundling of Metadata with Image. The growing abundance of preservation-quality image files and their associated metadata files is bringing into sharp relief the need to effectively manage these resources over the long term. To date, effort has been focused on building complex software systems that bind the metadata with the appropriate image file. JPEG 2000 introduces the concept of metadata bundles within the file format itself, permanently associating the metadata with the image in one digital object.
Catalyst for Advancement of Imaging Practice. As an image file format, TIFF had its origins in the 1980s in desktop publishing and related industries. The practice and terminology from those fields carried forward to the library and archive communities as the format was adopted as standard practice. In the intervening quarter century, imaging technology evolved with a field of science called signal processing that, at its core, represents images as mathematical algorithms. In doing so, the imaging community became concerned with the introduction of ânoiseâ in the image âsignalâ from the hardware, software, and the process for capturing the image (e.g., misalignment of lenses, proper lighting of objects, and adequate construction of sensors). In order to get the most accurate reproduction of the original object in the signal, noise must be reduced. To make the greatest use of JPEG2000 as an image format, the professionals in the library and archive communities must advance their knowledge and understanding of digital image capture with concepts and language from the signal processing field.
Drawing heavily from Everett Rogerâs âDiffusion of Innovations,â the symposium organizers intentionally drew together policy makers and practitioners, individuals from a variety of backgrounds, and delegates all along the spectrum of innovators/early-adopters/early-majority/late-majority/laggards. Rogers suggests that each member of a community follows a five-step process in making the decision to adopt an innovation:
The Symposium on the Adoption of JPEG 2000 was arranged in an arc to take delegates from little assumed knowledge of the standard, through an awareness of how it is currently and can be used, to a point where dialog can occur on stewarding the critical aspects of the standard for long-term accessibility and preservation of digital objects. In the microcosm of the two-day event, the symposium organizers sought to move delegates through the first three steps of Roger's model in the three parts of the symposium. (It is recognized, of course, that it will take longer than two days for members of the delegation who were previously not familiar to JPEG 2000 to fully commit to the second and third stages.)
Robert Buckley from Xerox Corporation and member of the JPEG 2000 committee carried out the first third of the arc in a three-hour tutorial on the fundamentals of JPEG 2000. This background provided the foundation of understanding that was used in subsequent discussions of the practical considerations of adopting and applying a JPEG 2000 practice in libraries and archives. At the time of writing, it is anticipated that video and slides from the presentation will be available on the j2kArcLib.info website.
The second part of the arc provided a forum for vendors and practitioners to share their experiences using the JPEG 2000 standard. Presenters addressed how use of the standard changes common imaging practices, how the standard is being adopted by developers of software systems, and the issues surrounding the adoption of the standard in the library and archive communities. In preparing remarks, speakers were asked to address these questions:
The presenters were:
At the conclusion of the first day â the first two parts of three parts to the arc â the delegation arrived at roughly the same level of understanding of the complexity of the standard and the unique needs of its various formats. In the last third of the symposium, the delegation identified key tasks and issues to be resolved that are required to smooth the adoption of the standard. The tasks and issues identified at the symposium constitute the remainder of the report.
What logically follows is the implementation of JPEG 2000 into the practices of libraries and archives by spreading the knowledge to others and reaching consensus on questions that ensure interoperability. On the final day of the symposium, the delegation engaged in a dialog that identified key tasks and issues to be resolved which are required to smooth the adoption of the standard. The tasks and issues identified at the symposium constitute the remainder of this report.
There were a number of participants for whom the symposium was the first exposure to JPEG 2000. As a result, there are many points for follow up that involve the spread of information to members of the library and archive communities â the Knowledge and Persuasion steps in Rogersâ model. The symposium organizers are committed to providing the platform for further conversation through the j2kArcLib.info website. Content and services that would be used by those seeking to learn and form an opinion about JPEG 2000 are: a list of vendors and open source projects providing equipment and software to implement a JPEG 2000 practice; a list of libraries and archives in the process of planning, developing, or using JPEG 2000; a mailing list through which questions from potential implementers could be answered by practitioners and vendors; opportunities to view and use JPEG 2000 images; and access to JPEG 2000 tools that allow individuals to test software in a real-time environment.
There is also a desire for the creation of case studies describing projects, architectures, and associated workflows. Information sought through the use of case studies is the rationale for the choosing JPEG 2000 and what successes or problems can be reported. The documents should also describe issues of risk (e.g. TIFF versus JPEG 2000 image standards) and the impact of introducing JPEG 2000 into situations where existing standards or software are already in use (TIFF, SID, etc.).
Delegates described mechanisms, in addition to the mailing list, that would be useful in supporting decision-making efforts: expand and maintain the bibliography as a comprehensive source of published articles on the use of the standard in libraries, archives, and related fields; establishment of a weblog that can be indexed/archived to allow continuous conversation on a variety of topics; and development of conference proposals to forums provided by the Society of American Archivists, the Association of Research Libraries, the Coalition on Networked Information, and others.
This group of users also focused on the types of information that can persuade potential implementers to make the decision to use JPEG 2000. This list included acceptance of the standard by professional bodies (e.g., ALA, SAA, ARL), use of the standard by community leaders (e.g., Library of Congress, NARA, ARL-member institutions), inclusion of JPEG 2000 in documents describing imaging best practices, and the establishment of JPEG 2000 as a required element in contracts to system vendors.
With a very flexible standard such as JPEG 2000, the particulars in the application of the standard will be widely divergent until common threads are found around which derivative standards and best practices are developed. In order to leverage the resources of vendors creating tools for other markets, it is incumbent on the libraries and archives to review the various parts of the JPEG 2000 specification to see how close the ratified standard comes to meet the needs of users. Where derivative standardization is required, a rigorous means â such as creation of a new âpartâ through the JPEG 2000 standards committee or a stand-alone process offered by NISO â must be identified and followed by the community to ensure interoperability.
Another side effect of the flexible standard is the complexity of decisions that can be made in the course of applying it in practice. For a file format that is capable of storing several versions of images of varying quality and size as well as layers of images representing different spectral planes or other variables, the setting of compression and file structure parameters can be daunting for even the most experienced imaging practitioner. In order to make the standard accessible to a wide variety of institutions, it would be incumbent upon experienced practitioners to create âprofilesâ of parameters for various source media types and derivative uses. These profiles, though perhaps not as precise as a standard, would form the basis of best practices for the professions.
The capability to include metadata boxes in the file is an issue that requires unique attention from our communities. We must determine if there is a need for the creation of standards or best practices regarding the inclusion of metadata in JPEG 2000 files and decide how proscriptive they should be. For instance: Should there be a minimal descriptive metadata associated with the object? To what extent does DIG-35, an image metadata standard from the digital photography community already in JPEG 2000, cover our technical and/or content metadata requirements? To what extent can PREMIS, the emerging preservation/digital provenance metadata standard, be integrated into JPEG 2000? Is it possible to imbed full metadata, pointers to metadata available through the network, or both in a file? It was also recognized that when an image is sent to an end user, it might not be desirable to send all of the metadata to the user.
With the wide range of permutations of image codestream settings and inclusion of metadata boxes, it may become important to create interoperability testbeds, stores of images from a variety of compression tools, and a validation service for file formats and imbedded metadata boxes. Software authors can use these services to ensure systems are compliant with standards, and institutions seeking examples of JPEG 2000 files can utilize the images for experimentation and testing of new systems.
One very important issue is to enumerate the requirements to certify JPEG 2000 as a format for the widest possible community uses. As an imaging format that may be placed on par with or supplant TIFF, comparing and contrasting the standard with current best practice in terms of preservation, image characteristics, technical metadata, and other details must commence and be resolved to the satisfaction of practitioners and policy makers. Is the promise of a single file that can be used as a preservation master and access mechanism achievable? Is it desirable to have a preservation master serving restorative and transformative (added value) uses?
To some imaging practitioners in archives and libraries, the concept of âcompressionâ seems to be synonymous with the âlossy compressionâ in standards such as GIF (reduction of the color palette) and JPEG (discarding information that the human eye cannot see) and claims of âlossless compressionâ are viewed with skepticism. JPEG 2000 offers a truly lossless compression algorithm through which a decompressed image is identical to the source; this needs to be demonstrated to the satisfaction of library and archive imaging practitioners.
For those considering migration from TIFF-based systems and/or other wavelet compression formats to JPEG 2000, there are a series of policy and technical questions to be resolved. Can JPEG 2000 offer cheaper and/or more sophisticated tools to aid in the access, preservation, and interchange of images? How well does TIFF header data and metadata from other image formats map to JPEG 2000 header information? What is the impact of JPEG 2000 on workflow with the merging images and metadata in one package? In support of those learning about and making decisions concerning the application of JPEG 2000, there is also a need for ongoing analysis and reporting of the available hardware and software tools to compare effectiveness and interoperability.
In the brief time the delegation was together, several questions of the standard arose that need to be addressed. Given the wide variety of metadata schemes, a discovery mechanism for the type of schema/DTD of the data in the XML box and the schema itself is required. A delegate also questioned whether the JPEG 2000 standard specified a requirement and/or a validation mechanism for data in the XML box. JPIP, a part of the JPEG 2000 standard regarding efficient interactive communication of JPEG 2000 image codestream and metadata elements, holds promise as a way to serve users on slow connections and non-traditional devices such as cell phones; in order to maintain a desired context or permissions information, can JPIP be configured to always transmit selected XML/UUID boxes.
A great deal of interest was generated surrounding two presentations of proposed systems to annotate images. In one case, the presenter described a distributed system where images are sent to specialists, the specialist's annotations are encoded in JPEG 2000 metadata boxes, and image is sent back to a central server for compilation and reconciliation. In the second, the presenter showed a prototype of an annotation tool that communicates in real-time with a server to store and index annotations. In both cases, a need was described for a common annotation framework that will aid in the interoperability of JPEG 2000-encoded/annotated images from each project.
At this point in time, most browser-based JPEG 2000 viewers must be licensed from a software vendor and are not freely distributable. In addition, these general-purpose viewers do not have features, such as the display of metadata boxes, critical to the application of the format in the library and archival communities. What is required for the broader adoption is a user tool that has these characteristics: web distributable, browser compliant, and broadly available; a means to see the entire image and parts of an image with acceptable performance over narrow-band connections; manipulation functionality such as pan, zoom, rotate, invert, and mirror; a means to put the image in its context with metadata that is either textual or in other media and can be made visible or suppressed; ability to dynamically retrieve the contextual information; meets identified image quality requirements; and includes transformative tools (i.e. the ability to put the file into a form that is required by a select user community).
In fact, what may be required is a tool architecture that allows for the creation of viewers that meet the differing needs of user communities and source images beyond the baseline characteristics identified above. For instance, as was discussed by the symposium delegation, the needs of a biological taxonomist to annotate portions of an image are different from that of a geologist working with georeferenced, multi-spectral satellite imagery, which is certainly different from a middle school student viewing images of cultural heritage objects. One idea expressed was the creation of a series of cross-platform Java classes that could be combined in a framework to meet the variety of needs from various user communities.
The needs of software agents should be addressed and accounted for in the development of tools and practices. In other words, one cannot assume it is always a human that is consuming the image and metadata. Thought should be given to what happens as an object moves around network and is processed by machines. In particular, there is a clear need to expose embedded metadata such that OAI or other harvesting agents can retrieve the data in JPEG 2000 file formats.
In addition to browsers and software agents, the library and archive communities have unique needs related to the anticipated use of metadata boxes in the JPEG 2000 file formats. Encoding and compressing software will need new options to efficiently accept a variety of metadata formats from a variety of sources as the image file is being built. Practitioners will need file editors that can not only manipulate the image codestream but also offer sophisticated viewing and updating tools for the embedded metadata.
As the technology is being explored, the practitioners in the library and archive communities must engage not only with vendors of systems targeted to those communities but also with software developers of JPEG 2000 toolkits for whom these communities are just one of many vertical markets adopting the standard. The vendors in the symposium delegation represented the latter group, and they clearly expressed a desire to understand what libraries and archives do from a technology perspective. To that end practitioners need to engage in a dialog with the vendors on the creation of use cases and software requirements. Vendors must also be willing to explore the capabilities of the standard with a community that has only recently begun contemplating the implications of a JPEG 2000 practice. At the same time, it must be recognized that the standard specification cannot drive user needs; the tools created must address the needs of the professions and the end users.
The participants of the symposium generally viewed JPEG 2000 as a favorable development in for access, preservation, and interchange of digital imagery within the library and archive communities. Within the delegation, some were already experimenting with JPEG 2000 as an access format and others as a preservation format. Some were creating new services that were not possible with the prior formats, and others stated that their research in the standard had just begun. All agreed that more work needed to be done to validate the use of the standard as an access, preservation and interchange format, and acknowledged the long learning curve ahead for the professions in adopting the standard. The symposium organizers and the delegation are committed to moving the effort forward.
The symposium organizers are grateful for the financial support of the Gladys Kreible Delmas Foundation and the Connecticut State Library. Rob Buckley of Xerox and Ron Murray from the Library of Congress provided invaluable insight in preparing the structure of the symposium, identifying key delegates, and playing an active role in the meeting. The organizers also thank the invited speakers for sharing their knowledge and the delegation for attending and participating in the symposium.