Template for common text ISO/UIT-T - IPSJ/ITSCJ · Web viewA descriptive term for DCT-based encoding and decoding processes in which additional capabilities are added to the baseline

ISO/IEC JTC 1/SC 29/WG1 N 6626 17 January 2014

ISO/IEC JTC 1/SC 29/WG 1(& ITU-T SG16)

Coding of Still Pictures

JBIG JPEGJoint Bi-level Image Joint

Photographic Experts Group Experts Group

TITLE: Text of ISO/IEC WD1 18477-2

SOURCE: JPEG HDR Ad Hoc Group Thomas Richter, Germany Walt Husak, USA Ajit Ninan, USA Arkady Ten, USA, Pavel Korshunov, Switzerland Touradj Ebrahimi, Switzerland Alessandro Artusi, UK Massimiliano Agostinelli, UK

PROJECT: ISO/IEC 18477-2 Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images Extensions for High-Dynamic Range Images

STATUS: WD1

REQUESTEDACTION: For WD1 ballot

DISTRIBUTION: For WG1 review

INTERNATIONAL ORGANISATION FOR STANDARDISATION

INTERNATIONAL ELECTROTECHNICAL COMMISSION

Information technology –

Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images

Extensions for High Dynamic Range Images

Draft ITU-T Recommendation | International Standard

CONTENTSPage

ITU-T Rec. T.xxxx (200x E) i

ForewordThis Recommendation | International Standard is an extension of 18477-1, a compression system for continuous tone digital still images which is backwards compatible with ITU.T Rec.81 | ISO/IEC 10918-1. That is, legacy applications conforming to ITU.T Rec. 81 | ISO/IEC 10918-1 will be able to reconstruct streams generated by an encoder conforming to this Recommendation | International Standard, though will possibly not be able to reconstruct such streams in full dynamic range, full quality or other features defined in this Recommendation| International Standard.

The aim of this standard is to provide a migration path for legacy applications to support, potentially in a limited way, lossless coding and coding of high-dynamic range images. Existing tools depending on the existing standards will continue to work, but will only be able to reconstruct a lossy and/or a low-dynamic range version of the image contained in the codestream. This Recommendation | International Standard specifies a coded file format, referred to as JPEG XT, which is designed primarily for storage and interchange of continuous-tone photographic content.

IntroductionThis Recommendation | International standard specifies a coded codestream format for storage of continous-tone high and low dynamic range photographic content. JPEG XT part 2 is a scalable image coding system supporting multiple component images in floating point. It is by itself an extension of the coding tools defined in 18477-1; the codestream is composed in such a way that legacy applications conforming to ITU.T 81 | ISO/IEC 10918-1 are able to reconstruct a lower quality, low dynamic range, eight bits per sample version of the image.

Today, the most widely used digital photography format, a minimal implementation of JPEG (specified in ITU-T Recommendation T.81 | ISO/IEC 10918-1), uses a bit depth of 8; each of the three channels that together compose an image pixel is represented by 8 bits, providing 256 representable values per channel. For more demanding applications, it is not uncommon to use a bit depth of 16, providing 65 536 representable values to describe each channel within a pixel, resulting on over 2.8 x 1014 representable color values. In some less common scenarios, even greater bit depths are used.

Most common photo and image formats use an 8-bit or 16-bit unsigned integer value to represent some function of the intensity of each color channel. While it might be theoretically possible to agree on one method for assigning specific numerical values to real world colors, doing so is not practical. Since any specific device has its own limited range for color reproduction, the device’s range may be a small portion of the agreed-upon universal color range. As a result, such an approach is an extremely inefficient use of the available numerical values, especially when using only 8 bits (or 256 unique values) per channel. To represent pixel values as efficiently as possible, devices use a numeric encoding optimized for their own range of possible colors or gamut.

JPEG XT is primarily designed to provide coded data containing high dynamic range and wide color gamut content while simultaneously providing eight bits per pixel low dynamic range images using tools defined in 18477-1. The goal is to provide a backwards compatible coding spefication that allows legacy applications and existing toolchains to continue to operate on codestreams conforming to this Recommendation|International Standard.

JPEG XT has been designed to be backwards compatible to legacy applications while at the same time having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of ITU.T 81|ISO/IEC 10918-1 to extend the functionality of the legacy JPEG Coding System. It is optimized for storage and transmission of high dynamic range and wide color gamut 32bit float images while also enabling low-complexity encoder and decoder implementations.

ii ITU-T Rec. T.xxxx (200x E)

INTERNATIONAL STANDARDISO/IEC 29199-2 : 200x (E)ITU-T Rec. T.xxxx (200x E)

ITU-T RECOMMENDATION

INFORMATION TECHNOLOGY – INFORMATION TECHNOLOGY – JPEG HDR IMAGE CODING SYSTEM:

CODING OF HIGH DYNAMIC RANGE IMAGES

1 ScopeThis Recommendation | International standard specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content.

2 Normative referencesThe following Recommendations and International Standards contain provisions which, through reference in this text, constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently valid ITU-T Recommendations.

2.1 Identical Recommendations | International StandardsITU.T XXX

2.2 Paired Recommendations | International Standards equivalent in technical content

ISO/IEC 18477-1: Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images,

Core Coding System Specification

2.3 Additional referencesITU-T Rec. T.81 | ISO/IEC 10918-1: Information Technology – Digital Compression and Coding of Continuous Tone Still Images – Requirements and Guidelines ITU-T Rec. T.86 | ISO/IEC 10918-4: Information technology -- Digital compression and coding of continuous-tone still images: Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour spaces, APPn markers, SPIFF compression types, and Registration AuthoritiesITU-T Rec. T.871 | ISO/IEC 10918-5: Information technology -- Digital compression and coding of continuous-tone still images: JPEG File Interchange FormatITU-T Rec. T.801 | ISO/IEC 15444-1: Information technology – JPEG 2000 Image Coding SystemIEC 60559 Binary floating-point arithmetic for microprocessor systemsIEC 61966-2-1sRGB Colour management – Default RGB colour space – sRGB ISO/IEC 10646-1Annex D Information Technology – Universal Multiple-Octet Coded Character Set (UCS) – part 1:Architecture and Basic Multilingual Plane

3 Definitions, Abbreviations and Symbols

3.1 DefinitionsFor the purposes of this Recommendation | International Standard, the following definitions apply.

ITU-T Rec. T.xxxx (200x E) 1

AC coefficient: Any DCT coefficient for which the frequency is not zero in at least one dimension.

binary decision: Choice between two alternatives.

bit stream: Partially encoded or decoded sequence of bits comprising an entropy-coded segment.

block: An 8 8 array of samples or an 8 8 array of DCT coefficient values of one component.

box: a structured collection of data describing the image or the image decoding process embedded into one or multiple APP 11 marker segments. See Annex for the definition of boxes.

byte: A group of 8 bits.

coder: An embodiment of a coding process.

coding: Encoding or decoding.

coding model: A procedure used to convert input data into symbols to be coded.

(coding) process: A general term for referring to an encoding process, a decoding process, or both.

compression: Reduction in the number of bits used to represent source image data.

component: A two-dimensional array of samples having the same designation in the output or display device. An image typically consists of several components, e.g. red, green and blue.

continuous-tone image: An image whose components have more than one bit per sample.

data unit: An 8 8 block of samples of one component in DCT-based processes; a sample in lossless processes.

DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.

(DCT) coefficient: The amplitude of a specific cosine basis function – may refer to an original DCT coefficient, to a quantized DCT coefficient, or to a dequantized DCT coefficient.

decoder: An embodiment of a decoding process.

decoding process: A process which takes as its input compressed image data and outputs a continuous-tone image.

dequantization: The inverse procedure to quantization by which the decoder recovers a representation of the DCT coefficients.

discrete cosine transform; DCT: Either the forward discrete cosine transform or the inverse discrete cosine transform.

downsampling: A procedure by which the spatial resolution of a component is reduced.

encoder: An embodiment of an encoding process.

encoding process: A process which takes as its input a continuous-tone image and outputs compressed image data.

entropy-coded (data) segment: An independently decodable sequence of entropy encoded bytes of compressed image data.

entropy decoder: An embodiment of an entropy decoding procedure.

entropy decoding: A lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the entropy encoder.

entropy encoder: An embodiment of an entropy encoding procedure.

entropy encoding: A lossless procedure which converts a sequence of input symbols into a sequence of bits such that the average number of bits per symbol approaches the entropy of the input symbols.

extended (DCT-based) process: A descriptive term for DCT-based encoding and decoding processes in which additional capabilities are added to the baseline sequential process.

forward discrete cosine transform; IFDCT: A mathematical transformation using cosine basis functions which converts a block of samples into a corresponding block of original DCT coefficients.

frequency: A two-dimensional index into the two-dimensional array of DCT coefficients.

grayscale image: A continuous-tone image that has only one component.

high dynamic range: An image or image data comprised of more than eight bits per sample.

Huffman decoder: An embodiment of a Huffman decoding procedure.

Huffman decoding: An entropy decoding procedure which recovers the symbol from each variable length code produced by the Huffman encoder.

Huffman encoder: An embodiment of a Huffman encoding procedure.

Huffman encoding: An entropy encoding procedure which assigns a variable length code to each input symbol.

2 ITU-T Rec. T.xxxx (200x E)

Joint Photographic Experts Group; JPEG: The informal name of the committee which created this Specification. The “joint” comes from the ITU-T and ISO/IEC collaboration.

legacy decoder: An embodiment of a decoding process conforming to ITU.T Rec.T.81|ISO/IEC 10918-1, confined to the lossy DCT process and the baseline, sequential or progressive modes, decoding at most four components to eight bits per component.

lossless: A descriptive term for encoding and decoding processes and procedures in which the output of the decoding procedure(s) is identical to the input to the encoding procedure(s).

lossless coding: The mode of operation which refers to any one of the coding processes defined in this Specification in which all of the procedures are lossless (see Annex H).

lossy: A descriptive term for encoding and decoding processes which are not lossless.

low-dynamic range: An image or image data comprised of data with no more than eight bits per sample.

marker: A two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and hexadecimal FE.

marker segment: A marker together with its associated set of parameters.

minimum coded unit; MCU: The smallest group of data units that is coded.

pixel: A collection of sample values in the spatial image domain having all the same sample coordinates, e.g. a pixel may consist of three samples describing its red, green and blue value.

point transform: Scaling of a sample or DCT coefficient by a factor.

precision: Number of bits allocated to a particular sample or DCT coefficient.

procedure: A set of steps which accomplishes one of the tasks which comprise an encoding or decoding process.

quantization value: An integer value used in the quantization procedure.

quantize: The act of performing the quantization procedure for a DCT coefficient.

residual scan: An additional pass over the image data invisible to legacy decoders which provides additive and/or multiplicative correction data of the legacy scans to allow reproduction of high-dynamic range or wide color gamut data.

refinement scan: An additional pass over the image data invisible to legacy decoders which provides additional least significant bits to extend the precision of the DCT transformed coefficients.

sample: One element in the two-dimensional image array which comprises a component.

sample grid: A common coordinate system for all samples of an image. The samples at the top left edge of the image have the coordinates (0,0), the first coordinate increases towards the right, the second towards the bottom.

scan: A single pass through the data for one or more of the components in an image.

scan header: A marker segment that contains a start-of-scan marker and associated scan parameters that are coded at the beginning of a scan.

table specification data: The coded representation from which the tables used in the encoder and decoder are generated and their destinations specified.

(uniform) quantization: The procedure by which DCT coefficients are linearly scaled in order to achieve compression.

upsampling: A procedure by which the spatial resolution of a component is increased.

vertical sampling factor: The relative number of vertical data units of a particular component with respect to the number of vertical data units in the other components in the frame.

zero byte: The 0x00 byte.

zig-zag sequence: A specific sequential ordering of the DCT coefficients from (approximately) lowest spatial frequency to highest.

3.2 Symbols

X Width of the sample grid in positionsY Height of the sample grid in positionsNf Number of components in an imagesi,x Subsampling factor of component i in horizontal directionsi,y Subsamplng factor of component i in vertical directionHi Subsampling indicator of component i in the frame headerVi Subsampling indicator of component i in the frame headervx,y Sample value at the sample grid position x,y


h Additional number of DCT coefficients bits represented by refinement scans, 8+h is the number of non-fractional bits (i.e. bits in front of the "binary dot") of the output of the inverse DCT process.

Rb Additional bits in the HDR image. 8+Rb is the sample precision of the reconstructed HDR image.

3.3 AbbreviationsFor the purposes of this Recommendation | International Standard, the following abbreviations apply.

ASCII American Standard Code for Information Interchange LSB Least Significant BitMSB Most Significant BitHDR High Dynamic RangeLDR Low Dynamic RangeTMO Tone Mapping OperatorDCT Discrete Cosine Transformation

4 Conventions

4.1 Conformance languageThis Recommendation | International Standard consists of normative and informative text.

Normative text is that text which expresses mandatory requirements. The word "shall" is used to express mandatory requirements strictly to be followed in order to conform to this Specification and from which no deviation is permitted. A conforming implementation is one that fulfils all mandatory requirements.

Informative text is text that is potentially helpful to the user, but not indispensable and can be removed, changed or added editorially without affecting interoperability. All text in this Recommendation | International Standard is normative, with the following exceptions: the Introduction, any parts of the text that are explicitly labelled as "informative", and statements appearing with the preamble "NOTE" and behaviour described using the word "should". The word "should" is used to describe behaviour that is encouraged but is not required for conformance to this Specification.

The keywords "may" and "need not" indicate a course of action that is permissible in a conforming implementation.

The keyword "reserved" indicates a provision that is not specified at this time, shall not be used, and may be specified in the future. The keyword "forbidden" indicates "reserved" and in addition indicates that the provision will never be specified in the future.

4.2 OperatorsNOTE – Many of the operators used in this Recommendation | International Standard are similar to those used in the C programming language.

4.2.1 Arithmetic operators+ Addition Subtraction (as a binary operator) or negation (as a unary prefix operator)

* Multiplication

/ Division without truncation or rounding.

smod x smod a is the unique value y between -(a-1)/2 and (a-1)/2

for which y+Na = x with a suitable integer N.

umod x mod a is the unique value y between 0 and a-1

for which y+Na = y with a suitable integer N.


4.2.2 Logical operators|| Logical OR

&& Logical AND

! Logical NOT x {A, B} is defined as (x == A || x == B) x {A, B} is defined as (x != A && x != B)

4.2.3 Relational operators> Greater than

>= Greater than or equal to

< Less than

<= Less than or equal to

== Equal to

!= Not equal to

4.2.4 Precedence order of operatorsOperators are listed below in descending order of precedence. If several operators appear in the same line, they have equal precedence. When several operators of equal precedence appear at the same level in an expression, evaluation proceeds according to the associativity of the operator either from right to left or from left to right.

Operators Type of operation Associativity

(), [ ], . Expression Left to Right Unary negation

*, / Multiplication Left to Right

umod, smod Modulo (remainder) Left to Right

+, Addition and Subtraction Left to Right

< , >, <=, >= Relational Left to Right

4.2.5 Mathematical functionsx Ceil of x. Returns the smallest integer that is greater than or equal to x.

x Floor of x. Returns the largest integer that is lesser than or equal to x.

|x| Absolute value, is –x for x < 0, otherwise x.

sign(x) Sign of x, zero if x is zero, +1 if x is positive, -1 if x is negative.

clamp(x,min,max) Clamps x to the range [min,max]: returns min if x < min, max if x > max or otherwise x.

power(x,a) Raises the value of x to the power of a. x is a non-negative real number, a is a real number. Power(x,a) is equal to exp(a×log(x)) where exp is the exponential function and log() the natural logarithm. If x is zero and a is positive, power(x,a) is defined to be zero.


5 General

The purpose of this clause is to give an informative overview of the elements specified in this Specification. Another purpose is to introduce many of the terms which are defined in clause 3. These terms are printed in italics upon first usage in this clause.

There are three elements specified in this Specification:

a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source image data and encoder specifications, and by means of a specified set of procedures generates as output a codestream.

b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by means of a specified set of procedures generates as output digital reconstructed image data.

c) The codestream is a compressed image data representation which includes all necessary data to allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data might be required that define the interpretation of the sample data, such as color space or the spatial dimensions of the samples.

5.1 High Level Overview on 18477-2This Recommendation|International Standard allows lossy and lossless coding of high and extended dynamic range of photographic images in a way that is backwards compatible to ITU T.81|ISO/IEC 10918-1. Decoders compliant to the latter standard will be able to parse codestreams conforming to this Recommendation|International standard correctly, albeit in less precision, with a limited dynamic range, and potential loss in sample bit precision.

This Recommendation|International Standard includes multiple tools to reach the above functionality, defined in Annexes A through I. A short overview on these coding tools will be given in this section.

The high-level syntax of a 18477-2 codestream is identical to that defined in 18477-1, which is a subset of the syntax defined in ITU Rec. T.81|ISO/IEC 10918-1. Marker definitions and the syntax of the markers defined in the above recommendation remain in force and unchanged. However, this Recommendation|International standard defines the APP11 marker, reserved in the legacy recommendation|standard for encoding additional syntax elements. Legacy decoders will skip and ignore such marker elements, and hence will only be able to decode the image encoded by the legacy syntax elements. This part of the 18477-2 codestream will be denoted the legacy codestream in the following.

This Recommendation|International standard extends the legacy standard by two mechanisms, both of which use the APP11 marker to hide the extended syntax from legacy applications. Annex B defines a binary format for additional data that uses a synax element called boxes. Boxes are, as introduced above, encoded in one or multiple APP11 markers by a mechanism defined in Annex C. A box may either include additional metadata required to decode the complete codestream to full precision, full dynamic range or without loss, or may contain entropy coded image data itself. The box syntax and the box types are also defined in Annex C. Annex B defines a text-based format that is capable to encode a subset of the information available in the boxes; it is only applicable to the Baseline Profile. The text based and the box based format shall not be mixed within the same codestream.

High dynamic range coding of floating point samples is supported by residual coding. This coding method represents the high-dynamic range image by two codestreams, the legacy codestream and the residual codestream.The former represents a represents a low-dynamic range representation of the original image that is visible to legacy applications, the latter a residual image that is merged with the low-dynamic range legacy image in the spatial domain to obtain the high-dynamic range reconstructed output. The merging process first generates a precursor image of the high-dynamic range image by transforming the legacy codestream to a linear colorspace. The precision of the precursor image is then extended to full precision by the residual image which is decoded from the residual codestream packaged in the residual codestream box defined in Annex C or the residual codestream segment defined in Annex B

The residual codestream is the entropy coded representation of the residual image. This Recommendation|International standard defines multiple coding methods for encoding the residual image: The DCT baseline Huffman, extended Huffman or progressive coding method defined in ITU Rec. T.81|ISO/IEC 10918-1.

5.2 High level functional overview of decoding process


Figure 1: Overview on the decoding process

The decoding process is best described by the above Figure 1. The process begins with the legacy decoder block which reconstructs the base image. This image is then optionally Chroma upsampledfollowed by the Inverse Decorrelation block . The output of this transformation is a low-dynamic range image with eight bits per sample in a RGB-type color space. This upper path is the backward compatible part.

The low-dynamic range components are further mapped by the Base Mapping and Color Space Conversion block to a floating point image which is called the precursor image. The Precursor image is optionally converted to HDR color space and Luminance can be calculated. The noise level maybe used to avoid division by zero and to reduce the compression artifacts which maybe amplified in following blocks.

The residual decoder path uses the residual data that is embedded in the codestream in the APP 11 markers. This data is reconstructed and then optionally upsampled. It is then processed by the Residual Mapping and Inverse Decorrelation block. This block maps the residual data to floating point domain which is optionally Inversely Decorrelated. This mapping may use the luminance computed by the Base Mapping and Color Space Conversion block.

The mapped Residual data and the precursor image are processed by the HDR Reconstruction block to produce the final reconstructed HDR Image.

5.3 Profiles

The Profiles define the implementation of a particular technology within the functional blocks of Figure 1. The profiles are described in Annex A.

5.4 Encoder requirements

There is no requirement in this Specification that any encoder shall support all profiles. An encoder is only required to meet the compliance tests and to generate the codestream according to the syntax and to limit the coding parameters to those valid within the profile it conforms to. Profiles are defined in the Annex of this Recommendation|International Standard.

5.5 Decoder requirements

A decoding process converts compressed image data to reconstructed image data. It shall follow the decoding operation specified in the Recommendation | International Standard and ISO/IEC 18477-1 for the Residual and Base Image


codestreams. The Decoder shall parse the codestream syntax to extract the Parameters, the Residual data and the Base Image. The parameters shall be used to merge the Residual data and Base Image into the reconstructed HDR Image.

In order to comply with this Specification, a decoder

a) shall convert a codestream conforming to this Recommendation | International Standard without considering the Residual Data into to a low dynamic range image.

b) may additionally, convert a conforming codestream including the Residual data according to the codestream syntax, into a high dynamic range continuous tone image.

c) shall implement at least all the functional blocks of the JPEG XT decoding process defined in the profile it claims to be conforming to.


Annex AProfiles

(This Annex forms a normative and integral part of this Recommendation | International Standard.)

A.1 Profile AThe Profile A relies on a layered approach to extend JPEG’s capabilities. The proposal decomposes an HDR image into a base layer and an HDR ratio residual layer. The base layer is a tone mapped image tone mapped from the original floating point HDR with either a local or global tonemapper. This codestream will be the backwards compatible part and will be accessible by all legacy decoders. The Ratio Residual layer contains HDR quantized log luminance ratio and the chrominance residual difference, this data is put together and represented as a single ratio Residual Image. To extract this data the HDR decoder uses a T.81 10918-1 Decoder as well.

Since the Residual Data is hidden in the APP11 markers the Legacy decoders will skip over this residual image and only access the base image codes stream and will thereby be backwards compatible. JPEG XT compliant decoders will combine the two as described below to recover the HDR image.

Figure A-1 illustrates the functionality of a Profile A decoder.

Figure A-1: High level overview of the decoding process of a Profile A compliant decoder

Profile A compliant codestreams shall use the codestream syntax defined in Annex B.

As shown in the block diagram, both the Base Image and Residual Data of a Baseline Profile compliant codestream shall be encoded using the syntax and semantics of ISO/IEC 10918-1, where both are constraint to the 8 bit baseline Huffman, extended Huffman or progressive coding process.

In Figure A-1 the upper path which comprise of blocks B1,B2,B3 will be the standard flow of a legacy decoder and outputs a backwards compatible LDR image in sRGB space typically. This decoder uses the decorrelation transformation according to Annex D, which is either a YcbCr to RGB conversion or an identity transformation, dependening on the presence of the Color Decorrelation Control marker specified in ISO/IEC 18477-1. This base image data is then mapped into linear HDR space and processed by the color space conversion operation in B4. This block converts the LDR image into the color space of the original HDR image, and it also maps the image to floating point value and called Linear Pre RGB2, it can also be referred to as LP_RGB2. A noise floor value specified in the parameter codestream is added to the luminance component of the LP_RGB2 to avoid divide by 0 and to avoid amplifying any noise that could occur due to operations downstream from this block for small values. This noise floor offset luminance value can be referred to as lum below.


Block B3 implements the decorrelation transformation defined by table D-1 and converts upsampled samples from YcbCr space to RGB space if appropriate. Details on this conversion process are specified in subclause D.1.

If the number of components of the legacy image Nf equals three, denote the reconstructed sample values from block B3 by inputR, inputG and inputB, respectively. If the number of components of the legacy image is one, the reconstructed sample value is denoted by inputY. The Block B4 then performs the following functions:

// basemapping RGB1 If (Nf == 3) {

R1 = inputR;G1 =inputG; B1=inputB,

} else if (Nf == 1) {R1 = inputY;G1= inputY;B1 = inputY;

}// basemapping RGB1

dgR1 = basemappingLUT[R1];

dgG1 = basemappingLUT[G1];

dgB1 = basemappingLUT[B1];

The basemappingLUT is loaded with the passed bga parameter in header specified in subclause B.1 if present. Otherwise, basemappingLUT shall be initialized by the inverse sRGB color transform function specified in IEC 61966-2-1:1999.

// color space conversion from RGB2 to HDR

if(header_param_csb present) {

csb[0:8] = header_param_csb[0:8];

} else {

( csb[0] = 1; csb[1] = 0; csb[2] = 0;

csb[3] = 0; csb[4] = 1; csb[5] = 0;

csb[6] = 0; csb[7] = 0; csb[8] = 1;)

}

LP_R2= csb[0]*dgR1 + csb[1]*dgG1 + csb[2]*dgB1;

LP_G2= csb[3]*dgR1 + csb[4]*dgG1 + csb[5]*dgB1;

LP_B2= csb[6]*dgR1 + csb[7]*dgG1 + csb[8]*dgB1;

// calculate luminance and add noise floorif (header_param_cst_present) {

lum = (cst[3]*LP_R2 + cst[4]*LP_G2 + cst[5]*LP_B2) + header_param_nf ;

} else {

lum = (0.299*LP_R2 + 0.587*LP_G2 + 0.114*LP_B2) + header_param_nf ;}basemappingLUT specified in subclause B.1 is a 256 entry table loaded by a default sRGB table which is an inverse linear and power function of 2.4. If it is in an alternate color space such as AdobeRGB, the look up table will be sent in the header. This basemapping is typically inverse gamma table but could also be an inverse gamma and an inverse tonemapping function combined. The Color space conversion defaults to identity when the base image and the HDR target color space are the same. If they are different, this matrix is sent in the header as csb parameters. These matrix parameters represent the product of


the base color space primaries and the inverse of the HDR target color space matrix. The target color space for the HDR can be found in the header as the cst parameters.

In Figure A-1, the lower path starting from B5 begins with the Residual data of the high-dynamic range image, and is represented by the ISO/IEC 10918-1 codestream format. This codestream is embedded in the APP11 marker as a Residual Data segment described in subclause B.2. After being decoded by the T.81 10918-1 Decoder the chroma upsampling step is performed by B6 to bring all components to full resolution i.e. 4:4:4. Chroma upsampling is identical to the upsampling process defined in Annex A of 18477-1.

The Residual Ratio data is then separated by B7 into floating point Linear Ratio Luminance values and the Linear Residual color difference value. The incoming residual luminance values are inverse quantized according to the parameters recorded in the header segment defined in subclause B.1. They are either provided by an explicit lookup table in the parameter segment described by the riq parameter in the codestream. If this riq table is not present then using the min and max, referred to as ln1, ln0 in the parameters segment, an inverse log map is calculated. Similarly, the incoming chroma residual sample values are inverse quantized according to the minimum & maximum parameters stored in the parameter segment of the codestream as cb0,cb1 and cr0,cr1. For the following, rY is the first, and rCb and rCr are the output of block B6, the chroma-upsampled image reconstructed from the residual codestream. If the residual codestream contains only one component, rCb and rCr shall be set to zero.

The following describes the function of block B7:// rY,rCb,rCr are the inputs from the decoded residual stream

// dequant and Linearize Luminance Ratio

If (header_paramater_riq present) {

Dqlut[] = header_parameter_riq;

LR_lum = dqlut[rY];

} else {

val = rY * (header_paramater_ln1 – header_paramater_ln0) + header_paramater_ln0;

LR_lum = exp(val);

}

// dequant and Linearize Chroma Residue

val = rCb * (header_paramater_cb1 - header_paramater_cb0) + header_paramater_cb0;

LR_Cb = val * lum;

val = rCr * (header_paramater_cr1 - header_paramater_cr0) + header_paramater_cr0;

LR_Cr = val * lum;

The Chroma values are then processed by B8, the YCbCr to RGB2 block and will convert the Linear Dequantized YCbCr to a Linear Residue RGB2 in the HDR color space, alternatively referred to as LR_RGB2.

// YCbCr to RGB2

if(header_param_csr present) {

csr[0:8] = header_param_csr[0:8];

} else {

( csr[0] = 1; csr[1] = 0; csr[2] = 1.402;

csr[3] = 1; csr[4] = -0.34414; csr[5] = -0.71414;

csr[6] = 1; csr[7] = 1.772; csr[8] = 1;)}

LR_R2= csr[1]*LR_Cb + csr[2]*LR_Cr;

LR_G2= csr[4]*LR_Cb + csr[5]*LR_Cr;


LR_B2 = csr[7]*LR_Cb + csr[8]*LR_Cr;

Finally the blocks B9 and B10 constructs an HDR image by first adding the Linear Pre RGB2 to the Linear Residue RGB2 in B9 and then multiplying the result by the Linear Ratio Luminance in B10.

Blocks 9 and 10 have the following function:

hdrR2 = LP_R2 + LR_R2;

hdrG2 = LP_G2 + LR_G2;

hdrB2 = LP_B2 + LR_B2;

/* Mult */

hdrR = LR_lum * hdrR2;

hdrG = LR_lum * hdrG2;

hdrB = LR_lum * hdrB2;

If Nf equals three, he final 32bit float output values shall be hdrR,hdrG,hdrB and represents the final HDR image. Otherwise, if Nf equals one, the final 32bit float output value shall be hdrG.

A.2 Profile B

The Profile B relies on a layered approach to extend JPEG’s capabilities. The proposal decomposes an HDR image into a base layer and an HDR residual layer. The base is either an exposed and clamped or tone mapped version from the original floating point HDR. This codestream will be the backwards compatible part and will be accessible by all legacy decoders. The HDR Residual layer contains the fractional part of the tonemapped LDR image divided by the original HDR image carried out channel by channel in RGB space, and converted to YCrCb. The residual data is also encoded with a Rec. T.81 | ISO/IEC 10918-1 encoder using only the baseline or extended Huffman or progressive 8 bit modes, and the Number of Components signalled in the residual frame header shall be identical to the number of components in the legacy image.

Since the Residual Data is hidden in the APP11 markers, Legacy decoders will skip over this residual image and only access the base image codesstream and will thereby be backwards compatible. JPEG XT compliant decoders will combine the two as described below to recover the HDR image.The details of this merging process are outlined below.

Figure B-2 illustrates the functionality of a Profile B decoder.


Figure A-2: High level overview of the decoding process of a Profile B compliant decoder

Profile B compliant codestreams shall use the codestream syntax defined in Annex C.

In Figure B-2 the upper path which comprise of blocks B1,B2,B3 will be the standard flow of a legacy decoder and outputs a backwards compatible LDR image in sRGB space typically. The RGB data is next processed by block B4 where a mapping to float is performed followed by an Inverse Gamma. This block maps the base image into linear floating point space and outputs Linear Pre RGB values.

Block B4 using the header parameters ldr_gama to perform the following mapping. If the number of components Nf of the legacy image equals three, denote the reconstructed upsampled sample values from the legacy codestream by R,G and B. Otherwise, if Nf equals one, denote the single reconstructed sample value by Y. The number of components of the residual image shall be equal to the number of components signalled in the legacy image.

// input to B4 is the base image RGB// Dequantise RGB to float If (Nf == 1) {

linbase_Y = Y / 255.0;} else if (Nf == 3) {

linbase_R = R/255.0;linbase_G = G/255.0;linbase_B = B/255.0;

}

// inverse gamma of linbase_RGBIf ( Nf == 1) {

LP_Y = power(linbase_Y, header_param_ldr_gamma);} else if (Nf == 3) {

LP_R= power(linbase_R, header_param_ldr_gamma);LP_G= power(linbase_G, header_param_ldr_gamma);LP_B = power(linbase_B, header_param_ldr_gamma);

}

In Figure B-2 the lower path starting from B5 begins with the Residual data of the high-dynamic range image, and is represented by the Rec. T.81 | ISO/IEC 10918-1 codestream format. This codestream is embedded in the APP 11 marker as a Residual Data segment described in Annex B-XX. After being decoded by the Rec. T.81 10918-1 Decoder the chroma upsampling step is performed by B6 to bring all components to full resolution i.e. 4:4:4. Chroma upsampling is identical to the upsampling process defined in Annex M.

Block B7 then performs a YCbCr to RGB in 8bits by applying the L transformation as selected by Table D-1 specified in Annex D . Its output consists of either one value rc_Y if Nf == 1 or three values rc_R,rc_G,rc_B if Nf == 3.

The block B8 maps to floating point the output of block B7.

If (Nf == 3) {rbase_R = rc_R/255.0;rbase_G = rc_G/255.0;rbase_B = rc_B/255.0;

} else if (Nf == 1) {rbase_Y = rc_Y / 255.0;

}

This is followed by an inverse gamma function in B9 as described below:

If (Nf == 3) {

rcig_R = power(rbase_R, header_param_hdr_gamma);


rcig_G = power(rbase_G, header_param_hdr_gamma);

rcig_B = power(rbase_B, header_param_hdr_gamma);

} else if (Nf == 1) {

rcig_Y = power(rbase_Y, header_param_hdr_gamma);

}

The purpose of block B10 is to re-expand the values to recover the original residual image.

If (Nf == 3) {

LR_R = rcig_R * header_param_maxR;

LR_G = rcig_G * header_param_maxG;

LR_B = rcif_B * header_param_maxB;

} else {

LR_Y = rcig_Y * header_param_maxR;

}

The next few blocks B11, B12 are the HDR reconstruction blocks. Both B11 and B12 are Division functions; the first division takes the Linear Pre RGB values from B4 and divides it by the Linear Residual RGB giving the original HDR in floating point. A small value ε is added to the denominator to avoid a division by zero. The final division is an inverse exposure to reconstruct the values to the original exposure. The blocks are described below:

If (Nf == 3) {

preExp_R = LP_R / (LR_R + ε);

preExp_G = LP_G/(LR_G + ε);

preExp_B = LP_B/(LR_B + ε);

} else if (Nf == 1) {

preExp_Y = LP_Y/(LR_Y + ε);

}

The value of ε shall be taken from the header parameter epsilon.

If (Nf == 3) {

HDR_R = preExp_R / header_param_expval;

HDR_G = preExp_G / header_param_expval;

HDR_B = preExp_B / header_param_expval;

} else if (Nf == 1) {

HDR_Y = preExp_Y / header_param_expval;

}

The header_param_expval is always greater than zero. This results in the final HDR floating point output.


A.3 Profile CThe profile C relies on a layered approach to extend JPEG’s capabilities. The encoder decomposes an HDR image into a base layer, which consists of a tone-mapped version of the HDR image, and an HDR residual layer. In addition to the residual layer, the codestream includes a description of an approximate inverse tone mapping operation that allows the decoder to reconstruct from the LDR image an approximate HDR image; the errors of this approximation process are corrected by the residual codestream included in an APP11 marker. Both the description of the tone mapping and the residual image are included in APP11 markers which will be invisible to legacy decoders. Such decoders will thus only see the tone mapped LDR image. While the base image complies to ISO/IEC 18477-1 and thus supports only the 8 bit extended or baseline Huffman or progressive modes, the residual image may optionally be encoded in the 12-bit Huffman or progressive modes.

Figure A-3 illustrates the functionality of a Profile C compliant decoder:

Figure A-3: High level overview of the decoding process of a Profile C compliant decoder

This subclause specifies the reconstruction process of a high-dynamic range image from a LDR image and a residual image decoders shall follow. This process consists of the following steps, see also Figure 1:

In steps B1 and B1a, reconstruct the legacy data from legacy codestream and the refinement codestream if a Refinement Data box is present. This process is specified in Annex H.

In step B1 and B1a, apply the Inverse Quantization and Inverse Discrete Cosine Transformation as in ITU-T Rec.10918-1.

In step B2, the upsampling process specified in Annex A of ISO/IEC 18477-1 shall be followed to generate samples for all positions on the sample grid.

In step B3, the L-Transformation as defined in Annex D shall be applied to inversely decorrelate the image components. Table D-1 defines which transformation to pick. The output of this block consists of either one or three samples per grid point Oi, depending on the number of components in the legacy codestream.

In step B4, a point transformation shall be applied to each of the output components O i. This process is selected according to the tdi parameters of the Lcod parameter set of the Merging Specification box, implementing the L i

Luts of Figure 1 and following the specifications of Annex E. The parameters Lcod.td i define which Table Lookup box or which Parameteric Curve box to use. The output of this process are the predicted high dynamic range samples Ji. As above, i=1..Nf.

Also in step B4, the C-transformation is applied to the input values J i resulting in the output pixel values Hi. The transformation is selected by the Ccod.Xt parameter of the Merging Specification Box, which selects one of the transformations defined in Annex D. If Nf equals one, no transformation is performed.

In steps B5 and B5a, the residual image shall be reconstructed from the data contained in the Residual Codestream box and the Residual Refinement Box. The codestream contained in this box follows the specifications defined in Rec. ITU.T 81|ISO/IEC 10918-1. If a Residual Refinement Box is present, the precision of the samples of the residual codestream shall be extended by refinement coding as specified in Annex I. The number of components of the residual image shall be equal to the number of components signalled in the legacy image.


In steps B5 and B5a, apply Inverse Quantization and Inverse Discrete Cosine Transformation as in specified in Annex J. In step B6, apply a point transformation that is defined by the Qcod parameter set of the Merging Specification Box. The Qcod.tdi parameter defines which Table Lookup Box or which Parametric Curve Box to apply. The output of this operation are one or three residual samples denoted by Pi.

In step B7, residual data is upsampled to the common sample grid following the specification of Annex A of 18477-1 M.

The outputs of the residual decoding process are one or three integer sample values Pi per sample grid point. In step B8, inverse color decorrelation shall be applied to the P i data by means of the residual transformation

(R-Transformation) as specified in Annex D. The R-Transformation shall be picked according to Table D-3. The outputs of this process are the inversely decorrelated prediction errors Qi.

In step B9, the high dynamic range output F i, i.e. the output of the decoding process, is reconstructed from H i, the predicted high dynamic range signal, and the inversely decorrelation prediction errors Qi. For this, compute

Fi := Hi + Qi

In step B10, if the Ot.Oc parameter of the Merging Specification box is one, the F i sample values are first rounded to integers and then a linear interpolation of an exponential function converts the rounded integers to floating values. This transformation is specified in Annex K. Otherwise, F i are the final output of the decoding process.


Annex BCodestream Syntax For Profile A


The codestream syntax for profile A defines parameters and codestream data required for HDR image reconstruction. The syntax of profile A uses the ITU-T recommendation T81/ISO Standard 10918-1 and 10918-6 APP 11 marker section. The APP11 marker is broken into a parameter data segment and a residual data segment. The parameter segment has two type of segments, an ASCII based parameter type and a Binary parameter type.

Figure B-1 identifies the data that is contained in the APP11 header segment:

Figure B-1: Profile A Header Definition

B.1 Parameter ASCII SegmentThe parameter data segment is defined as type 1; it carries the following parameters encoded as ASCII encoded text as payload data in the exact order listed:

Parameter DescriptionMandatory/Optional

<ln0=%e> log-r minimum Mandatory

<ln1=%e> log-r maximum Mandatory

<s2n=%e> Sample to nits Optional

<cb0=%e> Cb minmum Mandatory

<cb1=%e> Cb maximum Mandatory

<cr0=%e> Cr minimum Mandatory

<cr1=%e> Cr maximum Mandatory

<nf=%e> Noise floor normalized from 0 to 1.0 Mandatory

<riq %e %e … %e>

256-point LUT to decode ratio image Optional

<swb %e %e %e>

White balance multiplier in camera sensor RGB spaceThis is additional metadata a decoder may take advantage of to improve the visual appearance of an image in the presence of clipping.Facilitate highlight rendering for clipped or almost

Optional


Parameter DescriptionMandatory/Optional

clipped rednering. See also spr tag.

<bga %e %e …%e>

256-point LUT to inv map base image Optional

<spr %e %e … %e>

Color space conversion 3x3 matrix that converts data from camera sensor color space to target hdr colorspace. The tag is used in combination with the swb tag. This is additional metadata a decoder may take advantage of to improve the visual appearance of an image in the presence of clipping.

Optional

<csb %e %e … %e>

Color space conversion 3x3 matrix for base layer (9 coefficients)

Optional

<csr %e %e … %e>

Color space conversion 3x3 matrix for residual layer (9 coefficients)

Optional

<cst %e %e … %e>

Color space 3x3 matrix for the target image (9 coefficients)

Optional

<tmt %s > Tonemapper type and ParametersThis is a string that identifies the tonemapper that was used to generate the LDR image. This Recommendation|International Standard does not enforce a specific encoding and it serves only for informational purposes.

Optional

<enc %s> Encryption params as segment index, exchange, segment ID , key string; if multiple segments are encoded there can be repeated enc parameters

Optional

<sdsc %s> Segment index, segment image height, width, xy start location if there are multiple segments embedded sdsc will be repeated

Optional

<ckb= %i> Fletcher 16 Checksum of base layer codestream based on all bytes in base layer codestream. The computation of the checksum is specified in Annex E.

Mandatory

Figure B-2: Profile A Parameter Data Segment

Each parameter is coded as an ASCII string and enclosed in angle brackets (< >). For single-value parameters, no extra space should be inserted between the paired angle brackets.

If riq is present, it is used for decoding the ratio image. In that case, ln0 and ln1 are ignored.

If the swb and spr parameters are present, they can be applied to camera raw data for white balance modification.

For the csb and csr parameters the color conversion matrix coefficients are stored in row major order, i.e. the linear transformation

( )a11 a12 a13

a21 a22 a23

a31 a32 a33

acting on a column vector of a red-green-blue triple is stored in the order a11,a12,a13, a21,a22,a23, a31,a32,a33.

The purpose of the swb and sbr tags is to preserve the color of the clipped highlights during color-rebalancing. They are applied as follows to the HDR-RGB values Ri,Gi,Bi :

First, invert the 3×3 matrix formed by the spr tags in the parameter segment. The output of this matrix is denoted by Mi,j..

Second, apply the matrix Mk,l to the the HDR RGB vector as follows:


Rs = M1,1 ×Ri + M1,2 ×Gi+M1,3×Bi

Gs = M2,1 ×Ri + M2,2 ×Gi+M2,3×Bi

Bs = M3,1 ×Ri + M3,2 ×Gi+M3,3×Bi

Third, the output RGB values of the process are scaled and clamped:

Rt = clamp(Rs×Wro/Wri,0,1)

Gt = clamp(Gs×Wgo/Wgi,0,1)

Bt = clamp(Bs×Wbo/Wbi,0,1)

Where W?,o is a customer-provided output white-balance adaption matrix and W?,iI the recorded input white balance parameters recorded in the swb tag in the parameter segment.

In a last step, the RGB values are converted back with the 3×3 matrix N built from the coefficients in the spr tag in the parameter segment:

Ro = N1,1 ×Rt + N1,2 ×Gt+N1,3×Bt

Go = N2,1 ×Rt + N2,2 ×Gt+N2,3×Bt

Bo = N3,1 ×Rt + N3,2 ×Gt+N3,3×Bt

In Table L-1, %e is a decimal ASCII representation of a floating point value formatted as ASCII text in the format ([-]d.d... e[+/-]d...). Here d indicates a decimal digit, [ ] indicates an optional character and [/] two alternative choices and ... indicates one or more of the preceding syntax elements; every other character stands for itself. The exponent always contains at least two digits; if the value is zero, the exponent is 00.

In Table L-1, %s is a UCS encoded (ISO/IEC 10646-1 Annex D, utf-8) string representation without the “>”. %i is an ASCII representation of an unsigned integer.

B.2 Residual Codestream SegmentThe HDR data segment is defined as type 2. Its payload data is the JPEG compressed ratio residual image. If the data size exceeds 64 kilobytes, this segment can be repeated and the image divided into multiple type 2 segments with index and header. The content of the HDR data segment is otherwise identical to the Residual Data Box defined in Annex B. If more than one Residual Codestream Marker segment is present, the payload data of all segments is to be concatenated by increasing order of the segment index parameter in the residual codestream segment. The concatenated data then forms a valid 10918-1 codestream representing the residual image.


Annex C Codestream Syntax – Box Based Format

(This annex forms an integral part of this Recommendation | International Standard)

This Annex extends the compressed bitstream syntax of Annex B of 18477-1 by introducing additional markers and marker segments carrying refinement and residual data, and coding parameters that control the decoding process. This Annex describes the Box Based format, one out of two possible representations of residual image, refinement data and metadata, it is used by profiles B and C.

C.1 Marker assignments

The following additional markers are defined in this Recommendation|International Standard:

Code Assignment Symbol Description Defined in0xFFEB APP11 JPEG XT Marker This Recommendation

| International Standard

Table C-1: Additional Markers and Marker Segments

C.2 Codestream SyntaxThe high-level syntax of 18477-2 codestream shall follow the syntax specified in 18477-1, which is a subset of Rec. ITU.T 81|ISO/IEC 10918-1. Specifically, since JPEG XT boxes are represented by APP11 marker segments, 18477-1 conforming implementations will ignore them. JPEG XT boxes include additional coding parameter and/or entropy coded data using a coding process defined in ITU-T.81|ISO/IEC 10918-1 or new coding processes defined in Annex I in this Recommendation|International Standard. Since APP11 markers have a limited capacity of 216-1 bytes, data may be partitioned over several segments and shall be merged by a process specified in subclause B.3 before entropy decoding is applied.

NOTE – Note that by the above paragraph, byte stuffing and padding as defined in ITU-T.81|ISO/IEC 10918-1 also applies to entropy coded data contained in APP11 markers. Note further that due to the segmentation of entropy coded data into application markers, it may happen that the last byte of an APP11 marker segment is 0xff, and that the corresponding "stuffed" zero byte is part of a subsequent application marker segment. This does not cause a problem for legacy decoders since they are required to skip over unknown application marker segments in first place, without interpreting their content.

C.3 JPEG XT BoxesJPEG XT structures any additional data that remains invisible to legacy decoders in JPEG XT boxes. A box is a generic data container that has both a type, and a body that carries its actual payload. The type is a four-byte identifier that allows decoders to identify its purpose and the structure of its content.

Boxes are embedded into the codestream format by encapsulating them into one or several JPEG XT Extension marker segments. Sinces boxes can grow large in size, a single box may extend over multiple JPEG XT Extension marker segments, and decoders may have to merge multiple marker segments before they can attempt to decode the box content.

The JPEG XT Extension marker segment consists of the APP11 marker that is reserved for this Recommendation | International Standard, the size of the marker segment in bytes (not including the marker), a common identifier, a box enumerator field, a sequence number field, the box length, the box type and the actual box payload data. The box length field can be extended by a Box Length Extension field that allows box sizes beyond 232-1 bytes. Figure C-1 depicts the high-level syntax of a JPEG XT Marker segment.

0xFFEB Le CI

Common Identifier

En

Enum-eratior

Z

Sequence Number

LBox

Box Length

TBox

Box Type

XLBox

Box Length

Payload Data


Extension

(optional)

Figure C-1: Organization of the JPEG Extensions Marker Segment

If the size of the box payload is less than 232-8 bytes, then all fields except the XLBox field, that is: Le, CI, En, Z, LBox and TBox shall be present in the JPEG XT Extension marker segment representing this box, regardless of whether the marker segments starts this box, or continues a box started by a former JPEG XT Marker segment.

If the size of the box payload is larger than 232 bytes, the XLBox field is required additionally, and shall also be present in all JPEG XT Extension marker segments of the same box type and same enumerator.

The meaning of the fields of the JPEG XT Marker segment is as follows:

The Le field is the size of the marker segment, not including the marker. It measures the size from the Le field up to the end of the marker segment.

NOTE – Sinces boxes may extend over several marker segments, the Le field is typically not related to the Box Length field and care must be taken not to confuse the two. If a box extends over multiple JPEG XT Extension marker segments, the Le field measures the total size of each individual marker segment and may differ from segment to segment, whereas the Box Length field remains identical in all segments that contribute to the same box.

The Common Identifier is a 16 bit field that allows decoders to identify an APP11 marker segment as a JPEG XT Extension marker segment. Its value shall be 0x4A50.

The Box Type is a 32-bit field that specifies the type of the payload data, and thus its syntax. Box types are specified in the following subclauses of Annex B. Since ITU|ISO/IEC may add additional box types that define additional meta-information on the image later, decoders shall disregard box types they do not understand.

The Enumerator is a 16-bit field that disambuigates between JPEG XT marker segments of box identical type, but differing content. Encoders shall concatenate the payload data of those JPEG XT marker segments whose Enumerator and Type Identifier fields are identical in the order of increasing sequence numbers.

NOTE – A codestream containing multiple boxes of the same box type uses the enumerator field to instruct the decoder which JPEG XT Extension marker segments to merge into one box. Refinement coding makes use of this process: The entropy coded data of each refinement scan is placed into its individual box, using the enumerator field to disambiguate the scans.

The Sequence Number is a 32-bit field that specifies the order in which payload data shall be merged. Concatenation proceeds in the order of increasing Sequence Number.

The Box Length is a four byte field that, if it is one, is extended by an additional eight byte field that specifies the total size of the concatenated playload data. The box length field measures the size of the payload data of all JPEG Extensions markers of the same box type and enumerator combined, plus the size of a single copy of the Box Type, plus the size of a single copy of the Box Length, plus the length of a single copy of the Box Length Extender if present. The box length does not include the size of the sequence number, the enumerator, the common identifier, the marker length or the marker.

NOTE – A box having a payload data of 32 bytes will, by this, have a box length of 32+4+4 = 40. If this box is split evenly over two JPEG XT marker segments, each marker segment will have a Le value of 2+2+2+4+(4+4+16) = 50.

NOTE – An encoder creating four refinements over legacy data would place the first refinement scan into a Refinement Data box with En=0, the second into a box with En=1, and so on. However, because entropy coded data of individual refinement scans may still require more than 64K for storage, each of the Refinement Data boxes of the individual scans may be broken up into several APP 11 marker segements. The first of the segments contributing to the same refinement scan will have Z=0, the second Z=1, the third Z=2 and so on.

The payload data carries the contents of the box. Its syntax is specified along with the corresponding box types in this Annex.

Profiles defined in Annex M may add additional constraints in how payload data may be broken up into individual JPEG XT marker segments.

Parameter Size (bits) Value Meaning

APP11 16 0xFFEB Identifies all JPEG XT Marker Segments.

Le 16 8..65535 Length of the marker segment,


including the size itself, all parameters, and the size of the payload data contained in this marker segment alone. Does not include the marker itself.

CI 16 0x4A50

(ASCII encoding of "JP")

The special value 0x4A50 (ASCII: 'J' 'P') allows readers to distinguish the JPEG Extensions marker segment from other uses of the APP11

marker. Readers shall ignore APP11 markers for the purpose of decoding JPEG extensions if this value does not match.

En 16 1..65535 Disambuigates payload data of the same box type and defines which payload data is to be concatenated. Only payload data whose box type and enumerator is identical shall be concatenated.

The value zero is reserved for ITU|ISO purposes.

Z 32 1..232-1 Sequence number defining the order in which the payload data shall be concatenated. Concatenation shall proceed in order of increasing Z values.

The value zero is reserved for ITU|ISO purposes.

LBox 32 1 or 8..232-1 Box length. This is the total length of the concatenated payload data, including a single copy of the LBox and Tbox field, and a single copy of the XLBox field, if present.

The values zero and two to seven are reserved for ITU|ISO purposes and shall not be used.

TBox 32 0..232-1 Box type. The box type defines the syntax of the concatenated payload data. Also, the box type and the enumerator specify which payload data to merge.

XLBox 0 or 64 16..264-1 If the LBox field is one, this field contains the size of the concatenated payload data plus the box overhead instead.

Otherwise, this field is missing.


The values zero to fifteen are reserved for ITU|ISO purposes.

Payload Data Varies Varies The syntax of the concatenated payload data is defined in subclauses B, G and H.

Table C-2: JPEG Extensions Marker Parameters and Sizes

C.4 Residual Data BoxThis box encapsulates entropy coded segments from residual coding. Data contained in this box defines residual data that extends the LDR image encoded in the legacy JPEG stream to a HDR image. The algorithm for decoding entropy coded residual data is defined in Annex G.

The type of this box shall be 0x52455349, ASCII encoding of "RESI". There shall be at most one stream of residual data after concatenating all JPEG extension marker payload data belonging to this box type.

NOTE – Unlike refinement coding, residual coding merges all data into one single box.

The structure of this box is defined in Figure C-2, the parameters and sizes in Table C-3.

Entropy Coded Data

Figure C-2: Organization of the Residual Data Box


Data Varies Varies Entropy coded data segment of variable lengths.

Table C-3: Residual Data Box, Parameters and Sizes

C.5 Residual Refinement Data BoxThis box encapsulates entropy coded data segments created by a single refinement scan over residual image data. If the number of additional residual refinement bits, the Rr-Parameter of the Merging Specification Box (see subclause B.X), is non-zero, residual refinement data encapsulated in Residual Refinement Data Boxes shall be present.

Data contained in ths box extends the precision of the residual coefficients by additional least significant bits. The concatenated payload data of all Residual Refinement Data Boxes forms the input to the refinement decoding process specified in Annex H.

The type of this box shall be 0x5246494e, ASCII encoding of "RFIN". The structure of the payload data of this box is defined in Figure C-3, the parameters and sizes in Table C-4.

Entropy Coded Data

Figure C-3: Organization of the Residual Refinement Data Box




Table C-4: Residual Refinement Data Box, Parameters and Sizes

C.6 Residual Refinement Data BoxThis box encapsulates entropy coded data segments created by a single refinement scan over legacy image data. If the number of additional legacy refinement bits, the Rh-Parameter of the Merging Specification Box (see subclause C.10), is non-zero, refinement data encapsulated in Residual Refinement Data Boxes shall be present.

Data contained in ths box extends the precision of the residual coefficients by additional least significant bits. The concatenated payload data of all Refinement Data Boxes forms the input to the refinement decoding process specified in Annex H.

The type of this box shall be 0x46494e45, ASCII encoding of "FINE". The structure of the payload data of this box is defined in Figure C-4, the parameters and sizes in Table C-5.

Entropy Coded Data

Figure C-4: Organization of the Residual Refinement Data Box



Table C-5: Residual Refinement Data Box, Parameters and Sizes

C.7 Table Lookup BoxThis box defines a lookup process at the decoder and may be used to implement the point transformation as used in the base or range mapping operations which are part of the merging process for combining the low-dynamic range and residual image information to reconstruct a high-dynamic range image. An additional method to implement the above operations is by a parametric description, defined in subclause C.8. The merging process of LDR and residual image is defined in Annex A, the output range and scaling of data is specified in Annex G.

The codestream shall contain exactly one Table Lookup Box or Parameteric Curve Box for each occurring value of td i in the Lcod, Scod, Rcod or rcod parameter fields of the Merging Specification box of subclause C.10.

The type of this box shall be 0x544f4e45, ASCII encoding of "TONE".

The box organization is defined in Figure C-5, the parameters and sizes in Table C-6.

M Rt Dk

Figure C-6: Organization of the Table Lookup Box

Parameter Size (in Bits) Value Meaning

M 4 0..15 Table destination. Up to 16 curves can be defined. A

table is identified by matching the value of tdi in the Merging Specification box with the value of M

here.

Rt 4 0..8 Specifies the output precision of the table

entries.


Table entries shall be integers in the range

[0,28+Rt-1]. Rt shall fit to the application of the table,

valid values are defined in Annex D and F.

Dk 16 0..65535 Table contents. Each entry is a 16 bit unsigned integer

All numbers shall be represented in big-endian

format.

The number of table entries can be derived from the box length. The size of the table shall be a power

of two that fits to the application of the table. See Annex D and F for details.

Figure C-7: Table Lookup Box, Parameters and Sizes

The value of Rt defines the output range of the table lookup process in bits. The value of R t must match the number to the right of the corresponding table function within figure F-1.

The length of the table depends on the input precision required for the application of the table. The number of table entries shall be 2l where the value of l shall match the number indicated to the left of the corresponding table application in figure F-1.

NOTE – Consider for example a Table Lookup box that shall be deployed as L1 point transformatoin. Figure F-1 of Annex F indicates now an input scale of 8+h for L-tables, and an output scale of 8+R b. A Table Lookup box suitable for this application hence has 2 8+Rh entries and Rt

equals 8+Rb.


C.8 Parametric Cuve Box

This box defines a decoder mapping process in the form of a parametrized curve which might offer a more efficient coding than a look-up table. This parametric curve may be used to implement the point transformations of the base and range mapping which are part of the merging process for combining the low-dynamic range and residual image information to reconstruct a high-dynamic range image. An additional method to define an inverse tone mapping curve is by an explicit lookup table, as defined by the Table Lookup box of subclause C.7. The process of inverse mapping is described in more detail in Annex E.

The codestream shall contain exactly one Table Lookup box or Parameteric Curve Box for each td i used in the Merging Specification box.

The box type of this box shall be 0x43555256, the ASCII encoding of "CURV".

The box organization is defined in Figure C-7, the parameters and sizes in Table C-8.


M t P1 P2 P3 P4

Figure C-7: Organization of the Parametric Inverse Tone Mapping Box

Parameter Size (in Bits) Value Meaning

M 4 0..15 Table destination. Up to 16 curves can be defined. A table is identified by matching the value of tdi in the Merging Specification box with the value of M here.

t 4 0..15 Curve type. Curve types are defined in Table B-11.

P1 32 Depends on t Curve parameter 1, encoded as big-endian IEC 60559 single precision floating point value





Table C-8: Contents of the parameteric Inverse Tone Mapping Box

The tone mapping curve to be applied is defined by the t parameter. Depending on t, the parameters P1 through P4 further specify the curve. Note that all parameters are always present, regardless of the actual curve type. They are, however, eventually ignored. Table B-11 specifies the available parametric curve types.

Parametric curves are applied in four steps to map an input value to an output value:

First, the input value x is normalized to the input scale of the channel indicated as bit count in Figure G-1 of Annex G by dividing x by a suitable scale factor. The result of this process is a scaled input value x.

Second, the parameter t is used to select one out of several curve types. The value of x and the curve parameters P1 through P4 compute then from the scaled input x a scaled output f(x).

Third, the output f(x) is again scaled by a suitable factor to match the output range of the channel.

The scale factors are defined in Annex G, the input and output scaling process in Annex E.


Curve Type, Value of t Inverse tone mapping to be performed Remarks

0 f(x) = 0 The zero tone mapping, the output is always set to zero.

P1 through P4 shall be ignored.

1 f(x) = 1 The constant-one tone mapping curve. Note that output scaling applies and the actual effective output value thus depends on the output scale factor.

P1 through P4 are ignored.

2 f(x) = x Identity transformation, input data is passed through, though scaling of input and output applies.

3 Reserved for ITU|ISO purposes

4 f(x) = ﴾(x+P3)/(1+P3)﴿P2 for x > P1 and

f(x) = ﴾(P1+P3)/(1+P3)﴿P2×x/P1 for x ≤P1

This is an inverse gamma mapping with parameters P1

through P3.

For P1=0.04045, P2=2.4, P3=0.055 and P4=1, this is the sRGB nonlinearity.

5 f(x) = x×(P2-P1)+P1 Linear ramp with start value P1

and end value P2. P2 shall be larger than P1. P3 and P4 are ignored.

6..15 Reserved for ITU|ISO purposes.

Table C-9: Inverse Tone Mapping Operations

C.9 HDR Parameter Specification BoxThis box defines the process and parameters for merging low dynamic range image with the residual image to reconstruct the high dynamic range image in profile B. Exactly one HDR Parameter Specification box shall be present whenever a residual image is included in the codestream, and the codestream complies to profile B. The details of the merging process of residual/refinement data with LDR data is defined in the description of profile B in Annex A.

The type of this box shall be 0x4850524D, ASCII encoding of "HPRM".

Figure C-8 describes the organization of this box segment, Table C-10 the parameters and parameter sizes.

bin_se hdr_ ldr_ max_ max_ max_ Expval tm_ img_ img_ chksum epsilon


g_type gamma gamma R G B type type size

Figure C-8: Organization of the HDR Parameter Specification Box

Parameter Size (in bits) Value Description

Bin_seg_type 32 1 This field shall have the value 1. All other values are reserved for ISO/IEC purposes.

hdr_gamma 32 >0.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be positive.

It encodes the gamma value of the HDR residual image.

hdr_gamma power function value

ldr_gamma 32 ≥1.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be greater or equal than one.

This field encodes the gamma value of the LDR image.

max_R 32 >0.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be positive.

This parameter encodes the maximum value of the residual HDR the red channel.

max_G 32 >0.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be positive.

This parameter encodes the maximum value of the residual HDR the green channel.

max_B 32 >0.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be positive.

This parameter encodes the maximum value of the residual HDR the blue channel.

Expval 32 >0.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be positive.

This parameter describes a scaling factor computed on the luminance value of the input HDR image.

tm_type 8 0 or 1 This value is reserved for ISO/IEC purposes and shall have either the value zero or one.

img_type 8 0 Describes the encoding of the residual image. The value of zero indicates a JPEG codestream. All


Parameter Size (in bits) Value Description

other values are reserved for ISO/IEC purposes.

img_size 32 >0 This 32 bit integer encodes the size of the residual codestream and hence the size of the payload data of the Residual Codestream Box.

chksum 32 depends This 32 bit integer value encodes the checksum over the LDR image according to Annex E.

epsilon 32 >0.0 This field is encoded as a 32 bit IEC 60559 floating point number. It shall always be positive.

It encodes the normalization parameter of the LDR/residual division operation.

Table C-10: Layout of the HDR Parameter Specification Box

C.10 Merging Specification BoxThis box defines the process and parameters for merging low dynamic range image with the residual image to reconstruct the high dynamic range image in profile C. Exactly one Merging Specification box shall be present whenever a residual image or a refinement scan is included in the codestream, and the codestream complies to profile C. The details of the merging process of residual/refinement data with LDR data is defined in the description of profile C in Annex A.

The type of this box shall be 0x53504543, ASCII encoding of "SPEC".


Lcod Rcod Ccod Scod Qcod Acod Lspec Rspec Aspec Ot Rb

Figure C-9: Organization of the Merging Specification Box

Parameter Size (in bits) Value Meaning

Lcod 40 See Table C-12 Defines the nonlinear point transformations and the L-Transformation of Block

B4 of profile C

Rcod 40 See Table C-12 Defines the R-Transformation of Block

B8 of profile C

Ccod 40 See Table C-12 Defines the C-Transformation, i.e. the

linear color transformation fo block B4 of profile C

Scod 40 See Table C-12 Reserved for ISO/IEC purposes


Qcod 40 See Table C-12 Defines the range mapping of the residual image of block B7 of profile C

Acod 40 See Table C-12 Reserved for ISO/IEC purposes

Lspec 16 See Table C-14 Specifies the DCT processes and number of refinement scans for the

legacy codestream.

Rspec 16 See Table C-14 Specifies the DCT processes, number of

refinement scans for the residual codestream.

If no residual codestream is included, this field shall be

ignored.

Aspec 16 See Table C-14 Reserved for ISO/IEC purposes

Ot 8 See Table C-15 Defines the output transformation.

Rb 4 0..7 Number of additional bits available for high dynamic

range data. The bit precision of the

reconstructed high dynamic range image shall be computed as 8+Rb.

Table C-11: Merging Specification Box Contents

Point transformations and linear transformations in the legacy and residual codestream are defined by the Lcod and Rcod fields. Note that the point transformation in the residual codestream consists only of a multiplication or a division. The linear color transformation is defined within Ccod, it neither uses any point transformations. The range mapping of the residual data is defined by Qcod. The fields Scod and Acod are reserved for ITU/ISO purposes and shall be filled with the values as indicated in Table C-14.

Within Lcod, the tdi parameters define the lookup tables to use by a table destination index tdi is given that matches the M value of the corresponding Table Lookup Box to apply. The tsi field allows to bypass a table-based point transformation and replaces it by linear scaling.

.


ts0 4 1,2 Within Lcod, specifies whether the point

transformation is bypassed or not. Within Qcod,

specifies whether the range mapping is explicitly

specified or consists only of a scaling.

If the value is two, the decoder shall employ a


point transformation. If the value is one, a linear

scaling shall be used (see Annex D and F). All other

values are reserved for ISO/IEC purposes.

This field shall be one if not used within Lcod or

Qcod.

td0 4 0..15 Selects the Table Lookup Box or Parametric Curve Box to use to implement

the nonlinear point transformation for

component zero. The value of tdi shall match the M

parameter of the corresponding box.

This value shall be ignored if ts0 is unequal two.

ts1 4 1,2 Within Lcod, point transformation disable flag

for component 1.

Within Qcod, disables the point transformation within

the range mapping for component 1 and replaces

it by scaling.

Shall be one if used outside of Lcod or Qcod.

td1 4 0..15 Selects the Table Lookup Box or Parametric Curve

Box for component 1. Ignored if ts1 is unequal

two.

ts2 4 1,2 Within Lcod and Qcod, nonlinear point

transformation disable flag for component 2.

Shall be one if used outside of Lcod or Qcod.

td2 4 0..15 Selects the Table Lookup Box or Parametric Curve

Box for component 2. Ignored if ts2 is unequal

two.

ts3 4 1 Reserved for ITU|ISO use.

td3 4 0 Reserved for ITU|ISO use.

Rs 4 0 Reserved for ITU|ISO use.


Xt 4 0..15 Within Lcod, selects the inverse decorrelation

transformation for the L transformatoin. See Table

B-15 for the possible values.

Within Rcod, selects the residual decorrelation

transformation. See Table B-xx for possible choices.

Within Ccod, selects the linear color transformation. See Table B-xx for possible

choices.

Within Qcod, this field shall be one.

Within Scod, this field shall be zero.

Within Acod, this field shall be one.

Table C-12: Lcod,Scod,Ccod, Acod, Rcod and Qcod parameter fields

The Lcod, Rcod and Ccod parameters also specify type of the inverse component decorrelation transformation by the Xt parameter. The Xt parameter of the QCod and ACod parameter sets shall be set to one. The Xt parameter of the Scod parameter field shall be set to the reserved value zero. The Xt parameter of the Rcod parameter set shall be two.

Xt can either select a free-form transformation, the YcbCr transformation specificed in Rec. ITU.T 871|10918-5, the identity transformation or the zero transformation. The free form transformation is defined by the Linear Transformation Box defined in subclause B.9. The selection of the L transformation depends on the value of the cc parameter of the Component Decorrelation Control marker specified in Annex B of 18477-1. The transformation selection procedure as well as all legimitate combinations are defined in tables C-2 through C-5 of Annex C.

The encoding of the Xt parameter is given by the following table:

Value Transformation to be used

0 Reserved for ISO/IEC purposes.

1 The identity transformation shall be used. If employed for the C transformation as Ccod.Xt, the output samples are given by the input sample values divided by 16, but no further linear transformation is applied. If applied for the R transformation as Rcod.Xt, output shall be identical to the input without any scaling and without any linear transformation.

2 The YcbCr to RGB transformation following the specifications of 10918-5 shall be used. This transformation is denoted as ICT in this Recommendation|International Standard.




5..15 The free form transformation defined by the Linear Transformation Box whose M value matches the value of Xt.

Table C-13: Interpretation of the Xt parameter fields of the Merging Specification Box

The Lspec, Rspec fields specify how many refinement scans are included in the corresponding codestream, and how to interpret the samples reconstructed from the codestream. The Lspec field defines refinement scans for the legacy codestream, the Rspec field residual codestream. The Aspec field is reserved for ISO purposes.

Parameter Size (in bits) Values Meaning

Stype 2 0,1 Reserved for ISO/IEC purposes. Shall be zero in Lspec and Rspec, and one

in Aspec.

Ns 2 0 Reserved for ISO/IEC purposes.

dct 4 0 Reserved for ISO/IEC purposes.

h 4 0..8 Specifies the number of refinement scans for the

corresponding codestream.

Within Lspec, h defines Rh.

Within Rspec, h defines Rr.

Within Aspec, h shall be zero.

Rs 4 0 Reserved for ITU/ISO purposes.

Table C-14: Interpretation of the Lspec,Rspec and Aspec parameter sets of the Merging Specification Box.

The Ot field of the merging parameter box specifies the type of output processing that is applied before the final reconstructed sample values are generated from the combination of the legacy and residual sample values. This field is defined by the following table:


Lf 1 0 Reserved for ISO/IEC purposes.

Oc 1 0..1 If 1, output conversion as indicated in Annex K shall apply to the output of the

merging process. Otherwise, the output from the merging process is the

final output.

Ce 1 1 Reserved for ISO/IEC


purposes.

Re 1 0 Reserved for ITU/ISO purposes.

Table C-15: Interpretation of the Ot parameter set of the Merging Specification Box.

Table C-16 lists the output process that is applied depending on the Ce and Oc fields. The notation ψexp(x) indicates the application of the output conversion on the sample value x as specified in Annex K.

Oc field Ce flag Sample value generation Comments

0 1 clamp(Hi + Qi,0,2Rb+8-1) clamping to range.

1 1 ψexp (clamp(Hi + Qi,0,2Rb+8-1)) clamping to range, conversion to float.

All other values Reserved for ISO/IEC purposes.

Table C-16: Interpretation of the Ot parameter set of the Merging Specification Box.

C.11 Linear Transformation BoxThis box defines a free-form table based linear transformation that can be used as L, R or C transformation. This box defines nine parameters which specify the entries in a 3×3 matrix:

( )a11 a12 a13

a21 a22 a23

a31 a32 a33

Up to eleven free form transformations can be defined; its entries can be either integer or floating point values.

The type of this box shall be 0x4D545258, ASCII encoding of "MTRX".


M t a11 a12 a13 a21 a22 a23 a31 a32 a33

Figure C-10: Organization of the Linear Transformation Box


M 4 5..15 Table Destination. Up to eleven free form linear transformations can be

defined. The value in the Xt parameter of the

Merging Specification Box selects the linear

transformation by matching its M value.


t 4 13 Reserved for ISO/IEC purposes.

a11 16 or 32 -32768 to 32767 First parameter of the 3×3 transformation.

a12 16 or 32 -32768 to 32767 Second parameter

a13 through a33 All other parameters in the same format.

Table C-17: Linear Transformation Box Contents

NOTE – Due to scaling that is applied to the L nonlinearity transformations in the integer case, the integer coefficients of the transformation matrix for t=13 can also be understood as fix point numbers which are preshifted by 13 bits.

C.12 Legacy Data Checksum BoxThis box keeps a checksum over the legacy codestream content, i.e. all data except those contained in application markers (APP00 through APP15) using the algorithm defined in Annex F. Decoders may use this data to test for the integrity of the legacy codestream, and thus to test whether the legacy image data still fits to the residual data.

NOTE – If a decoder detects that the checkum over the received data differs from the checksum recorded in this box, it may either abort decoding, or may only decode the legacy image and reject the residual stream or may attempt a full decoding at its disretion.

The type of this box shall be 0x4C43484B, ASCII encoding of "LCHK".


Chk

Figure C-11: Organization of the Legacy Data Checksum Box


Chk 32 Varies Checksum over the legacy part of the codestream, to be computed by the algorithm

defined in Annex F.

Table C-18: Refinement Data Box, Parameters and Sizes


Annex DMulti-Component Decorrelation


In this Annex and all of its subclauses, the flow charts and tables are normative only in the sense that they are defining an output that alternative implementations shall duplicate.

This Annex defines the Multiple Component Decorrelation Transformations available as legacy, residual and color transformations of the decoding process in profiles A, B and C as outlined in Annex A. The legacy transformation is identical to the Multiple Component Decorrelation Transformation of ISO/IEC 18477-1, the Residual and Color transformation are new to this Recommendation|International Standard, and are only applicable in profile C.

A Multiple Component Decorrelation Transformations take one or three input components, and generate from them one or three output components. A task dependent level shift by a value of 2Rs ensures that chroma components are correctly centered and representable by unsigned values. The value of Rs is defined in Table G-3 of subclause G.3.

DecorrelationTransformation

Level Shift

Figure D-1: Input and output of a decorrelation transformation: Input components, output components and the DC level shift.

Table D-1 specifies the decorrelation transformation applicable to the samples reconstructed in the legacy codestream:

Nf

Number of Components

Component Decorrelation Marker present and cc = 0

L-Transformation as represented in block B3 in Annex A. (for all profiles).

1 ignored Inverse Identity Transformation

3 yes Inverse Identity Transformation

3 no Inverse ICT

All other values Reserved for ITU/ISO Purposes

Table D-1: Selection of the Legacy-Transformation

Table D- selects the Residual transformation for profile C that decorrelates the residual data:

Nf

Number of Components Rcod.Xt

R-Transformation as represented by block B8 of profile C specified in

Annex A.

1 1 Inverse Identity Transformation


3 1 Inverse Identity Transformation

3 2 Inverse ICT


Table D-2: Selection of the R-Transformation

Table D-3 selects the C-Transformation for profile C that transforms the precursor image into the target colorspace:

Nf

Number of Components

Ccod.Xt C-Transformation, as represented by block B4 in profile C, see Annex A.

1 1 Scaled Inverse Identity Transformation

3 1 Scaled Inverse Identity Transformation

3 5..15 Free form transformation defined by the Linear Transformation Box

whose M value matches Rcod.Xt.


Table D-3: Selection of the C-Transformation

D.1 Irreversible inverse multi component transformation (ICT) (normative)This transformation is it is identical to the ICT in ISO/IEC 18477-1 for 8-bit LDR data and can be understood as an inverse decorrelation from YCbCr to ITU 601 RGB.

In the following, set I0, I1 and I2 be the input sample values reconstructed from the upsampled components 0, 1 and 2, and let L0,L1 and L2 be the output sample value. Let Rs be the number of level shift scale bits defined by subclause F.3. Then:

L0 = I0 + 1.40200×(I2 − 2Rs)

L1 = I0 − 0.7141362859×(I2 − 2Rs) − 0.3441362861×(I1 − 2Rs)

L2 = I0 + 1.772×(I1 − 2Rs)

D.2 Free-form multi-component transformation (normative)This transformation offers a generic linear transformation that can be applied as Color Transformation. It is applied in this form only in profile C.Let I0,I1 and I2 be the inputs to this transformation and O0, O1 and O2 its output.

O0 = (a11×I0 + a12×I1 + a13×I2 + 216)/217

O1 = (a21×I0 + a22×I1 + a23×I2 + 216)/217

O2 = (a31×I0 + a32×I1 + a33×I2 + 216)/217

NOTE – Unlike the ICT, the free form transformation does intentionally not include an offset shift for I1 and I2..


D.3 Inverse identity transformation (normative)The inverse identity transformation maps its input values to the output without applying an inverse linear decorrelation step. Denote the input value of the identity transformation with I 0 and the output value by L0. The inverse identity transformation then sets

L0 = I0

If more than one component has to be transformed, the above equation(s) apply identically to all components.

D.4 Inverse scaled identity transformation (normative)The scaled inverse identity transformation maps its input values to the output without applying an inverse linear decorrelation step, but applies a scaling step that corrects for the scaling due to the IFDCT.

L0 = (213×I0+216)/217

If more than one component has to be transformed, the above equation(s) apply identically to all components.

D.5 Irreversible forward multi component transformation (ICT) (informative)This transformation is used to decorrelate the low-dynamic range versions of the image before encoding the components by a legacy 10918-1 conforming encoder. It is identical to the forward ICT defined in ISO/IEC 18477-1. Let L 0 to L2 be the input pixel values, and Rs the number of level shift scale bits, then the output I0 to I2 shall be computed as follows:

I0 = 2.9900 × L0 + 0.58700 × L1 + 0.11400×L2

I1 = -0.1687358916 × L1 − 0.3312641084 × L1 + 0.5×L2 + 2Rs

I2 = 0.5 L2 − 0.4186875892 × L1 − 0.08131241085×L2+ 2Rs

D.6 Forward identity transformation (normative)This transformation is used to generate the input of the legacy coding process when the number of components in the frame is one, or the component decorrelation transformation is disabled. If L0 is the input value, then this transformation sets the output value I0 to

I0 = L0

If more than one component is to be transformed, then the above applies to all three components identically.

D.7 Scaled forward identity transformation (informative)This transformation is used to generate the input of the legacy coding process when the number of components in the frame is one, or the component decorrelation transformation is disabled and the FFDCT is used as DCT process. It corrects for the input scaling of the fixed point DCT by pre-shifting the input accordingly. If L0 is the input value, then this transformation sets the output value I0 to

I0 = L0×16

If more than one component is to be transformed, then the above applies to all three components identically.


Annex EPoint Transformation

This Annex forms a normative and integral part of this Recommendation | International Standard.)


E.1 Point Transform (normative)This subclause defines the implementation of the point transformations used by the decoding process. These processes, either driven by an explicit table or a parametric curveor a linear scaling, take an input sample value and apply a, in general, nonlinear scalar transformation to generate an output value. The transformation is applied in the spatial domain and is independent of the position of the sample value within the image.

The purpose of the Li curves is to provide a prediction of the HDR image based on the legacy codestream and R i provide means to adapt the HDR residual data to the output. The decoder table lookup process proceeds in the following steps:

First, the input value v is clamped to the range [0,2Rw-1]. Rw is defined by Table G-4 in Annex G.

Further processing depends on the tsi flag of the Lcod, Qcod or Rcod parameter set of the Merging Specification Box and whether the nonlinear process is implemented by either an Table Lookup box, or a Parametric Curve box.

o In case inverse tone mapping is specified by a Parametric Curve box (see subclause B.7) or ts i is one, the clamped value v is scaled to range [0,1] by a division::

x = v / (2Rw – 1)

o Second, the parametric curve specified by the Parametric Curve box is applied, generating from the input value x an output value f(x). If no parametric curve box is present, or tsi is one, f shall be the identity, i.e. f(x) = x.

o Third output scaling applies. Table G-5 specifies the output scale bits R t. The value of x is multiplied by (2Rt+8 -1).

If the tsi field is two, a Table Lookup box is used to specify the point transformation. For that, the clamped value v is first rounded to integer and this integer is used directly as index into the table. The number of table enties shall match to the number of possible inputs, which is 2Rw-1. Output values are integers in the range of 0 to 2Rt+8-1. The value of Rt is recorded in the inverse tone mapping box and shall match the value specified in Table G-5 in Annex G.


Annex FChecksum Computation

This Annex forms a normative and integral part of this Recommendation | International Standard.)


This annex specifies how to compute a checksum over legacy data to ensure integrity of the data, and to verify whether the data encoded by the legacy codestream is in sync with the residual data recorded in the APP 11 marker segments or has been edited or modified. A non-matching checksum is an indicator that the legacy image has been changed. Decoders should check for the correctness of the checksum and may, if this checksum does not match, either abort coding or display only the LDR version of the image or may perform any other recovery at their discretion.

F.1 Data Covered by the Checksum (normative)

The checksum algorithm, specified in subclause F.2 is to be computed over the following data:

The checksum starts with and includes the first SOF marker of the legacy codestream, starting with the marker code itself, and proceeds until the last byte of the entropy coded segments of the legacy codestream. However, the following markers in the Tables/misc section upfront the frame header are not to be included in the checksum computation:

Any APPn marker, the marker indiciator, their length and their contents is to be skipped for the purpose of the checksum computation.

Any COM marker, the marker indicator, its length and its contents is to be skipped.

The EOI marker shall not be included in the computation of the checksum.

NOTE – By this definition, RSTn (restart markers) shall be included in the computation of the checksum.

F.2 Computation of the Checksum (normative)The checksum computation is initialized by resetting two cumulative checksum variables C0,C1 to zero:

C0=0

C1=0

For each byte b included in the checksum, the following operations shall be performed:

C0=(C0 + b) umod 255

C1=(C1 + C0) umod 255

The final output value of the checksum algorithm is then computed by

chksum = C1×256+C0


Annex GInput and Output Scaling of the Functional Blocks



This Annex defines the nominal range for the various functional blocks in a ISO/IEC 18477-2 compliant decoder in profile C. If an integer only implementation is intended, the scales also define the required bit depths for the input and output channels of a functional block.

Figure G-1 lists the input and output scales in powers of two allocated to the functional blocks of a JPEG XT decoder. The values of Rr and Rd are defined in the following. The values of Rh and Rb are parameters of the Merging Specification Box defined in subclause B.8 as Lspec.h and Rspec.h.

LegacyDecoder

++

+

Refine-mentScan

Up-sampling L-Trafo

LegacyDecoder

Up-sampling

scalescale

scaleR-Trafo

++

+

L1-NLTL2-NLT

L3-NLT

OutputConversion

InverseQNT

8

Rh

8+Rh 8+Rh8+Rb+4

8+Rb

8/12 8/12 + Rr 8+Rb+Rd 8+Rb

8+Rh+Rd

C-TrafoInverseDCT

InverseDCT

++

+

Refine-mentScan

8/12 + Rr

InverseQNT

Rr

Figure G-1: Input and output scales in powers of two of the functional blocks of the decoder

G.1 Decorrelation Transformation Level Shift BitsThe inverse decorrelation transformations defined in Annex D require a level shift to shift the chroma components back into range. The level shift is defined in terms of the level shift bits Rs, specified in Table G-1.

Application of the Multiple Component Decorrelation Transformation as

Level Shift Bits Rs

L (Decorrelation transformation in the legacy codestream) 8+Rh-1

R (Block B8 of profile C) 8+Rb-1

C (Block B4 of profile C) None of the valid choices for the C transformation uses a level shift, hence Rs does not matter.

Table G-1: Level Shift Bits Rs


G.2 Pointwise Nonlinearity Input Scaling and Input Clamping RangeThe point transformation as specified by subclause D applied in profile C requires an input clamping to the range [0,2 Rw-1] before the table lookup or parametric curve mapping is applied. The number of input clamp bits is defined in Table F-4 and depends on the table destination of the curve.

Nonlinearity Excess input clamp bits Rw

L (Block B4 of profile C) 8+Rh

(Rh is defined in subclause C.10)

R (Block B7 of profile C) 8+Rb

(Rb is defined in subclause C.10)Table G-2: Inverse Tone Mapping Input Clamp Bits Rw

G.3 Pointwise Nonlinearity Output Scale RangeAfter applying the pointwise nonlinearity transformation in profile C by the L,R transformation, the output is scaled by a factor (2Rt+8-1). The number of output scale bits Rt is defined in Table G-3.

Point Transformation Output Range Bits Rt

L (Block B3 of profile C) Rb+4

R (Block B7 of profile C) Rb

Table G-3: Ouput Scale Bits Rt of the Nonlinear Pixel Transformation

G.4 DCT DC Level ShiftThe DCT operations defined in Annex J require an additional shift of the DC value to transform it from a non-negative value into a signal that is symmetric around zero. This level shift depends on whether the DCT is applied on the legacy codestream or on the residual codestream, and on the codestream coding options. The required DC level shift is defined in Table G-4. More details on the DCT level shift operation are found in Annex J.

DCT in which codestream DCT DC level shift

Legacy codestream 28+ Rh-1

Residual codestream 28+ Rr-1

Table G-4: DCT Level Shift Sdc


Annex HEntropy Coding of Residual Data



This Annex describes the syntax of the data encapsulated in the Residual Data Box, also called the Residual Codestream in the following. For profiles B and C, the Residual Data Box is formed by concatenating payload data of one or several JPEG XT marker segments by the mechanisms specified in Annex C. For profiles A, the residual codestream is contained in type 2 residual segments defined in Annex B.

NOTE – Unlike refinement coding, the residual codestream for profiles B and C is contained in a single box which may, however, spread over several JPEG XT Extension marker segments.

The residual codestream consists of a frame header, followed by at least one scan over the residual data. entropy decoding proceeds by one of the processes defined in ITU.T Rec.T.81|ISO/IEC 10918-1.

The dimensions of the residual image and the legacy image shall always be identical, while the number of components and component precisions may differ from the values in the frame header of the legacy codestream for some profiles. For details on the allowed deviations, see the profile description in Annex A. The subsampling parameters Hi and Vi may also differ between residual and legacy codestream..


Annex IEntropy Coding of Refinement Data



This Annex defines the procedures for decoding and encoding of images using an extended bitdepths in the DCT domain. If this coding method is used, the entropy coded data visible to legacy decoders describes an image with a precision of eight bits per sample, which is extended by a refinement coding procedure described in this Annex to up to 16 bits. Refinement coding can be combined with other coding methods, for example the residual coding method defined in Annex G, then allowing even higher bit depths of up to 16 bits per sample. Refinement coding provides a simple and efficient technique to extend the bit depths of the coded samples in a backwards compatible way even without requiring a residual codestream.

Refinement coding is signaled by a non-zero value of the h parameter in the Rspec or Lspec parameters sets of the Merging Specifications box; the value of h describes the additional bit precision becoming available due to the refinement scan described in this Annex. It may take values from zero – refinement scan disabled – to eight,.

Refinement of which data Box Type Used Number h of Refinement Scans specified in

Scan Type

Legacy Refinement Data Box Lspec.h Refinement (modified successive approximation scan)

Residual Residual Refinement Data Box

Rspec.h Refinement(modified successive approximation scan)

Table H.1- Refinement Scan Types and Encapsulating Box Types

If the refinement scan is enabled by setting h ≠ 0 in Rspec or Lspec, the quantization and inverse quantization process defined in ITU.T Rec. T-81| ISO/IEC 10918-1 have to be modified to include the additional bits, and the entropy coding and decoding procedures includes additional scans over the image to represent such bits. The entropy coded data of each scan is located in a separate Refinement Data Box or Residual Refinement Data Box, as specified in subclause B.4, which remains invisible to legacy decoders.

The DCT transformation and multi-component decorrelation transformation remain unchanged except that the DC level shift has to be modified to include the additional bits. That is, while subclauses F.1.1.3 and F.2.1.5 of Rec. ITU.T T.81|ISO/IEC 10918-1 specify a level shift of 27 for a sample precision of eight bit, the level shift is replaced by a shift of S dc

if the refinement scans are included in the decoding process. Details on the level shift of the DCT process are specified in full detail in Annex J, the value of Sdc is defined in subclause G.4. A decoder that chooses to ignore refinement scans will continue to operate with the level shift as indicated in the legacy standard and will be able to reconstruct a visible image, but will not take advantage of the enhanced bit precision. NOTE – Additional care must be taken to avoid overflow conditions in the DCT as the range of the input and output data is extended.

H.1 Modifications of the inverse quantization process for refinement decoding (normative)

The regular dequantization process in the absence of refinement scans multiplies the entropy decoded DCT coefficient value Si,j at block position i,j with the quantization matrix entry qi,j to get the DCT coefficient Ci,j. In the presence of a residual scan, h additional bits Ti,j are decoded by the refinement scan that shall be included as h least significant bits in the reconstruction of the DCT coefficient Ci,j as follows:

Ci,j = (2h×Si,j + Ti,j) × qi,j

NOTE – While the decoding process will ensure that T0,0 will always be positive or zero, all other Ti,j may be positive, negative or zero.


NOTE – An equivalent implementation strategy for the above inverse quantization procedure is to decode the data from the legacy scans represented by the entropy coded data following the SOS markers as if the point transformation parameter of the legacy scans would be set to Al+h (or 0+h) instead of Al (or zero). Al is the point transformation parameter of the scan header, see subclause A.4 and B.2.3 of ITU.T Rec.T.81| ISO/IEC 10918-1. Data decoded by the refinement scan uses an unmodified decoding procedure and is then automatically placed in the least significant bits of Si,j.

H.2 Modifications of the quantization process for residual coding (informative)In the presence of refinement scans, the quantization procedure generates from the DCT coefficients C i,j two data sets, the legacy quantized data Si,j

which will be encoded by the entropy coding methods provided by ITU.T Rec.T-81|ISO/IEC 10918-1, and the refinement data Ti,j which undergoes refinement coding. Si,j and Ti,j are computed from the DCT coefficients Ci,j and the quantization matrix qi,j as follows:

Xi,j = sign(Ci,j)×|Ci,j|/qi,j + 0.5

S0,0 = X0,0/2h T0,0 = X0,0 – 2h×S0,0

Si,j = sign(Xi,j) × |Xi,j|/2h Ti,j = Xi,j – 2h×Si,j for (i,j) ≠ (0,0)

NOTE – The division and rounding algorithm for computing S i,j generates a "fat zero" for AC coefficients, i.e. a dead zone of twice the size of a regular quantization bucket. DC coding does not follow this convention to avoid drift errors when encoding. The rounding conventions picked here are intentionally identical to the rounding procedure used when encoding data with the successive approximation of the progressive scan defined in ITU.T Rec.T.81|ISO/IEC 10918-1, and this is, in fact, one possible implementation strategy for refinement coding: The legacy codestream is created by an encoder as if the successive approximation parameter would be Al+h (or h) instead of Al (or zero), the remaining bits are then encoded by the refinement scan defined in this Annex.

H.3 Decoding process for entropy coded refinement data (normative)This subclause defines the decoding procedure for the refinement data T i,j that extends the precision of the DCT coefficients by h bits. Refinement data is encapsulated in Refinement Data Boxes, which are embedded in the legacy codestream by means of the mechanisms defined in Annex B. Each Refinement Data Box contains exactly one scan over the data. A JPEG XT image using refinements will thus contain multiple Refinement Data Boxes.

The contents of the Refinement Data box consists of marker segments that define the Huffman tables for refinement data coding, followed by a scan header and entropy coded data. The syntax of the contents of the Refinement Data box follows ITU.T Rec.T.81|ISO/IEC 10918-1, except that neither a frame header nor DQT marker segments shall be present. Restart markers, if indicated in the legacy codestream, shall however be included, following the syntax defined in the legacy standard, and a DHT marker segment defining the Huffman code table required to decode the data shall be present as well. The scan included in this stream is decoded by a modification of the successive approximation decoding algorithm using Huffman decoding defined in Annex G of ITU.T Rec.T.81|ISO/IEC 10918-1, regardless of the frame type indicated in the legacy codestream (i.e. regardless of whether the frame type is baseline, extended sequential or progressive).

[Tables/misc]

Scan header

ECS0 [RST0] … ECSlast [RSTlast]

Figure H-1: Syntax of the Contents of the Refinement Data Continuous Codestream

The entropy coding algorithm used for refinement decoding is identical to the successive approximation decoding defined in Annex G of ITU.T Rec.T.81|ISO/IEC 10918-1 except that the scan coefficients ZZ'(K) have to be initialized before starting the first refinement scan by the entropy decoded data of the legacy codestream shifted by 2h bits, i.e.

ZZ'(K) = 2h × ZZ(K)

where ZZ(k) is the value of the legacy decoded data at the k-th position of the zig-zag scan defined in Figure 5 of ITU.T Rec.T.81|ISO/IEC 10918-1. The refinement data Ti,j after decoding is then defined by the output ZZ'(K) of all successive approximation scans included in the JPEG Extensions refinement markers:

T(i,j)(k) = ZZ'(K) - 2h × ZZ(K)

NOTE – An alternative implementation strategy would first decode the legacy portion of the codestream using either the baseline, sequential or progressive decoding procedure, but with a point transformation/successive approximation parameter of Al+h (or h) for each scan included in the legacy codestream. The refinement scan extends these scans on the same scan buffer by using a successive approximation scan on the same scan buffer while using the successive approximation parameter exactly as inidicated in the scan headers included in the JPEG Extensions Refinement marker. Note further that the refinement scan always uses a successive approximation scan pattern, regardless of the frame type of the legacy codestream.


H.4 Encoding of Refinement Data (informative)This Annex specifies an encoding process of refinement data that is compatible to the decoding process defined in the above subclause. In the first step, the scan pattern ZZ'(K) is initialized with the quantized data encoded in the legacy codestream, Si,j, plus the refinement data Ti,j:

ZZ'(K) = 2h × ZZ(K) + T(i,j)(k)

where ZZ(K) is the legacy data at the k-th position in the zig-zag scan represented by the legacy codestream. Then, one or several successive approximation scans following the specifications in Annex G of ITU.T Rec.T.81|ISO/IEC 10918-1 encode the h least significant bits of the data, i.e. the Al parameter of the scans generated by the refinement scans will vary between h-1 and zero.

The resulting entropy coded data of each scan, together with the scan headers defining the scan pattern and the DHT marker segments defining the Huffman tables, but without any frame header or DQT markers, form the contents of the Refinement Data box. The output of each scan, together with the scan header and the DHT marker segments are encapsulated in such a box, and the box is split into one or multiple JPEG XT Extension marker segments by the mechanism described in subclause B.3.

NOTE – An alternative implementation would first use a legacy scan process with a point transformation of Al+h (or h) bits to encode the LDR data, and would then run a successive approximation scan on the same data buffer with an unmodified parameter of Al. Note further that the refinement encoding procedure always uses the successive approximation scan of the progressive coding mode, regardless of the frame type of the legacy codestream.


Annex JDiscrete Cosine Transformation



This Annex defines the inverse DCT transformation. The floating point IDCT is identical to the DCT process specified in ITU.T Rec.T-81|ISO/IEC 10918-1, and its full specification is not repeated here. It has to be slightly modified for refinement coding.

.

The DCT operation requires a transformation-dependent prescaling of its input data to align it to the requirements of the color decorrelation transformation in its corresponding decoding path. The DCT operating in the legacy codestream requires a prescaling by 2Rd dependent on the L-Transformation, the DCT in the residual path a prescaling by 2Rd

depending on the R-Transformation. The values of Rd are specified in Table F-1.

J.1 Overview on the Inverse DCT (IDCT) Transformation (normative)The Inverse DCT Transformation (IDCT) takes integer samples Q i,j in the form of an 8×8 block, and generates from that integer samples Yi,j also organized in an 8×8 block. The input samples Qi,j are integers generated by the dequantization procedure described in ITU.T Rec.T.81|ISO/IEC 10918-1, the output samples Yi,j are integer values as well. The DC level shift specified in subclauses A.3.., F.1.1.3 and F.2.1.5 of Rec. ITU.T 81 | ISO/IEC 10918-1 are included in the DCT algorithm specified by this Annex, thus forwards and inverse level shift need not to be performed if the algorithms shown here are followed. Modifications for refinement coding are also included.

DCT(all versions)

Level Shift

64 64

Figure 2: IDCT inputs and outputs

The steps of the algorithms are as follows:

Level-shift the DC coefficient Q0,0 and scale all coefficients to the precision of the decorrelation transformation; this step implements the inverse DC level shift of subclauses A.3.1 and F.2.1.5 of Rec. ITU.T Rec.T.81|ISO/IEC 10918-1 as part of the DCT transformation. For that, set:

X0,0 = Q0,0 + Sdc×23.

Xi,j = Qi,j for (i,j)≠(0,0)

Run an inverse DCT process on the 8×8 block as specified in subclause A.3.3 of ISO/IEC 10918-1.

J.2 Forward Floating Point DCT (informative)The forwards DCT process is compatible to the DCT process and level shifting described in subclause A.3.3 of Rec. ITU.T 81|ISO/IEC 10918-1. However, the following modifications apply to the process defined in this Recommendation|International Standard:

The DC Level Shift shall be Sdc instead of 2P-1, i.e. the level shift are scaled by the decorrelation scale factor


Annex KOutput Transformation



This Annex defines the process that computes the final output of the image reconstruction process given the sample values Fi computed by the merging process specified in subclause E.1.

The output of the reconstruction process is computed from the ouput of the merging process F i (see subclause E.1) and the flags Ot.Lf, Ot.Ce and Ot.Oc, all of them defined by the Merging Specification Box (see subclause B.8).

Lf flag Ce flag Oc flag Reconstructed value Comments

0 1 0 round(clamp(Fi,0,2Rb+8-1)) round() is an implementation

defined rounding to integer values

0 1 1 Ψexp(round(clamp(Fi,0,2Rb+8-1))) The function Ψexp is specified in

subclause K.1

all other combinations Reserved for ITU|ISO/IEC purposes

Table K-1: Output Transformation Depending on the Lf, Ce and Oc flags

K.1 Half-Exponential Output transformation (normative)

This annex defines an optional output transformation denoted by Ψexp that may help to improve the compression performance for floating point sample values. This output transformation is effective whenever the Oc parameter of the Merging Specification box is one, see subclause B.8 for details on this box. The Ψexp map implements a half-exponential map that converts its integer input to a floating point output. This map has an exact inverse Ψlog that allows also lossless coding of floating point samples.

The Ψexp map is defined by the following steps: Round the input value x to the nearest integer value The resulting integer value shall be represented in a 16-bit binary representation. The resulting bit-pattern shall be re-interpreted as an IEC 60559 (IEEE 754) floating point number. This floating

point number is the final output of the output transformation.

NOTE – It can be seen that this re-interpretation of bit-patterns on non-negative integers is equivalent to a piecewise linear approximation of the exponential function f(x) (2y-15) with y=x×2-10. Negative numbers never appear as the output of this process due to the output clamping.

Annex LReferences


Documents

Template for common text ISO/UIT-T - IPSJ/ITSCJ · Web viewA descriptive term for DCT-based encoding and decoding processes in which additional capabilities are added to the baseline