38
**************************** * ISO 14496-1 Media Format * **************************** - values use big endian (network) byte order - general terms: integer = signed value - general values: byte/char/octet = 8-bit value; short/word = 16-bit value; long = 32-bit value - fixed point values: value made up of an integer for whole numbers and an unsigned value for the decimal - binary values: base-2 long unsigned values (values from 0 and 1) - octal values: base-8 long unsigned values (values from 0 through to 7) - decimal values: base-10 long unsigned values (values from 0 through to 9) - hexadecimal (hex) values: base-16 long unsigned values (values from 0 to 9 and A to F) - box offsets: values relative to boxes only and are used to skip to the next box - sample chunk/block offsets: values relative to the file's length - UUID: a hexadecimal Universal Unique Identifier that is 128 bits in length FILE INFO Suffixes = ".mp4", ".m4a"; Mac OS Type = "mpg4"; Mac OS Creator = "TVOD"; MIME="video/mp4" and "audio/mp4" Standard single fork binary file that only uses a resource fork on HFS/HFS+ volumes to store mac specific file info, quicktime movie previews and can store a quicktime version of the file's header, but this is only valid if transcoded to the quicktime format as other storage media may not use or support multiple file forks. Unknown boxes can be safely skipped over, most boxes can be in any order and most lowercase long ASCII text strings used for box names/types were pre-defined by Apple and any others are reserved for future use by Apple and the ISO. It is discouraged

ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

****************************

* ISO 14496-1 Media Format *

****************************

- values use big endian (network) byte order

- general terms: integer = signed value

- general values: byte/char/octet = 8-bit value; short/word = 16-bit

value;

long = 32-bit value

- fixed point values: value made up of an integer for whole numbers

and an unsigned value for the decimal

- binary values: base-2 long unsigned values (values from 0 and 1)

- octal values: base-8 long unsigned values (values from 0 through to

7)

- decimal values: base-10 long unsigned values (values from 0 through

to 9)

- hexadecimal (hex) values: base-16 long unsigned values

(values from 0 to 9 and A to F)

- box offsets: values relative to boxes only

and are used to skip to the next box

- sample chunk/block offsets: values relative to the file's length

- UUID: a hexadecimal Universal Unique Identifier

that is 128 bits in length

FILE INFO

Suffixes = ".mp4", ".m4a"; Mac OS Type = "mpg4"; Mac OS Creator =

"TVOD";

MIME="video/mp4" and "audio/mp4"

Standard single fork binary file that only uses a resource fork on HFS/HFS+

volumes

to store mac specific file info, quicktime movie previews and can store

a quicktime

version of the file's header, but this is only valid if transcoded to the

quicktime

format as other storage media may not use or support multiple file forks.

Unknown boxes can be safely skipped over, most boxes can be in any order

and most

lowercase long ASCII text strings used for box names/types were

pre-defined by Apple

and any others are reserved for future use by Apple and the ISO. It is

discouraged

Page 2: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

to use custom boxes and to only use ISO defined ones.

Box type strings can be either standard length atom type strings or a 32

byte UUID,

UUIDs are appended following the standard type of 'uuid' and if the box

offset is

equal to one then a 64-bit box offset is appended after the box type string

or UUID.

Wide boxes used in the 'mdat' box can be used with other box types as

needed.

The term QUICKTIME denotes an unused atom/box or item from the format that

this one

was based upon. The terms 3GPP and APPLE denote custom additions to the

format.

Even though the original ISO specification is static Apple members have

added

bits from the 3GPP and iTunes versions as extensions such as those in parts

10 and 12.

FILE IDENTIFICATION

* 8+ bytes file type box = long unsigned offset + long ASCII text string

'ftyp'

-> 4 bytes major brand = long ASCII text main type string

-> 4 bytes major brand version = long unsigned main type revision value

-> 4+ bytes compatible brands = list of long ASCII text used technology

strings

- types are ISO 14496-1 Base Media = isom ; ISO 14496-12 Base Media

= iso2

- types are ISO 14496-1 vers. 1 = mp41 ; ISO 14496-1 vers. 2 = mp42

- types are quicktime movie = 'qt ' ; JVT AVC = avc1

- types are 3G MP4 profile = '3gp' + ASCII value ; 3G Mobile MP4 =

mmp4

- types are Apple AAC audio w/ iTunes info = 'M4A ' ; AES encrypted

audio = 'M4P '

- types are Apple audio w/ iTunes position = 'M4B ' ; ISO 14496-12

MPEG-7 meta data = 'mp71'

- NOTE: All compatible with 'isom', vers. 1 uses no Scene Description

Tracks,

vers. 2 uses the full part one spec, M4A uses custom ISO 14496-12

info,

Page 3: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

qt means the format complies with the original Apple spec, 3gp uses

sample

descriptions in the same style as the original Apple spec.

FILE MEDIA DATA

Note: if any box grows in excess of 2^32 bytes (> 4.2 GB), the box size

can be extended

in increments of 64 bits (18.4 EB).

By setting the box size to 1 and appending a new 64 bit box size.

This is why empty 'wide' boxes may be found on either side of this box

header for

future expansion of the sample data.

By setting the box size to 0, the media data box is open ended and extends

to the end

of the file.

* 8+ bytes media (sample) data box = long unsigned offset + long ASCII

text string 'mdat'

-> 8 bytes larger file offset place holder box

= long unsigned offset set to 8 + long ASCII text string 'wide'

OR

-> 8 bytes wider mdat box offset = 64-bit unsigned offset

- only if mdat standard offset set to 1

-> Sample data = hex dump

- Media with multiple tracks have sample data interleaved unless

preloaded.

UNUSED SPACE OR DATA TO BE DELETED/REUSED WITHIN FILE

* 8+ bytes free space (current) box

= long unsigned offset + long ASCII text string 'free'

* 8+ bytes skip over (older) box

= long unsigned offset + long ASCII text string 'skip'

* 8+ bytes widen (lengthen) file box

= long unsigned offset + long ASCII text string 'wide'

EXTERNAL MPEG-7 META DATA ONLY

* 8+ bytes optional ISO/IEC 14496-12 presentation meta data box

= long unsigned offset + long ASCII text string 'meta'

Page 4: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

* 8+ bytes ISO/IEC 14496-12 handler reference box

= long unsigned offset + long ASCII text string 'hdlr'

- this box must be toward the start of the meta box

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 4 bytes QUICKTIME type = long ASCII text string

(eg. Media Handler = 'mhlr')

-> 4 bytes subtype/meta data type = long ASCII text string

- types are MPEG-7 XML = 'mp7t' ; MPEG-7 binary XML = 'mp7b'

- type is APPLE meta data for iTunes reader = 'mdir'

-> 4 bytes QUICKTIME manufacturer reserved = long ASCII text string

(eg. Apple = 'appl' or 0)

-> 4 bytes QUICKTIME component reserved flags = long hex flags (none

= 0)

-> 4 bytes QUICKTIME component reserved flags mask = long hex mask

(none = 0)

-> component type name ASCII string

(eg. "Meta Data Handler" - no name = zero length string)

-> 1 byte component name string end = byte padding set to zero

- note: the quicktime spec uses a Pascal string

instead of the above C string

* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 XML box

= long unsigned offset + long ASCII text string 'xml '

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> MPEG-7 XML meta data = text dump

* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 binary XML box

= long unsigned offset + long ASCII text string 'bxml'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> MPEG-7 encoded XML meta data = hex dump

* 8+ bytes optional ISO/IEC 14496-12 item location box

= long unsigned offset + long ASCII text string 'iloc'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 1 nibble size of access offsets = 4 bits one byte multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8

-> 1 nibble size of data lengths = 4 bits one byte multiples

Page 5: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8

-> 1 nibble size of starting offset = 4 bits one byte multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8

-> 1 nibble reserved = 4 bits set to zero

-> 2 bytes number of locations = short unsigned index total

-> 2+ bytes item reference = short unsigned id

-> 2+ bytes stream data reference = short unsigned index from 'dref'

box

- if meta data item in same file set to zero

-> 1-8+ bytes starting offset = byte - dlong unsigned offset

-> 2+ bytes number of access points = short unsigned index total

-> 1-8+ bytes access offset = byte - dlong unsigned relative offset

(relative to starting offset)

-> 1-8+ bytes data length = byte - dlong unsigned length

* 8+ bytes optional ISO/IEC 14496-12 primary item box

= long unsigned offset + long ASCII text string 'pitm'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes main item reference = short unsigned id

* 8+ bytes optional ISO/IEC 14496-12 item encryption box

= long unsigned offset + long ASCII text string 'ipro'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes number of encryption boxes = short unsigned index total

* 8+ bytes ISO/IEC 14496-12 encryption scheme info box

= long unsigned offset + long ASCII text string 'sinf'

- if meta data encrypted to ISO/IEC 14496-12 standards

* 8+ bytes ISO/IEC 14496-12 original format box

= long unsigned offset + long ASCII text string 'frma'

-> 4 bytes description format = long ASCII text string

* 8+ bytes optional ISO/IEC 14496-12 IPMP info box

= long unsigned offset + long ASCII text string 'imif'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> IPMP descriptors = hex dump from IPMP part of ES Descriptor

box

* 8+ bytes optional ISO/IEC 14496-12 scheme type box

= long unsigned offset + long ASCII text string 'schm'

Page 6: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0 ; contains URI if flags = 0x000001)

-> 4 bytes encryption type = long ASCII text string

- types are 128-bit AES counter = 'ACM1' ; 128-bit AES

FS = 'AFS1'

- types are NULL algorithm = 'ENUL' ; 160-bit HMAC-SHA-1

= 'SHM2'

- types are RTCP = 'ANUL' ; private scheme = ' '

-> 2 bytes encryption version = short unsigned version

-> optional scheme URI string = UTF-8 text string

(eg. web site)

-> 1 byte optional scheme URI string end = byte padding set

to zero

* 8+ bytes ISO/IEC 14496-12 scheme data box

= long unsigned offset + long ASCII text string 'schi'

-> encryption related key = hex dump

* 8+ bytes optional ISO/IEC 14496-12 item information box

= long unsigned offset + long ASCII text string 'pitm'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes main item reference = short unsigned id

-> 2 bytes encryption box array value = short unsigned index

-> item name or URL string = UTF-8 text string

-> 1 byte name or URL c string end = byte value set to zero

-> item mime type string = UTF-8 text string

-> 1 byte mime type c string end = byte value set to zero

-> optional item transfer encoding string = UTF-8 text string

-> 1 byte optional transfer encoding c string end = byte value set

to zero

FILE MEDIA HEADER

Note: the header is safer when stored at the beginning of the file or in

another

file fork as HFS resource type 'moov'; ID any.

The advantage of using another file fork is that the header can be

lengthened

without recalculating the sample offsets or new header must be written

at the end

of the file.

Page 7: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes movie (presentation) box = long unsigned offset + long ASCII

text string 'moov'

* 8+ bytes QUICKTIME movie data reference atom

= long unsigned offset + long ASCII text string 'mdra'

- if this is used no other atoms or boxes should be present at this

level

* 8+ bytes data reference atom

= long unsigned offset + long ASCII text string 'dref'

-> 4 bytes reference type name = long ASCII text string

- types are file alias = 'alis' ; resource alias = 'rsrc' ;

- types are url c string = 'url '

-> 4 bytes reference version/flags

= byte hex version (current = 0) + 24-bit hex flags

- some flags are external data = 0x000000 ; internal data =

0x000001

-> mac os file alias record structure

OR

-> mac os file alias record structure plus resource info

OR

-> url c string = ASCII text string

-> 1 byte url c string end = byte value set to zero

* 8+ bytes QUICKTIME compressed moov atom

= long unsigned offset + long ASCII text string 'cmov'

- if this is used no other atoms should be present

as this is for an entire compressed movie resource

* 8+ bytes data compression atom

= long unsigned offset + long ASCII text string 'dcom'

-> 4 bytes compression code = long ASCII text string

- compression codes are Deflate = 'zlib' ; Apple Compression

= 'adec'

* 8+ bytes compressed moov data atom

= long unsigned offset + long ASCII text string 'cmvd'

-> 4 bytes uncompressed size = long unsigned value

-> entire compressed movie 'moov' resource = hex dump

* 8+ bytes QUICKTIME reference movie record atom

= long unsigned offset + long ASCII text string 'rmra'

- if this atom is used it must come first within the movie resource

box

Page 8: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes reference movie descriptor atom

= long unsigned offset + long ASCII text string 'rmda'

* 8+ bytes reference movie data reference atom

= long unsigned offset + long ASCII text string 'rdrf'

-> 4 bytes reference version/flags

= byte hex version (current = 0) + 24-bit hex flags

- some flags are external data = 0x000000 ; internal data

= 0x000001

-> 4 bytes reference type name = long ASCII text string (if

internal = 0)

- types are file alias = 'alis' ; resource alias = 'rsrc' ;

- types are url c string = 'url '

-> 4+ bytes reference data = long unsigned length

-> mac os file alias record structure

OR

-> mac os file alias record structure plus resource info

OR

-> url c string = ASCII text string

-> 1 byte url c string end = byte value set to zero

* 8+ bytes optional reference movie quality atom

= long unsigned offset + long ASCII text string 'rmqu'

-> 4 bytes queue position = long unsigned value from 100 to

0

* 8+ bytes optional reference movie cpu rating atom

= long unsigned offset + long ASCII text string 'rmcs'

-> 4 bytes reserved flag = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes speed rating = short unsigned value from 500 to 100

* 8+ bytes optional reference movie version check atom

= long unsigned offset + long ASCII text string 'rmvc'

-> 4 bytes flags = byte hex version + 24-bit hex flags (current

= 0)

-> 4 bytes gestalt selector = long ASCII text string

(eg. quicktime = 'qtim')

-> 4 bytes gestalt min value = long hex value

(eg. QT 3.02 mac file version = 0x03028000)

-> 4 bytes gestalt no value = long value set to zero

OR

Page 9: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes gestalt value mask = long hex mask

-> 4 bytes gestalt value = long hex value

-> 2 bytes gestalt check type = short unsigned value

(min value = 0 or mask = 1)

* 8+ bytes optional reference movie component check atom

= long unsigned offset + long ASCII text string 'rmcd'

-> 4 bytes flags = byte hex version + 24-bit hex flags (current

= 0)

-> 8 bytes component type/subtype

= long ASCII text string + long ASCII text string

(eg. Timecode Media Handler = 'mhlrtmcd')

-> 4 bytes component manufacturer = long ASCII text string

(eg. Apple = 'appl' or 0)

-> 4 bytes component flags = long hex flags (none = 0)

-> 4 bytes component flags mask = long hex mask (none = 0)

-> 4 bytes component min version = long hex value (none = 0)

* 8+ bytes optional reference movie data rate atom

= long unsigned offset + long ASCII text string 'rmdr'

-> 4 bytes flags = byte hex version + 24-bit hex flags (current

= 0)

-> 4 bytes data rate = long integer bit rate value

- common analog modem rates are 1400; 2800; 3300; 5600

- common broadband rates are 5600; 11200; 25600; 38400;

51200; 76800; 100000

- common high end broadband rates are T1 = 150000; no

limit/LAN = 0x7FFFFFFF

* 8+ bytes optional reference movie language atom

= long unsigned offset + long ASCII text string 'rmla'

-> 4 bytes flags = byte hex version + 24-bit hex flags (current

= 0)

-> 2 bytes mac language = short unsigned language value

(english = 0)

* 8+ bytes optional reference movie alternate group atom

= long unsigned offset + long ASCII text string 'rmag'

(structure was not provided in MoviesFormat.h of the 4.1.2

win32 sdk)

-> 4 bytes flags = long value set to zero

-> 2 bytes alternate/other = short integer track id value (none

= 0)

Page 10: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes optional initial object descriptor box

= long unsigned offset + long ASCII text string 'iods'

- NOTE: this was added in vers. 2 of spec

-> 4 bytes version/flags = 8-bit hex version + 24-bit hex flags

-> 1 byte file IOD type tag = 8-bit hex value 0x10

-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value

- types are Start = 0x80 ; End = 0xFE

- NOTE: the extended start tags may be left out

-> 1 byte descriptor type length = 8-bit unsigned length

-> 2 bytes OD ID = 16-bit unsigned value

-> 1 byte OD profile level = 8-bit unsigned value

-> 1 byte scene profile level = 8-bit unsigned value

-> 1 byte audio profile level = 8-bit unsigned value

-> 1 byte video profile level = 8-bit unsigned value

-> 1 byte graphics profile level = 8-bit unsigned value

- NOTE: if level unused then set to 0xFF

-> 1 byte ES ID included descriptor type tag = 8-bit hex value 0x0E

-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value

- types are Start = 0x80 ; End = 0xFE

- NOTE: the extended start tags may be left out

-> 1 byte descriptor type length = 8-bit unsigned length

-> 4 bytes Track ID = 32-bit unsigned value

- NOTE: refers to non-data system tracks

* 8+ bytes movie (presentation) header box

= long unsigned offset + long ASCII text string 'mvhd'

-> 1 byte version = 8-bit unsigned value

- if version is 1 then date and duration values are 8 bytes in length

-> 3 bytes flags = 24-bit hex flags (current = 0)

-> 4 bytes created mac UTC date

= long unsigned value in seconds since beginning 1904 to 2040

-> 4 bytes modified mac UTC date

= long unsigned value in seconds since beginning 1904 to 2040

OR

-> 8 bytes created mac UTC date

= 64-bit unsigned value in seconds since beginning 1904

-> 8 bytes modified mac UTC date

= 64-bit unsigned value in seconds since beginning 1904

Page 11: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes time scale = long unsigned time unit per second (default

= 600)

-> 4 bytes duration = long unsigned time length (in time units)

OR

-> 8 bytes duration = 64-bit unsigned time length (in time units)

-> 4 bytes decimal user playback speed = long fixed point rate (normal

= 1.0)

-> 2 bytes decimal user volume = short fixed point level

(mute = 0.0 ; normal = 1.0 ; QUICKTIME MAX = 3.0)

-> 10 bytes reserved = 5 * short values set to zero

-> 4 bytes decimal window geometry matrix value A

= long fixed point width scale (normal = 1.0)

-> 4 bytes decimal window geometry matrix value B

= long fixed point width rotate (normal = 0.0)

-> 4 bytes decimal window geometry matrix value U

= long fixed point width angle (restricted to 0.0)

-> 4 bytes decimal window geometry matrix value C

= long fixed point height rotate (normal = 0.0)

-> 4 bytes decimal window geometry matrix value D

= long fixed point height scale (normal = 1.0)

-> 4 bytes decimal window geometry matrix value V

= long fixed point height angle (restricted to 0.0)

-> 4 bytes decimal window geometry matrix value X

= long fixed point positon (left = 0.0)

-> 4 bytes decimal window geometry matrix value Y

= long fixed point positon (top = 0.0)

-> 4 bytes decimal window geometry matrix value W

= long fixed point divider scale (restricted to 1.0)

-> 8 bytes QUICKTIME preview

= long unsigned start time + long unsigned time length (in time

units)

-> 4 bytes QUICKTIME still poster

= long unsigned frame time (in time units)

-> 8 bytes QUICKTIME selection time

= long unsigned start time + long unsigned time length (in time

units)

-> 4 bytes QUICKTIME current time = long unsigned frame time (in time

units)

-> 4 bytes next/new track id = long integer value (single track =

2)

Page 12: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes QUICKTIME clipping (mask) atom

= long unsigned offset + long ASCII text string 'clip'

* 8+ bytes clipping region atom

= long unsigned offset + long ASCII text string 'crgn'

-> 2 bytes region size = short unsigned box size

-> 8 bytes region boundary

= long fixed point x value + long fixed point y value

-> QuickDraw Region Data = hex dump

* 8+ bytes track (element) box = long unsigned offset + long ASCII text

string 'trak'

* 8+ bytes track (element) header box

= long unsigned offset + long ASCII text string 'tkhd'

-> 1 byte version = byte unsigned value

- if version is 1 then date and duration values are 8 bytes in

length

-> 3 bytes flags = 24-bit unsigned flags

- sum of TrackEnabled = 1 ; TrackInMovie = 2 ;

TrackInPreview = 4; TrackInPoster = 8

- MPEG-4 only defines TrackEnabled as being valid

-> 4 bytes created mac UTC date

= long unsigned value in seconds since beginning 1904 to 2040

-> 4 bytes modified mac UTC date

= long unsigned value in seconds since beginning 1904 to 2040

OR

-> 8 bytes created mac UTC date

= 64-bit unsigned value in seconds since beginning 1904

-> 8 bytes modified mac UTC date

= 64-bit unsigned value in seconds since beginning 1904

-> 4 bytes track id = long integer value (first track = 1)

-> 8 bytes reserved = 2 * long value set to zero

-> 4 bytes duration = long unsigned time length (in time units)

OR

-> 8 bytes duration = 64-bit unsigned time length (in time units)

- if duration is undefined set above bits to all ones

-> 4 bytes reserved = long value set to zero

-> 2 bytes video layer = short integer positon

(middle = 0 ; negatives are in front)

Page 13: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 2 bytes QUICKTIME alternate/other = short integer track id

(none = 0)

-> 2 bytes track audio volume = short fixed point level

(mute = 0x0001 ; 100% = 1.0 ; QUICKTIME 200% max = 2.0)

-> 2 bytes reserved = short value set to zero

-> 4 bytes decimal video geometry matrix value A

= long fixed point width scale (normal = 1.0)

-> 4 bytes decimal video geometry matrix value B

= long fixed point width rotate (normal = 0.0)

-> 4 bytes decimal video geometry matrix value U

= long fixed point width angle (restricted to 0.0)

-> 4 bytes decimal video geometry matrix value C

= long fixed point height rotate (normal = 0.0)

-> 4 bytes decimal video geometry matrix value D

= long fixed point height scale (normal = 1.0)

-> 4 bytes decimal video geometry matrix value V

= long fixed point height angle (restricted to 0.0)

-> 4 bytes decimal video geometry matrix value X

= long fixed point positon (left = 0.0)

-> 4 bytes decimal video geometry matrix value Y

= long fixed point positon (top = 0.0)

-> 4 bytes decimal video geometry matrix value W

= long fixed point divider scale (restricted to 1.0)

-> 8 bytes decimal video frame size

= long fixed point width + long fixed point height

* 8+ bytes QUICKTIME clipping (mask) atom

= long unsigned offset + long ASCII text string 'clip'

- see moov clipping atom above

* 8+ bytes QUICKTIME matte (video overlay) atom

= long unsigned offset + long ASCII text string 'matt'

* 8+ bytes compressed matte atom

= long unsigned offset + long ASCII text string 'kmat'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> Matte Image Description Structure

(similar to Media Sample Description Table)

-> Matte Data = hex dump

* 8+ bytes optional edits (# of external tracks) box

= long unsigned offset + long ASCII text string 'edts'

Page 14: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- if tracks are of different start times this atom is needed to

maintain media sync.

* 8+ bytes optional edit list box

= long unsigned offset + long ASCII text string 'elst'

-> 1 byte version = byte unsigned value

- if version is 1 then duration values are 8 bytes in length

-> 3 bytes flags = 24-bit hex flags (current = 0)

-> 4 bytes number of edits = long unsigned total (default =

1)

-> 8 bytes edit time

= long unsigned time length + long unsigned start time (in

time units)

OR

-> 16 bytes edit time

= 64-bit unsigned time length + 64-bit unsigned start

time (in time units)

- if start time is -1, then that time length is edited out

-> 4 bytes decimal playback speed = long fixed point rate

(normal = 1.0)

* 8+ bytes QUICKTIME preload atom

= long unsigned offset + long ASCII text string 'load'

-> 8 bytes preload time

= long unsigned start time + long unsigned time length (in

time units)

-> 4 bytes flags = long integer value

- flags are PreloadAlways = 1 or TrackEnabledPreload = 2

-> 4 bytes default hints flags = long hex data play options

- flags are KeepInBuffer = 0x00000004 ; HighQuality =

0x00000100 ;

- flags are SingleFieldPlayback = 0x00100000

- flags are DeinterlaceFields = 0x04000000

* 8+ bytes optional track references box

= long unsigned offset + long ASCII text string 'tref'

* 8+ bytes type of reference box

= long unsigned offset + long ASCII text string

-> vers. 1 box type is stream hint = 'hint'

-> vers. 2 box types are other dependency = 'dpnd' ; IPI

declarations = 'ipir'

Page 15: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> vers. 2 box types are elementary stream = 'mpod' ;

-> vers. 2 box types are synchronization (video/audio) = 'sync

-> QUICKTIME atom types are timecode = 'tmcd'; chapterlist =

'chap'

-> QUICKTIME atom types are transcript (text) = 'scpt'

-> QUICKTIME atom types are non-primary source (used in other

track) = 'ssrc'

-> 4+ bytes Track IDs = long integer track numbers (Disabled

Track ID = 0)

* 8+ bytes QUICKTIME non-primary source input map atom

= long unsigned offset + long ASCII text string 'imap'

* 8+ bytes input atom

= long unsigned offset + long ASCII text string 0x0000 + 'in'

-> 4 bytes atom ID = long integer atom reference (first ID =

1)

-> 2 bytes reserved = short value set to zero

-> 2 bytes number of internal atoms = short unsigned count

-> 4 bytes reserved = long value set to zero

* 8+ bytes input type atom

= 32-bit integer unsigned + long ASCII text string 0x0000

+ 'ty'

-> 4 bytes type modifier name = long integer value

-> name values are matrix = 1 ; clip = 2 ;

-> name values are volume = 3; audio balance = 4

-> name values are graphics mode = 5; matrix object = 6

-> name values are graphics mode object = 7; image type

= 'vide'

* 8+ bytes object ID atom

= long unsigned offset + long ASCII text string 'obid'

-> 4 bytes object ID = long integer value

* 8+ bytes media (stream) box = long unsigned offset + long ASCII

text string 'mdia'

* 8+ bytes media (stream) header box

= long unsigned offset + long ASCII text string 'mdhd'

-> 1 byte version = byte unsigned value

- if version is 1 then date and duration values are 8 bytes

in length

-> 3 bytes flags = 24-bit unsigned flags (current = 0)

Page 16: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes created mac UTC date

= long unsigned value in seconds since beginning 1904 to

2040

-> 4 bytes modified mac UTC date

= long unsigned value in seconds since beginning 1904 to

2040

OR

-> 8 bytes created mac UTC date

= 64-bit unsigned value in seconds since beginning 1904

-> 8 bytes modified mac UTC date

= 64-bit unsigned value in seconds since beginning 1904

-> 4 bytes time scale = long unsigned media time unit

(video = fps rate ; audio = sample per sec. rate)

-> 4 bytes duration = long unsigned media time length (in media

time units)

OR

-> 8 bytes duration = 64-bit unsigned time length (in time

units)

-> 1/8 byte ISO language padding = 1-bit value set to 0

-> 1 7/8 bytes content language = 3 * 5-bits ISO 639-2 language

code less 0x60

- example code for english = 0x15C7

-> 2 bytes QUICKTIME quality = short integer playback quality

value (normal = 0)

* 8+ bytes handler reference box

= long unsigned offset + long ASCII text string 'hdlr'

- this box must be toward the start of the media box

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 4 bytes QUICKTIME type = long ASCII text string

(eg. Media Handler = 'mhlr')

-> 4 bytes subtype/media type = long ASCII text string

- types are Visual Media = 'vide' ; Audio Media = 'soun' ;

Hint = "hint'

- types are Object Descriptor = 'odsm' ; Clock Reference =

'crsm'

- types are Scene Description = 'sdsm' ; MPEG-7 Stream =

'm7sm'

Page 17: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- types are Object Content Info = 'ocsm' ; IPMP = 'ipsm' :

MPEG-J = 'mjsm'

-> 4 bytes QUICKTIME manufacturer reserved = long ASCII text

string

(eg. Apple = 'appl' or 0)

-> 4 bytes QUICKTIME component reserved flags = long hex flags

(none = 0)

-> 4 bytes QUICKTIME component reserved flags mask = long hex

mask (none = 0)

-> component type name ASCII string

(eg. "Media Handler" - no name = zero length string)

-> 1 byte component name string end = byte padding set to zero

- note: the quicktime spec uses a Pascal string

instead of the above C string

* 8+ bytes media (stream) information box

= long unsigned offset + long ASCII text string 'minf'

* 8+ bytes visual media (stream) info header box

= long unsigned offset + long ASCII text string 'vmhd'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

- version = 0 ; flags = 0x000001 for QUICKTIME or zero

MPEG-4

-> 2 bytes QuickDraw graphic mode = short hex type

- mode types are copy = 0x0000 ; dither copy = 0x0040 ;

straight alpha = 0x0100

- mode types are composition dither copy = 0x0103 ; blend

= 0x0020

- mode premultipled types are white alpha = 0x101 ; black

alpha = 0x102

- mode color types are transparent = 0x0024; straight

alpha blend = 0x0104

- NOTE: MPEG-4 only uses copy mode and quicktime uses

dither copy by default

-> 6 bytes graphic mode color = 3 * short unsigned QuickDraw

RGB color values

OR

* 8+ bytes sound media (stream) info header box

= long unsigned offset + long ASCII text string 'smhd'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags (current = 0)

-> 2 bytes audio balance = short fixed point value

Page 18: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- balnce scale is left = negatives ; normal = 0.0 ; right

= positives

-> 2 bytes reserved = short value set to zero

OR

* 8+ bytes hint stream (stream) info header box

= long unsigned offset + long ASCII text string 'hint'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags (current = 0)

-> 2 bytes maximum packet delivery unit = short unsigned

value

-> 2 bytes average packet delivery unit = short unsigned

value

-> 4 bytes maximum bit rate = long unsigned value

-> 4 bytes average bit rate = long unsigned value

-> 4 bytes reserved = long value set to zero

OR

* 8+ bytes mpeg-4 media (stream) header box

= long unsigned offset + long ASCII text string 'nmhd'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags (current = 0)

* 8+ bytes QUICKTIME handler reference atom

= long unsigned offset + long ASCII text string 'hdlr'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags (current = 0)

-> 8 bytes type/subtype = long ASCII text string + long ASCII

text string

(eg. Alias Data Handler = 'dhlralis' ; URL Data Handler

= 'dhlrurl ')

-> 4 bytes manufacturer reserved = long ASCII text string

(eg. Apple = 'appl' or 0)

-> 4 bytes component reserved flags = long hex flags (none

= 0)

-> 4 bytes component reserved flags mask = long hex mask

(none = 0)

-> 1 byte component name string length = byte unsigned length

(no name = zero length string)

-> component type name ASCII string (eg. "Data Handler")

* 8+ bytes data (locator) information box

= long unsigned offset + long ASCII text string 'dinf'

* 8+ bytes data reference box

= long unsigned offset + long ASCII text string 'dref'

Page 19: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 4 bytes number of references = long unsigned total

(minimum = 1)

* 8+ bytes reference type box

= long unsigned offset + long ASCII text string

- box types are url c string = 'url ' ; urn c strings

= 'urn '

- QUICKTIME atom types are file alias = 'alis' ;

resource alias = 'rsrc'

-> 4 bytes version/flags

= byte hex version (current = 0) + 24-bit hex

flags

- some flags are external data = 0x000000 ;

internal data = 0x000001

-> url c string = ASCII text string points to external

data

-> 1 byte url c string end = byte value set to zero

OR

-> urn c string = ASCII text string points to external

data

-> 1 byte urn c string end = byte value set to zero

-> url c string = ASCII text string points to external

data

-> 1 byte url c string end = byte value set to zero

OR

-> QUICKTIME mac os file alias record structure

points to external data

OR

-> QUICKTIME mac os file alias record structure

plus resource info points to external data

OR

* 8+ bytes Data URL box

= long unsigned offset + long ASCII text string 'url

'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> url c string = ASCII text string points to external

data

-> 1 byte url c string end = byte value set to zero

Page 20: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

OR

* 8+ bytes Data URN box = long unsigned offset + long ASCII

text string 'urn '

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags (current = 0)

-> urn c string = ASCII text string points to external

data

-> 1 byte urn c string end = byte value set to zero

-> url c string = ASCII text string points to external

data

-> 1 byte url c string end = byte value set to zero

* 8+ bytes sample (framing info) table box

= long unsigned offset + long ASCII text string 'stbl'

* 8+ bytes sample (frame encoding) description box

= long unsigned offset + long ASCII text string 'stsd'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 4 bytes number of descriptions = long unsigned total

(default = 1)

-> 4 bytes description length = long unsigned length

-> 4 bytes description visual format = long ASCII text

string 'mp4v'

- if encoded to ISO/IEC 14496-10 or 3GPP AVC standards

then use:

-> 4 bytes description visual format = long ASCII text

string 'avc1'

- if encrypted to ISO/IEC 14496-12 or 3GPP standards

then use:

-> 4 bytes description visual format = long ASCII text

string 'encv'

- if encoded to 3GPP H.263v1 standards then use:

-> 4 bytes description visual format = long ASCII text

string 's263'

-> 6 bytes reserved = 48-bit value set to zero

-> 2 bytes data reference index

= short unsigned index from 'dref' box

- there are other sample descriptions

available in the Apple QT format dev docs

-> 2 bytes QUICKTIME video encoding version = short hex

version

Page 21: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- default = 0 ; audio data size before decompression

= 1

-> 2 bytes QUICKTIME video encoding revision level =

byte hex version

- default = 0 ; video can revise this value

-> 4 bytes QUICKTIME video encoding vendor = long ASCII

text string

- default = 0

-> 4 bytes QUICKTIME video temporal quality = long

unsigned value (0 to 1024)

-> 4 bytes QUICKTIME video spatial quality = long

unsigned value (0 to 1024)

- some quality values are lossless = 1024 ; maximum

= 1023 ; high = 768

- some quality values are normal = 512 ; low = 256 ;

minimum = 0

-> 4 bytes video frame pixel size

= short unsigned width + short unsigned height

-> 8 bytes video resolution

= long fixed point horizontal + long fixed point

vertical

- defaults to 72.0 dpi

-> 4 bytes QUICKTIME video data size = long value set

to zero

-> 2 bytes video frame count = short unsigned total (set

to 1)

-> 1 byte video encoding name string length = byte

unsigned length

-> 31 bytes video encoder name string

-> NOTE: if video encoder name string < 31 chars then

pad with zeros

-> 2 bytes video pixel depth = short unsigned bit depth

- colors are 1 (Monochrome), 2 (4), 4 (16), 8 (256)

- colors are 16 (1000s), 24 (Ms), 32 (Ms+A)

- grays are 33 (B/W), 34 (4), 36 (16), 40(256)

-> 2 bytes QUICKTIME video color table id = short integer

value

(no table = -1)

-> optional QUICKTIME color table data if above set to

0

(see color table atom below for layout)

OR

-> 4 bytes description length = long unsigned length

Page 22: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes description audio format = long ASCII text

string 'mp4a'

- if encrypted to ISO/IEC 14496-12 or 3GPP standards

then use:

-> 4 bytes description audio format = long ASCII text

string 'enca'

- if encoded to 3GPP GSM 6.10 AMR narrowband standards

then use:

-> 4 bytes description audio format = long ASCII text

string 'samr'

- if encoded to 3GPP GSM 6.10 AMR wideband standards

then use:

-> 4 bytes description audio format = long ASCII text

string 'sawb'

-> 6 bytes reserved = 48-bit value set to zero

-> 2 bytes data reference index

= short unsigned index from 'dref' box

-> 2 bytes QUICKTIME audio encoding version = short hex

version

- default = 0 ; audio data size before decompression

= 1

-> 2 bytes QUICKTIME audio encoding revision level

= byte hex version

- default = 0 ; video can revise this value

-> 4 bytes QUICKTIME audio encoding vendor

= long ASCII text string

- default = 0

-> 2 bytes audio channels = short unsigned count

(mono = 1 ; stereo = 2)

-> 2 bytes audio sample size = short unsigned value

(8 or 16)

-> 2 bytes QUICKTIME audio compression id = short

integer value

- default = 0

-> 2 bytes QUICKTIME audio packet size = short value set

to zero

-> 4 bytes audio sample rate = long unsigned fixed point

rate

OR

-> 4 bytes description length = long unsigned length

-> 4 bytes description system format = long ASCII text

string 'mp4s'

- if encrypted to ISO/IEC 14496-12 standards then use:

Page 23: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes description system format = long ASCII text

string 'encs'

-> 6 bytes reserved = 48-bit value set to zero

-> 2 bytes data reference index

= short unsigned index from 'dref' box

* 8+ bytes ISO/IEC 14496-12/3GPP encryption scheme

info box

= long unsigned offset + long ASCII text string

'sinf'

- if stream encrypted to ISO/IEC 14496-12 standards

* 8+ bytes ISO/IEC 14496-12/3GPP/QUICKTIME

original format box

= long unsigned offset + long ASCII text string

'frma'

-> 4 bytes description format = long ASCII text

string

- formats are MPEG-4 visual = 'mp4v' ; MPEG-4

AVC = 'avc1'

- formats are MPEG-4 audio = 'mp4a' ; MPEG-4

system = 'mp4s'

- 3GPP formats are H.253 = 's263' ; AMR narrow

= 'samr'

- 3GPP format is AMR wide = 'sawb'

* 8+ bytes optional ISO/IEC 14496-12 IPMP info box

= long unsigned offset + long ASCII text string

'imif'

-> 4 bytes version/flags = byte hex version +

24-bit hex flags

(current = 0)

-> IPMP descriptors = hex dump from IPMP part of

ES Descriptor box

* 8+ bytes optional ISO/IEC 14496-12/3GPP scheme

type box

= long unsigned offset + long ASCII text string

'schm'

-> 4 bytes version/flags = byte hex version +

24-bit hex flags

(current = 0 ; contains URI if flags =

0x000001)

Page 24: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 4 bytes encryption type = long ASCII text

string

- types are 128-bit AES counter = 'ACM1' ;

128-bit AES FS = 'AFS1'

- types are NULL algorithm = 'ENUL' ; 160-bit

HMAC-SHA-1 = 'SHM2'

- types are RTCP = 'ANUL' ; private scheme = '

'

-> 2 bytes encryption version = short unsigned

version

-> optional scheme URI string = UTF-8 text string

(eg. web site)

-> 1 byte optional scheme URI string end = byte

padding set to zero

* 8+ bytes ISO/IEC 14496-12/3GPP scheme data box

= long unsigned offset + long ASCII text string

'schi'

-> encryption related key = hex dump

* 8+ bytes 3GPP H.263v1 decode config box

= long unsigned offset + long ASCII text string

'd263'

-> 4 bytes encoder vendor = long ASCII text string

-> 1 byte encoder version = 8-bit unsigned revision

-> 1 byte H.263 level = 8-bit unsigned stream level

-> 1 byte H.263 profile = 8-bit unsigned stream

profile

* 8+ bytes optional 3GPP H.263v1 bit rate box

= long unsigned offset + long ASCII text string

'bitr'

-> 4 bytes average bit rate = 32-bit unsigned

value

-> 4 bytes maximum bit rate = 32-bit unsigned

value

* 8+ bytes 3GPP GSM 6.10 AMR decode config box

= long unsigned offset + long ASCII text string

'damr'

-> 4 bytes encoder vendor = long ASCII text string

-> 1 byte encoder version = 8-bit unsigned revision

-> 2 byte packet modes = 16-bit unsigned bit mode

index

Page 25: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 1 byte number of packet mode changes = 8-bit

unsigned value

-> 1 byte samples per packet = 8-bit unsigned value

* 8+ bytes ISO/IEC 14496-10 or 3GPP AVC decode config

box

= long unsigned offset + long ASCII text string

'avcC'

-> 1 byte version = 8-bit hex version (current = 1)

-> 1 byte H.264 profile = 8-bit unsigned stream

profile

-> 1 byte H.264 compatible profiles = 8-bit hex flags

-> 1 byte H.264 level = 8-bit unsigned stream level

-> 1 1/2 nibble reserved = 6-bit unsigned value set

to 63

-> 1/2 nibble NAL length = 2-bit length byte size type

- 1 byte = 0 ; 2 bytes = 1 ; 4 bytes = 3

-> 1 byte number of SPS = 8-bit unsigned total

-> 2+ bytes SPS length = short unsigned length

-> + SPS NAL unit = hexdump

-> 1 byte number of PPS = 8-bit unsigned total

-> 2+ bytes PPS length = short unsigned length

-> + PPS NAL unit = hexdump

* 8+ bytes vers. 2 ES Descriptor box

= long unsigned offset + long ASCII text string

'esds'

- if encoded to ISO/IEC 14496-10 AVC standards then

optionally use:

= long unsigned offset + long ASCII text string

'm4ds'

-> 4 bytes version/flags = 8-bit hex version + 24-bit

hex flags

(current = 0)

-> 1 byte ES descriptor type tag = 8-bit hex value

0x03

-> 3 bytes extended descriptor type tag string = 3

* 8-bit hex value

- types are Start = 0x80 ; End = 0xFE

- NOTE: the extended start tags may be left out

-> 1 byte descriptor type length = 8-bit unsigned

length

Page 26: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 2 bytes ES ID = 16-bit unsigned value

-> 1 byte stream priority = 8-bit unsigned value

- Defaults to 16 and ranges from 0 through to 31

-> 1 byte decoder config descriptor type tag =

8-bit hex value 0x04

-> 3 bytes extended descriptor type tag string

= 3 * 8-bit hex value

- types are Start = 0x80 ; End = 0xFE

- NOTE: the extended start tags may be left out

-> 1 byte descriptor type length = 8-bit unsigned

length

-> 1 byte object type ID = 8-bit unsigned value

- type IDs are system v1 = 1 ; system v2 =

2

- type IDs are MPEG-4 video = 32 ; MPEG-4 AVC

SPS = 33

- type IDs are MPEG-4 AVC PPS = 34 ; MPEG-4

audio = 64

- type IDs are MPEG-2 simple video = 96

- type IDs are MPEG-2 main video = 97

- type IDs are MPEG-2 SNR video = 98

- type IDs are MPEG-2 spatial video = 99

- type IDs are MPEG-2 high video = 100

- type IDs are MPEG-2 4:2:2 video = 101

- type IDs are MPEG-4 ADTS main = 102

- type IDs are MPEG-4 ADTS Low Complexity =

103

- type IDs are MPEG-4 ADTS Scalable Sampling

Rate = 104

- type IDs are MPEG-2 ADTS = 105 ; MPEG-1

video = 106

- type IDs are MPEG-1 ADTS = 107 ; JPEG video

= 108

- type IDs are private audio = 192 ; private

video = 208

- type IDs are 16-bit PCM LE audio = 224 ;

vorbis audio = 225

- type IDs are dolby v3 (AC3) audio = 226 ;

alaw audio = 227

- type IDs are mulaw audio = 228 ; G723 ADPCM

audio = 229

Page 27: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- type IDs are 16-bit PCM Big Endian audio

= 230

- type IDs are Y'CbCr 4:2:0 (YV12) video =

240 ; H264 video = 241

- type IDs are H263 video = 242 ; H261 video

= 243

-> 6 bits stream type = 3/4 byte hex value

- type IDs are object descript. = 1 ; clock

ref. = 2

- type IDs are scene descript. = 4 ; visual

= 4

- type IDs are audio = 5 ; MPEG-7 = 6 ; IPMP

= 7

- type IDs are OCI = 8 ; MPEG Java = 9

- type IDs are user private = 32

-> 1 bit upstream flag = 1/8 byte hex value

-> 1 bit reserved flag = 1/8 byte hex value set

to 1

-> 3 bytes buffer size = 24-bit unsigned value

-> 4 bytes maximum bit rate = 32-bit unsigned

value

-> 4 bytes average bit rate = 32-bit unsigned

value

-> 1 byte decoder specific descriptor type

tag

= 8-bit hex value 0x05

-> 3 bytes extended descriptor type tag

string

= 3 * 8-bit hex value

- types are Start = 0x80 ; End = 0xFE

- NOTE: the extended start tags may be left

out

-> 1 byte descriptor type length

= 8-bit unsigned length

-> ES header start codes = hex dump

-> 1 byte SL config descriptor type tag = 8-bit

hex value 0x06

-> 3 bytes extended descriptor type tag string

= 3 * 8-bit hex value

- types are Start = 0x80 ; End = 0xFE

- NOTE: the extended start tags may be left out

Page 28: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 1 byte descriptor type length = 8-bit unsigned

length

-> 1 byte SL value = 8-bit hex value set to 0x02

* 8+ bytes QUICKTIME video gamma atom

= long unsigned offset + long ASCII text string

'gama'

-> 4 bytes decimal level = long fixed point level

* 8+ bytes QUICKTIME video field order atom

= long unsigned offset + long ASCII text string

'fiel'

-> 2 bytes field count/order = byte integer total +

byte integer order

* 8+ bytes QUICKTIME video m-jpeg quantize table atom

= long unsigned offset + long ASCII text string

'mjqt'

-> quantization table = hex dump

* 8+ bytes QUICKTIME video m-jpeg huffman table atom

= long unsigned offset + long ASCII text string

'mjht'

-> huffman table = hex dump

* 8+ bytes time to sample (frame timing) box

= long unsigned offset + long ASCII text string 'stts'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 4 bytes number of times = long unsigned total

-> 8+ bytes time per frame

= long unsigned frame count + long unsigned

duration

- multiple durations means variable framing rate

- single duration means fixed framing rate

- calculate framing (fps): media units / (average)

duration

* 8+ bytes optional sync sample (key/intra frame) box

= long unsigned offset + long ASCII text string 'stss'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

Page 29: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

(current = 0)

-> 4 bytes number of key frames = long unsigned total

-> 4+ bytes key/intra frame location = long unsigned

framing time

- key/intra frame location according to

sample/framing time

* 8+ bytes sample/framing to chunk/block box

= long unsigned offset + long ASCII text string 'stsc'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 4 bytes number of blocks = long unsigned total

-> 8+ bytes frames per block

= long unsigned first/next block + long unsigned

# of frames

-> 4+ bytes samples description id

= long unsigned description number

* 8+ bytes sample (block byte) size box

= long unsigned offset + long ASCII text string 'stsz'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 4 bytes block byte size for all = 32-bit integer byte

value

(different sizes = 0)

-> 4 bytes number of block sizes = long unsigned total

-> 4+ bytes block byte sizes = long unsigned byte values

* 8+ bytes chunk/block offset box

= long unsigned offset + long ASCII text string 'stco'

-> 4 bytes number of offsets = long unsigned total

-> 4+ bytes block offsets = long unsigned byte values

* 8+ bytes larger chunk/block offset box

= long unsigned offset + long ASCII text string 'co64'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 4 bytes number of offsets = long unsigned total

-> 8+ bytes larger block offsets = 64-bit unsigned byte

values

Page 30: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes optional user data (any custom info) atom

= long unsigned offset + long ASCII text string 'udta'

(copyright and MPEG-7 meta data related to element tracks)

* 8+ bytes optional copyright notice box

= long unsigned offset + long ASCII text string 'cprt'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 1/8 byte ISO language padding = 1-bit value set to 0

-> 1 7/8 bytes content language = 3 * 5-bits ISO 639-2 language

code less 0x60

- example code for english = 0x15C7

-> annotation string = ASCII text string

-> 1 byte annotation c string end = byte value set to zero

* 8+ bytes optional ISO/IEC 14496-12 element meta data box

= long unsigned offset + long ASCII text string 'meta'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

* 8+ bytes ISO/IEC 14496-12 handler reference box

= long unsigned offset + long ASCII text string 'hdlr'

- this box must be toward the start of the meta box

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags (current = 0)

-> 4 bytes QUICKTIME type = long ASCII text string

(eg. Media Handler = 'mhlr')

-> 4 bytes subtype/meta data type = long ASCII text string

- types are MPEG-7 XML = 'mp7t' ; MPEG-7 binary XML =

'mp7b'

- type is APPLE meta data iTunes reader = 'mdir'

-> 4 bytes QUICKTIME manufacturer reserved = long ASCII

text string

(eg. Apple = 'appl' or 0)

-> 4 bytes QUICKTIME component reserved flags = long hex

flags (none = 0)

-> 4 bytes QUICKTIME component reserved flags mask = long

hex mask (none = 0)

-> component type name ASCII string

(eg. "Meta Data Handler" - no name = zero length string)

-> 1 byte component name string end = byte padding set to

zero

- note: the quicktime spec uses a Pascal string

instead of the above C string

Page 31: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 XML box

= long unsigned offset + long ASCII text string 'xml '

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> MPEG-7 XML meta data = text dump

* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 binary XML box

= long unsigned offset + long ASCII text string 'bxml'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> MPEG-7 encoded XML meta data = hex dump

* 8+ bytes optional ISO/IEC 14496-12 item location box

= long unsigned offset + long ASCII text string 'iloc'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 1 nibble size of access offsets = 4 bits one byte

multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset

= 8

-> 1 nibble size of data lengths = 4 bits one byte multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset

= 8

-> 1 nibble size of starting offset = 4 bits one byte

multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset

= 8

-> 1 nibble reserved = 4 bits set to zero

-> 2 bytes number of locations = short unsigned index total

-> 2+ bytes item reference = short unsigned id

-> 2+ bytes stream data reference = short unsigned index

from 'dref' box

- if meta data item in same file set to zero

-> 1-8+ bytes starting offset = byte - dlong unsigned

offset

-> 2+ bytes number of access points = short unsigned index

total

-> 1-8+ bytes access offset = byte - dlong unsigned

relative offset

(relative to starting offset)

Page 32: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

-> 1-8+ bytes data length = byte - dlong unsigned length

* 8+ bytes optional ISO/IEC 14496-12 primary item box

= long unsigned offset + long ASCII text string 'pitm'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 2 bytes main item reference = short unsigned id

* 8+ bytes optional ISO/IEC 14496-12 item encryption box

= long unsigned offset + long ASCII text string 'ipro'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 2 bytes number of encryption boxes = short unsigned index

total

* 8+ bytes ISO/IEC 14496-12 encryption scheme info box

= long unsigned offset + long ASCII text string 'sinf'

- if meta data encrypted to ISO/IEC 14496-12 standards

* 8+ bytes ISO/IEC 14496-12 original format box

= long unsigned offset + long ASCII text string

'frma'

-> 4 bytes description format = long ASCII text

string

* 8+ bytes optional ISO/IEC 14496-12 IPMP info box

= long unsigned offset + long ASCII text string

'imif'

-> 4 bytes version/flags = byte hex version + 24-bit

hex flags

(current = 0)

-> IPMP descriptors = hex dump from IPMP part of ES

Descriptor box

* 8+ bytes optional ISO/IEC 14496-12 scheme type box

= long unsigned offset + long ASCII text string

'schm'

-> 4 bytes version/flags = byte hex version + 24-bit

hex flags

(current = 0 ; contains URI if flags = 0x000001)

-> 4 bytes encryption type = long ASCII text string

Page 33: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- types are 128-bit AES counter = 'ACM1' ;

128-bit AES FS = 'AFS1'

- types are NULL algorithm = 'ENUL' ; 160-bit

HMAC-SHA-1 = 'SHM2'

- types are RTCP = 'ANUL' ; private scheme = '

'

-> 2 bytes encryption version = short unsigned

version

-> optional scheme URI string = UTF-8 text string

(eg. web site)

-> 1 byte optional scheme URI string end = byte

padding set to zero

* 8+ bytes ISO/IEC 14496-12 scheme data box

= long unsigned offset + long ASCII text string

'schi'

-> encryption related key = hex dump

* 8+ bytes optional ISO/IEC 14496-12 item information box

= long unsigned offset + long ASCII text string 'pitm'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> 2 bytes main item reference = short unsigned id

-> 2 bytes encryption box array value = short unsigned index

-> item name or URL string = UTF-8 text string

-> 1 byte name or URL c string end = byte value set to zero

-> item mime type string = UTF-8 text string

-> 1 byte mime type c string end = byte value set to zero

-> optional item transfer encoding string = UTF-8 text

string

-> 1 byte optional transfer encoding c string end = byte

value set to zero

* 8+ bytes optional user data (any custom info) box

= long unsigned offset + long ASCII text string 'udta'

* 8+ bytes optional copyright notice box

= long unsigned offset + long ASCII text string 'cprt'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 1/8 byte ISO language padding = 1-bit value set to 0

-> 1 7/8 bytes content language = 3 * 5-bits ISO 639-2 language

code less 0x60

Page 34: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

- example code for english = 0x15C7

-> annotation string = UTF text string

-> 1 byte annotation c string end = byte value set to zero

* 8+ bytes optional 3GPP notice box

= long unsigned offset + long ASCII text string

- box types are title = 'titl'; author = 'auth'; description =

'dscp'

- box types are performers = 'perf'; genre = 'gnre'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 1/8 byte ISO language padding = 1-bit value set to 0

-> 1 7/8 bytes content language = 3 * 5-bits ISO 639-2 language

code less 0x60

- example code for english = 0x15C7

-> annotation string = UTF text string

-> 1 byte annotation c string end = byte value set to zero

* 8+ bytes optional ISO/IEC 14496-12 presentation meta data box

= long unsigned offset + long ASCII text string 'meta'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

* 8+ bytes ISO/IEC 14496-12 handler reference box

= long unsigned offset + long ASCII text string 'hdlr'

- this box must be toward the start of the meta box

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 4 bytes QUICKTIME type = long ASCII text string

(eg. Media Handler = 'mhlr')

-> 4 bytes subtype/meta data type = long ASCII text string

- types are MPEG-7 XML = 'mp7t' ; MPEG-7 binary XML = 'mp7b'

- type is APPLE meta data for iTunes reader = 'mdir'

-> 4 bytes QUICKTIME manufacturer reserved = long ASCII text

string

(eg. Apple = 'appl' or 0)

-> 4 bytes QUICKTIME component reserved flags = long hex flags

(none = 0)

-> 4 bytes QUICKTIME component reserved flags mask = long hex

mask (none = 0)

-> component type name ASCII string

(eg. "Meta Data Handler" - no name = zero length string)

-> 1 byte component name string end = byte padding set to zero

- note: the quicktime spec uses a Pascal string

Page 35: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

instead of the above C string

* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 XML box

= long unsigned offset + long ASCII text string 'xml '

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> MPEG-7 XML meta data = text dump

* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 binary XML box

= long unsigned offset + long ASCII text string 'bxml'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> MPEG-7 encoded XML meta data = hex dump

* 8+ bytes optional ISO/IEC 14496-12 item location box

= long unsigned offset + long ASCII text string 'iloc'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 1 nibble size of access offsets = 4 bits one byte multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8

-> 1 nibble size of data lengths = 4 bits one byte multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8

-> 1 nibble size of starting offset = 4 bits one byte multiples

- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8

-> 1 nibble reserved = 4 bits set to zero

-> 2 bytes number of locations = short unsigned index total

-> 2+ bytes item reference = short unsigned id

-> 2+ bytes stream data reference = short unsigned index from

'dref' box

- if meta data item in same file set to zero

-> 1-8+ bytes starting offset = byte - dlong unsigned offset

-> 2+ bytes number of access points = short unsigned index

total

-> 1-8+ bytes access offset = byte - dlong unsigned relative

offset

(relative to starting offset)

-> 1-8+ bytes data length = byte - dlong unsigned length

* 8+ bytes optional ISO/IEC 14496-12 primary item box

= long unsigned offset + long ASCII text string 'pitm'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes main item reference = short unsigned id

Page 36: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes optional ISO/IEC 14496-12 item encryption box

= long unsigned offset + long ASCII text string 'ipro'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes number of encryption boxes = short unsigned index

total

* 8+ bytes ISO/IEC 14496-12 encryption scheme info box

= long unsigned offset + long ASCII text string 'sinf'

- if meta data encrypted to ISO/IEC 14496-12 standards

* 8+ bytes ISO/IEC 14496-12 original format box

= long unsigned offset + long ASCII text string 'frma'

-> 4 bytes description format = long ASCII text string

* 8+ bytes optional ISO/IEC 14496-12 IPMP info box

= long unsigned offset + long ASCII text string 'imif'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0)

-> IPMP descriptors = hex dump from IPMP part of ES

Descriptor box

* 8+ bytes optional ISO/IEC 14496-12 scheme type box

= long unsigned offset + long ASCII text string 'schm'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current = 0 ; contains URI if flags = 0x000001)

-> 4 bytes encryption type = long ASCII text string

- types are 128-bit AES counter = 'ACM1' ; 128-bit

AES FS = 'AFS1'

- types are NULL algorithm = 'ENUL' ; 160-bit

HMAC-SHA-1 = 'SHM2'

- types are RTCP = 'ANUL' ; private scheme = ' '

-> 2 bytes encryption version = short unsigned version

-> optional scheme URI string = UTF-8 text string

(eg. web site)

-> 1 byte optional scheme URI string end = byte padding

set to zero

* 8+ bytes ISO/IEC 14496-12 scheme data box

= long unsigned offset + long ASCII text string 'schi'

-> encryption related key = hex dump

Page 37: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

* 8+ bytes optional ISO/IEC 14496-12 item information box

= long unsigned offset + long ASCII text string 'pitm'

-> 4 bytes version/flags = byte hex version + 24-bit hex flags

(current = 0)

-> 2 bytes main item reference = short unsigned id

-> 2 bytes encryption box array value = short unsigned index

-> item name or URL string = UTF-8 text string

-> 1 byte name or URL c string end = byte value set to zero

-> item mime type string = UTF-8 text string

-> 1 byte mime type c string end = byte value set to zero

-> optional item transfer encoding string = UTF-8 text string

-> 1 byte optional transfer encoding c string end = byte value

set to zero

* 8+ bytes optional APPLE item list box

= long unsigned offset + long ASCII text string 'ilst'

* 8+ bytes optional APPLE annotation box

= long unsigned offset + 0xA9 + 24-bit ASCII text string

- box types are full name = 'nam' ; comment = 'cmt' ; content

created year = 'day'

- box types are artist = 'ART'; track = 'trk'; album = 'alb';

composer = 'com'

- box types are composer = 'wrt'; encoder = 'too'; album =

'alb'; composer = 'com'

OR

= long unsigned offset + 32-bit ASCII text string

- box types are genre = 'gnre' ; CD set number = 'disk' ;

track number = 'trkn'

- box types are beats per minute = 'tmpo' ; compilation =

'cpil'

- box types are cover art = 'covr' ; itunes specific info

= '----'

* 8+ bytes APPLE item data box

= long unsigned offset + long ASCII text string 'data'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current version = 0 ; contains text flag = 0x000001

contains data flag = 0x000000 ; for tmpo/cpil flag

= 0x000015

contains image data flag = 0x00000D)

-> 4 bytes reserved = 32-bit value set to zero

-> annotation text or data values = text or hex dump

Page 38: ISO 14496-1 Media Format * ****************read.pudn.com › downloads86 › ebook › 333447 › ISO-14496-1.pdf · * ISO 14496-1 Media Format * ***** - values use big endian (network)

(NOTE: Genre is either text or a 16-bit short ID3

value and

most other non-text data are short unsigned

values

with the exception of compilation which is a byte

flag)

* 8+ bytes optional APPLE additional info box

= long unsigned offset + long ASCII text string

- box types are Java style app name = 'mean' ; item name

= 'name'

-> 4 bytes version/flags = byte hex version + 24-bit hex

flags

(current version = 0 ; current flags = 0x000000)

-> string text = ASCII text dump

-> 4 bytes compatibility utda end = long value set to zero