77
Media Foundation: Essential Concepts If you are new to digital media, this topic introduces some concepts that you will need to understand before writing a Media Foundation application. Streams Compression Media Containers Formats Related topics Streams A stream is a sequence of media data with a uniform type. The most common types are audio and video, but a stream can contain almost any kind of data, including text, script commands, and still images. The term stream in this documentation does not imply delivery over a network. A media file intended for local playback also contains streams. Usually, a media file contains either a single audio stream, or exactly one video stream and one audio stream. However, a media file might contain several streams of the same type. For example, a video file might contain audio streams in several different languages. At run time, the application would select which stream to use. Compression Compression refers to any process that reduces the size of a data stream by removing redundant information. Compression algorithms fall into two broad categories: Lossless compression. Using a lossless algorithm, the reconstructed data is identical to the original. Lossy compression. Using a lossy algorithm, the reconstructed data is an approximation of the original, but is not an exact match. In most other domains, lossy compression is not acceptable. (Imagine getting back an "approximation" of a spreadsheet!) But lossy compression schemes are well-suited to audio and video, for a couple of reasons. The first reason has to do with the physics of human perception. When we listen to a complex sound, like a music recording, some of the information contained in that sound is not perceptible to the ear. With the help of signal processing theory, it is possible to analyze and separate the frequencies that cannot be perceived. These frequencies can be removed with no perceptual effect. Although the reconstructed audio will not match the original exactly, it will sound the same to the listener. Similar principles apply to video. Second, some degradation in sound or image quality may be acceptable, depending on the intended purpose. In telephony, for example, audio is often highly compressed. The result is good enough for a phone conversation—but you wouldn't want to listen to a symphony orchestra over a telephone. Compression is also called encoding, and a device that encodes is called an encoder. The reverse process is decoding, and the device is a naturally called a decoder. The general term for both encoders and decoders is codec. Codecs can be implemented in hardware or software.

Media Foundation

Embed Size (px)

Citation preview

Page 1: Media Foundation

Media Foundation: Essential Concepts

If you are new to digital media, this topic introduces some concepts that you will need to

understand before writing a Media Foundation application.

Streams Compression Media Containers Formats Related topics

StreamsA stream is a sequence of media data with a uniform type. The most common types are audio and

video, but a stream can contain almost any kind of data, including text, script commands, and still

images. The term stream in this documentation does not imply delivery over a network. A media

file intended for local playback also contains streams.

Usually, a media file contains either a single audio stream, or exactly one video stream and one

audio stream. However, a media file might contain several streams of the same type. For example,

a video file might contain audio streams in several different languages. At run time, the application

would select which stream to use.

CompressionCompression refers to any process that reduces the size of a data stream by removing redundant

information. Compression algorithms fall into two broad categories:

Lossless compression. Using a lossless algorithm, the reconstructed data is identical to the original.

Lossy compression. Using a lossy algorithm, the reconstructed data is an approximation of the original, but is not an exact match.

In most other domains, lossy compression is not acceptable. (Imagine getting back an

"approximation" of a spreadsheet!) But lossy compression schemes are well-suited to audio and

video, for a couple of reasons.

The first reason has to do with the physics of human perception. When we listen to a complex

sound, like a music recording, some of the information contained in that sound is not perceptible to

the ear. With the help of signal processing theory, it is possible to analyze and separate the

frequencies that cannot be perceived. These frequencies can be removed with no perceptual

effect. Although the reconstructed audio will not match the original exactly, it will sound the same

to the listener. Similar principles apply to video.

Second, some degradation in sound or image quality may be acceptable, depending on the

intended purpose. In telephony, for example, audio is often highly compressed. The result is good

enough for a phone conversation—but you wouldn't want to listen to a symphony orchestra over a

telephone.

Compression is also called encoding, and a device that encodes is called an encoder. The reverse

process is decoding, and the device is a naturally called a decoder. The general term for both

encoders and decoders is codec. Codecs can be implemented in hardware or software.

Compression technology has changed rapidly since the advent of digital media, and a large

number of compression schemes are in use today. This fact is one of the main challenges for digital

media programming.

Media ContainersIt is rare to store a raw audio or video stream as a computer file, or to send one directly over the

network. For one thing, it would be impossible to decode such a stream, without knowing in

Page 2: Media Foundation

advance which codec to use. Therefore, media files usually contain at least some of the following

elements:

File headers that describe the number of streams, the format of each stream, and so on. An index that enables random access to the content. Metadata that describes the content (for example, the artist or title). Packet headers, to enable network transmission or random access.

This documentation uses the term container to describe the entire package of streams, headers,

indexes, metadata, and so forth. The reason for using the term container rather than file is that

some container formats are designed for live broadcast. An application could generate the

container in real time, never storing it to a file.

An early example of a media container is the AVI file format. Other examples include MP4 and

Advanced Systems Format (ASF). Containers can be identified by file name extension (for example,

.mp4) or by MIME type.

The following diagram shows a typical structure for a media container. The diagram does not

represent any specific format; the details of each format vary widely.

Notice that the structure shown in the diagram is hierarchical, with header information appearing

at the start of the container. This structure is typical of many (but not all) container formats. Also

notice that the data section contains interleaved audio and video packets. This type of interleaving

is common in media containers.

The term multiplexing refers to the process of packetizing the audio and video streams and

interleaving the packets into the container. The reverse process, reassembling the streams from

the packetized data, is called demultiplexing.

FormatsIn digital media, the term format is ambiguous. A format can refer to the type of encoding, such as

H.264 video, or thecontainer, such as MP4. This distinction is often confusing for ordinary users.

The names given to media formats do not always help. For example, MP3 refers both to an

encoding format (MPEG-1 Audio Layer 3) and a file format.

The distinction is important, however, because reading a media file actually involves two stages:

Page 3: Media Foundation

1. First, the container must be parsed. In most cases, the number of streams and the format of each stream cannot be known until this step is complete.

2. Next, if the streams are compressed, they must be decoded using the appropriate decoders.

This fact leads quite naturally to a software design where separate components are used to parse

containers and decode streams. Further, this approach lends itself to a plug-in model, so that third

parties can provide their own parsers and codecs. On Windows, the Component Object Model

(COM) provides a standard way to separate an API from its implementation, which is a requirement

for any plug-in model. For this reason (among others), Media Foundation uses COM interfaces.

The following diagram shows the components used to read a media file:

Writing a media file also requires two steps:

1. Encoding the uncompressed audio/video data.2. Putting the compressed data into a particular container format.

The following diagram shows the components used to write a media file:

Media Foundation ArchitectureThis section describes the general design of Microsoft Media Foundation. For information about

using Media Foundation for specific programming tasks, see Media Foundation Programming Guide.

In this section

Topic Description

Overview of the

Media Foundation

Architecture

Gives a high-level overview of the Media Foundation architecture.

Media Foundation

Primitives

Describes some basic interfaces that are used throughout Media

Foundation.

Almost all Media Foundation applications will use these interfaces.

Media Foundation

Platform APIs

Describes core Media Foundation functions, such as asynchronous

callbacks and work queues.

Some applications might use platform-level interfaces. Also,

custom plug-ins, such as media sources and MFTs, use these

Page 4: Media Foundation

interfaces.

Media Foundation

Pipeline

The Media Foundation pipeline layer consists of media sources,

MFTs, and media sinks. Most applications do not call methods

directly on the pipeline layer. Instead, applications use one of the

higher layers, such as the Media Session or the Source Reader and

Sink Writer.

Media Session The Media Session manages data flow in the Media Foundation

pipeline.

Source Reader The Source Reader enables an application to get data from a

media source, without the applicating needing to call the media

source APIs directly. The Source Reader can also perform decoding

of compressed streams.

Protected Media Path The protected media path (PMP) provides a protected environment

for playing premium video content. It is not necessary to use the

PMP when writing a Media Foundation application.

 

Overview of the Media Foundation Architecture

This topic describes the general design of Microsoft Media Foundation. For information about using

Media Foundation for specific programming tasks, see Media Foundation Programming Guide.

The following diagram shows a high-level view of the Media Foundation architecture.

Media Foundation provides two distinct programming models. The first model, shown on the left

side of the diagram, uses an end-to-end pipeline for media data. The application initializes the

pipeline—for example, by providing the URL of a file to play—and then calls methods to control

Page 5: Media Foundation

streaming. In the second model, shown on the right side of the diagram, the application either pulls

data from a source, or pushes it to a destination (or both). This model is particularly useful if you

need to process the data, because the application has direct access to the data stream.Primitives and Platform

Starting from the bottom of the diagram, the primitives are helper objects used throughout the

Media Foundation API:

Attributes  are a generic way to store information inside an object, as a list of key/value pairs. Media Types  describe the format of a media data stream. Media Buffers  hold chunks of media data, such as video frames and audio samples, and are

used to transport data between objects. Media Samples  are containers for media buffers. They also contain metadata about the

buffers, such as time stamps.

The Media Foundation Platform APIs provide some core functionality that is used by the Media

Foundation pipeline, such as asynchronous callbacks and work queues. Certain applications might

need to call these APIs directly; also, you will need them if you implement a custom source,

transform, or sink for Media Foundation.Media Pipeline

The media pipeline contains three types of object that generate or process media data:

Media Sources  introduce data into the pipeline. A media source might get data from a local file, such as a video file; from a network stream; or from a hardware capture device.

Media Foundation Transforms  (MFTs) process data from a stream. Encoders and decoders are implemented as MFTs.

Media Sinks  consume the data; for example, by showing video on the display, playing audio, or writing the data to a media file.

Third parties can implement their own custom sources, sinks, and MFTs; for example, to support

new media file formats.

The Media Session controls the flow of data through the pipeline, and handles tasks such as quality

control, audio/video synchronization, and responding to format changes.Source Reader and Sink Writer

The Source Reader and Sink Writer provide an alternative way to use the basic Media Foundation

components (media sources, transforms, and media sinks). The source reader hosts a media

source and zero or more decoders, while the sink writer hosts a media sink and zero or more

encoders. You can use the source reader to get compressed or uncompressed data from a media

source, and use the sink writer to encode data and send the data to a media sink.

Note  The source reader and sink writer are available in Windows 7.

This programming model gives the application more control over the flow of data, and also gives

the application direct access to the data from the source.

Media Foundation Primitives

Media Foundation defines several basic object types that are used throughout the Media

Foundation APIs.

Topic Description

Page 6: Media Foundation

Attributes Attributes and properties are key/value pairs stored on an object.

Media Types A media type describes the format of a digital media stream.

Media Buffers

A media buffer manages a block of memory, so that it can be shared between objects.

Media Samples

A media sample is an object that contains a list of media buffers.

 

Attributes in Media Foundation

An attribute is a key/value pair, where the key is a GUID and the value is a PROPVARIANT.

Attributes are used throughout Microsoft Media Foundation to configure objects, describe media

formats, query object properties, and other purposes.

This topic contains the following sections.

About Attributes Serializing Attributes Implementing IMFAttributes Related topics

About AttributesAn attribute is a key/value pair, where the key is a GUID and the value is a PROPVARIANT.

Attribute values are restricted to the following data types:

Unsigned 32-bit integer (UINT32). Unsigned 64-bit integer (UINT64). 64-bit floating-point number. GUID. Null-terminated wide-character string. Byte array. IUnknown pointer.

These types are defined in the MF_ATTRIBUTE_TYPE enumeration. To set or retrieve attribute

values, use the IMFAttributesinterface. This interface contains type-safe methods to get and set

values by data type. For example, to set a 32-bit integer, callIMFAttributes::SetUINT32.

Attribute keys are unique within an object. If you set two different values with the same key, the

second value overwrites the first.

Several Media Foundation interfaces inherit the IMFAttributes interface. Objects that expose this

interface have optional or mandatory attributes that the application should set on the object, or

have attributes that the application can retrieve. Also, some methods and functions take

an IMFAttributes pointer as a parameter, which enables the application to set configuration

information. The application must create an attribute store to hold the configuration attributes. To

create an empty attribute store, call MFCreateAttributes.

Page 7: Media Foundation

The following code shows two functions. The first creates a new attribute store and sets a

hypothetical attribute named MY_ATTRIBUTE with a string value. The second function retrieves the

value of this attribute.

extern const GUID MY_ATTRIBUTE;

HRESULT ShowCreateAttributeStore(IMFAttributes **ppAttributes){ IMFAttributes *pAttributes = NULL; const UINT32 cElements = 10; // Starting size.

// Create the empty attribute store. HRESULT hr = MFCreateAttributes(&pAttributes, cElements);

// Set the MY_ATTRIBUTE attribute with a string value. if (SUCCEEDED(hr)) { hr = pAttributes->SetString( MY_ATTRIBUTE, L"This is a string value" ); }

// Return the IMFAttributes pointer to the caller. if (SUCCEEDED(hr)) { *ppAttributes = pAttributes; (*ppAttributes)->AddRef(); }

SAFE_RELEASE(pAttributes);

return hr;}

HRESULT ShowGetAttributes(){ IMFAttributes *pAttributes = NULL; WCHAR *pwszValue = NULL; UINT32 cchLength = 0;

// Create the attribute store. HRESULT hr = ShowCreateAttributeStore(&pAttributes);

// Get the attribute. if (SUCCEEDED(hr)) { hr = pAttributes->GetAllocatedString( MY_ATTRIBUTE, &pwszValue, &cchLength ); }

CoTaskMemFree(pwszValue); SAFE_RELEASE(pAttributes);

return hr;}

Page 8: Media Foundation

For a complete list of Media Foundation attributes, see Media Foundation Attributes. The expected

data type for each attribute is documented there.

Serializing AttributesMedia Foundation has two functions for serializing attribute stores. One writes the attributes to a

byte array, the other writes them to a stream that supports the IStream interface. Each function

has a corresponding function that loads the data.

Operation

Byte Array IStream

Save MFGetAttributesAsBlob

MFSerializeAttributesToStream

Load MFInitAttributesFromBlob

MFDeserializeAttributesFromStream

 

To write the contents of an attribute store into a byte array, call MFGetAttributesAsBlob.

Attributes with IUnknown pointer values are ignored. To load the attributes back into an attribute

store, call MFInitAttributesFromBlob.

To write an attribute store to a stream, call MFSerializeAttributesToStream. This function can

marshal IUnknown pointer values. The caller must provide a stream object that implements

the IStream interface. To load an attribute store from a stream,

call MFDeserializeAttributesFromStream.

Implementing IMFAttributesMedia Foundation provides a stock implementation of IMFAttributes, which is obtained by calling

the MFCreateAttributesfunction. In most situations, you should use this implementation, and not

provide your own custom implementation.

There is one situation when you might need to implement the IMFAttributes interface: If you

implement a second interface that inherits IMFAttributes. In that case, you must provide

implementations for the IMFAttributes methods inherited by the second interface.

In this situation, it is recommended to wrap the existing Media Foundation implementation

of IMFAttributes. The following code shows a class template that holds an IMFAttributes pointer

and wraps every IMFAttributes method, except for theIUnknown methods.

#include <assert.h>

// Helper class to implement IMFAttributes.

// This is an abstract class; the derived class must implement the IUnknown // methods. This class is a wrapper for the standard attribute store provided // in Media Foundation.

// template parameter: // The interface you are implementing, either IMFAttributes or an interface // that inherits IMFAttributes, such as IMFActivate

template <class IFACE=IMFAttributes>

Page 9: Media Foundation

class CBaseAttributes : public IFACE{protected: IMFAttributes *m_pAttributes;

// This version of the constructor does not initialize the // attribute store. The derived class must call Initialize() in // its own constructor. CBaseAttributes() : m_pAttributes(NULL) { }

// This version of the constructor initializes the attribute // store, but the derived class must pass an HRESULT parameter // to the constructor.

CBaseAttributes(HRESULT& hr, UINT32 cInitialSize = 0) : m_pAttributes(NULL) { hr = Initialize(cInitialSize); }

// The next version of the constructor uses a caller-provided // implementation of IMFAttributes.

// (Sometimes you want to delegate IMFAttributes calls to some // other object that implements IMFAttributes, rather than using // MFCreateAttributes.)

CBaseAttributes(HRESULT& hr, IUnknown *pUnk) { hr = Initialize(pUnk); }

virtual ~CBaseAttributes() { if (m_pAttributes) { m_pAttributes->Release(); } }

// Initializes the object by creating the standard Media Foundation attribute store. HRESULT Initialize(UINT32 cInitialSize = 0) { if (m_pAttributes == NULL) { return MFCreateAttributes(&m_pAttributes, cInitialSize); } else { return S_OK; } }

// Initializes this object from a caller-provided attribute store. // pUnk: Pointer to an object that exposes IMFAttributes. HRESULT Initialize(IUnknown *pUnk)

Page 10: Media Foundation

{ if (m_pAttributes) { m_pAttributes->Release(); m_pAttributes = NULL; }

return pUnk->QueryInterface(IID_PPV_ARGS(&m_pAttributes)); }

public:

// IMFAttributes methods

STDMETHODIMP GetItem(REFGUID guidKey, PROPVARIANT* pValue) { assert(m_pAttributes); return m_pAttributes->GetItem(guidKey, pValue); }

STDMETHODIMP GetItemType(REFGUID guidKey, MF_ATTRIBUTE_TYPE* pType) { assert(m_pAttributes); return m_pAttributes->GetItemType(guidKey, pType); }

STDMETHODIMP CompareItem(REFGUID guidKey, REFPROPVARIANT Value, BOOL* pbResult) { assert(m_pAttributes); return m_pAttributes->CompareItem(guidKey, Value, pbResult); }

STDMETHODIMP Compare( IMFAttributes* pTheirs, MF_ATTRIBUTES_MATCH_TYPE MatchType, BOOL* pbResult ) { assert(m_pAttributes); return m_pAttributes->Compare(pTheirs, MatchType, pbResult); }

STDMETHODIMP GetUINT32(REFGUID guidKey, UINT32* punValue) { assert(m_pAttributes); return m_pAttributes->GetUINT32(guidKey, punValue); }

STDMETHODIMP GetUINT64(REFGUID guidKey, UINT64* punValue) { assert(m_pAttributes); return m_pAttributes->GetUINT64(guidKey, punValue); }

STDMETHODIMP GetDouble(REFGUID guidKey, double* pfValue) { assert(m_pAttributes);

Page 11: Media Foundation

return m_pAttributes->GetDouble(guidKey, pfValue); }

STDMETHODIMP GetGUID(REFGUID guidKey, GUID* pguidValue) { assert(m_pAttributes); return m_pAttributes->GetGUID(guidKey, pguidValue); }

STDMETHODIMP GetStringLength(REFGUID guidKey, UINT32* pcchLength) { assert(m_pAttributes); return m_pAttributes->GetStringLength(guidKey, pcchLength); }

STDMETHODIMP GetString(REFGUID guidKey, LPWSTR pwszValue, UINT32 cchBufSize, UINT32* pcchLength) { assert(m_pAttributes); return m_pAttributes->GetString(guidKey, pwszValue, cchBufSize, pcchLength); }

STDMETHODIMP GetAllocatedString(REFGUID guidKey, LPWSTR* ppwszValue, UINT32* pcchLength) { assert(m_pAttributes); return m_pAttributes->GetAllocatedString(guidKey, ppwszValue, pcchLength); }

STDMETHODIMP GetBlobSize(REFGUID guidKey, UINT32* pcbBlobSize) { assert(m_pAttributes); return m_pAttributes->GetBlobSize(guidKey, pcbBlobSize); }

STDMETHODIMP GetBlob(REFGUID guidKey, UINT8* pBuf, UINT32 cbBufSize, UINT32* pcbBlobSize) { assert(m_pAttributes); return m_pAttributes->GetBlob(guidKey, pBuf, cbBufSize, pcbBlobSize); }

STDMETHODIMP GetAllocatedBlob(REFGUID guidKey, UINT8** ppBuf, UINT32* pcbSize) { assert(m_pAttributes); return m_pAttributes->GetAllocatedBlob(guidKey, ppBuf, pcbSize); }

STDMETHODIMP GetUnknown(REFGUID guidKey, REFIID riid, LPVOID* ppv) { assert(m_pAttributes); return m_pAttributes->GetUnknown(guidKey, riid, ppv); }

STDMETHODIMP SetItem(REFGUID guidKey, REFPROPVARIANT Value)

Page 12: Media Foundation

{ assert(m_pAttributes); return m_pAttributes->SetItem(guidKey, Value); }

STDMETHODIMP DeleteItem(REFGUID guidKey) { assert(m_pAttributes); return m_pAttributes->DeleteItem(guidKey); }

STDMETHODIMP DeleteAllItems() { assert(m_pAttributes); return m_pAttributes->DeleteAllItems(); }

STDMETHODIMP SetUINT32(REFGUID guidKey, UINT32 unValue) { assert(m_pAttributes); return m_pAttributes->SetUINT32(guidKey, unValue); }

STDMETHODIMP SetUINT64(REFGUID guidKey,UINT64 unValue) { assert(m_pAttributes); return m_pAttributes->SetUINT64(guidKey, unValue); }

STDMETHODIMP SetDouble(REFGUID guidKey, double fValue) { assert(m_pAttributes); return m_pAttributes->SetDouble(guidKey, fValue); }

STDMETHODIMP SetGUID(REFGUID guidKey, REFGUID guidValue) { assert(m_pAttributes); return m_pAttributes->SetGUID(guidKey, guidValue); }

STDMETHODIMP SetString(REFGUID guidKey, LPCWSTR wszValue) { assert(m_pAttributes); return m_pAttributes->SetString(guidKey, wszValue); }

STDMETHODIMP SetBlob(REFGUID guidKey, const UINT8* pBuf, UINT32 cbBufSize) { assert(m_pAttributes); return m_pAttributes->SetBlob(guidKey, pBuf, cbBufSize); }

STDMETHODIMP SetUnknown(REFGUID guidKey, IUnknown* pUnknown) { assert(m_pAttributes); return m_pAttributes->SetUnknown(guidKey, pUnknown); }

Page 13: Media Foundation

STDMETHODIMP LockStore() { assert(m_pAttributes); return m_pAttributes->LockStore(); }

STDMETHODIMP UnlockStore() { assert(m_pAttributes); return m_pAttributes->UnlockStore(); }

STDMETHODIMP GetCount(UINT32* pcItems) { assert(m_pAttributes); return m_pAttributes->GetCount(pcItems); }

STDMETHODIMP GetItemByIndex(UINT32 unIndex, GUID* pguidKey, PROPVARIANT* pValue) { assert(m_pAttributes); return m_pAttributes->GetItemByIndex(unIndex, pguidKey, pValue); }

STDMETHODIMP CopyAllItems(IMFAttributes* pDest) { assert(m_pAttributes); return m_pAttributes->CopyAllItems(pDest); }

// Helper functions HRESULT SerializeToStream(DWORD dwOptions, IStream* pStm) // dwOptions: Flags from MF_ATTRIBUTE_SERIALIZE_OPTIONS { assert(m_pAttributes); return MFSerializeAttributesToStream(m_pAttributes, dwOptions, pStm); }

HRESULT DeserializeFromStream(DWORD dwOptions, IStream* pStm) { assert(m_pAttributes); return MFDeserializeAttributesFromStream(m_pAttributes, dwOptions, pStm); }

// SerializeToBlob: Stores the attributes in a byte array. // // ppBuf: Receives a pointer to the byte array. // pcbSize: Receives the size of the byte array. // // The caller must free the array using CoTaskMemFree. HRESULT SerializeToBlob(UINT8 **ppBuffer, UINT32 *pcbSize) { assert(m_pAttributes);

Page 14: Media Foundation

if (ppBuffer == NULL) { return E_POINTER; } if (pcbSize == NULL) { return E_POINTER; }

*ppBuffer = NULL; *pcbSize = 0;

UINT32 cbSize = 0; BYTE *pBuffer = NULL;

HRESULT hr = MFGetAttributesAsBlobSize(m_pAttributes, &cbSize);

if (FAILED(hr)) { return hr; }

pBuffer = (BYTE*)CoTaskMemAlloc(cbSize); if (pBuffer == NULL) { return E_OUTOFMEMORY; }

hr = MFGetAttributesAsBlob(m_pAttributes, pBuffer, cbSize);

if (SUCCEEDED(hr)) { *ppBuffer = pBuffer; *pcbSize = cbSize; } else { CoTaskMemFree(pBuffer); } return hr; } HRESULT DeserializeFromBlob(const UINT8* pBuffer, UINT cbSize) { assert(m_pAttributes); return MFInitAttributesFromBlob(m_pAttributes, pBuffer, cbSize); }

HRESULT GetRatio(REFGUID guidKey, UINT32* pnNumerator, UINT32* punDenominator) { assert(m_pAttributes); return MFGetAttributeRatio(m_pAttributes, guidKey, pnNumerator, punDenominator); }

HRESULT SetRatio(REFGUID guidKey, UINT32 unNumerator, UINT32 unDenominator) {

Page 15: Media Foundation

assert(m_pAttributes); return MFSetAttributeRatio(m_pAttributes, guidKey, unNumerator, unDenominator); }

// Gets an attribute whose value represents the size of something (eg a video frame). HRESULT GetSize(REFGUID guidKey, UINT32* punWidth, UINT32* punHeight) { assert(m_pAttributes); return MFGetAttributeSize(m_pAttributes, guidKey, punWidth, punHeight); }

// Sets an attribute whose value represents the size of something (eg a video frame). HRESULT SetSize(REFGUID guidKey, UINT32 unWidth, UINT32 unHeight) { assert(m_pAttributes); return MFSetAttributeSize (m_pAttributes, guidKey, unWidth, unHeight); }};

The following code shows how to derive a class from this template:

#include <shlwapi.h>

class MyObject : public CBaseAttributes<>{ MyObject() : m_nRefCount(1) { } ~MyObject() { }

long m_nRefCount;

public:

// IUnknown STDMETHODIMP MyObject::QueryInterface(REFIID riid, void** ppv) { static const QITAB qit[] = { QITABENT(MyObject, IMFAttributes), { 0 }, }; return QISearch(this, qit, riid, ppv); }

STDMETHODIMP_(ULONG) MyObject::AddRef() { return InterlockedIncrement(&m_nRefCount); }

STDMETHODIMP_(ULONG) MyObject::Release() { ULONG uCount = InterlockedDecrement(&m_nRefCount); if (uCount == 0)

Page 16: Media Foundation

{ delete this; } return uCount; }

// Static function to create an instance of the object.

static HRESULT CreateInstance(MyObject **ppObject) { HRESULT hr = S_OK;

MyObject *pObject = new MyObject(); if (pObject == NULL) { return E_OUTOFMEMORY; }

// Initialize the attribute store. hr = pObject->Initialize();

if (FAILED(hr)) { delete pObject; return hr; }

*ppObject = pObject; (*ppObject)->AddRef();

return S_OK; }};

You must call CBaseAttributes::Initialize to create the attribute store. In the previous

example, that is done inside a static creation function.

The template argument is an interface type, which defaults to IMFAttributes. If your object

implements an interface that inherits IMFAttributes, such as IMFActivate, set the template

argument equal to name of the derived interface.

Media TypesA media type is a way to describe the format of a media stream. In Media Foundation, media types

are represented by theIMFMediaType interface. Applications use media types to discover the

format of a media file or media stream. Objects in the Media Foundation pipeline use media types

to negotiate the formats they will deliver or receive.

This section contains the following topics.

Topic Description

About Media Types General overview of media types in Media Foundation.

Page 17: Media Foundation

Media Type GUIDs Lists the defined GUIDs for major types and subtypes.

Audio Media Types How to create media types for audio formats.

Video Media Types How to create media types for video formats.

Complete and Partial Media Types

Describes the difference between complete media types and partial media types.

Media Type Conversions How to convert between Media Foundation media types and older format structures.

Media Type Helper Functions

A list of functions that manipulate or get information from a media type.

Media Type Debugging Code

Example code that shows how to view a media type while debugging.

About Media Types 

A media type describes the format of a media stream. In Microsoft Media Foundation, media types

are represented by theIMFMediaType interface. This interface inherits

the IMFAttributes interface. The details of a media type are specified as attributes.

To create a new media type, call the MFCreateMediaType function. This function returns a

pointer to the IMFMediaTypeinterface. The media type initially has no attributes. To set the

details of the format, set the relevant attributes.

For a list of media type attributes, see Media Type Attributes.

Major Types and SubtypesTwo important pieces of information for any media type are the major type and the subtype.

The major type is a GUID that defines the overall category of the data in a media stream. Major types include video and audio. To specify the major type, set the MF_MT_MAJOR_TYPE attribute. The IMFMediaType::GetMajorType method returns the value of this attribute.

The subtype further defines the format. For example, within the video major type, there are subtypes for RGB-24, RGB-32, YUY2, and so forth. Within audio, there are PCM audio, IEEE floating-point audio, and others. The subtype provides more information than the major type, but it does not define everything about the format. For example, video subtypes do not define the image size or the frame rate. To specify the subtype, set the MF_MT_SUBTYPE attribute.

All media types should have a major type GUID and a subtype GUID. For a list of major type and

subtype GUIDs, see Media Type GUIDs.

Page 18: Media Foundation

Why Attributes?Attributes have several advantages over the format structures that have been used in previous

technologies such as DirectShow and the Windows Media Format SDK.

It is easier to represent "don't know" or "don't care" values. For example, if you are writing a

video transform, you might know in advance which RGB and YUV formats the transform

supports, but not the dimensions of the video frame, until you get them from the video

source. Similarly, you might not care about certain details, such as the video primaries. With

a format structure, every member must be filled with some value. As a result, it has become

common to use zero to indicate an unknown or default value. This practice can cause errors

if another component treats zero as a legitimate value. With attributes, you simply omit the

attributes that are unknown or not relevant to your component. As requirements have changed over time, format structures were extended by adding

additional data at the end of the structure. For example, WAVEFORMATEXTENSIBLE extends the WAVEFORMATEX structure. This practice is prone to error, because components must cast structure pointers to other structure types. Attributes can be extended safely.

Mutually incompatible format structures have been defined. For example, DirectShow defines the VIDEOINFOHEADERand VIDEOINFOHEADER2 structures. Attributes are set independently of each other, so this problem does not arise.

Major Media TypesIn a media type, the major type describes the overall category of the data, such as audio or video.

The subtype, if present, further refines the major type. For example, if the major type is video, the

subtype might be 32-bit RGB video. Subtypes also distinguish encoded formats, such as H.264

video, from uncompressed formats.

Major type and subtype are identified by GUIDs and stored in the following attributes:

AttributeDescription

MF_MT_MAJOR_TYPE

Major type.

MF_MT_SUBTYPE Subtype.

 

The following major types are defined.

Major Type Description Subtypes

MFMediaType_Audio Audio. Audio Subtype GUIDs.

MFMediaType_Binary Binary stream. None.

Page 19: Media Foundation

MFMediaType_FileTransfer

A stream that contains data files.

None.

MFMediaType_HTML HTML stream. None.

MFMediaType_Image Still image stream. WIC GUIDs and CLSIDs.

MFMediaType_Protected

Protected media. The subtype specifies the content protection scheme.

MFMediaType_SAMI Synchronized Accessible Media Interchange (SAMI) captions.

None.

MFMediaType_Script Script stream. None.

MFMediaType_Video Video. Video Subtype GUIDs.

 

Third-party components can define new major types and new subtypes.

Audio Media TypesThis section describes how to create and manipulate media types that describe audio data.

Topic Description

Audio Subtype GUIDs Contains a list of audio subtype GUIDs.

Uncompressed Audio Media Types

How to create a media type that describes an uncompressed audio format.

AAC Media Types Describes how to specify the format of an Advanced Audio Coding (AAC) stream.

 

Audio Subtype GUIDs

Page 20: Media Foundation

The following audio subtype GUIDs are defined. To specify the subtype, set

the MF_MT_SUBTYPE attribute on the media type. Except where noted, these constants are

defined in the header file mfapi.h.

When these subtypes are used, set the MF_MT_MAJOR_TYPE attribute to MFMediaType_Audio.

GUID DescriptionFormat Tag (FOURCC)

MEDIASUBTYPE_RAW_AAC1

Advanced Audio Coding (AAC).

This subtype is used for AAC contained

in an AVI file with an audio format tag

equal to 0x00FF.

For more information, see AAC

Decoder.

Defined in wmcodecdsp.h

WAVE_FORMAT_RAW_AAC1 (0x00FF)

MFAudioFormat_AAC

Advanced Audio Coding (AAC).

Note  Equivalent to

MEDIASUBTYPE_MPEG_HEAAC, defined

in wmcodecdsp.h.

The stream can contain raw AAC data or

AAC data in an Audio Data Transport

Stream (ADTS) stream.

For more information, see:

AAC Decoder MPEG-4 File Source

WAVE_FORMAT_MPEG_HEAAC (0x1610)

MFAudioFormat_ADTS

Not used. WAVE_FORMAT_MPEG_ADTS_AAC (0x1600)

MFAudioFormat_Dolby_AC3_SPDIF

Dolby AC-3 audio over Sony/Philips

Digital Interface (S/PDIF).

This GUID value is identical to the

following subtypes:

KSDATAFORMAT_SUBTYPE_IEC61937_DOLBY_DIGITAL, defined in ksmedia.h.

MEDIASUBTYPE_DOLBY_AC3_SPDIF, defined in uuids.h.

WAVE_FORMAT_DOLBY_AC3_SPDIF (0x0092)

MFAudioFormat_DRM

Encrypted audio data used with secure audio path.

WAVE_FORMAT_DRM (0x0009)

Page 21: Media Foundation

MFAudioFormat_DTS

Digital Theater Systems (DTS) audio. WAVE_FORMAT_DTS (0x0008)

MFAudioFormat_Float

Uncompressed IEEE floating-point audio. WAVE_FORMAT_IEEE_FLOAT (0x0003)

MFAudioFormat_MP3

MPEG Audio Layer-3 (MP3). WAVE_FORMAT_MPEGLAYER3 (0x0055)

MFAudioFormat_MPEG

MPEG-1 audio payload. WAVE_FORMAT_MPEG (0x0050)

MFAudioFormat_MSP1

Windows Media Audio 9 Voice codec. WAVE_FORMAT_WMAVOICE9 (0x000A)

MFAudioFormat_PCM

Uncompressed PCM audio. WAVE_FORMAT_PCM (1)

MFAudioFormat_WMASPDIF

Windows Media Audio 9 Professional codec over S/PDIF.

WAVE_FORMAT_WMASPDIF (0x0164)

MFAudioFormat_WMAudio_Lossless

Windows Media Audio 9 Lossless codec or Windows Media Audio 9.1 codec.

WAVE_FORMAT_WMAUDIO_LOSSLESS (0x0163)

MFAudioFormat_WMAudioV8

Windows Media Audio 8 codec, Windows Media Audio 9 codec, or Windows Media Audio 9.1 codec.

WAVE_FORMAT_WMAUDIO2 (0x0161)

MFAudioFormat_WMAudioV9

Windows Media Audio 9 Professional codec or Windows Media Audio 9.1 Professional codec.

WAVE_FORMAT_WMAUDIO3 (0x0162)

 

The format tags listed in the third column of this table are used in the WAVEFORMATEX structure,

and are defined in the header file mmreg.h.

Given an audio format tag, you can create an audio subtype GUID as follows:

1. Start with the value MFAudioFormat_Base, which is defined in mfaph.i.2. Replace the first DWORD of this GUID with the format tag.

You can use the DEFINE_MEDIATYPE_GUID macro to define a new GUID constant that follows this

pattern.

Page 22: Media Foundation

Ch DXVAAbout DXVA 2.0DirectX Video Acceleration (DXVA) is an API and a corresponding DDI for using hardware

acceleration to speed up video processing. Software codecs and software video processors can use

DXVA to offload certain CPU-intensive operations to the GPU. For example, a software decoder can

offload the inverse discrete cosine transform (iDCT) to the GPU.

In DXVA, some decoding operations are implemented by the graphics hardware driver. This set of

functionality is termed the accelerator. Other decoding operations are implemented by user-mode

application software, called the host decoder or software decoder. (The terms host

decoderand software decoder are equivalent.) Processing performed by the accelerator is

called off-host processing. Typically the accelerator uses the GPU to speed up some operations.

Whenever the accelerator performs a decoding operation, the host decoder must convey to the

accelerator buffers containing the information needed to perform the operation

The DXVA 2 API requires Windows Vista or later. The DXVA 1 API is still supported in Windows Vista

for backward compatibility. An emulation layer is provided that converts between either version of

the API and the opposite version of the DDI:

If the graphics driver conforms to the Windows Display Driver Model (WDDM), DXVA 1 API calls are converted to DXVA 2 DDI calls.

If the graphics drivers uses the older Windows XP Display Driver Model (XPDM), DXVA 2 API calls are converted to DXVA 1 DDI calls.

The following table shows the operating system requirements and the supported video renderers

for each version of the DXVA API.

API Version

Requirements Video Renderer Support

DXVA 1 Windows 2000 or later

Overlay Mixer, VMR-7, VMR-9 (DirectShow only)

DXVA 2 Windows Vista EVR (DirectShow and Media Foundation)

 

In DXVA 1, the software decoder must access the API through the video renderer. There is no way

to use the DXVA 1 API without calling into the video renderer. This limitation has been removed

with DXVA 2. Using DXVA 2, the host decoder (or any application) can access the API directly,

through the IDirectXVideoDecoderService interface.

The DXVA 1 documentation describes the decoding structures used for the following video

standards:

ITU-T Rec. H.261 ITU-T Rec. H.263 MPEG-1 video MPEG-2 Main Profile video

The following specifications define DXVA extensions for other video standards:

Page 23: Media Foundation

DXVA Specification for H.264/AVC Decoding DXVA Specification for H.264/MPEG-4 AVC Multiview Video Coding (MVC), Including the

Stereo High Profile DXVA Specification for MPEG-1 VLD and Combined MPEG-1/MPEG-2 VLD Video Decoding. DXVA Specification for Off-Host VLD Mode for MPEG-4 Part 2 Video Decoding DXVA Specification for Windows Media Video® v8, v9 and vA Decoding (Including SMPTE

421M "VC-1")

DXVA 1 and DXVA 2 use the same data structures for decoding. However, the procedure for

configuring the decoding session has changed. DXVA 1 uses a "probe and lock" mechanism,

wherein the host decoder can test various configurations before setting the desired configuration

on the accelerator. In DXVA 2, the accelerator returns a list of supported configurations and the

host decoder selects one from the list. Details are given in the following sections:

Supporting DXVA 2.0 in DirectShow Supporting DXVA 2.0 in Media Foundation

Direct3D Device ManagerThe Microsoft Direct3D device manager enables two or more objects to share the same Microsoft

Direct3D 9 device. One object acts as the owner of the Direct3D 9 device. To share the device, the

owner of the device creates the Direct3D device manager. Other objects can obtain a pointer to

the device manager from the device owner, then use the device manager to get a pointer to the

Direct3D device. Any object that uses the device holds an exclusive lock, which prevents other

objects from using the device at the same time.

Note  The Direct3D Device Manager supports Direct3D 9 devices only. It does not support DXGI

devices.

To create the Direct3D device manager, call DXVA2CreateDirect3DDeviceManager9. This

function returns a pointer to the device manager'sIDirect3DDeviceManager9 interface, along

with a reset token. The reset token enables the owner of the Direct3D device to set (and reset) the

device on the device manager. To initialize the device manager,

call IDirect3DDeviceManager9::ResetDevice. Pass in a pointer to the Direct3D device, along

with the reset token.

The following code shows how to create and initialize the device manager.

HRESULT CreateD3DDeviceManager( IDirect3DDevice9 *pDevice, UINT *pReset, IDirect3DDeviceManager9 **ppManager ){ UINT resetToken = 0;

IDirect3DDeviceManager9 *pD3DManager = NULL;

HRESULT hr = DXVA2CreateDirect3DDeviceManager9(&resetToken, &pD3DManager);

if (FAILED(hr)) { goto done; }

hr = pD3DManager->ResetDevice(pDevice, resetToken);

if (FAILED(hr))

Page 24: Media Foundation

{ goto done; }

*ppManager = pD3DManager; (*ppManager)->AddRef();

*pReset = resetToken;

done: SafeRelease(&pD3DManager); return hr;}

The device owner must provide a way for other objects to get a pointer to

the IDirect3DDeviceManager9 interface. The standard mechanism is to implement

the IMFGetService interface. The service GUID is MR_VIDEO_ACCELERATION_SERVICE.

To share the device among several objects, each object (including the owner of the device) must

access the device through the device manager, as follows:

1. Call IDirect3DDeviceManager9::OpenDeviceHandle to get a handle to the device.2. To use the device, call IDirect3DDeviceManager9::LockDevice and pass in the device

handle. The method returns a pointer to theIDirect3DDevice9 interface. The method can be called in a blocking mode or a non-blocking mode, depending on the value of thefBlock parameter.

3. When you are done using the device, call IDirect3DDeviceManager9::UnlockDevice. This method makes the device available to other objects.

4. Before exiting, call IDirect3DDeviceManager9::CloseDeviceHandle to close the device handle.

You should hold the device lock only while using the device, because holding the device lock

prevents other objects from using the device.

The owner of the device can switch to another device at any time by calling ResetDevice,

typically because the original device was lost. Device loss can occur for various reasons, including

changes in the monitor resolution, power management actions, locking and unlocking the

computer, and so forth. For more information, see the Direct3D documentation.

The ResetDevice method invalidates any device handles that were opened previously. When a

device handle is invalid, the LockDevice method returns DXVA2_E_NEW_VIDEO_DEVICE. If this

occurs, close the handle and call OpenDeviceHandle again to obtain a new device handle, as

shown in the following code.

The following example shows how to open a device handle and lock the device.

HRESULT LockDevice( IDirect3DDeviceManager9 *pDeviceManager, BOOL fBlock, IDirect3DDevice9 **ppDevice, // Receives a pointer to the device. HANDLE *pHandle // Receives a device handle. ){ *pHandle = NULL; *ppDevice = NULL;

HANDLE hDevice = 0;

HRESULT hr = pDeviceManager->OpenDeviceHandle(&hDevice);

Page 25: Media Foundation

if (SUCCEEDED(hr)) { hr = pDeviceManager->LockDevice(hDevice, ppDevice, fBlock); }

if (hr == DXVA2_E_NEW_VIDEO_DEVICE) { // Invalid device handle. Try to open a new device handle. hr = pDeviceManager->CloseDeviceHandle(hDevice);

if (SUCCEEDED(hr)) { hr = pDeviceManager->OpenDeviceHandle(&hDevice); }

// Try to lock the device again. if (SUCCEEDED(hr)) { hr = pDeviceManager->LockDevice(hDevice, ppDevice, TRUE); } }

if (SUCCEEDED(hr)) { *pHandle = hDevice; } return hr;}

Supporting DXVA 2.0 in DirectShowThis topic describes how to support DirectX Video Acceleration (DXVA) 2.0 in a DirectShow decoder

filter. Specifically, it describes the communication between the decoder and the video renderer.

This topic does not describe how to implement DXVA decoding.

Prerequisites Migration Notes Finding a Decoder Configuration Notifying the Video Renderer Allocating Uncompressed Buffers Decoding Related topics

PrerequisitesThis topic assumes that you are familiar with writing DirectShow filters. For more information, see

the topic Writing DirectShow Filters in the DirectShow SDK documentation. The code examples in

this topic assume that the decoder filter is derived from the CTransformFilter class, with the

following class definition:

class CDecoder : public CTransformFilter{public: static CUnknown* WINAPI CreateInstance(IUnknown *pUnk, HRESULT *pHr);

HRESULT CompleteConnect(PIN_DIRECTION direction, IPin *pPin);

HRESULT InitAllocator(IMemAllocator **ppAlloc);

Page 26: Media Foundation

HRESULT DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp);

// TODO: The implementations of these methods depend on the specific decoder. HRESULT CheckInputType(const CMediaType *mtIn); HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut); HRESULT CTransformFilter::GetMediaType(int,CMediaType *);

private: CDecoder(HRESULT *pHr); ~CDecoder();

CBasePin * GetPin(int n);

HRESULT ConfigureDXVA2(IPin *pPin); HRESULT SetEVRForDXVA2(IPin *pPin);

HRESULT FindDecoderConfiguration( /* [in] */ IDirectXVideoDecoderService *pDecoderService, /* [in] */ const GUID& guidDecoder, /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig, /* [out] */ BOOL *pbFoundDXVA2Configuration );

private: IDirectXVideoDecoderService *m_pDecoderService;

DXVA2_ConfigPictureDecode m_DecoderConfig; GUID m_DecoderGuid; HANDLE m_hDevice;

FOURCC m_fccOutputFormat;};

In the remainder of this topic, the term decoder refers to the decoder filter, which receives

compressed video and outputs uncompressed video. The term decoder device refers to a hardware

video accelerator implemented by the graphics driver.

Here are the basic steps that a decoder filter must perform to support DXVA 2.0:

1. Negotiate a media type.2. Find a DXVA decoder configuration.3. Notify the video renderer that the decoder is using DXVA decoding.4. Provide a custom allocator that allocates Direct3D surfaces.

These steps are described in more detail in the remainder of this topic.

Migration NotesIf you are migrating from DXVA 1.0, you should be aware of some significant differences between

the two versions:

DXVA 2.0 does not use the IAMVideoAccelerator and IAMVideoAcceleratorNotify interfaces, because the decoder can access the DXVA 2.0 APIs directly through the IDirectXVideoDecoder interface.

Page 27: Media Foundation

During media type negotiation, the decoder does not use a video acceleration GUID as the subtype. Instead, the subtype is just the uncompressed video format (such as NV12), as with software decoding.

The procedure for configuring the accelerator has changed. In DXVA 1.0, the decoder calls Execute with a DXVA_ConfigPictureDecodestructure to configure the accerlator. In DXVA 2.0, the decoder uses the IDirectXVideoDecoderService interface, as described in the next section.

The decoder allocates the uncompressed buffers. The video renderer no longer allocates them.

Instead of calling IAMVideoAccelerator::DisplayFrame to display the decoded frame, the decoder delivers the frame to the renderer by calling IMemInputPin::Receive, as with software decoding.

The decoder is no longer responsible for checking when data buffers are safe for updates. Therefore DXVA 2.0 does not have any method equivalent to IAMVideoAccelerator::QueryRenderStatus.

Subpicture blending is done by the video renderer, using the DXVA2.0 video processor APIs. Decoders that provide subpictures (for example, DVD decoders) should send subpicture data on a separate output pin.

For decoding operations, DXVA 2.0 uses the same data structures as DXVA 1.0.

The enhanced video renderer (EVR) filter supports DXVA 2.0. The Video Mixing Renderer filters

(VMR-7 and VMR-9) support DXVA 1.0 only.

Finding a Decoder ConfigurationAfter the decoder negotiates the output media type, it must find a compatible configuration for the

DXVA decoder device. You can perform this step inside the output

pin's CBaseOutputPin::CompleteConnect method. This step ensures that the graphics driver

supports the capabilities needed by the decoder, before the decoder commits to using DXVA.

To find a configuration for the decoder device, do the following:

1. Query the renderer's input pin for the IMFGetService interface.2. Call IMFGetService::GetService to get a pointer to

the IDirect3DDeviceManager9 interface. The service GUID is MR_VIDEO_ACCELERATION_SERVICE.

3. Call IDirect3DDeviceManager9::OpenDeviceHandle to get a handle to the renderer's Direct3D device.

4. Call IDirect3DDeviceManager9::GetVideoService and pass in the device handle. This method returns a pointer to theIDirectXVideoDecoderService interface.

5. Call IDirectXVideoDecoderService::GetDecoderDeviceGuids. This method returns an array of decoder device GUIDs.

6. Loop through the array of decoder GUIDs to find the ones that the decoder filter supports.

For example, for an MPEG-2 decoder, you would look

for DXVA2_ModeMPEG2_MOCOMP, DXVA2_ModeMPEG2_IDCT,

or DXVA2_ModeMPEG2_VLD.7. When you find a candidate decoder device GUID, pass the GUID to

the IDirectXVideoDecoderService::GetDecoderRenderTargetsmethod. This method returns an array of render target formats, specified as D3DFORMAT values.

8. Loop through the render target formats and look for one that matches your output format.

Typically, a decoder device supports a single render target format. The decoder filter should

connect to the renderer using this subtype. In the first call to CompleteConnect, the

decoder can determing the render target format and then return this format as a preferred

output type.

9. Call IDirectXVideoDecoderService::GetDecoderConfigurations. Pass in the same

decoder device GUID, along with aDXVA2_VideoDesc structure that describes the

proposed format. The method returns an array of DXVA2_ConfigPictureDecodestructures.

Each structure describes one possible configuration for the decoder device.

10. Assuming that the previous steps are successful, store the Direct3D device handle, the

decoder device GUID, and the configuration structure. The filter will use this information to

create the decoder device.

Page 28: Media Foundation

The following code shows how to find a decoder configuration.

HRESULT CDecoder::ConfigureDXVA2(IPin *pPin){ UINT cDecoderGuids = 0; BOOL bFoundDXVA2Configuration = FALSE; GUID guidDecoder = GUID_NULL;

DXVA2_ConfigPictureDecode config; ZeroMemory(&config, sizeof(config));

// Variables that follow must be cleaned up at the end.

IMFGetService *pGetService = NULL; IDirect3DDeviceManager9 *pDeviceManager = NULL; IDirectXVideoDecoderService *pDecoderService = NULL;

GUID *pDecoderGuids = NULL; // size = cDecoderGuids HANDLE hDevice = INVALID_HANDLE_VALUE;

// Query the pin for IMFGetService. HRESULT hr = pPin->QueryInterface(IID_PPV_ARGS(&pGetService));

// Get the Direct3D device manager. if (SUCCEEDED(hr)) { hr = pGetService->GetService(

MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pDeviceManager) ); }

// Open a new device handle. if (SUCCEEDED(hr)) { hr = pDeviceManager->OpenDeviceHandle(&hDevice); }

// Get the video decoder service. if (SUCCEEDED(hr)) { hr = pDeviceManager->GetVideoService( hDevice, IID_PPV_ARGS(&pDecoderService)); }

// Get the decoder GUIDs. if (SUCCEEDED(hr)) { hr = pDecoderService->GetDecoderDeviceGuids( &cDecoderGuids, &pDecoderGuids); }

if (SUCCEEDED(hr)) { // Look for the decoder GUIDs we want. for (UINT iGuid = 0; iGuid < cDecoderGuids; iGuid++) { // Do we support this mode?

Page 29: Media Foundation

if (!IsSupportedDecoderMode(pDecoderGuids[iGuid])) { continue; }

// Find a configuration that we support. hr = FindDecoderConfiguration(pDecoderService, pDecoderGuids[iGuid], &config, &bFoundDXVA2Configuration); if (FAILED(hr)) { break; }

if (bFoundDXVA2Configuration) { // Found a good configuration. Save the GUID and exit the loop. guidDecoder = pDecoderGuids[iGuid]; break; } } }

if (!bFoundDXVA2Configuration) { hr = E_FAIL; // Unable to find a configuration. }

if (SUCCEEDED(hr)) { // Store the things we will need later.

SafeRelease(&m_pDecoderService); m_pDecoderService = pDecoderService; m_pDecoderService->AddRef();

m_DecoderConfig = config; m_DecoderGuid = guidDecoder; m_hDevice = hDevice; }

if (FAILED(hr)) { if (hDevice != INVALID_HANDLE_VALUE) { pDeviceManager->CloseDeviceHandle(hDevice); } }

SafeRelease(&pGetService); SafeRelease(&pDeviceManager); SafeRelease(&pDecoderService); return hr;}HRESULT CDecoder::FindDecoderConfiguration( /* [in] */ IDirectXVideoDecoderService *pDecoderService, /* [in] */ const GUID& guidDecoder, /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig,

Page 30: Media Foundation

/* [out] */ BOOL *pbFoundDXVA2Configuration ){ HRESULT hr = S_OK; UINT cFormats = 0; UINT cConfigurations = 0;

D3DFORMAT *pFormats = NULL; // size = cFormats DXVA2_ConfigPictureDecode *pConfig = NULL; // size = cConfigurations

// Find the valid render target formats for this decoder GUID. hr = pDecoderService->GetDecoderRenderTargets( guidDecoder, &cFormats, &pFormats );

if (SUCCEEDED(hr)) { // Look for a format that matches our output format. for (UINT iFormat = 0; iFormat < cFormats; iFormat++) { if (pFormats[iFormat] != (D3DFORMAT)m_fccOutputFormat) { continue; }

// Fill in the video description. Set the width, height, format, // and frame rate. DXVA2_VideoDesc videoDesc = {0};

FillInVideoDescription(&videoDesc); // Private helper function. videoDesc.Format = pFormats[iFormat];

// Get the available configurations. hr = pDecoderService->GetDecoderConfigurations( guidDecoder, &videoDesc, NULL, // Reserved. &cConfigurations, &pConfig );

if (FAILED(hr)) { break; }

// Find a supported configuration. for (UINT iConfig = 0; iConfig < cConfigurations; iConfig++) { if (IsSupportedDecoderConfig(pConfig[iConfig])) { // This configuration is good. *pbFoundDXVA2Configuration = TRUE; *pSelectedConfig = pConfig[iConfig]; break;

Page 31: Media Foundation

} }

CoTaskMemFree(pConfig); break;

} // End of formats loop. }

CoTaskMemFree(pFormats);

// Note: It is possible to return S_OK without finding a configuration. return hr;}

Because this example is generic, some of the logic has been placed in helper functions that would

need to be implemented by the decoder. The following code shows the declarations for these

functions:

// Returns TRUE if the decoder supports a given decoding mode.BOOL IsSupportedDecoderMode(const GUID& mode);

// Returns TRUE if the decoder supports a given decoding configuration.BOOL IsSupportedDecoderConfig(const DXVA2_ConfigPictureDecode& config);

// Fills in a DXVA2_VideoDesc structure based on the input format.void FillInVideoDescription(DXVA2_VideoDesc *pDesc);

Notifying the Video RendererIf the decoder finds a decoder configuration, the next step is to notify the video renderer that the

decoder will use hardware acceleration. You can perform this step inside

the CompleteConnect method. This step must occur before the allocator is selected, because it

affects how the allocator is selected.

1. Query the renderer's input pin for the IMFGetService interface.2. Call IMFGetService::GetService to get a pointer to

the IDirectXVideoMemoryConfiguration interface. The service GUID isMR_VIDEO_ACCELERATION_SERVICE.

3. Call IDirectXVideoMemoryConfiguration::GetAvailableSurfaceTypeByIndex in a loop, incrementing the dwTypeIndex variable from zero. Stop when the method returns the value DXVA2_SurfaceType_DecoderRenderTarget in the pdwType parameter. This step ensures that the video renderer supports hardware-accelerated decoding. This step will always succeed for the EVR filter.

4. If the previous step succeeded, call IDirectXVideoMemoryConfiguration::SetSurfaceType with the value DXVA2_SurfaceType_DecoderRenderTarget. Calling SetSurfaceType with this value puts the video renderer into DXVA mode. When the video renderer is in this mode, the decoder must provide its own allocator.

The following code shows how to notify the video renderer.

HRESULT CDecoder::SetEVRForDXVA2(IPin *pPin){ HRESULT hr = S_OK;

Page 32: Media Foundation

IMFGetService *pGetService = NULL; IDirectXVideoMemoryConfiguration *pVideoConfig = NULL;

// Query the pin for IMFGetService. hr = pPin->QueryInterface(__uuidof(IMFGetService), (void**)&pGetService);

// Get the IDirectXVideoMemoryConfiguration interface. if (SUCCEEDED(hr)) { hr = pGetService->GetService( MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pVideoConfig)); }

// Notify the EVR. if (SUCCEEDED(hr)) { DXVA2_SurfaceType surfaceType;

for (DWORD iTypeIndex = 0; ; iTypeIndex++) { hr = pVideoConfig->GetAvailableSurfaceTypeByIndex(iTypeIndex, &surfaceType); if (FAILED(hr)) { break; }

if (surfaceType == DXVA2_SurfaceType_DecoderRenderTarget) { hr = pVideoConfig->SetSurfaceType(DXVA2_SurfaceType_DecoderRenderTarget); break; } } }

SafeRelease(&pGetService); SafeRelease(&pVideoConfig);

return hr;}

If the decoder finds a valid configuration and successfully notifies the video renderer, the decoder

can use DXVA for decoding. The decoder must implement a custom allocator for its output pin, as

described in the next section.

Allocating Uncompressed BuffersIn DXVA 2.0, the decoder is responsible for allocating Direct3D surfaces to use as uncompressed

video buffers. Therefore, the decoder must implement a custom allocator that will create the

surfaces. The media samples provided by this allocator will hold pointers to the Direct3D surfaces.

The EVR retrieves a pointer to the surface by calling IMFGetService::GetService on the media

sample. The service identifier isMR_BUFFER_SERVICE.

To provide the custom allocator, perform the following steps:

Page 33: Media Foundation

1. Define a class for the media samples. This class can derive from the CMediaSample class. Inside this class, do the following:

Store a pointer to the Direct3D surface. Implement the IMFGetService interface. In the GetService method, if the service

GUID is MR_BUFFER_SERVICE, query the Direct3D surface for the requested interface. Otherwise, GetService can return MF_E_UNSUPPORTED_SERVICE.

Override the CMediaSample::GetPointer method to return E_NOTIMPL.2. Define a class for the allocator. The allocator can derive from the CBaseAllocator class.

Inside this class, do the following. Override the CBaseAllocator::Alloc method. Inside this method,

call IDirectXVideoAccelerationService::CreateSurface to create the surfaces. (The IDirectXVideoDecoderService interface inherits this method from IDirectXVideoAccelerationService.)

Override the CBaseAllocator::Free method to release the surfaces.3. In your filter's output pin, override the CBaseOutputPin::InitAllocator method. Inside this

method, create an instance of your custom allocator.4. In your filter, implement the CTransformFilter::DecideBufferSize method.

The pProperties parameter indicates the number of surfaces that the EVR requires. Add to this value the number of surfaces that your decoder requires, and call IMemAllocator::SetProperties on the allocator.

The following code shows how to implement the media sample class:

class CDecoderSample : public CMediaSample, public IMFGetService{ friend class CDecoderAllocator;

public:

CDecoderSample(CDecoderAllocator *pAlloc, HRESULT *phr) : CMediaSample(NAME("DecoderSample"), (CBaseAllocator*)pAlloc, phr, NULL, 0), m_pSurface(NULL), m_dwSurfaceId(0) { }

// Note: CMediaSample does not derive from CUnknown, so we cannot use the // DECLARE_IUNKNOWN macro that is used by most of the filter classes.

STDMETHODIMP QueryInterface(REFIID riid, void **ppv) { CheckPointer(ppv, E_POINTER);

if (riid == IID_IMFGetService) { *ppv = static_cast<IMFGetService*>(this); AddRef(); return S_OK; } else { return CMediaSample::QueryInterface(riid, ppv); } } STDMETHODIMP_(ULONG) AddRef() { return CMediaSample::AddRef(); }

Page 34: Media Foundation

STDMETHODIMP_(ULONG) Release() { // Return a temporary variable for thread safety. ULONG cRef = CMediaSample::Release(); return cRef; }

// IMFGetService::GetService STDMETHODIMP GetService(REFGUID guidService, REFIID riid, LPVOID *ppv) { if (guidService != MR_BUFFER_SERVICE) { return MF_E_UNSUPPORTED_SERVICE; } else if (m_pSurface == NULL) { return E_NOINTERFACE; } else { return m_pSurface->QueryInterface(riid, ppv); } }

// Override GetPointer because this class does not manage a system memory buffer. // The EVR uses the MR_BUFFER_SERVICE service to get the Direct3D surface. STDMETHODIMP GetPointer(BYTE ** ppBuffer) { return E_NOTIMPL; }

private:

// Sets the pointer to the Direct3D surface. void SetSurface(DWORD surfaceId, IDirect3DSurface9 *pSurf) { SafeRelease(&m_pSurface);

m_pSurface = pSurf; if (m_pSurface) { m_pSurface->AddRef(); }

m_dwSurfaceId = surfaceId; }

IDirect3DSurface9 *m_pSurface; DWORD m_dwSurfaceId;};

The following code shows how to implement the Alloc method on the allocator.

HRESULT CDecoderAllocator::Alloc()

Page 35: Media Foundation

{ CAutoLock lock(this);

HRESULT hr = S_OK;

if (m_pDXVA2Service == NULL) { return E_UNEXPECTED; }

hr = CBaseAllocator::Alloc();

// If the requirements have not changed, do not reallocate. if (hr == S_FALSE) { return S_OK; }

if (SUCCEEDED(hr)) { // Free the old resources. Free();

// Allocate a new array of pointers. m_ppRTSurfaceArray = new (std::nothrow) IDirect3DSurface9*[m_lCount]; if (m_ppRTSurfaceArray == NULL) { hr = E_OUTOFMEMORY; } else { ZeroMemory(m_ppRTSurfaceArray, sizeof(IDirect3DSurface9*) * m_lCount); } }

// Allocate the surfaces. if (SUCCEEDED(hr)) { hr = m_pDXVA2Service->CreateSurface( m_dwWidth, m_dwHeight, m_lCount - 1, (D3DFORMAT)m_dwFormat, D3DPOOL_DEFAULT, 0, DXVA2_VideoDecoderRenderTarget, m_ppRTSurfaceArray, NULL ); }

if (SUCCEEDED(hr)) { for (m_lAllocated = 0; m_lAllocated < m_lCount; m_lAllocated++) { CDecoderSample *pSample = new (std::nothrow) CDecoderSample(this, &hr);

Page 36: Media Foundation

if (pSample == NULL) { hr = E_OUTOFMEMORY; break; } if (FAILED(hr)) { break; } // Assign the Direct3D surface pointer and the index. pSample->SetSurface(m_lAllocated, m_ppRTSurfaceArray[m_lAllocated]);

// Add to the sample list. m_lFree.Add(pSample); } }

if (SUCCEEDED(hr)) { m_bChanged = FALSE; } return hr;}

Here is the code for the Free method:

void CDecoderAllocator::Free(){ CMediaSample *pSample = NULL;

do { pSample = m_lFree.RemoveHead(); if (pSample) { delete pSample; } } while (pSample);

if (m_ppRTSurfaceArray) { for (long i = 0; i < m_lAllocated; i++) { SafeRelease(&m_ppRTSurfaceArray[i]); }

delete [] m_ppRTSurfaceArray; } m_lAllocated = 0;}

For more information about implementing custom allocators, see the topic Providing a Custom

Allocator in the DirectShow SDK documentation.

Page 37: Media Foundation

DecodingTo create the decoder device, call IDirectXVideoDecoderService::CreateVideoDecoder. The

method returns a pointer to theIDirectXVideoDecoder interface of the decoder device.

On each frame, call IDirect3DDeviceManager9::TestDevice to test the device handle. If the

device has changed, the method returns DXVA2_E_NEW_VIDEO_DEVICE. If this occurs, do the

following:

1. Close the device handle by calling IDirect3DDeviceManager9::CloseDeviceHandle.2. Release the IDirectXVideoDecoderService and IDirectXVideoDecoder pointers.3. Open a new device handle.4. Negotiate a new decoder configuration, as described in the section Finding a Decoder

Configuration.5. Create a new decoder device.

Assuming that the device handle is valid, the decoding process works as follows:

1. Call IDirectXVideoDecoder::BeginFrame.2. Do the following one or more times:

1. Call IDirectXVideoDecoder::GetBuffer to get a DXVA decoder buffer.2. Fill the buffer.3. Call IDirectXVideoDecoder::ReleaseBuffer.

3. Call IDirectXVideoDecoder::Execute to perform the decoding operations on the frame.

DXVA 2.0 uses the same data structures as DXVA 1.0 for decoding operations. For the original set

of DXVA profiles (for H.261, H.263, and MPEG-2), these data structures are described in the DXVA

1.0 specification.

Within each pair of BeginFrame/Execute calls, you may call GetBuffer multiple times, but only

once for each type of DXVA buffer. If you call it twice with the same buffer type, you will overwrite

the data.

After calling Execute, call IMemInputPin::Receive to deliver the frame to the video renderer, as

with software decoding. The Receive method is asynchronous; after it returns, the decoder can

continue decoding the next frame. The display driver prevents any decoding commands from

overwriting the buffer while the buffer is in use. The decoder should not reuse a surface to decode

another frame until the renderer has released the sample. When the renderer releases the sample,

the allocator puts the sample back into its pool of available samples. To get the next available

sample, call CBaseOutputPin::GetDeliveryBuffer, which in turn

calls IMemAllocator::GetBuffer. For more information, see the topic Overview of Data Flow in

DirectShow in the DirectShow documentation.

Supporting DXVA 2.0 in Media Foundation

This topic describes how to support DirectX Video Acceleration (DXVA) 2.0 in a Media Foundation

transform (MFT) using Microsoft Direct3D 9 Specifically, it describes the communication between

the decoder and the video renderer, which is mediated by the topology loader. This topic does not

describe how to implement DXVA decoding.

In the remainder of this topic, the term decoder refers to the decoder MFT, which receives

compressed video and outputs uncompressed video. The term decoder device refers to a hardware

video accelerator implemented by the graphics driver.

Here are the basic steps that a decoder must perform to support DXVA 2.0 in Media Foundation:

1. Open a handle to the Direct3D 9 device.2. Find a DXVA decoder configuration.3. Allocate uncompressed Buffers.

Page 38: Media Foundation

4. Decode frames.

These steps are described in more detail in the remainder of this topic.

Opening a Direct3D Device HandleThe MFT uses the Microsoft Direct3D device manager to get a handle to the Direct3D 9 device. To

open the device handle, perform the following steps:

1. Expose the MF_SA_D3D_AWARE attribute with the value TRUE. The topology loader queries this attribute by callingIMFTransform::GetAttributes. Setting the attribute to TRUE notifies the topology loader that the MFT supports DXVA.

2. When format negotiation begins, the topology loader calls IMFTransform::ProcessMessage with theMFT_MESSAGE_SET_D3D_MANAGER message. The ulParam parameter is an IUnknown pointer to the video renderer's Direct3D device manager. Query this pointer for the IDirect3DDeviceManager9 interface.

3. Call IDirect3DDeviceManager9::OpenDeviceHandle to get a handle to the renderer's Direct3D device.

4. Call IDirect3DDeviceManager9::GetVideoService and pass in the device handle. This method returns a pointer to the IDirectXVideoDecoderService interface.

5. Cache the pointers and the device handle.

Finding a Decoder ConfigurationThe MFT must find a compatible configuration for the DXVA decoder device. Perform the following

steps inside theIMFTransform::SetInputType method, after validating the input type:

1. Call IDirectXVideoDecoderService::GetDecoderDeviceGuids. This method returns an array of decoder device GUIDs.

2. Loop through the array of decoder GUIDs to find the ones that the decoder supports. For

example, for an MPEG-2 decoder, you would look

for DXVA2_ModeMPEG2_MOCOMP, DXVA2_ModeMPEG2_IDCT,

orDXVA2_ModeMPEG2_VLD.3. When you find a candidate decoder device GUID, pass the GUID to

theIDirectXVideoDecoderService::GetDecoderRenderTargets method. This method returns an array of render target formats, specified as D3DFORMAT values.

4. Loop through the render target formats and look for a format supported by the decoder.5. Call IDirectXVideoDecoderService::GetDecoderConfigurations. Pass in the same

decoder device GUID, along with aDXVA2_VideoDesc structure that describes the proposed output format. The method returns an array ofDXVA2_ConfigPictureDecode structures. Each structure describes one possible configuration for the decoder device. Look for a configuration that the decoder supports.

6. Store the render target format and configuration.

In the IMFTransform::GetOutputAvailableType method, return an uncompressed video format,

based on the proposed render target format.

In the IMFTransform::SetOutputType method, check the media type against the render target

format.Fallback to Software Decoding

If the MFT cannot find a DXVA configuration (for example, if the graphics driver does not have the

right capabilities), it should return the error code MF_E_UNSUPPORTED_D3D_TYPE from

the SetInputType and SetOutputType methods. The topology loader will respond by sending

the MFT_MESSAGE_SET_D3D_MANAGER message with the value NULL for

the ulParamparameter. The MFT should release its pointer to

the IDirect3DDeviceManager9 interface. The topology loader will then renegotiate the media

type, and the MFT can use software decoding.

Allocating Uncompressed BuffersIn DXVA 2.0, the decoder is responsible for allocating Direct3D surfaces to use as uncompressed

video buffers. The decoder should allocate 3 surfaces for the EVR to use for deinterlacing. This

Page 39: Media Foundation

number is fixed, because Media Foundation does not provide a way for the EVR to specify how

many surfaces the graphics driver requires for deinterlacing. Three surfaces should be sufficient for

any driver.

In the IMFTransform::GetOutputStreamInfo method, set

the MFT_OUTPUT_STREAM_PROVIDES_SAMPLES flag in

theMFT_OUTPUT_STREAM_INFO structure. This flag notifies the Media Session that the MFT

allocates its own output samples.

To create the surfaces, call IDirectXVideoAccelerationService::CreateSurface.

(The IDirectXVideoDecoderServiceinterface inherits this method

from IDirectXVideoAccelerationService.) You can do this in SetInputType, after finding the

render target format.

For each surface, call MFCreateVideoSampleFromSurface to create a media sample to hold the

surface. The method returns a pointer to the IMFSample interface.

DecodingTo create the decoder device, call IDirectXVideoDecoderService::CreateVideoDecoder. The

method returns a pointer to theIDirectXVideoDecoder interface of the decoder device.

Decoding should occur inside the IMFTransform::ProcessOutput method. On each frame,

callIDirect3DDeviceManager9::TestDevice to test the device handle. If the device has

changed, the method returnsDXVA2_E_NEW_VIDEO_DEVICE. If this occurs, do the following:

1. Close the device handle by calling IDirect3DDeviceManager9::CloseDeviceHandle.2. Release the IDirectXVideoDecoderService and IDirectXVideoDecoder pointers.3. Open a new device handle.4. Negotiate a new decoder configuration, as described in "Finding a Decoder Configuration"

earlier on this page.5. Create a new decoder device.

Assuming that the device handle is valid, the decoding process works as follows:

1. Get an available surface that is not currently in use. (Initially all of the surfaces are available.)

2. Query the media sample for the IMFTrackedSample interface.3. Call IMFTrackedSample::SetAllocator and provide a pointer to

the IMFAsyncCallback interface, implemented by the decoder. When the video renderer releases the sample, the decoder's callback will be invoked.

4. Call IDirectXVideoDecoder::BeginFrame.5. Do the following one or more times:

1. Call IDirectXVideoDecoder::GetBuffer to get a DXVA decoder buffer.2. Fill the buffer.3. Call IDirectXVideoDecoder::ReleaseBuffer.4. Call IDirectXVideoDecoder::Execute to perform the decoding operations on the

frame.

DXVA 2.0 uses the same data structures as DXVA 1.0 for decoding operations. For the original set

of DXVA profiles (for H.261, H.263, and MPEG-2), these data structures are described in the DXVA

1.0 specification.

Within each pair of BeginFrame/Execute calls, you may call GetBuffer multiple times, but only

once for each type of DXVA buffer. If you call it twice with the same buffer type, you will overwrite

the data.

Use the callback from the SetAllocator method (step 3) to keep track of which samples are

currently available and which are in use.

Related topicsDirectX Video Acceleration 2.0Media Foundation Transforms

Page 40: Media Foundation

 DXVA Video ProcessingDXVA video processing encapsulates the functions of the graphics hardware that are devoted to

processing uncompressed video images. Video processing services include dinterlacing and video

mixing.

This topic contains the following sections:

Overview Creating a Video Processing Device

Get the IDirectXVideoProcessorService Pointer Enumerate the Video Processing Devices Enumerate Render-Target Formats Query the Device Capabilities Create the Device

Video Process Blit Blit Parameters Input Samples

Image Composition Example 1: Letterboxing Example 2: Stretching Substream Images Example 3: Mismatched Stream Heights Example 4: Target Rectangle Smaller Than Destination Surface Example 5: Source Rectangles Example 6: Intersecting Destination Rectangles Example 7: Stretching and Cropping Video

Input Sample Order Example 1 Example 2 Example 3 Example 4

Related topics

OverviewGraphics hardware can use the graphics processing unit (GPU) to process uncompressed video

images. A video processingdevice is a software component that encapsulates these functions.

Applications can use a video processing device to perform functions such as:

Deinterlacing and inverse telecine Mixing video substreams onto the main video image Color adjustment (ProcAmp) and image filtering Image scaling Color-space conversion Alpha blending

The following diagram shows the stages in the video processing pipeline. The diagram is not meant

to show an actual implementation. For example, the graphics driver might combine several stages

into a single operation. All of these operations can be performed in a single call to the video

processing device. Some stages shown here, such as noise and detail filtering, might not be

supported by the driver.

Page 41: Media Foundation

The input to the video processing pipeline always includes a primary video stream, which contains

the main image data. The primary video stream determines the frame rate for the output video.

Each frame of the output video is calculated relative to the input data from the primary video

stream. Pixels in the primary stream are always opaque, with no per-pixel alpha data. The primary

video stream can be progressive or interlaced.

Optionally, the video processing pipeline can receive up to 15 video substreams. A substream

contains auxiliary image data, such as closed captions or DVD subpictures. These images are

displayed over the primary video stream, and are generally not meant to be shown by themselves.

Substream pictures can contain per-pixel alpha data, and are always progressive frames. The video

processing device alpha-blends the substream images with the current deinterlaced frame from

the primary video stream.

In the remainder of this topic, the term picture is used for the input data to a video processing

device. A picture might consist of a progressive frame, a single field, or two interleaved fields. The

output is always a deinterlaced frame.

A video driver can implement more than one video processing device, to provide different sets of

video processing capabilities. Devices are identified by GUID. The following GUIDs are predefined:

DXVA2_VideoProcBobDevice. This device performs bob deinterlacing. DXVA2_VideoProcProgressiveDevice. This device is used if the video contains only

progressive frames, with no interlaced frames. (Some video content contains a mix of progressive and interlaced frames. The progressive device cannot be used for this kind of "mixed" video content, because a deinterlacing step is required for the interlaced frames.)

Every graphics driver that supports DXVA video processing must implement at least these two

devices. The graphics driver may also provide other devices, which are identified by driver-specific

GUIDs. For example, a driver might implement a proprietary deinterlacing algorithm that produces

better quality output than bob deinterlacing. Some deinterlacing algorithms may require forward or

backward reference pictures from the primary stream. If so, the caller must provide these pictures

to the driver in the correct sequence, as described later in this section.

A reference software device is also provided. The software device is optimized for quality rather

than speed, and may not be adequate for real-time video processing. The reference software

device uses the GUID value DXVA2_VideoProcSoftwareDevice.

Creating a Video Processing DeviceBefore using DXVA video processing, the application must create a video processing device. Here

is a brief outline of the steps, which are explained in greater detail in the remainder of this section:

1. Get a pointer to the IDirectXVideoProcessorService interface.

Page 42: Media Foundation

2. Create a description of the video format for the primary video stream. Use this description to get a list of the video processing devices that support the video format. Devices are identified by GUID.

3. For a particular device, get a list of render-target formats supported by the device. The formats are returned as a list ofD3DFORMAT values. If you plan to mix substreams, get a list of the supported substream formats as well.

4. Query the capabilities of each device.5. Create the video processing device.

Sometimes you can omit some of these steps. For example, instead of getting the list of render-

target formats, you could simply try creating the video processing device with your preferred

format, and see if it succeeds. A common format such as D3DFMT_X8R8G8B8 is likely to succeed.

The remainder of this section describes these steps in detail.Get the IDirectXVideoProcessorService Pointer

The IDirectXVideoProcessorService interface is obtained from the Direct3D device. There are

two ways to get a pointer to this interface:

From a Direct3D device. From the Direct3D Device Manager.

If you have a pointer to a Direct3D device, you can get

an IDirectXVideoProcessorService pointer by calling theDXVA2CreateVideoService function.

Pass in a pointer to the device's IDirect3DDevice9 interface, and

specifyIID_IDirectXVideoProcessorService for the riid parameter, as shown in the following

code:

// Create the DXVA-2 Video Processor service. hr = DXVA2CreateVideoService(g_pD3DD9, IID_PPV_ARGS(&g_pDXVAVPS));

n some cases, one object creates the Direct3D device and then shares it with other objects through

the Direct3D Device Manager. In this situation, you can

call IDirect3DDeviceManager9::GetVideoService on the device manager to get

theIDirectXVideoProcessorService pointer, as shown in the following code:

HRESULT GetVideoProcessorService( IDirect3DDeviceManager9 *pDeviceManager, IDirectXVideoProcessorService **ppVPService ){ *ppVPService = NULL;

HANDLE hDevice;

HRESULT hr = pDeviceManager->OpenDeviceHandle(&hDevice); if (SUCCEEDED(hr)) { // Get the video processor service HRESULT hr2 = pDeviceManager->GetVideoService( hDevice, IID_PPV_ARGS(ppVPService) );

// Close the device handle. hr = pDeviceManager->CloseDeviceHandle(hDevice);

Page 43: Media Foundation

if (FAILED(hr2)) { hr = hr2; } }

if (FAILED(hr)) { SafeRelease(ppVPService); }

return hr;}

Enumerate the Video Processing Devices

To get a list of video processing devices, fill in a DXVA2_VideoDesc structure with the format of

the primary video stream, and pass this structure to

the IDirectXVideoProcessorService::GetVideoProcessorDeviceGuids method. The method

returns an array of GUIDs, one for each video processing device that can be used with this video

format.

Consider an application that renders a video stream in YUY2 format, using the BT.709 definition of

YUV color, with a frame rate of 29.97 frames per second. Assume that the video content consists

entirely of progressive frames. The following code fragment shows how to fill in the format

description and get the device GUIDs:

// Initialize the video descriptor.

g_VideoDesc.SampleWidth = VIDEO_MAIN_WIDTH; g_VideoDesc.SampleHeight = VIDEO_MAIN_HEIGHT; g_VideoDesc.SampleFormat.VideoChromaSubsampling = DXVA2_VideoChromaSubsampling_MPEG2; g_VideoDesc.SampleFormat.NominalRange = DXVA2_NominalRange_16_235; g_VideoDesc.SampleFormat.VideoTransferMatrix = EX_COLOR_INFO[g_ExColorInfo][0]; g_VideoDesc.SampleFormat.VideoLighting = DXVA2_VideoLighting_dim; g_VideoDesc.SampleFormat.VideoPrimaries = DXVA2_VideoPrimaries_BT709; g_VideoDesc.SampleFormat.VideoTransferFunction = DXVA2_VideoTransFunc_709; g_VideoDesc.SampleFormat.SampleFormat = DXVA2_SampleProgressiveFrame; g_VideoDesc.Format = VIDEO_MAIN_FORMAT; g_VideoDesc.InputSampleFreq.Numerator = VIDEO_FPS; g_VideoDesc.InputSampleFreq.Denominator = 1; g_VideoDesc.OutputFrameFreq.Numerator = VIDEO_FPS; g_VideoDesc.OutputFrameFreq.Denominator = 1;

// Query the video processor GUID.

UINT count; GUID* guids = NULL;

Page 44: Media Foundation

hr = g_pDXVAVPS->GetVideoProcessorDeviceGuids(&g_VideoDesc, &count, &guids);

The code for this example is taken from the DXVA2_VideoProc SDK sample.

The pGuids array in this example is allocated by the GetVideoProcessorDeviceGuids method, so

the application must free the array by calling CoTaskMemFree. The remaining steps can be

performed using any of the device GUIDs returned by this method.Enumerate Render-Target Formats

To get the list of render-target formats supported by the device, pass the device GUID and

the DXVA2_VideoDesc structure to

the IDirectXVideoProcessorService::GetVideoProcessorRenderTargets method, as shown in

the following code:

// Query the supported render-target formats.

UINT i, count; D3DFORMAT* formats = NULL;

HRESULT hr = g_pDXVAVPS->GetVideoProcessorRenderTargets( guid, &g_VideoDesc, &count, &formats);

if (FAILED(hr)) { DBGMSG((L"GetVideoProcessorRenderTargets failed: 0x%x.\n", hr)); return FALSE; }

for (i = 0; i < count; i++) { if (formats[i] == VIDEO_RENDER_TARGET_FORMAT) { break; } }

CoTaskMemFree(formats);

if (i >= count) { DBGMSG((L"The device does not support the render-target format.\n")); return FALSE; }

The method returns an array of D3DFORMAT values. In this example, where the input type is

YUY2, a typical list of formats might be D3DFMT_X8R8G8B8 (32-bit RGB) and D3DMFT_YUY2 (the

input format). However, the exact list will depend on the driver.

The list of available formats for the substreams can vary depending on the render-target format

and the input format. To get the list of substream formats, pass the device GUID, the format

structure, and the render-target format to

theIDirectXVideoProcessorService::GetVideoProcessorSubStreamFormats method, as

shown in the following code:

Page 45: Media Foundation

// Query the supported substream formats.

formats = NULL;

hr = g_pDXVAVPS->GetVideoProcessorSubStreamFormats( guid, &g_VideoDesc, VIDEO_RENDER_TARGET_FORMAT, &count, &formats);

if (FAILED(hr)) { DBGMSG((L"GetVideoProcessorSubStreamFormats failed: 0x%x.\n", hr)); return FALSE; }

for (i = 0; i < count; i++) { if (formats[i] == VIDEO_SUB_FORMAT) { break; } }

CoTaskMemFree(formats);

if (i >= count) { DBGMSG((L"The device does not support the substream format.\n")); return FALSE; }

This method returns another array of D3DFORMAT values. Typical substream formats are AYUV

and AI44.Query the Device Capabilities

To get the capabilities of a particular device, pass the device GUID, the format structure, and a

render-target format to theIDirectXVideoProcessorService::GetVideoProcessorCaps method.

The method fills in a DXVA2_VideoProcessorCapsstructure structure with the device capabilities.

// Query video processor capabilities.

hr = g_pDXVAVPS->GetVideoProcessorCaps( guid, &g_VideoDesc, VIDEO_RENDER_TARGET_FORMAT, &g_VPCaps);

if (FAILED(hr)) { DBGMSG((L"GetVideoProcessorCaps failed: 0x%x.\n", hr)); return FALSE; }

Create the Device

To create the video processing device,

call IDirectXVideoProcessorService::CreateVideoProcessor. The input to this method is the

device GUID, the format description, the render-target format, and the maximum number of

substreams that you plan to mix. The method returns a pointer to

the IDirectXVideoProcessor interface, which represents the video processing device.

Page 46: Media Foundation

// Finally create a video processor device.

hr = g_pDXVAVPS->CreateVideoProcessor( guid, &g_VideoDesc, VIDEO_RENDER_TARGET_FORMAT, SUB_STREAM_COUNT, &g_pDXVAVPD );

Video Process BlitThe main video processing operation is the video processing blit. (A blit is any operation that

combines two or more bitmaps into a single bitmap. A video processing blit combines input

pictures to create an output frame.) To perform a video processing blit,

call IDirectXVideoProcessor::VideoProcessBlt. This method passes a set of video samples to

the video processing device. In response, the video processing device processes the input pictures

and generates one output frame. Processing can include deinterlacing, color-space conversion, and

substream mixing. The output is written to a destination surface provided by the caller.

The VideoProcessBlt method takes the following parameters:

pRT points to an IDirect3DSurface9 render target surface that will receive the processed video frame.

pBltParams points to a DXVA2_VideoProcessBltParams structure that specifies the parameters for the blit.

pSamples is the address of an array of DXVA2_VideoSample structures. These structures contain the input samples for the blit.

NumSamples gives the size of the pSamples array. The Reserved parameter is reserved and should be set to NULL.

In the pSamples array, the caller must provide the following input samples:

The current picture from the primary video stream. Forward and backward reference pictures, if required by the deinterlacing algorithm. Zero or more substream pictures, up to a maximum of 15 substreams.

The driver expects this array to be in a particular order, as described in Input Sample Order.Blit Parameters

The DXVA2_VideoProcessBltParams structure contains general parameters for the blit. The

most important parameters are stored in the following members of the structure:

TargetFrame is the presentation time of the output frame. For progressive content, this

time must equal the start time for the current frame from the primary video stream. This

time is specified in the Start member of theDXVA2_VideoSample structure for that input

sample.

For interlaced content, a frame with two interleaved fields produces two deinterlaced output

frames. On the first output frame, the presentation time must equal the start time of the

current picture in the primary video stream, just like progressive content. On the second

output frame, the start time must equal the midpoint between the start time of the current

picture in the primary video stream and the start time of the next picture in the stream. For

example, if the input video is 25 frames per second (50 fields per second), the output frames

will have the time stamps shown in the following table. Time stamps are shown in units of

100 nanoseconds.

Page 47: Media Foundation

Input picture TargetFrame (1) TargetFrame (2)

0 0 200000

400000 0 600000

800000 800000 1000000

1200000 1200000 1400000

 

If interlaced content consists of single fields rather than interleaved fields, the output times

always match the input times, as with progressive content.

TargetRect defines a rectangular region within the destination surface. The blit will write the output to this region. Specifically, every pixel inside TargetRect will be modified, and no pixels outside of TargetRect will be modified. The target rectangle defines the bounding rectangle for all of the input video streams. Placement of individual streams within that rectangle is controlled through the pSamples parameter of IDirectXVideoProcessor::VideoProcessBlt.

BackgroundColor gives the color of the background wherever no video image appears. For example, when a 16 x 9 video image is displayed within a 4 x 3 area (letterboxing), the letterboxed regions are displayed with the background color. The background color applies only within the target rectangle (TargetRect). Any pixels outside of TargetRectare not modified.

DestFormat describes the color space for the output video—for example, whether ITU-R BT.709 or BT.601 color is used. This information can affect how the image is displayed. For more information, see Extended Color Information.

Other parameters are described on the reference page for

the DXVA2_VideoProcessBltParams structure.Input Samples

The pSamples parameter of IDirectXVideoProcessor::VideoProcessBlt points to an array

of DXVA2_VideoSample structures. Each of these structures contains information about one

input sample and a pointer to the Direct3D surface that contains the sample. Each sample is one of

the following:

The current picture from the primary stream. A forward or backward reference picture from the primary stream, used for deinterlacing. A substream picture.

The exact order in which the samples must appear in the array is described later, in the

section Input Sample Order.

Up to 15 substream pictures can be provided, although most video applications need only one

substream, at the most. The number of substreams can change with each call

to VideoProcessBlt. Substream pictures are indicated by setting

theSampleFormat.SampleFormat member of the DXVA2_VideoSample structure equal to

Page 48: Media Foundation

DXVA2_SampleSubStream. For the primary video stream, this member describes the interlacing of

the input video. For more information, seeDXVA2_SampleFormat enumeration.

For the primary video stream, the Start and End members of the DXVA2_VideoSample structure

give the start and end times of the input sample. For substream pictures, set these values to zero,

because the presentation time is always calculated from the primary stream. The application is

responsible for tracking when each substream picture should be presented and submitting it

to VideoProcessBlt at the proper time.

Two rectangles define how the source video is positioned for each stream:

The SrcRect member of the DXVA2_VideoSample structure specifies the source rectangle, a rectangular region of the source picture that will appear in the composited output frame. To crop the picture, set this to a value smaller than the frame size. Otherwise, set it equal to the frame size.

The DstRect member of the same structure specifies the destination rectangle, a rectangular region of the destination surface where the video frame will appear.

The driver blits pixels from the source rectangle into the destination rectangle. The two rectangles

can have different sizes or aspect ratios; the driver will scale the image as needed. Moreover, each

input stream can use a different scaling factor. In fact, scaling might be necessary to produce the

correct aspect ratio in the output frame. The driver does not take the source's pixel aspect ratio

into account, so if the source image uses non-square pixels, it is up to the application to calculate

the correct destination rectangle.

The preferred substream formats are AYUV and AI44. The latter is a palletized format with 16

colors. Palette entries are specified in the Pal member of the DXVA2_VideoSample structure. (If

your source video format is originally expressed as a Media Foundation media type, the palette

entries are stored in the MF_MT_PALETTE attribute.) For non-palletized formats, clear this array to

zero.

Image CompositionEvery blit operation is defined by the following three rectangles:

The target rectangle (TargetRect) defines the region within the destination surface where the output will appear. The output image is clipped to this rectangle.

The destination rectangle for each stream (DstRect) defines where the input stream appears in the composited image.

The source rectangle for each stream (SrcRect) defines which part of the source image appears.

The target and destination rectangles are specified relative to the destination surface. The source

rectangle is specified relative to the source image. All rectangles are specified in pixels.

The video processing device alpha blends the input pictures, using any of the following sources of

alpha data:

Page 49: Media Foundation

Per-pixel alpha data from substreams. A planar alpha value for each video stream, specified in the PlanarAlpha member of

the DXVA2_VideoSamplestructure. The planar alpha value of the composited image, specified in the Alpha member of

theDXVA2_VideoProcessBltParams structure. This value is used to blend the entire composited image with the background color.

This section gives a series of examples that show how the video processing device creates the

output image.Example 1: Letterboxing

This example shows how to letterbox the source image, by setting the destination rectangle to be

smaller than the target rectangle. The primary video stream in this example is a 720 × 480 image,

and is meant to be displayed at a 16:9 aspect ratio. The destination surface is 640 × 480 pixels

(4:3 aspect ratio). To achieve the correct aspect ratio, the destination rectangle must be 640 ×

360. For simplicity, this example does not include a substream. The following diagram shows the

source and destination rectangles.

The preceding diagram shows the following rectangles:

Target rectangle: { 0, 0, 640, 480 }

Primary video: Source rectangle: { 0, 0, 720, 480 } Destination rectangle: { 0, 60, 640, 420 }

The driver will deinterlace the video, shrink the deinterlaced frame to 640 × 360, and blit the

frame into the destination rectangle. The target rectangle is larger than the destination rectangle,

so the driver will use the background color to fill the horizontal bars above and below the frame.

The background color is specified in the DXVA2_VideoProcessBltParamsstructure.Example 2: Stretching Substream Images

Substream pictures can extend beyond the primary video picture. In DVD video, for example, the

primary video stream can have a 4:3 aspect ratio while the substream is 16:9. In this example,

both video streams have the same source dimensions (720 × 480), but the substream is intended

to be shown at a 16:9 aspect ratio. To achieve this aspect ratio, the substream image is stretched

horizontally. The source and destination rectangles are shown in the following diagram.

The preceding diagram shows the following rectangles:

Page 50: Media Foundation

Target rectangle: { 0, 0, 854, 480 }

Primary video: Source rectangle: { 0, 0, 720, 480 } Destination rectangle: { 0, 107, 474, 480 }

Substream: Source rectangle: { 0, 0, 720, 480 } Destination rectangle: { 0, 0, 854, 480 }

These values preserve the image height and scale both images horizontally. In the regions where

both images appear, they are alpha blended. Where the substream picture exends beyond the

primay video, the substream is alpha blended with the background color. This alpha blending

accounts for the altered colors in the right-hand side of the diagram.Example 3: Mismatched Stream Heights

In the previous example, the substream and the primary stream are the same height. Streams can

also have mismatched heights, as shown in this example. Areas within the target rectangle where

no video appears are drawn using the background color—black in this example. The source and

destination rectangles are shown in the following diagram.

The preceding diagram shows the following rectangles:

Target rectangle: { 0, 0, 150, 85 } Primary video:

Source rectangle: { 0, 0, 150, 50 } Destination rectangle: { 0, 17, 150, 67 }

Substream: Source rectangle: { 0, 0, 100, 85 } Destination rectangle: { 25, 0, 125, 85 }

Example 4: Target Rectangle Smaller Than Destination Surface

This example shows a case where the target rectangle is smaller than the destination surface.

The preceding diagram shows the following rectangles:

Destination surface: { 0, 0, 300, 200 } Target rectangle: { 0, 0, 150, 85 } Primary video:

Source rectangle: { 0, 0, 150, 50 } Destination rectangle: { 0, 17, 150, 67 }

Page 51: Media Foundation

Substream: Source rectangle: { 0, 0, 100, 85 } Destination rectangle: { 25, 0, 125, 85 }

Pixels outside of the target rectangle are not modified, so the background color appears only within

the target rectangle. The dotted area indicates portions of the destination surface that are not

affected by the blit.Example 5: Source Rectangles

If you specify a source rectangle that is smaller than the source picture, the driver will blit just that

portion of the picture. In this example, the source rectangles specify the lower-right quadrant of

the primary video stream and the lower-left quadrant of the substream (indicated by hash marks in

the diagram). The destination rectangles are the same sizes as the source rectangles, so the video

is not stretched. The source and destination rectangles are shown in the following diagram.

The preceding diagram shows the following rectangles:

Target rectangle: { 0, 0, 720, 576 } Primary video:

Source surface size: { 0, 0, 720, 480 } Source rectangle: { 360, 240, 720, 480 } Destination rectangle: { 0, 0, 360, 240 }

Substream: Source surface size: { 0, 0, 640, 576 } Source rectangle: { 0, 288, 320, 576 } Destination rectangle: { 400, 0, 720, 288 }

Example 6: Intersecting Destination Rectangles

This example is similar to previous one, but the destination rectangles intersect. The surface

dimensions are the same as in the previous example, but the source and destination rectangles are

not. Again, the video is cropped but not stretched. The source and destination rectangles are

shown in the following diagram.

Page 52: Media Foundation

The preceding diagram shows the following rectangles:

Target rectangle: { 0, 0, 720, 576 } Primary video:

Source surface size: { 0, 0, 720, 480 } Source rectangle: { 260, 92, 720, 480 } Destination rectangle: { 0, 0, 460, 388 }

Substream: Source surface size: { 0, 0, 640, 576 } Source rectangle: { 0, 0, 460, 388 } Destination rectangle: { 260, 188, 720, 576 }

Example 7: Stretching and Cropping Video

In this example, the video is stretched as well as cropped. A 180 × 120 region from each stream is

stretched to cover a 360 × 240 area in the destination rectangle.

The preceding diagram shows the following rectangles:

Target rectangle: { 0, 0, 720, 480 } Primary video:

Source surface size: { 0, 0, 360, 240 } Source rectangle: { 180, 120, 360, 240 } Destination rectangle: { 0, 0, 360, 240 }

Substream: Source surface size: { 0, 0, 360, 240 } Source rectangle: { 0, 0, 180, 120 } Destination rectangle: { 360, 240, 720, 480 }

Input Sample Order

Page 53: Media Foundation

The pSamples parameter of the VideoProcessBlt method is a pointer to an array of input

samples. Samples from the primary video stream appear first, followed by substream pictures in Z-

order. Samples must be placed into the array in the following order:

Samples for the primary video stream appear first in the array, in temporal order. Depending on the deinterlace mode, the driver may require one or more reference samples from the primary video stream. The NumForwardRefSamplesand NumBackwardRefSamples members of the DXVA2_VideoProcessorCaps structure specify how many forward and backward reference samples are needed. The caller must provide these reference samples even if the video content is progressive and does not require deinterlacing. (This can occur when progressive frames are given to a deinterlacing device, for example when the source contains a mix of both interlaced and progressive frames.)

After the samples for the primary video stream, the array can contain up to 15 substream samples, arranged in Z-order, from bottom to top. Substreams are always progressive and do not require reference pictures.

At any time, the primary video stream can switch between interlaced and progressive content, and

the number of substreams can change.

The SampleFormat.SampleFormat member of the DXVA2_VideoSample structure indicates

the type of picture. For substream pictures, set this value to DXVA2_SampleSubStream. For

progressive pictures, the value is DXVA2_SampleProgressiveFrame. For interlaced pictures, the

value depends on the field layout.

If the driver requires forward and backward reference samples, the full number of samples might

not be available at the start of the video sequence. In that case, include entries for them in

the pSamples array, but mark the missing samples as having type DXVA2_SampleUnknown.

The Start and End members of the DXVA2_VideoSample structure give the temporal location of

each sample. These values are used only for samples from the primary video stream. For

substream pictures, set both members to zero.

The following examples may help to clarify these requirements.Example 1

The simplest case occurs when there are no substreams and the deinterlacing algorithm does not

require reference samples (NumForwardRefSamples and NumBackwardRefSamples are both

zero). Bob deinterlacing is an example of such an algorithm. In this case, the pSamples array

should contain a single input surface, as shown in the following table.

Index Surface typeTemporal location

pSamples[0]

Interlaced picture.

T

 

The time value T is assumed to be the start time of the current video frame.Example 2

In this example, the application mixes two substreams with the primary stream. The deinterlacing

algorithm does not require reference samples. The following table shows how these samples are

arranged in the pSamples array.

Index Surface typeTemporal location

Z-order

Page 54: Media Foundation

pSamples[0]

Interlaced picture

T 0

pSamples[1]

Substream 0 1

pSamples[2]

Substream 0 2

 Example 3

Now suppose that the deinterlacing algorithm requires one backward reference sample and one

forward reference sample. In addition, two substream pictures are provided, for a total of five

surfaces. The correct ordering is shown in the following table.

Index Surface typeTemporal location

Z-order

pSamples[0]

Interlaced picture (reference)

T −1 Not applicable

pSamples[1]

Interlaced picture T 0

pSamples[2]

Interlaced picture (reference)

T +1 Not applicable

pSamples[3]

Substream 0 1

pSamples[4]

Substream 0 2

 

The time T −1 is the start time of the frame before the current frame, and T +1 is the start time of

the following frame.

If the video stream switches to progressive content, using the same deinterlacing mode, the

application must provide the same number of samples, as shown in the following table.

Index Surface typeTemporal location

Z-order

Page 55: Media Foundation

pSamples[0]

Progressive picture (reference)

T −1 Not applicable

pSamples[1]

Progressive picture T 0

pSamples[2]

Progressive picture (reference)

T +1 Not applicable

pSamples[3]

Substream 0 1

pSamples[4]

Substream 0 2

 Example 4

At the start of a video sequence, forward and backward reference samples might not be available.

When this happens, entries for the missing samples are included in the pSamples array, with

sample type DXVA2_SampleUnknown.

Assuming that the deinterlacing mode needs one forward and one backward reference sample, the

first three calls toVideoProcessBlt would have the sequences of inputs shown in the following

three tables.

Index Surface typeTemporal location

pSamples[0]

Unknown 0

pSamples[1]

Unknown 0

pSamples[2]

Interlaced picture (reference)

T +1

 

Index Surface typeTemporal location

pSamples[0]

Unknown 0

Page 56: Media Foundation

pSamples[1]

Interlaced picture T

pSamples[2]

Interlaced picture (reference)

T +1

 

Index Surface typeTemporal location

pSamples[0]

Interlaced picture T −1

pSamples[1]

Interlaced picture T

pSamples[2]

Interlaced picture (reference)

T +1

 

DXVA-HDMicrosoft DirectX Video Acceleration High Definition (DXVA-HD) is an API for hardware-accelerated

video processing. DXVA-HD uses the GPU to perform functions such as deinterlacing, compositing,

and color-space conversion.

DXVA-HD is similar to DXVA Video Processing (DXVA-VP), but offers enhanced features and a

simpler processing model. By providing a more flexible composition model, DXVA-HD is designed to

support the next generation of HD optical formats and broadcast standards.

The DXVA-HD API requires either a WDDM display driver that supports the DXVA-HD device driver

interface (DDI), or a plug-in software processor.

Improvements over DXVA-VP Related topics

Improvements over DXVA-VPDXVA-HD expands the set of features provided by DXVA-VP. Enhancements include:

RGB and YUV mixing. Any stream can be either RGB or YUV. There is no longer a distinction between the primary stream and the substreams.

Deinterlacing of multiple streams. Any stream can be either progressive or interlaced. Moreover, the cadence and frame rate can can vary from one input stream to the next.

RGB background colors. Previously, only YUV background colors were supported. Luma keying. When luma keying is enabled, luma values that fall within a designated range

become transparent. Dynamic switching between deinterlace modes.

DXVA-HD also defines some advanced features that drivers can support. However, applications

should not assume that all drivers will support these features. The advanced features include:

Page 57: Media Foundation

Inverse telecine (for example, 60i to 24p). Frame-rate conversion (for example, 24p to 120p). Alpha-fill modes. Noise reduction and edge enhancement filtering. Anamorphic non-linear scaling. Extended YCbCr (xvYCC).

This section contains the following topics.

Creating a DXVA-HD Video Processor Checking Supported DXVA-HD Formats Creating DXVA-HD Video Surfaces Setting DXVA-HD States Performing the DXVA-HD Blit

Creating a DXVA-HD Video ProcessorMicrosoft DirectX Video Acceleration High Definition (DXVA-HD) uses two primary interfaces:

IDXVAHD_Device. Represents the DXVA-HD device. Use this interface to query the device capabilities and create the video processor.

IDXVAHD_VideoProcessor. Represents a set of video processing capabilities. Use this interface to perform the video processing blit.

In the code that follows, the following global variables are assumed:

IDirect3D9Ex *g_pD3D = NULL; IDirect3DDevice9Ex *g_pD3DDevice = NULL; // Direct3D device.IDXVAHD_Device *g_pDXVAHD = NULL; // DXVA-HD device.IDXVAHD_VideoProcessor *g_pDXVAVP = NULL; // DXVA-HD video processor.IDirect3DSurface9 *g_pSurface = NULL; // Video surface. const D3DFORMAT RENDER_TARGET_FORMAT = D3DFMT_X8R8G8B8;const D3DFORMAT VIDEO_FORMAT = D3DFMT_X8R8G8B8; const UINT VIDEO_FPS = 60;const UINT VIDEO_WIDTH = 640;const UINT VIDEO_HEIGHT = 480;

To create a DXVA-HD video processor:

1. Fill in a DXVAHD_CONTENT_DESC structure with a description of the video content. The driver uses this information as a hint to optimize the capabilities of the video processor. The structure does not contain a complete format description.

2.3. DXVAHD_RATIONAL fps = { VIDEO_FPS, 1 }; 4.5. DXVAHD_CONTENT_DESC desc;6.7. desc.InputFrameFormat = DXVAHD_FRAME_FORMAT_PROGRESSIVE;8. desc.InputFrameRate = fps;9. desc.InputWidth = VIDEO_WIDTH;

Page 58: Media Foundation

10. desc.InputHeight = VIDEO_HEIGHT;11. desc.OutputFrameRate = fps;12. desc.OutputWidth = VIDEO_WIDTH;13. desc.OutputHeight = VIDEO_HEIGHT;14.15.

16. Call DXVAHD_CreateDevice to create the DXVA-HD device. This function returns a pointer to the IDXVAHD_Deviceinterface.

17.18. hr = DXVAHD_CreateDevice(g_pD3DDevice, &desc,

DXVAHD_DEVICE_USAGE_PLAYBACK_NORMAL,19. NULL, &pDXVAHD);20.21.

22. Call IDXVAHD_Device::GetVideoProcessorDeviceCaps. This method fills in a DXVAHD_VPDEVCAPS structure with the device capabilities. If you require specific video processing features, such as luma keying or image filtering, check their availability by using this structure.

23.24. DXVAHD_VPDEVCAPS caps;25.26. hr = pDXVAHD->GetVideoProcessorDeviceCaps(&caps);27.28.

29. Check whether the DXVA-HD device supports the input video formats that you require. The section Checking Supported Input Formats describes this step in more detail.

30. Check whether the DXVA-HD device supports the output format that you require. The section Checking Supported Output Formats describes this step in more detail.

31. Allocate an array of DXVAHD_VPCAPS structures. The number of array elements that must be allocated is given by theVideoProcessorCount member of the DXVAHD_VPDEVCAPS structure, obtained in step 3.

32.33. // Create the array of video processor caps. 34. 35. DXVAHD_VPCAPS *pVPCaps = 36. new (std::nothrow) DXVAHD_VPCAPS[ caps.VideoProcessorCount ];37.38. if (pVPCaps == NULL)39. {40. return E_OUTOFMEMORY;41. }42.43.

44. Each DXVAHD_VPCAPS structure represents a distinct video processor. You can loop through this array to discover the capabilities of each video processor. The structure includes information about the deinterlacing, telecine, and frame-rate conversion capabilities of the video processor.

45. Select a video processor to create. The VPGuid member of the DXVAHD_VPCAPS structure contains a GUID that uniquely identifies the video processor. Pass this GUID to the IDXVAHD_Device::CreateVideoProcessor method. The method returns an IDXVAHD_VideoProcessor pointer.

46.47. HRESULT hr = pDXVAHD->GetVideoProcessorCaps(48. caps.VideoProcessorCount, pVPCaps);49.50.

51. Optionally, call IDXVAHD_Device::CreateVideoSurface to create an array of input video surfaces. For more information, see Creating Video Surfaces.

Page 59: Media Foundation

The following code example shows the complete sequence of steps:

// Initializes the DXVA-HD video processor.

// NOTE: The following example makes some simplifying assumptions://// 1. There is a single input stream.// 2. The input frame rate matches the output frame rate.// 3. No advanced DXVA-HD features are needed, such as luma keying or IVTC.// 4. The application uses a single input video surface.

HRESULT InitializeDXVAHD(){ if (g_pD3DDevice == NULL) { return E_FAIL; }

HRESULT hr = S_OK;

IDXVAHD_Device *pDXVAHD = NULL; IDXVAHD_VideoProcessor *pDXVAVP = NULL; IDirect3DSurface9 *pSurf = NULL;

DXVAHD_RATIONAL fps = { VIDEO_FPS, 1 };

DXVAHD_CONTENT_DESC desc;

desc.InputFrameFormat = DXVAHD_FRAME_FORMAT_PROGRESSIVE; desc.InputFrameRate = fps; desc.InputWidth = VIDEO_WIDTH; desc.InputHeight = VIDEO_HEIGHT; desc.OutputFrameRate = fps; desc.OutputWidth = VIDEO_WIDTH; desc.OutputHeight = VIDEO_HEIGHT;

#ifdef USE_SOFTWARE_PLUGIN HMODULE hSWPlugin = LoadLibrary(L"C:\\dxvahdsw.dll");

PDXVAHDSW_Plugin pSWPlugin = (PDXVAHDSW_Plugin)GetProcAddress(hSWPlugin, "DXVAHDSW_Plugin");

hr = DXVAHD_CreateDevice(g_pD3DDevice, &desc,DXVAHD_DEVICE_USAGE_PLAYBACK_NORMAL, pSWPlugin, &pDXVAHD);#else hr = DXVAHD_CreateDevice(g_pD3DDevice, &desc, DXVAHD_DEVICE_USAGE_PLAYBACK_NORMAL, NULL, &pDXVAHD);#endif if (FAILED(hr)) { goto done; }

DXVAHD_VPDEVCAPS caps;

hr = pDXVAHD->GetVideoProcessorDeviceCaps(&caps);

Page 60: Media Foundation

if (FAILED(hr)) { goto done; }

// Check whether the device supports the input and output formats.

hr = CheckInputFormatSupport(pDXVAHD, caps, VIDEO_FORMAT); if (FAILED(hr)) { goto done; }

hr = CheckOutputFormatSupport(pDXVAHD, caps, RENDER_TARGET_FORMAT); if (FAILED(hr)) { goto done; }

// Create the VP device. hr = CreateVPDevice(pDXVAHD, caps, &pDXVAVP); if (FAILED(hr)) { goto done; }

// Create the video surface for the primary video stream. hr = pDXVAHD->CreateVideoSurface( VIDEO_WIDTH, VIDEO_HEIGHT, VIDEO_FORMAT, caps.InputPool, 0, // Usage DXVAHD_SURFACE_TYPE_VIDEO_INPUT, 1, // Number of surfaces to create &pSurf, // Array of surface pointers NULL );

if (FAILED(hr)) { goto done; }

g_pDXVAHD = pDXVAHD; g_pDXVAHD->AddRef();

g_pDXVAVP = pDXVAVP; g_pDXVAVP->AddRef();

g_pSurface = pSurf; g_pSurface->AddRef();

done: SafeRelease(&pDXVAHD); SafeRelease(&pDXVAVP); SafeRelease(&pSurf); return hr;

Page 61: Media Foundation

}

The CreateVPDevice function show in this example creates the video processor (steps 5–7):

// Creates a DXVA-HD video processor.

HRESULT CreateVPDevice( IDXVAHD_Device *pDXVAHD, const DXVAHD_VPDEVCAPS& caps, IDXVAHD_VideoProcessor **ppDXVAVP ){ // Create the array of video processor caps. DXVAHD_VPCAPS *pVPCaps = new (std::nothrow) DXVAHD_VPCAPS[ caps.VideoProcessorCount ];

if (pVPCaps == NULL) { return E_OUTOFMEMORY; }

HRESULT hr = pDXVAHD->GetVideoProcessorCaps( caps.VideoProcessorCount, pVPCaps);

// At this point, an application could loop through the array and examine // the capabilities. For purposes of this example, however, we simply // create the first video processor in the list.

if (SUCCEEDED(hr)) { // The VPGuid member contains the GUID that identifies the video // processor.

hr = pDXVAHD->CreateVideoProcessor(&pVPCaps[0].VPGuid, ppDXVAVP); }

delete [] pVPCaps; return hr;}

Checking Supported DXVA-HD FormatsChecking Supported Input FormatsTo get a list of the input formats that the Microsoft DirectX Video Acceleration High Definition

(DXVA-HD) device supports, do the following:

1. Call IDXVAHD_Device::GetVideoProcessorDeviceCaps to get the device capabilities.2. Check the InputFormatCount member of the DXVAHD_VPDEVCAPS structure. This

member gives the number of supported input formats.3. Allocate an array of D3DFORMAT values, of size InputFormatCount.4. Pass this array to the IDXVAHD_Device::GetVideoProcessorInputFormats method. The

methods fills the array with a list of input formats.

The following code shows these steps:

Page 62: Media Foundation

// Checks whether a DXVA-HD device supports a specified input format.

HRESULT CheckInputFormatSupport( IDXVAHD_Device *pDXVAHD, const DXVAHD_VPDEVCAPS& caps, D3DFORMAT d3dformat ){ D3DFORMAT *pFormats = new (std::nothrow) D3DFORMAT[ caps.InputFormatCount ]; if (pFormats == NULL) { return E_OUTOFMEMORY; }

HRESULT hr = pDXVAHD->GetVideoProcessorInputFormats( caps.InputFormatCount, pFormats );

if (FAILED(hr)) { goto done; }

UINT index; for (index = 0; index < caps.InputFormatCount; index++) { if (pFormats[index] == d3dformat) { break; } } if (index == caps.InputFormatCount) { hr = E_FAIL; }

done: delete [] pFormats; return hr;}

Checking Supported Output FormatsTo get a list of the output formats that the DXVA-HD device supports, do the following:

1. Call IDXVAHD_Device::GetVideoProcessorDeviceCaps to get the device capabilities.2. Check the OutputFormatCount member of the DXVAHD_VPDEVCAPS structure. This

member gives the number of supported input formats.3. Allocate an array of D3DFORMAT values, of size OutputFormatCount.4. Pass this array to the IDXVAHD_Device::GetVideoProcessorOutputFormats method.

The methods fills the array with a list of output formats.

The following code shows these steps:

// Checks whether a DXVA-HD device supports a specified output format.

Page 63: Media Foundation

HRESULT CheckOutputFormatSupport( IDXVAHD_Device *pDXVAHD, const DXVAHD_VPDEVCAPS& caps, D3DFORMAT d3dformat ){ D3DFORMAT *pFormats = new (std::nothrow) D3DFORMAT[caps.OutputFormatCount]; if (pFormats == NULL) { return E_OUTOFMEMORY; }

HRESULT hr = pDXVAHD->GetVideoProcessorOutputFormats( caps.OutputFormatCount, pFormats );

if (FAILED(hr)) { goto done; }

UINT index; for (index = 0; index < caps.OutputFormatCount; index++) { if (pFormats[index] == d3dformat) { break; } } if (index == caps.OutputFormatCount) { hr = E_FAIL; }

done: delete [] pFormats; return hr;}

Creating DXVA-HD Video SurfacesThe application must create one or more Direct3D surfaces to use for the input frames. These must

be allocated in the memory pool specified by the InputPool member of

the DXVAHD_VPDEVCAPS structure. The following surface types can be used:

A video surface created by calling IDXVAHD_Device::CreateVideoSurface and specifying theDXVAHD_SURFACE_TYPE_VIDEO_INPUT or DXVAHD_SURFACE_TYPE_VIDEO_INPUT_PRIVATE surface type. This surface type is equivalent to an off-screen plain surface.

A decoder render-target surface, created by calling IDirectXVideoAccelerationService::CreateSurface and specifying the DXVA2_VideoDecoderRenderTarget surface type. This surface type is used for DXVA decoding.

An off-screen plain surface.

The following code shows how to allocate a video surface, using CreateVideoSurface:

Page 64: Media Foundation

// Create the video surface for the primary video stream. hr = pDXVAHD->CreateVideoSurface( VIDEO_WIDTH, VIDEO_HEIGHT, VIDEO_FORMAT, caps.InputPool, 0, // Usage DXVAHD_SURFACE_TYPE_VIDEO_INPUT, 1, // Number of surfaces to create &pSurf, // Array of surface pointers NULL );

Setting DXVA-HD States

During video processing, the Microsoft DirectX Video Acceleration High Definition (DXVA-HD)

device maintains a persistent state from one frame to the next. Each state has a documented

default. After you configure the device, set any states that you wish to change from their defaults.

Before you process each frame, update any states that should change.

Note  This design differs from DXVA-VP. In DXVA-VP, the application must specify all of the VP

parameters with each frame.

Device states fall into two categories:

Stream states apply each input stream separately. You can apply different settings to each stream.

Blit states apply globally to the entire video processing blit.

The following stream states are defined.

Stream State Description

DXVAHD_STREAM_STATE_D3DFORMAT Input video format.

DXVAHD_STREAM_STATE_FRAME_FORMAT

Interlacing.

DXVAHD_STREAM_STATE_INPUT_COLOR_SPACE

Input color space. This state specifies the RGB color range and the YCbCr transfer matrix for the input stream.

DXVAHD_STREAM_STATE_OUTPUT_RATE Output frame rate. This state controls frame-rate conversion.

DXVAHD_STREAM_STATE_SOURCE_RECT Source rectangle.

DXVAHD_STREAM_STATE_DESTINATION_ Destination rectangle.

Page 65: Media Foundation

RECT

DXVAHD_STREAM_STATE_ALPHA Planar alpha.

DXVAHD_STREAM_STATE_PALETTE Color palette. This state applies only to palettized input formats.

DXVAHD_STREAM_STATE_LUMA_KEY Luma key.

DXVAHD_STREAM_STATE_ASPECT_RATIO

Pixel aspect ratio.

DXVAHD_STREAM_STATE_FILTER_Xxxx Image filter settings. The driver can support brightness, contrast, and other image filters.

 

The following blit states are defined:

Blit State Description

DXVAHD_BLT_STATE_TARGET_RECT Target rectangle.

DXVAHD_BLT_STATE_BACKGROUND_COLOR

Background color.

DXVAHD_BLT_STATE_OUTPUT_COLOR_SPACE

Output color space.

DXVAHD_BLT_STATE_ALPHA_FILL Alpha fill mode.

DXVAHD_BLT_STATE_CONSTRICTION Constriction. This state controls whether the device downsamples the output.

 

To set a stream state, call

the IDXVAHD_VideoProcessor::SetVideoProcessStreamState method. To set a blit state, call

theIDXVAHD_VideoProcessor::SetVideoProcessBltState method. In both of these methods,

an enumeration value specifies the state to set. The state data is given using a state-specific data

structure, which the application casts to a void* type.

The following code example sets the input format and destination rectangle for stream 0, and sets

the background color to black.

Page 66: Media Foundation

HRESULT SetDXVAHDStates(HWND hwnd, D3DFORMAT inputFormat){ // Set the initial stream states.

// Set the format of the input stream

DXVAHD_STREAM_STATE_D3DFORMAT_DATA d3dformat = { inputFormat };

HRESULT hr = g_pDXVAVP->SetVideoProcessStreamState( 0, // Stream index DXVAHD_STREAM_STATE_D3DFORMAT, sizeof(d3dformat), &d3dformat );

if (SUCCEEDED(hr)) { // For this example, the input stream contains progressive frames.

DXVAHD_STREAM_STATE_FRAME_FORMAT_DATA frame_format = { DXVAHD_FRAME_FORMAT_PROGRESSIVE }; hr = g_pDXVAVP->SetVideoProcessStreamState( 0, // Stream index DXVAHD_STREAM_STATE_FRAME_FORMAT, sizeof(frame_format), &frame_format ); }

if (SUCCEEDED(hr)) { // Compute the letterbox area.

RECT rcDest; GetClientRect(hwnd, &rcDest);

RECT rcSrc; SetRect(&rcSrc, 0, 0, VIDEO_WIDTH, VIDEO_HEIGHT);

rcDest = LetterBoxRect(rcSrc, rcDest);

// Set the destination rectangle, so the frame is displayed within the // letterbox area. Otherwise, the frame is stretched to cover the // entire surface.

DXVAHD_STREAM_STATE_DESTINATION_RECT_DATA DstRect = { TRUE, rcDest };

hr = g_pDXVAVP->SetVideoProcessStreamState( 0, // Stream index DXVAHD_STREAM_STATE_DESTINATION_RECT, sizeof(DstRect), &DstRect ); }

Page 67: Media Foundation

if (SUCCEEDED(hr)) { DXVAHD_COLOR_RGBA rgbBackground = { 0.0f, 0.0f, 0.0f, 1.0f }; // RGBA

DXVAHD_BLT_STATE_BACKGROUND_COLOR_DATA background = { FALSE, rgbBackground };

hr = g_pDXVAVP->SetVideoProcessBltState( DXVAHD_BLT_STATE_BACKGROUND_COLOR, sizeof (background), &background ); }

return hr;}

Performing the DXVA-HD Blit

BOOL ProcessVideoFrame(HWND hwnd, UINT frameNumber){ if (!g_pD3D || !g_pDXVAVP) { return FALSE; }

RECT client; GetClientRect(hwnd, &client);

if (IsRectEmpty(&client)) { return TRUE; }

// Check the current status of D3D9 device. HRESULT hr = TestCooperativeLevel();

switch (hr) { case D3D_OK : break;

case D3DERR_DEVICELOST : return TRUE;

case D3DERR_DEVICENOTRESET : return FALSE; break;

default : return FALSE; }

IDirect3DSurface9 *pRT = NULL; // Render target

DXVAHD_STREAM_DATA stream_data = { 0 };

Page 68: Media Foundation

// Get the render-target surface. hr = g_pD3DDevice->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &pRT); if (FAILED(hr)) { goto done; }

// Initialize the stream data structures for the primary video stream // and the substream.

stream_data.Enable = TRUE; stream_data.OutputIndex = 0; stream_data.InputFrameOrField = 0; stream_data.pInputSurface = g_pSurface; // Perform the blit. hr = g_pDXVAVP->VideoProcessBltHD( pRT, frameNumber, 1, &stream_data );

if (FAILED(hr)) { goto done; }

// Present the frame. hr = g_pD3DDevice->Present(NULL, NULL, NULL, NULL);

done: SafeRelease(&pRT); return SUCCEEDED(hr);}