87
SMAA: Enhanced Subpixel Morphological Anti-Aliasing Jorge Jimenez 1 Jose I. Echevarria 1 Tiago Sousa 2 Diego Gutierrez 1 1 Universidad de Zaragoza 2 Crytek

SMAA Enhanced Subpixel Morphological Antialiasing

Embed Size (px)

Citation preview

2010

SMAA: Enhanced Subpixel Morphological Anti-AliasingJorge Jimenez1 Jose I. Echevarria1 Tiago Sousa2 Diego Gutierrez1

1 Universidad de Zaragoza 2 Crytek

Hi,

Thanks for the presentation!

My talk is about real-time postprocessing antialiasing.1

For more than a decade, supersampling and multisampling have been

the gold standard solutions in real-time applicationsand video games.

In the last year, a myriad of techniques have appeared that exploit the morphological appearance of aliasing to smartly blur the images,

which is much faster but still doesnt have the quality of the previous techniques.

In this talk I will present SMAA or Subpixel Morphological Antialiasing

and show how to leverage the advantages of classical supersampling solutions

in an improved version of morphological antialiasing.2The Team

Jose I. Echevarria

Universidad de ZaragozaTiago Sousa

Crytek

Diego Gutierrez

Universidad de ZaragozaJorge Jimenez

Universidad de Zaragoza

But, first of all, here you have all the team members that made this technique possible.3

Practical Morphological Anti-Aliasing (Jimenezs MLAA)In GPU Pro 2: Advanced Rendering Techniques

This work is the evolution of Practical Morphological Antialiasing, also known as Jimenezs MLAA,

which enabled the CPU-based MLAA algorithm developed by Intel, to efficiently run in a GPU.

Id like to give a quick review of the core ideas behind this technique, before continuing with the new one.

And to avoid any confusion, while Im talking about our older MLAA approach, this icon4

Practical Morphological Anti-Aliasing (Jimenezs MLAA)In GPU Pro 2: Advanced Rendering Techniques

will be on the top right corner of the screen.5Key IdeasTranslate MLAA to use simple texturesUse pre-computed textures:Avoid dynamic branchingAvoid calculating areas on the flyLeverage bilinear filtering to the limitShare calculations between pixels (pixels share edges!)Mask operations by using the stencil buffer

So, lets begin with the key, high-level ideas of our technique.

And for this, lets forget about this boring slide

and begin with what the original CPU based approach does,

and how we replace each component into a more GPU-friendly form.6MLAA High-Level Algorithm

So, say we want to antialias this image.7MLAA High-Level Algorithm

For this we have to figure out the blue line,

which represents the revectorization of this pixel pattern.8MLAA High-Level Algorithm

Using this revectorization, we will fill the areas under the line using the opposite color at each pixel.

So, in the left we fill with black, and on the right, we fill with white.

This is then translated to9MLAA High-Level Algorithm

gray levels, which approximate the real shape.10MLAA High-Level Algorithm

But, ok, lets rewind.

The first step is detecting where the edges are, which are the lines marked on green.11MLAA High-Level Algorithm

And now, we have to search for the line ends to the left and to the right.12MLAA High-Level Algorithm

And, obtain the crossing edges at each side of the line.13MLAA High-Level Algorithm

With the distances and crossing edges at hand, we have enough information for calculating the areas under the revectorized line.

Easy, isnt it?14ProblemsSearching for ends is slowFetching for crossing edges is costlyRevectorization is branchy: 24 =16 cases!Area calculation is not cheapUp to 4 lines per pixel!

SearchesCrossing EdgesBut the beauty doesnt come without problems:

Fetching edges for the searches and crossing edges is slow,

as it requires lots of memory bandwidth.15ProblemsSearching for ends is slowFetching for crossing edges is costlyRevectorization is branchy: 24 =16 cases!Area calculation is not cheapUp to 4 lines per pixel!

And even with the crossing edges at hand,

the revectorization is not trivial given the high number of possible patterns.16ProblemsSearching for ends is slowFetching for crossing edges is costlyRevectorization is branchy: 24 =16 cases!Area calculation is not cheapUp to 4 lines per pixel!

Furthermore, we have to repeat these calculations up to four times per pixel,

one per boundary,

which introduces a very huge performance penalty.

So, how can we, solve these problems, without reducing the quality of MLAA?17SolutionsSearching for line ends is slowFetching for crossing edges is costlySolution: introduce bilinear filtering to post processing antialiasing This allows to fetch multiple values in a single access!

To improve the searches for line ends and the fetching of crossing edges,

we introduce bilinear filtering to accelerate post processing antialiasing.

This allows to fetch multiple values in a single access.18SolutionsCalculating the revectorization is not easy nor fast Accurate area calculation is not cheapSolution: avoid branchy code by using a precomputed texture

distancescrossing edgesareaThen, for easing the revectorization and area calculation

we built a texture which takes as input the distances and crossing edges

and outputs the area under the line.

This transforms the whole branchy code to discern between the sixteen cases,

and the area calculation

into a single texture access.19SolutionsUp to 4 lines can pass through a pixelSolution: stupid observation; pixels share edges, dont repeat calculations!

While its true that four lines can pass through a pixel, they are shared with the neighbors.

So, instead of searching for the four lines, we search just for the top and left lines,

Then, we calculate the corresponding areas, and store them into a temporal buffer.

This allows to share this information with the neighbors, at the cost of introducing another pass. 20New Problem!We now require three full-screen passesSolution: use the stencil buffer!

But, now we got a new problem: we require three fullscreen passes.

However, the solution is easy:

[click]

Mask pixels that need processing in the first pass using the stencil buffer.21Workflow

Original ImageEdges TextureBlending Weights TextureAntialiased ImageOk, so we finished with the high level idea.

Now, lets dive into the details of our implementation.

Here you have the big picture of our technique,

which consists on three passes.22Workflow

Original ImageEdges TextureBlending Weights TextureAntialiased Image1stIn the first pass, we perform edge detection, obtaining the edges texture.23Workflow

Original ImageEdges TextureBlending Weights TextureAntialiased Image1st2ndIn the second pass, we process each edge, calculating the revectorizations and obtaining the corresponding areas. 24Workflow

Original ImageEdges TextureBlending Weights TextureAntialiased Image3rd1st2ndAnd in the third and final pass,

we blend each pixel with its four-neighborhood,

using the areas from the second pass.

Im going to skip the edge detection step and go directly to the second one,

which has a more interesting implementation.25Blending Weights Calculation2nd PassIt consists on three parts:

Searching for distances dleft and dright to the end of current lineFetching crossing edges eleft and erightCalculating the coverage a of this pixel, using d and e

2ndSo, in this second pass we want to calculate the areas under the revectorized line.

We have to search for the ends of the line,

to fetch the crossing edges and

then use this information to calculate the coverage area of this pixel.26Searching for DistancesDone by exploiting bilinear filtering:

2ndSearching line ends is a memory-intensive task.

So, to improve the bandwidth usage, we leverage the fact that bilinear filtering is free on most platforms.

In this image you have the edge marked on blue, with the colors of the dots representing the values of the edge buffer.

So, starting from here [click], we are going to jump two pixels at time, just between them.27Searching for DistancesDone by exploiting bilinear filtering:

2ndThis rhombus represent the first fetch.

Bilinear filtering returns one, so there is an edge in both pixels.28Searching for DistancesDone by exploiting bilinear filtering:

2ndThe same here29Searching for DistancesDone by exploiting bilinear filtering:

2ndBut, in this fetch we obtain 0.5, which means that one of the edges is not active, and thus the search has finished.

By using bilinear filtering in this way, we are able to accelerate searches by a factor of two.30Fetching crossing edges

2nd

Once we have the distances to the ends of the line,

we use them to obtain the crossing edges.

And a nave approach for fetching them would imply querying the four edges.31Fetching crossing edgesAgain, done by exploiting bilinear filtering

0.5

0.5

2ndInstead, a more efficient approach is to use bilinear filtering for fetching two edges at a time, similar to the distance search.

But, now there is a little problem.

When the returned value is 0.5, we are dealing with this case [click]

or this other one [click]?32Fetching crossing edgesSolution: offset the coordinates!

0.00.250.751.0

2ndThe solution is to offset the query by 0.25 allowing to distinguish them,

as the returned value is different in each case.33Calculating the coverageWe use a pre-computed texture to avoid:

Dynamic branchingExpensive area calculations

2ndA key contribution in the coverage calculation is that,

instead of handling sixteen different patterns,

we precompute them in a 4D texture [click], which avoids branching code.34Neighborhood Blending3rd PassWe blend with the neighborhood using the areas calculated in previous pass

3rd

cnew = (1 - a) cold + a coppFinally, we want to blend pixel on edges using the areas calculated on the previous pass.

For example, for the case of the pixel c_old, we got the area a from the edge on the bottom.

The blending required [click] is similar to 1D bilinear filtering.35Neighborhood Blending3rd PassWe leverage bilinear filtering (yet again):

cnew = (1 - a) cold + a copp

3rdSo, we again leverage bilinear filtering, using it to implement the blending equation.36Practical Morphological Anti-Aliasing (also known as Jimenezs MLAA)In GPU Pro 2: Advanced Rendering Techniques

Filtering Approaches for Real-TimeAnti-AliasingSIGGRAPH 2011 Course

Ok, so, up to this point, we have covered our previous morphological approach.

For more details I refer you to the GPU Pro 2 book or the SIGGRAPH 2011 course.37Notice the icon on the top right corner is now gone,

which means38SMAAEnhanced Subpixel Morphological Antialiasing

that Im going to dive into the details of what we called

Subpixel Morphological Antialiasing.39

MLAA PipelineIll quickly review the original pipeline, and then explain how we improved each of these pieces.40

MLAA PipelineFirst we have the input image.41

MLAA PipelineSecond, the predefined patterns of the original algorithm.42

MLAA PipelineThird, we have the detected edges.43

MLAA PipelinevThe calculated coverage areas.44

MLAA PipelineAnd the final blended image.45

Revamped PipelineSharp Geometric Features and Diagonal HandlingOur algorithm improves the whole pipeline by extending (b) and (c)

for sharp geometric features and diagonals handling.46

Revamped PipelineLocal Contrast AdaptationLocal contrast adaptation removes spurious edges in (d).47

Revamped PipelinevExtended Patterns DetectionExtended patterns detection and accurate searches improve accuracy in (e).48

Revamped PipelineSubpixel RenderingAnd finally, we handle additional samples in (f) for accurate subpixel features and temporal supersampling.49Take neighborhood lumas into account

Local Contrast AdaptationThe first feature we introduced is what we call local contrast adaptation.

A big drawback of MLAA approaches is that they usually consider edges binary: they are either on or off.

Sometimes this can be an issue, because the actual strength of an edge is important.

For example, in this image we can see there are gradients in the silhouette of this object50Take neighborhood lumas into account

Local Contrast Adaptationwhich confuse MLAAs search scheme51Take neighborhood lumas into account

Local Contrast Adaptationas it will see crossing edges that will make it stop earlier than desired.52Take neighborhood lumas into account

Local Contrast AdaptationHowever, perceptually we ignore these crossing edges

because the contrast with the white background is much, much, higher.

So, what we did is to mimic our visual system

and ignore edges which have low contrast with respect to the neighbors.53Local Contrast Adaptation

Notice how, in this case, the dominant edges in green should mask the edges in red.54Local Contrast Adaptation

Not taking the local contrast into account obviously leads to artifacts.55Local Contrast Adaptation

To solve them, we check if the edge contrast is higher than the maximum edge in the neighborhood.

For example, for the left edge [click], in orange, we compare it with the neighbors marked on blue.56Sharp Geometric FeaturesImproves pattern handlingSharp geometric features detection (good for text!)

InputSMAAMLAA

We also extended the patterns handled by the algorithm.

The first new pattern handled are sharp geometric features, which allows to avoid the general roundness introduced by MLAA,

which also enables better text handling.57Sharp Geometric Features

For this image, MLAA would reconstruct the blue line, which is rounding the left corner.58Sharp Geometric Features

But, what we really want to do is to preserve the corner as is.59Sharp Geometric Features

Unfortunately, sampling one-pixel sized crossing edges doesnt allow to discern between corners [click]

and the staircase steps of a line [click].60Sharp Geometric Features

The good news are that the solution is very simple:

Just sample longer crossing edges,

and modify the resulting areas in the case of corners.61DiagonalsImproves pattern handling

MLAASMAA S2xSSAA 16xThe second new pattern recognized are diagonals

Notice the absence of jaggies in the SMAA case,

which more closely resembles the supersampling reference.62Diagonals

MLAASMAAFor this, we perform an additional diagonal search,

and apply different areas if a diagonal pattern is found.

Notice how our approach produces a perfect straight line in the case of diagonal lines.63Diagonals

MLAASMAAThe result of this search is,

as in the orthogonal line case,

the distance to the ends of the line and the crossing edges64Diagonals

which are used to look up a texture similar to the orthogonal line case.65

Accurate SearchesOur simplified search scheme, while very fast, sometimes introduced artifacts in form of dithering.

This is, due to the fact that, as we sample in the middle between pixels [click],

the last step is ambiguous.66

Accurate SearchesFurthermore, a crossing edge on the middle of a line wont stop the searches,

which is not the desirable behavior.67Accurate Searches

So, again, this happens because in our original approach we cannot distinguish the individual values of each edge.68Accurate Searches

To solve this issue, we offset the sample position in such a way

that we can fetch two edges and two crossing edges

with a single texture access.

I refer you to the paper for the details about how we decode

the single bilinear filtered value into the four original values.69

Temporal AAGoodsubpixel features!Subpixel Rendering

MLAAGreat gradients!

MSAAGreat subpixel features!And now to the probably most important part of this work: Subpixel Rendering

Some months ago we were thinking:

Morphological Antialiasing is good [click]

Temporal Antialiasing is also a good thing [click]

And multisampling is very good [click]

So, why not combine them into a single technique?70

Temporal AAGoodsubpixel features!Subpixel Rendering

MLAAGreat gradients!

MSAAGreat subpixel features!

But, well its not as easy as it sounds.71Subpixel Rendering

MLAA(Reference)

MSAA 2xPost-ResolveMLAAIf you try to apply morphological antialiasing after resolving,

it wont be able to generate proper gradients,

given the edges are now smoother and harder to detect.72

Subpixel RenderingMLAAMSAA 8x(Reference)Pre-ResolveMLAAAnd if you try to apply it before resolving,

its an improvement over MLAA but

it is still not that good, as there is too much blur.73Subpixel Rendering

By offsetting the revectorizations to match the subpixel positions,

we managed to obtain much better results.74

Notice how our technique, on the left, allows for a more accurate area calculation,

when compared with a nave application of MLAA on each subsample, on the right.75Subpixel RenderingBetter fallbacks: everything is processedZones with low contrast are antialiased by multi/supersampling

MLAASMAA S2xSSAA 16xBeing composed of three components, when one of them fails, the other two serve as fallback.

In this example, we can see how low contrast areas,

ignored by MLAA,

are handled by our multisampling component.76

This movie shows the subpixel rendering capabilities of SMAA in a synthetic environment.

Notice the smoother temporal results, when compared with pure postprocessing techniques like MLAA or FXAA.

Note the smoother subpixel motion, with less wave-like distortions.

And finally, see how we better handle shading aliasing.77Temporal Supersampling

One of the key components of our technique is temporal supersampling.

A nave approach is to simply average current frame with the previous one.

But unfortunately, this results in very noticeable ghosting.78Temporal Supersampling

By using classical temporal reprojection,

we managed to remove the blurriness to a certain degree. 79Temporal Supersampling

But it was not until we applied a novel velocity weighting

that we managed to almost completely preserve

the sharpness of the image.80Velocity Weighting

Velocity weight calculation: Blending current frame cc with previous frame cp:The idea behind velocity weighting is to blend with the previous frame

using a variable blend factor which depends on the velocity difference

between current

and previous frame.81

In this movie, we can see how a nave approach results in ghosting

Adding temporal reprojection improves the situation

And see how velocity weighting manages to better preserve the sharpness of the image.82

So, these have been the improvements we introduced.

And

We encourage you to download the paper for very exhaustive details83

including performance and features comparisons with a selection of recent postprocessing antialiasing techniques.84Whats under the hoodWhats under the hoodThe fast and accurate distance searchesThe local contrast awarenessSpecific patterns tuningCalculating the four possible lines that can cross a pixel, and smartly average them

We believe the key component of our technique

is the heuristics we are using for determining which is the best pattern revectorization for a pixel,

and to stick with it over time.85SMAA: Subpixel Morphological AntialiasingModularSMAA 1ximproved pattern handlingSMAA T2x1x + temporal supersamplingSMAA S2x1x + spatial multisamplingSMAA 4x1x + 2x temporal supersampling + 2x spatial multisampling

Performance of SMAA 4x (on a GTX 580)1.06 [email protected] ms@720pboth not taking into account 2x render overhead

The technique is modular, so you can turn on features selectively, easily adapting it to the available budget.

And in fact, most of the improvements just add a few lines to our previous MLAA shader.86SMAA: Subpixel Morphological AntialiasingModularSMAA 1ximproved pattern handlingSMAA T2x1x + temporal supersamplingSMAA S2x1x + spatial multisamplingSMAA 4x1x + 2x temporal supersampling + 2x spatial multisampling

Performance of SMAA 4x (on a GTX 580)1.06 [email protected] ms@720pboth not taking into account 2x render overhead

And finally, the performance of the highest quality profile, runs in 1 millisecond for a 720p buffer.87

Visit our project page:http://www.iryoku.com/smaa

Github page:https://github.com/iryoku/smaa/

Thanks to:Stephen HillJean-Francois St-AmourJohan AnderssonAlex FryNaty HoffmanCarsten WenzelNick KasyanPierre-Yves DonzallazMichael KopietzVisit us!So, thank you for your attention, and do not forget to visit us :-D88