67
Optimizing Performance in Qt Applications 11/16/09

Optimizing Performance in Qt-Based Applications

Embed Size (px)

DESCRIPTION

Performance is a key component of usability and crucial for the user experience, especially in today's modern user interfaces where graphical elements are being animated and transitioned. Bringing Qt Everywhere means a significant need for speed across desktop and embedded platforms. This presentation will give you a brief overview of performance improvements done in Qt, and will be highly interactive with hands-on sessions on how to squeeze every last drop of performance out of your Qt application. Presentation by Bjørn Erik Nilsen held during Qt Developer Days 2009. http://qt.nokia.com/whatsnew

Citation preview

Page 1: Optimizing Performance in Qt-Based Applications

Optimizing Performance in Qt Applications

11/16/09

Page 2: Optimizing Performance in Qt-Based Applications

Introduction

• Bjørn Erik Nilsen

– Software Engineer / Qt Widget Team

– The architect behind Alien Widgets

– Rewrote the Backing Store for Qt 4.5

– One of the guys implementing

WidgetsOnGraphicsView

– Author of QMdiArea/QMdiSubWindow

– Author of QGraphicsEffect/QGraphicsEffectSource

2

Page 3: Optimizing Performance in Qt-Based Applications

Agenda

• Why Performance Matters

• Performance Improvements in Qt 4.6

• How You Can Improve Performance

3

Page 4: Optimizing Performance in Qt-Based Applications

Why Performance Matters

• Attractive to users

• Looks more professional

• Help you get things done more efficiently

• Keeps the flow

4

Page 5: Optimizing Performance in Qt-Based Applications

Why Performance Matters

• An example explains more than a thousand words

5

Page 6: Optimizing Performance in Qt-Based Applications

Why Performance Matters

• Performance is more important than ever before– Dynamic user interfaces

• Qt Everywhere– Desktop– Embedded platforms with limited hardware

• We cannot just buy better hardware anymore

• Clock speed vs. number of cores

6

Page 7: Optimizing Performance in Qt-Based Applications

Why Performance Matters

• Not all applications can take advantage of

multiple cores

• And some will actually run slower:

– Each core in the processor is slower

– Most applications not programmed to be multi-

threaded

• Multi-core crisis?

7

Page 8: Optimizing Performance in Qt-Based Applications

Agenda

• Why Performance Matters

• Performance Improvements in Qt 4.6

• How You Can Improve Performance

8

Page 9: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• We continuously strive to optimize the performance

– QWidget painting performance, for example:

– Qt 4.6 no exception!9

Page 10: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

10

QtOpenGL QtGui

QtCore

QtSvg

QtNetwork

QtScript

QtWebKit

QtOpenVG

Page 11: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• Graphics View

– New update mechanism

– New painting algorithm

– New scene indexing

– Reduced QTransform/QVariant/floating point

overhead

• QPixmapCache

– Extended with an int based API

11

QtGui

Page 12: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• Item Views– Item selection– Drag 'n' drop– QTableView and QHeaderView

• QTransform– fromTranslate/fromScale– mapRect for projective transforms

• QRegion– No longer a GDI object on Windows

12

QtGui

Page 13: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• QObject– Destruction– Connect and disconnect– Signal emission

• QVariant– Construction from float and pointers

• QIODevice– Less (re)allocations in readAll()

13

QtCore

Page 14: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• QNetworkAccessManager– HTTP back-end

• QHttpNetworkConnectionChannel– Pipelining HTTP requests (off by default)

• QHttpNetworkConnection– Increased the number of concurrent connections

• QLocalSocket– New Windows implementation– Major performance improvements

14

QtNetwork

Page 15: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• QtScript now uses JavaScriptCore as the back-end!– Still the same API, but with JSC performance

15

QtScript

Page 16: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• New OpenGL 2.x paint engine

• General improvements– Clipping– Text drawing

16

QtOpenGL

Page 17: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• New OpenVG paint engine– Uses Khronos EGL API– Configure Qt with “-openvg”

• Support for hardware-accelerated 2D vector graphics on:– Embedded, mobile and consumer electronic devices– Desktop

• More info: http://labs.trolltech.com/blogs

17

QtOpenVG

New module!

Page 18: Optimizing Performance in Qt-Based Applications

Performance Improvements in Qt 4.6

• Improved support for DirectFB– Enabling hardware graphics acceleration on

embedded platforms

• Maemo Harmattan optimizations

18

Embedded

Page 19: Optimizing Performance in Qt-Based Applications

Agenda

• Why Performance Matters

• Performance Improvements in Qt 4.6

• How You Can Improve Performance

19

Page 20: Optimizing Performance in Qt-Based Applications

How You Can Improve Performance

• Theory of Constraints (TOC) by Eliyahu M. Goldratt• The theory is based on the idea that in any complex

system, there is usually one aspect of that system that limits its ability to achieve its goal or optimal functioning. To achieve any significant improvement of the system, the constraint must be identified and resolved.

• Applications will perform as fast as their bottlenecks

20

Page 21: Optimizing Performance in Qt-Based Applications

Theory of Constraints

• Define a goal:– For example: This application must run at 30 FPS

• Then:1) Identify the constraint 2) Decide how to exploit the constraint3) Improve4) If goal not reached, go back to 1)5) Done

21

Page 22: Optimizing Performance in Qt-Based Applications

Identifying hot spots (1)

• The number one and most important task

• Make sure you have plausible data

• Don't randomly start looking for slow code paths!– An O(n2) algorithm isn't necessarily bad– Don't spend time on making it O(n log n) just for fun

• Don't spend time on optimizing bubble sort

22

Page 23: Optimizing Performance in Qt-Based Applications

Identifying hot spots (1)

• “Bottlenecks occur in

surprising places, so

don't try second guess

and put in a speed hack

until you have proven

that is where the

bottleneck is” -- Rob Pike

23

Page 24: Optimizing Performance in Qt-Based Applications

Identifying hot spots (1)

• The right approach for identifying hot spots:

– Any profiler suitable for your platform• Shark (Mac OSX)• Valgrind (X11)• Visual Studio Profiler (Windows)• Embedded Trace Macrocell (ETM) (ARM devices)

• NB! Always profile in release mode

24

Page 25: Optimizing Performance in Qt-Based Applications

Identifying hot spots (1)

• Run application: “valgrind --tool=callgrind ./application”

• This will collect data and information about the program

• Data saved to file: callgrind.out.<pid>

• Beware:– I/O costs won't show up– Cache misses (--simulate-cache=yes)

• The next step is to analyze the data/profile• Example

25

Page 26: Optimizing Performance in Qt-Based Applications

Identifying hot spots (1)

• Profiling a section of code (run with “–instr-atstart=no”):

26

#include<BbrValgrind/callgrind.h>

int myFunction() const{ CALLGRIND_START_INSTRUMENTATION; int number = 10; ... CALLGRIND_STOP_INSTRUMENTATION; CALLGRIND_DUMP_STATS;

return number;}

Page 27: Optimizing Performance in Qt-Based Applications

Identifying hot spots (1)

• When a hot-spot is identified:– Look at the code and ask yourself: Is this the right

algorithm for this task?

• Once the best algorithm is selected, you can exploit the

constraint

27

Page 28: Optimizing Performance in Qt-Based Applications

How to exploit the constraint (2)

• Optimize– Design level– Source code level– Compile level

• Optimization trade-offs:– Memory consumption, cache misses– Code clarity and conciseness

28

Page 29: Optimizing Performance in Qt-Based Applications

How to exploit the constraint (2)

• “Any intelligent fool can

make things bigger,

more complex, and more

violent. It takes a touch

of genius – and a lot of

courage – to move in the

opposite direction.”

--Einstein

29

Page 30: Optimizing Performance in Qt-Based Applications

How to exploit the constraint (2)

• Wouldn't it be great to have a cross-platform tool to

measure performance?

30

Page 31: Optimizing Performance in Qt-Based Applications

QTestLib

• Say hello to QBENCHMARK

• Extension to the QTestLib framework

• Cross-platform

• Straight forward: QBENCHMARK { <code here> }

• Code will then be measured based on– Walltime (default)– CPU tick counter (-tickcounter)– Valgrind/Callgrind (-callgrind)– Event counter (-eventcounter)

31

Page 32: Optimizing Performance in Qt-Based Applications

QTestLib

• Let's create a benchmark

• Run with ./mytest -xml -o results.xml

• git clone git://gitorious.org/qt-labs/qtestlib-tools.git

• Visualize with– Graph (generatereport results.xml)– BMCompare (bmcompare results1.xml results2.xml)

• Now that we have tool, it is easier to measure and

decide which algorithm to use

32

Page 33: Optimizing Performance in Qt-Based Applications

How to exploit the constraint (2)

• General tricks:– Caching– Delay a computation until the result is required– Reduce computation in tight loops– Compiler optimizations

• Optimization Techniques for Qt:– Choose the right container– Use implicit data sharing efficiently– Discover the magic flags

33

Page 34: Optimizing Performance in Qt-Based Applications

Implicit data sharing in Qt

• Maximize resource usage and minimize copying

34

Object 2ObjectData

Object 3

Object 1

Shallow copies

Object 0

Object obj0; // Creates ObjectData

// Copies (share the same data)Object obj1, obj2, obj3 = obj0;

Page 35: Optimizing Performance in Qt-Based Applications

Implicit data sharing in Qt

• Data is only copied if someone modifies it:

35

Object 2ObjectData

Object 3

Object 1

Shallow copies

Object 0

ObjectData

Deep copy

Page 36: Optimizing Performance in Qt-Based Applications

Implicit data sharing in Qt

• How to avoid deep-copy:– Only use const operators and functions if possible– Be careful with the foreach keyword

• For classes that are not implicitly shared:– Always pass them around as const references– Passing const references is a good habit in any case

• Examples

36

Page 37: Optimizing Performance in Qt-Based Applications

Implicit data sharing in Qt

37

T *readOnly = list[index];

Original

T *readOnly = list.at(index);

Optimized

QList<T>::iterator i;i = list.begin();

QList<T>::const_iterator i;i = list.constBegin();

foreach (QString s, strings) foreach (const QString &s, strings)

void foo(QTransform t); void foo(const QTransform &t);

NB! QTransform is not implicitly shared!

Page 38: Optimizing Performance in Qt-Based Applications

Implicit data sharing in Qt

• See the “Implicitly Shared Classes” documentation for a

complete list of implicitly shared classes in Qt

• http://doc.trolltech.com/4.6-snapshot/shared.html

• Note: All Qt containers are implicitly shared

38

Page 39: Optimizing Performance in Qt-Based Applications

Qt Containers

39

QList

QLinkedList

QStackQQueue

QSet

QMultiMap

QVector

QHash

QMultiHash

QMap

Page 40: Optimizing Performance in Qt-Based Applications

Qt Containers

40

QMultiMap

QHash

QMultiHash

QMap

Associative Containers

Page 41: Optimizing Performance in Qt-Based Applications

Qt Containers

41

QList

QLinkedList

QStackQQueue

QVector

Sequential Containers

QSet

Page 42: Optimizing Performance in Qt-Based Applications

Qt Containers

42

QList

QLinkedList

QStackQQueue

QVector

Sequential Containers

vs

Page 43: Optimizing Performance in Qt-Based Applications

QVector<T>

• Items are stored contiguously in memory

• One block of memory is allocated:

43

ref

QBasicAtomicInt

alloc size flags array[alloc - 1]array[0] ...

int uintint T T

QVectorTypedData<T>

d

QVector<T>

Page 44: Optimizing Performance in Qt-Based Applications

QVector<T>

• Reserves space at the end

• Growth strategy depends on the type T– Movable types: realloc by increments of 4096 bytes– Non-movable types: 50% increments

• What is a movable type?– Primitive types: bool, int, char, enums, pointers, …– Plain Old Data (POD) with no constructor/destructor– Basically everything that can be moved around in

memory using memcpy() or memmove()– Good article: http://www.ddj.com/cpp/184401508

44

Page 45: Optimizing Performance in Qt-Based Applications

Movable types

• User-defined classes are treated as non-movable by

default

• Oh no!

• Have no fear, Q_DECLARE_TYPEINFO is here

• You can tell Qt that your class is a:– Q_PRIMITIVE_TYPE: POD with no constr./destr.– Q_MOVABLE_TYPE: has constr./destr., but can be

moved in memory using memcpy()/memmove()

45

Page 46: Optimizing Performance in Qt-Based Applications

Movable types (Q_PRIMITIVE_TYPE)

46

struct Point2D{ int x; int y;};

Q_DECLARE_TYPEINFO(Point2D, Q_PRIMITIVE_TYPE);

Page 47: Optimizing Performance in Qt-Based Applications

Movable types (Q_MOVABLE_TYPE)

47

class Point2D{public: Point2D() { data = new int[2]; } Point2D(const Point2D &other) { … } ~Point2D() { delete [] data; }

Point2D &operator=(const Point2D &other) { … }

int x() const { return data[0]; } int y() const { return data[1]; }

private: int *data;};

Q_DECLARE_TYPEINFO(Point2D, Q_MOVABLE_TYPE);

Page 48: Optimizing Performance in Qt-Based Applications

QVector<T>

• Insertion in the middle:– Movable type: memmove()– Non-movable type: operator=()

48

0 1 2 3 4 5 6

1

0 2 3 4 5 6 7

0 1 2 3 4 5 6

1

Page 49: Optimizing Performance in Qt-Based Applications

QList<T>

• Two representations

• Array of pointers to items on the heap (general case)

49

ref

QBasicAtomicInt

alloc begin

int intint

QListData::Data

d

QList<T>

end flags

uint

array[alloc - 1]array[0] ...

void * void *

T T

Page 50: Optimizing Performance in Qt-Based Applications

QList<T>

• Special case: T is movable and sizeof(T) <= sizeof(void *)

• Items are stored directly (same as QVector)

50

ref

QBasicAtomicInt

alloc begin

int intint

QListData::Data

d

QList<T>

end flags

uint

array[alloc - 1]array[0] ...

T T

Page 51: Optimizing Performance in Qt-Based Applications

QList<T>

• Reserves space at the beginning and at the end

• Benefits of reserving space at the beginning– Prepending an item usually takes constant time– Removing the first item usually takes constant time– Faster insertion

51

Page 52: Optimizing Performance in Qt-Based Applications

QVector<T> vs. QList<T>

• QList expands to less code in the executable

• For most purposes, QList is the right class to use

• If all you do is append(), use QVector– Use reserve() if you know the size in advance– Also consider QVarLengthArray or plain C array

• When T is movable and sizeof(T) <= sizeof(void *)– Almost no difference, except that QList provides faster

insertions/removals in the first half of the list

• (Constant time insertions in the middle: Use QLinkedList)

52

Page 53: Optimizing Performance in Qt-Based Applications

General Qt Container Advices

• Avoid deep copies, e.g:– Use at() rather than operator[]– constData()/constBegin()/constEnd()– Basically: limit usage of non-const functions

• When you know the size in advance:– Use reserve()

• Let Qt know whether your class is movable or not– Q_DECLARE_TYPEINFO

• Choose the right container for the right circumstance

53

Page 54: Optimizing Performance in Qt-Based Applications

General Painting Optimizations

• Prefer QPixmap over QImage (if possible)– QPixmap is accelerated– QPixmap caches information about the pixels

• Avoid QPixmap/QImage::setAlphaChannel()– Use QPainter::setCompositionMode instead

• Avoid QPixmap/QImage::transformed()– Use QPainter::setWorldTransform instead

• If you for sure know the image has alpha:– Qt::NoOpaqueDetection (QPixmap::fromImage)

54

Page 55: Optimizing Performance in Qt-Based Applications

General Painting Optimizations

55

int width = image.width();int height = image.height();

for (int y = 0; y < height; ++y) { for (int x = 0; x < width; ++x) { QRgb pixel = image.pixel(x, y); … }}

Original

NB! Image is 32 bit

Page 56: Optimizing Performance in Qt-Based Applications

General Painting Optimizations

56

int width = image.width();int height = image.height();

for (int y = 0; y < height; ++y) { QRgb *line = reinterpret_cast<QRgb *>(image.scanLine(y)); for (int x = 0; x < width; ++x) { QRgb pixel = line[x]; … }}

Optimized

Page 57: Optimizing Performance in Qt-Based Applications

General Painting Optimizations

57

int numPixels = image.width() * image.height();QRgb *pixels = reinterpret_cast<QRgb *>(image.bits());

for (int i = 0; i < numPixels; ++i) {QRgb pixel = pixels[i];

… }}

Even more optimized

Page 58: Optimizing Performance in Qt-Based Applications

General Painting Optimizations

58

MyWidget::paintEvent(...){ QPainter painter(this); painter.fillRect(rect(), Qt::red);}

int main(int argc, char **argv){ ... MyWidget widget; ...}

Original Optimized

MyWidget::paintEvent(...){ QPainter painter(this); painter.fillRect(rect(), Qt::red);}

int main(int argc, char **argv){ ... MyWidget widget; widget.setAttribute( Qt::WA_OpaquePaintEvent); ...}

Page 59: Optimizing Performance in Qt-Based Applications

General Painting Optimizations

59

painter.drawLine(line1);painter.drawLine(line2);painter.drawLine(line3);

Original

QLine lines[3];...painter.drawLines(lines, 3);

Optimized

painter.drawPoint(point1);painter.drawPoint(point2);painter.drawPoint(point3);

QPoint points[3];...painter.drawPoints(points, 3);

QString key(“abcd”);QPixmapCache::insert(key, pm);QPixmapCache::find(key, pm);

QPixmapCache::Key key;key = QPixmapCache::insert(pm);pm = QPixmapCache::find(key);

Page 60: Optimizing Performance in Qt-Based Applications

Other Optimizations

60

const QString s = s1 + s2 + s3;

Original

#include <QStringBuilder>...const QString s = s1 % s2 % s3;

Optimized

QTransform xform = a.inverted();xform *= b.inverted();

QTransform xform = b;xform *= a;xform = xform.inverted();

foreach (const QString &s, slist) { if (s.size() < 5) continue; const QString m = s.mid(2, 3); if (m == magicString) doMagicStuff();}

#define QT_USE_FAST_CONCATENATION

#define QT_USE_FAST_OPERATOR_PLUS

foreach (const QString &s, slist) { if (s.size() < 5) continue; QStringRef m(&s, 2, 3); if (m == magicString) doMgicStuff();}

Page 61: Optimizing Performance in Qt-Based Applications

Other Optimizations

61

qFuzzyCompare(opacity+1, 1));

Original

qFuzzyIsNull(opacity));

Optimized

int nRects = qregion.rects().size(); int nRects = qregion.numRects();

#button1 { background:red }#button2 { background:red }

*[readOnly=”1”] { color:blue }

if (cheap() && expensive())if (expensive() && cheap())

#button1,#button2 { background:red }

/* Only QLineEdit can possibly be read-only in my application*/QLineEdit[readOnly = “1”]{ color:blue }

Page 62: Optimizing Performance in Qt-Based Applications

Graphics View Optimizations

• Viewport update modes

• Scene index– BSP tree index– No index

• Avoid QGraphicsScene::changed signal

• QGraphicsScene::setSceneRect

• Cache modes– Device coordinates– Item coordinates

• OpenGL viewport62

Page 63: Optimizing Performance in Qt-Based Applications

Platform Specific Optimizations

• Link time optimization LTCG (Windows only)– Approx. 10%-15% speedup– Configure Qt with “-ltcg”

• Don't use explicit double arithmetic– qreal is float on embedded (QWS)– 100 / 2.54 → 100 / qreal(2.54)

• It's time time to take advantage of what we have

learned

• Let's do some real optimizations!

63

Page 64: Optimizing Performance in Qt-Based Applications
Page 65: Optimizing Performance in Qt-Based Applications

Theory of Constraints

• Define a goal:– For example: This application must run at 30 FPS

• Then:1) Identify the constraint 2) Decide how to exploit the constraint3) Improve4) If goal not reached, go back to 1)5) Done

65

Page 66: Optimizing Performance in Qt-Based Applications

Agenda

• Why Performance Matters

• Performance Improvements in Qt 4.6

• How You Can Improve Performance

66

Page 67: Optimizing Performance in Qt-Based Applications

Questions?