24
1 Optimizing for the Modern Web on the ARM Architecture Evens Pan Strategic Software Alliance ARM

Optimizing for the Modern Web on the ARM Architecture

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

1

Optimizing for the Modern Web

on the ARM Architecture

Evens Pan

Strategic Software Alliance

ARM

2

Question…

What is the most popular Programing

Language?

Source: American art and American art collections;

essays on artistic subjects by the best art writers.

Volume 2. Boston, E.W. Walker & Co. 1889

Author Walter Montgomery

3

The 3 pillars of the modern Web

4 Billion Web Pages in the World today

Or is that 4 Billion Web Apps?

Creative Commons Attribution-Share Alike 3.0

Unported license. Author: Matthias Kabel

4

From Browser to OS Platform

Web Apps are now:

Offline first

Out of browser

Rich and immersive

>100,000 lines of JavaScript

Providing access to device peripherals

5

What Web App performance really means

Benchmarking is irrelevant!*

Neither Google or Mozilla care that much

What matters is end-user experience

Dropped frames are the currency of performance

https://wiki.mozilla.org/Project_Eideticker

http://jankfree.org/

* This is a systems integration perspective that assumes

you have done your best with the components already

6

Building the new Smartphone OS'

云OS

7

JavaScript: The Assembly Language for the Web

Java

Google GWT Compiler

JavaScript

Google Web Toolkit

C/C++

LLVM

JavaScript

Mozilla

emscripten

8

JavaScript Improvements

Single Page Apps (SPA) exceed 100,000 lines of JavaScript

Google Web Apps spend 50-70% of their time in V8

JavaScript Engine

ARM working on Google V8 JavaScript Engine for 3 years

2010: Cortex-A9 was 35% slower than Atom on V8 benchmark*

2013: Cortex-A9 is 25% faster than Atom on V8 benchmark*

2012-2013 JavaScript on Desktop improved 24% and 57% in Mobile+

Cortex-A15 optimizations to V8 made this possible

ARM Team now has Commit Rights to V8 Java Script codebase

*Clock-for-

clock +Google IO

2013

9

Profiling JavaScript HTML5 Execution

ARM created an extension to Mozilla and Webkit Browsers

Developers can see hotspot analysis while specific

JavaScript is executing

You can zero-in on key areas to optimize in your browser

engine for web Apps

You can find bottlenecks in specific web Apps

10

Firefox Mobile OS: A True Web-based Platform

Firefox Mobile OS uses Android Kernel

ARM has integrated Streamline

Full profiling of Firefox Mobile OS from Web Apps to Kernel

12/17/2013 10

OSKernel(e.g.,AndroidLinux,etc.)

DeviceHardware

StandardAPI’s(Javascript)

Se ngsBluetoothNFC TelephonyContacts Camera SMS Loca onAudio

MozillaGeckoWebEngine

UserInterface&APPS

Android Kernel & Device Driver Framework

Improved performance

from 5fps to 25fps

11

Portable Native Client (PNaCl)

Compiles C/++ code to LLVM bitcode

Some restrictions on constructs

Bitcode is xlated on device native code

Runs in browser sandbox on device

Better than 80% of native performance

PNaCl will hit the stable channel with

Chrome 31 in a few weeks

LLVM for Native Web Apps

NaCl SDK

PNaCL Cross Compiler

C/C++ Code

pexe portable executable

(LLVM Bitcode)

LLVM Backend Translator

ARMv7 CPU

(VFP, NEON)

Internet

HTML5

Browser

12

Optimizing WebRTC via VP9

Already supported by 1 Billion Browsers worldwide

DTMS Encrypted connection by default

Video, Voice and arbitrary reliable/unreliable low latency data

Peer-to-Peer as well as Peer-to-Server

Google Hangouts will use WebRTC in the future

WebRTC uses the VP9 video codec in the Chrome Browser

Linaro have optimized VP9 decoder using NEON technology

Improved performance in some paths by up to 20%

http://www.webmproject.org/code/contribute

“WebRTC is a new front in the long

war for an open and unencumbered

Web.” Brendan Eich Mozilla CTO

13

Performance Ping Pong Continues

JavaScript

Graphics

14

Improving Graphics - Optimizing the other 30%-50%

University of Szeged Webkit nullport aka GL2D port-

research

Replacing everything below Graphics Context API with a new

OpenGLES 2.0 libary

GL2D Skia

15

Font rendering – a prototype of multi-core

A prototype test shows

the relationship between

performance increase

and the number of glyph

queries

Modification in chromium/skia also shows that we can get

about 40% performance increase on a pure CJK text

webpage on the first load

Most glyphs can be found in cache for European languages

Most glyphs can be found in cache if CJK text always displayed at the

same size

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 2 3 4 5 6 7 8 9 10 11 12 13

single 24px

dual 24px

normalized time

#of glyph queries

16

Path filling – A patch for scanline by CPU multi-core

• Case 1: Composite 4 complex polygons

• Original: 2.22 ms

• Patched (2 threads): 2.01 ms

• Improvements: 10%

Case 2: Composite 4 polygons (large size)

Original: 5.6 ms

Patched (2 threads): 4.0 ms

Improvements: 40%

= + + +

17

New Beginnings of Multicore Browsers

2010-2012 Webkit

ARM & Szeged found limited

improvement

Codebase large

Limited SMP ability

2013 Google Announce

<blink>

Webkit fork to underlie

Chrome/Chromium

Focus on improvements for

modern SoC’s

2009-2011 Gecko

Electrolysis project to split into

threads

Improved stability

Limited performance improvement

2013 Mozilla Announce Servo

New experimental browser for

modern SoCs

Designed for multicore

Built with new language Rust

Rust compiler uses LLVM

18

Performance Critical Ingredient Technologies

The 3 pillars of the modern browser

Creative Commons Attribution-Share Alike 3.0

Unported license. Author: Matthias Kabel

Skia

Optimization, Stabilization, Improvement

19

JIT Tooling for ARM

Architecture V8

20

VIXL A64 dynamic code generation toolkit

Macro Assembler

Instruction generation with helpful macro assembler

Functions for abstracting eg. immediate generation

Disassembler

Disassembles everything supported by the assembler

Simulator:

High-speed AArch64 processor simulation on 64-bit platforms

Supports all instructions generated by the assembler

Debugger

Supports stepping, register and memory examination, breakpoints

Test suite

Functionality and disassembly tests for all supported instructions

21

VIXL Embedded in Virtual Machine

PC

Debug

Disassembler

JIT Built

for x86*

ARM64

Runtime

Assembler

Virtual

Machine

ARM64 ISA

Simulator

22

Where to Use VIXL

JITs: JavaScript, Java, Python, other scripting languages

Dynamic code generation of optimized routines

Testing:

Random Instruction Stream (RIS) testing

Toolchain testing

ISA experimentation: try out features of the new A64 ISA

Benefits

A simple, fast, tested API

Integrated suite, ready to use on a new JIT project

Supported by ARM

Liberal 3 clause BSD license

23

Conclusion

The Web has become an important Software Platform and

ARM understands this

The extensive R&D effort by ARM is delivering higher

browser performance

More contributions and collaboration from ARM partners

please

Try this at home - it’s all Open Source

24

Thank You

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM

Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured

may be trademarks of their respective owners