Q4.11: NEON Intrinsics

View
1.531
Download
13
Category

Technology

Tags:

Preview:

DESCRIPTION

Resource: Q4.11 Name: NEON Intrinsics Date: 28-11-2011 Speaker: Michael Hope

Citation preview

Michael Hope, Toolchain

bzr branch lp:~michaelh1/+junk/intrinsics-demo

NEON Intrinsics

What's NEON?

●Ch 19 'Introducting NEON' http://infocenter.arm.com/help/topic/com.arm.doc.den0013a/

http://infocenter.arm.com/help/topic/com.arm.doc.den0013a/

SIMD is...

Same instruction, many values

Anything involving signals is great for SIMD

Normalisation

● Easier to read and write● Easier (better?) register allocation● Compiler knows how to schedule● ABI neutral

Advantages

Works across compilers

> gcc-mcpu=cortex-a9 -mfpu=neon -O3 -c test.c

> armcc --cpu Cortex-A9 --c99 -O3 -c test.c

> clang -mcpu=cortex-a9 -mfpu=neon -O3 -c test.c

Tune for the architecture

-mtune=cortex-a9

-mtune=cortex-a8

-mtune=cortex-a5

SMS, unrolling, profiling?

Writing

Environment

#include <arm_neon.h>

gcc -march=armv7-a -mfpu=neon

Data types

<type>x<lanes>_t (uint8x4_t)

<type>x<lanes>x<# registers>_t (int16x2x4_t)

Some Instructions

Add

uint16x4_t vadd_u16 (

uint16x4_t left,

uint16x4_t right

)

Multiply

uint64x2_t vmlal_u32

(uint64x2_t,

uint32x2_t, uint32x2_t)

int32x4_t vqdmlal_s16

(int32x4_t,

int16x4_t, int16x4_t)

Strided load

uint8x8x2_t vld2_u8

(const uint8_t *)

Form of expected instruction(s):

vld2.8 {d0, d1}, [r0]

Documentation

GCC

http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

ARM

http://infocenter.arm.com/help/topic/com.arm.doc.den0013a

Blog posts

Search for “Coding with NEON” on

http://blogs.arm.com

http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

http://infocenter.arm.com/help/topic/com.arm.doc.den0013a

http://blogs.arm.com/

Writing

Colour space conversion

Y = 0.2126 R + 0.7152 G + 0.0722 B

HD television (ITU BT.709)

Versions

Nils Pipenbrinckhttp://hilbert-space.de/?p=22

http://hilbert-space.de/?p=22

Performance

Plain C

48.481 s

Assembly

8.727 s (5.55 x faster)

Intrinsics

8.728 s (5.55 x faster)

Bigger Routines

“libpixelflinger: Add ARM NEON optimized scanline_t32cb16”

http://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance

Hand-written

2.831 s

Intrinsics

2.637 s (7.4 % faster)

http://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance

Recommended

Neon Intrinsics Case Study: Optimizing PNG Performance in

Documents

Q4.11: Introduction to eMMC

Technology

Camera Parameters: Extrinsics and Intrinsics

Documents

Direct Intrinsics: Learning Albedo-Shading Decomposition ...mmaire/papers/pdf/direct_ii_iccv2015.pdf · rect intrinsics, is to learn a convolutional neural network (CNN) that directly

Documents

Intrinsics, Metadata, and Attributes: The story continues! 2016 … · 2019-10-30 · Intrinsics, Metadata, and Attributes: The story continues! 2016 LLVM Developers’ Meeting

Documents

NeOn-projectneon-project.org/web-content/images/Publications/neon... · 2010. 8. 11. · NeOn-project.org NeOn: Lifecycle Support for Networked Ontologies Integrated Project (IST-2005-027595)

Documents

Intel(R) C++ Compiler Intrinsics Referenceengelen/courses/HPC-adv/intref_cls.pdfIntel(R) C++ Intrinsics Reference ... Pentium® III Processor Supported Supported Not Supported Not

Documents

Temporary Intrinsics and Presentism, with …fas-philosophy.rutgers.edu/zimmerman/tempintrinsics...Temporary Intrinsics and Presentism, with Postscript (2005)* Dean Zimmerman [Forthcoming

Documents

Neon Hoarding & Signages by Radiance Neon, Ahmedabad

Business

Manual Neon 2502A Neon Camera - Unidata · Manual – Neon 2502A Neon Camera Unidata Manual - 2502A Neon Camera Module Issue 4.0.docx Page 5 4.0 TAKING A PHOTO There are two ways

Documents

Neon Controls Controls SERIES 44 Neon Controlsneoncontrols.com/downloads/Neon-Controls-Series-44.pdfNeon Controls Neon Controls Neon Controls 6.2 Ø 2.10 9.2 8.0 2.25 3.75 4.84 6.75

Documents

Learning Non-Lambertian Object Intrinsics across ShapeNet Categories · 2016-12-28 · Learning Non-Lambertian Object Intrinsics across ShapeNet Categories Jian Shi Institute of Software,

Documents

Q4.11: Roadmap Overview

Technology

Q4.11: Keynote

Technology

Temporary Intrinsics and Presentism, with Postscript (2005)

Documents

X Toolkit Intrinsics – C Language Interface - X Window System

Documents

NEON Series Quick Start Guide Rev. 2.00 NEON Series Quad ...top-fa.com/adlink/vision system/smart camera/manual/NEON-1021... · NEON Series Quad Core x86 Smart Camera NEON-1040/1020/1021

Documents

SIMD Intrinsics on Managed Language Runtimes · SIMD intrinsics. Second, we provide the developer with the means to use the SIMD eDSL to develop application logic, which automatically

Documents

Data-Parallel Execution using SIMD Instructions · Data-Parallel Execution using SIMD Instructions Intrinsics intrinsics provide an interface to SIMD instructions without writing

Documents

C6000 Host Intrinsics - Texas Instrumentsprocessors.wiki.ti.com/images/7/75/Host_intrinsics_overview.pdf · Given the Host Intrinsics, what C/C++ programming and host platforms can

Documents