Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Performance Considerationsfor macOS/iOS Development
in the “New Frontier”
presented to Seattle Xcoders (Redmond)on 2017-03-23by Jeff Szuhay
1Friday, March 24, 17
Synopsis
• Basic model for computing has not changed since the 1950s
• Revolution in performance
• continuous small, evolutionary changes in PC hardware
• driven by low-energy demands–the new frontier
• all devices with a battery (and servers, too)
2Friday, March 24, 17
Synopsis:
While the basic model for computing has not changed since the 1950s,small, evolutionary changes in PC hardware driven largely by low-energy requirements (the new frontier) have begun to revolu-tionize the overall performance characteristics of all mobile computing devices, from iPhones to MacBook Pros.
Focus
1. The performance equation & how it has changed
• a model for computers systems
2. Using a RAM disk for Xcode
• what it is & why even do it today
• various configurations
NOTE: I was going to include Threading and Grand Central Dispatch. I ran out of time. My apologies; this will be a future talk.
3Friday, March 24, 17
This talk, a mix of both theory and practice, will focus on 3 areas: 1. how various Apple innovations are dramatically changing the performance equation; 2. using a RAMDisk for Xcode. a) why even do it today b) various configurations. 3. Threading in Cocoa with examples in Swift and Objective-C a) why b) when c) how. If there is time, some discussion about how these changes have corollary effects for software application design."
Your Humble Narrator
• Comp Sci degree in 1988 from Univ. of Pgh.(no, not CMU) focus on operating systems & DBMS
• Real-life experience:
• optimized production systems
• cognitive research application & gaming
• medical imaging application
• scalability testing
• QuarterTil2.com (graphics optimization)
4Friday, March 24, 17
My background:
* BS in CompSci, Univ of Pgh, 1988
* focused on systems architecture, OS design, and DBMS in college.
* was responsible for tuning a very large healthcare system (actually 4 of them). - turned a 28-hour _nightly_ capitation process into 1.5 hours. - returned 10% of a heavily i/o bound, bus-saturated system through better use of system processes.
* worked on various scientific research applications where performance was critical. * PST: cognitive research application * Nomos: medical imaging. * NTFS scalability testing at MS. * graphical optimization at QuarterTil2.com
* relation to gaming
Our Intuition Misleads Us
• Humans are terrible at estimations of
• orders of magnitute
• large sizes
• multiple dimensions
• A simple test (estimate 1/1,000 of room)
• Reminder:
• always measure performance,
• be wary of assumptions & intuition
5Friday, March 24, 17
1. The room from end-to-end, equals 1 million units.2. As I move from one side to the other, raise your hand when I get to 1/1,000 of it. * this will be our “lizard brain” working.3. Walk through a more reasoned analysis of 1/1,000th * this will be our “cro-magnon” brain working
Not only does our intuition mislead us but so do marketing departmentsof large hardware vendors.
Orders of Magnitude
Unit Multiple RangeTera 10 12 1,000,000,000,000 .0Giga 10 9 1,000,000,000 .0Mega 10 6 1,000,000 .0Kilo 10 3 1,000 .0One 10 0 1 .0Milli 10-3 0 .001
Micro 10-6 0 .000,001Nano 10-9 0 .000,000,001Pico 10-12 0 .000,000,000,001
6Friday, March 24, 17
Some History
• military need: the battleship
• mainframes: 1 computer, 1,000s of users
• mini-computers: 1 computer, multiple users (Unix)
• micro-computers: 1 computer, 1 user
• everything else (where we are today)
• devices everywhere
7Friday, March 24, 17
Some history * battleship: 2 tons shot 11 miles while running with waves and wind accurate to within 10 yards * mainframes: 1 computer, 1000s users. * mini-computers (XServe): 1 computer, mulitple users. * micro-comptuers (Mac Pro, iMac, Mini): 1 computer, 1 user. * everything else (were we are today): MacBooks, MBP, iPads, iPods, watches. Left out: * Client/server computing: mainframe/server connected to 1,000s of users * server compute farms - renderman, distributed Xcode, medical imaging apps. * server storage farms - web servers, cloud storage
* Massively parallel mainframes: * very large compute problems (NP-complete problems: O(n^x), where x > 3) * global weather * finite element analysis * travelling salesman problem
The “New Frontier”
• Lowest possible energy consumption
• Highest possible system performance
• storage
• graphics
• compute
• Every compute environment benefits from low power with or without a battery
• E.g. heat reduction (servers: cooling, costs)8Friday, March 24, 17
90% of server farm cost is energy consumption.
60% of that is cooling (hot CPUs, in particular).The rest is actual computer consumption.
Craig S. Hunter
• Worked at NASA’s Jet Propulsion Labs; now an ISV
• 1990’s Compiler Wars: published compiler performance results for jet engine simulations.
• various systems, most available compilers
• regardless of speed of resulting executable, incorrect results were discarded
• embarrased many compiler vendors (Intel)
9Friday, March 24, 17
We will revisit Mr. Hunter later.
General Computer System Model
clock CPU RAM
Keyboard
Storage
Video
Network
10Friday, March 24, 17
The general system model simplified: * clock * CPU * bus - contention (like tcp/ip) - throughput * memory (volatile) * I/O devices - inputs: keyboard, mouse, etc. - output:video, audio, printers, etc. - backing store (permanent): hard disks, SSD, tape, permanent memory. - network
Left out for simplicity: * DMAs * back-side memory * CPU caching schemes (L1, L2, L3) * GPU and GPU cache
“Ticks” Between Devices
clock CPU RAM
Keyboard
Storage
Video
Network
1
5-20
~1,000~5
200,000,000
16,700,000
4,000,000
125,000,000
1 GHz CPU
11Friday, March 24, 17
A system with 1) a 1 GHz System Clock = 1 tick2) 1 GHz processor = ~5 ticks/op 3) 500 MHz bus = ~5-20 tick/access4) ~100 MHz Memory = consider round trip5) Hard drive (7,500 rpm, 4 mSec access) = ~4,000,000 ticks/byte6) Video (60 Hz refresh rate = 16.7 mSec) = ~16,700,000 ticks/blit7) Network (latency ~ 1/8 sec) = ~125,000,000 ticks/bit8) Human (60 wpm = 5 char/sec) +~200,000,000 tick/char
Consider Time From Disk To CPU
Disk access
Disk to Bus To RamRam To Bus To CPU
12Friday, March 24, 17
Like our earlier examination of 1/1,000 of the room, nearly all the time is taken in waiting for disk access to complete.
Consider:1) what is the overall effect of speeding up CPU?2) what is overall effect of speeding up Bus?3) what is overall effect of speeding up Memory?4) what is overall effect of speeding up Disk/SSD access?
Consider our general system and "ticks" for 1 byte (or block) to be moved around. 1990 Mac SE/30 2000 PowerBook 2016 MBPro
Points to make:--> speed up which part? longest/slowest or shortest/fastest.--> from tuned to out-of-balance to tuned again--> can't avoid the bus... or can we? (CPU, GPU, memory)--> boundedness 1. compute/cpu bound 2. i/o bound 3. memory bound 4. bus bound--> here's where the new frontier lies. rarely cpu bound rarely memory bound mostly i/o bound sometimes bus bound.
Some Recent Observations: Cold Boot Times–From Bong To Login
• System boot is mostly I/O
• 2007 MacBook Pro 17” with hard drive: 100 sec.
• 2007 MacBook Pro 17” with SSD: 25 sec.
• 2010 MacBook Air 10.6”: 20 sec.
• 2012 MacBook Air 10.6”: 10 sec.
13Friday, March 24, 17
Craig Hunter Revisited
• MacBook Pro 2016 – “tricked out” machine
• CPU & memory constrained for most of his simulations
• Ran and compared to previous machines
• his applications CPU + memory-bound
• mult-core
• overall speeds.
Some 2016 MacBook Pro Benchmarks and Noteshttp://hrtapps.com/blogs/20161118/
14Friday, March 24, 17
15Friday, March 24, 17
16Friday, March 24, 17
17Friday, March 24, 17
The Compilation “Problem”
• Xcode no different than any other compiler
• My experience with PDP-8s
• Mostly I/O bound:
• lots of reads/rereads
• lots of writes (very expensive)
• sometimes memory bound, but not often
18Friday, March 24, 17
RAM Disk for Xcode
• Use fast but volatile memory as a backing store–it appears to the system as a hard drive
• In the past:
• used to compress hard drives (why?)
• could speed up selected applications
• Today, even with SSDs:
• offers significant speed up of i/o
• prevents “bit flip fatigue: of SSDs ⬅︎ now the
main reason to use
19Friday, March 24, 17
More recent SSDs do not have this problem nearly as serverely as 1st and 2nd gen SSDs.
Still an issue, especially in compilation.
RAM Disk for Xcode
• Basic 2-step approach
1. configure RAM disk via script
2. configure Xcode to use RAM disk
20Friday, March 24, 17
Script for Creating a RAM Disk
#!/bin/bash
RAMDISK=”ramdisk”
SIZE=1024 #size in MB for ramdisk.
diskutil erasevolume HFS+ $RAMDISK \ `hdiutil attach -nomount ram://$[SIZE*2048]`
21Friday, March 24, 17
chmod 777 ramdisk.sh
Configure Xcode to Use RAM Disk: 1 of 2
22Friday, March 24, 17
Xcode -> Preferences... -> Locations -> Locations Tab
Configure Xcode to Use RAM Disk: 2 of 2
23Friday, March 24, 17
Xcode -> Preferences... -> Locations -> Locations Tab -> Advanced button
RAM Disk Considerations
• Capacity: make sure you have enough RAM to spare. On systems with 8 or more GB or RAM, you should have plenty. Use caution on systems with less RAM.
• Sizing: size your RAM disk appropriately—after all, it is committing your RAM to a specific use which does not shrink or expand. I've found that for my development needs, even 500 MB is adequate. You'll just have to experiment and observe its use for your needs.
• Volatility: be certain you save anything you need stored there before you log off or shut down. Hence, only use for DerivedData where everything can be re-created.
24Friday, March 24, 17
Using “top” in terminal
Processes: 210 total, 2 running, 4 stuck, 204 sleeping, 778 threads 17:11:28Load Avg: 3.94, 2.18, 1.73 CPU usage: 33.49% user, 54.24% sys, 12.26% idle SharedLibs: 17M resident, 15M data, 0B linkedit.MemRegions: 42928 total, 1179M resident, 59M private, 415M shared. PhysMem: 3732M used (996M wired), 363M unused.VM: 505G vsize, 1066M framework vsize, 226346(0) swapins, 330284(0) swapouts.Networks: packets: 2636452/3368M in, 1189198/132M out. Disks: 1230574/17G read, 635571/16G written.
Using “top” in Terminal
Processes: 210 total, 2 running, 4 stuck, 204 sleeping, 778 threads Load Avg: 3.94, 2.18, 1.73 CPU usage: 33.49% user, 54.24% sys, 12.26% idle SharedLibs: 17M resident, 15M data, 0B linkedit.MemRegions: 42928 total, 1179M resident, 59M private, 415M shared. PhysMem: 3732M used (996M wired), 363M unused.VM: 505G vsize, 1066M framework vsize, 226346(0) swapins, 330284(0) swapouts.Networks: packets: 2636452/3368M in, 1189198/132M out. Disks: 1230574/17G read, 635571/16G written.
25Friday, March 24, 17
This shows a 4 GB system with around 1 GB “wired” (non-purgeable) and 363MB unused.A 500 MB RAM disk would work here but might slow down other apps, since more of theirmemory will be swapped to disk.
Reference
• Cache In Your Pocket: how to set up a RAM disk for Xcode.http://www.blinddogsoftware.com/goodies/#CacheInYourPocket
26Friday, March 24, 17
Q & A
27Friday, March 24, 17
Thanks.
28Friday, March 24, 17