Evolution of Personal Computing by
Microprocessors and SoCs
For Credit Seminar: EEC7203 (Internal Assessment)
Submitted To
Dr. T. Shanmuganantham
Associate Professor,
Department of Electronics Engineering
Azmath Moosa
Reg No: 13304006 M. Tech 1st Yr
Department of Electronics Engineering, School of Engg & Tech,
Pondicherry University
Page | i
Abstract
Throughout history, new and improved technologies have transformed the human
experience. In the 20th century, the pace of change sped up radically as we entered the
computing age. For nearly 40 years the Microprocessor driven by innovations of companies
like Intel have continuously created new possibilities in the lives of people around the world.
In this paper, I hope to capture the evolution of this amazing device that has raised computing
to a whole new level and made it relevant in all fields – Engineering, Research, Medical,
Academia, Businesses, Manufacturing, Commuting etc. I will highlight the significant strides
made in each generation of Processors and the remarkable ways in which engineers overcame
seemingly unsurmountable challenges and continued to push the evolution to where it is today.
Page | ii
Table of Contents
Title Page No.
1. Abstract i
2. Table of Contents ii
3. List of Figures iii
4. Introduction 1
5. X86 and birth of the PC 2
6. The Pentium 3
7. Pipelined Design 4
8. The Pentium 4 5
9. The Core Microarchitecture 7
10. Tick Tock Cadence 10
11. The Nehalem Microarchitecture 10
12. The SandyBridge Microarchitecture 12
13. The Haswell Microarchitecture 15
14. Performance Comparison 16
15. Shift in Computing Trends 18
16. Advanced RISC Machines 18
17. System on Chip (SoC) 19
18. Conclusion 22
19. References
Page | iii
List of Figures
Figure 1: 4004 Layout 1 Figure 2: Pentium Chip 3 Figure 3: Pentium CPU based PC architecture 4 Figure 4: Pentium 2 logo 4 Figure 5: Pentium 3 logo 4 Figure 6: Pentium 4 HT technology illustration 6 Figure 7: NetBurst architecture feature presentation at Intel Developer
Forum 6 Figure 8: The NetBurst Pipeline 7 Figure 9: The Core architecture feature presentation at Intel Developer
Forum 8 Figure 10: The Core architecture pipeline 8 Figure 11: Macro fusion explained at IDF 9 Figure 12: Power Management capabilities of Core architecture 9 Figure 13: Intel's new tick tock strategy revealed at IDF 10 Figure 14: Nehalem pipeline backend 11 Figure 15: Nehalem pipeline frontend 11 Figure 16: Improved Loop Stream Detector 11 Figure 17: Nehalem CPU based PC architecture 11 Figure 18: Sandybridge architecture overview at IDF 12 Figure 19: Sandybridge pipeline frontend 13 Figure 20: Sandybridge pipeline backend 13 Figure 21: Video transcoding capabilities of Nehalem 14 Figure 22: Typical planar transistor 14 Figure 23: FinFET Tri-Gate transistor 14 Figure 24: FinFET Delay vs Power 15 Figure 25: SEM photograph of fabricated FinFET trigate transistors 15 Figure 26: Haswell pipeline frontend 16 Figure 27: Haswell pipeline backend 16 Figure 28: Performance comparisons of 5 generations of Intel processors 17 Figure 29: Market share of personal computing devices. 18 Figure 30: A smartphone SoC; Qualcomm's OMAP 20 Figure 31: A SoC for tablet; Nvidia TEGRA 21
Page | 1
Introduction
In 1969, Intel was found with aim of manufacturing memory devices. Their first
product was Shottky TTL bipolar SRAM memory chip. A Japanese company – Nippon
Calculating Machine Corporation approached Intel to design 12 custom chips for its new
calculator. Intel engineers suggested a family of just four chips, including one that could be
programmed for use in a variety of products. Intel designed a set of four chips known as the
MCS-4. It included a central processing unit (CPU) chip—the 4004—as well as a supporting
read-only memory (ROM) chip for the custom applications programs, a random-access
memory (RAM) chip for processing data, and a shift-register chip for the input/output (I/O)
port. MCS-4 was a "building block" that engineers could purchase and then customize with
software to perform different functions in a wide variety of electronic devices.
And thus, the industry of the Microprocessor was born. 4004 had 2,300 pMOS
transistors at 10um and was clocked at 740 kHz. 4 pins were multiplexed for both address and
data (16 pin IC). In the very next year, the 8008 was introduced. It was an 8 bit processor
clocked at 500 kHz with 3,500 pMOS transistors at the same 10um. It was actually slower
with 0.05 MIPS (Millions of instructions per second) as compared to 4004 with 0.07. It was
in 1974, that the 8080 with 10 times the performance of 8008 with a different transistor
technology was launched. It used 4,500 NMOS transistors of size 6um. It was clocked at 2
MHz with a whopping 0.29 MIPS. Finally in March 1976, the 8085 clocked at 3 MHz with
yet another newer transistor technology - depletion type NMOS transistors of size 3 um was
launched. It was capable of 0.37 MIPS. The 8085 was a popular device of its time and is still
used in universities across the globe to introduce students to microprocessors.
Figure 1: 4004 Layout
Page | 2
x86 and birth of the PC
The 8086 16 bit processor made its debut in 1978. New techniques such as that of
memory segmentation into banks to extend capacity and Pipelining to speed up execution were
introduced. It was designed to be compatible with 8085 Assembly Mnemonics. It had 29,000
transistors of 3um channel length and was clocked at 5, 8 and 10 MHz with a full 0.75 MIPS
at maximum clock. It was the father of what is now known as the x86 Architecture which
eventually turned out to be Intel’s most successful line of processors that power many
computing devices even today. Introduced soon after was the processor that powered the first
PC – the 8088. Clocked at 5-8 MHz with 0.33-0.66 MIPS, it was 8086 with an external 8 bit
bus.
In 1981, a revolution seized the computer industry stirred by the IBM PC. By the late
'70s, personal computers were available from many vendors, such as Tandy, Commodore, TI
and Apple. Computers from different vendors were not compatible. Each vendor had their own
architecture, their own operating system, their own bus interface, and their own software.
Backed by IBM's marketing might and name recognition, the IBM PC quickly captured the
bulk of the market. Other vendors either left the PC market (TI), pursued niche markets
(Commodore, Apple) or abandoned their own architecture in favor of IBM's (Tandy). With a
market share approaching 90%, the PC became a de-facto standard. Software houses wrote
operating systems (MicroSoft DOS, Digital Research DOS), spread sheets (Lotus 123), word
processors (WordPerfect, WordStar) and compilers (MicroSoft C, Borland C) that ran on the
PC. Hardware vendors built disk drives, printers and data acquisition systems that connected
to the PC's external bus. Although IBM initially captured the PC market, it subsequently lost
it to clone vendors. Accustomed to being a monopoly supplier of mainframe computers, IBM
was unprepared for the fierce competition that arose as Compaq, Leading Edge, AT&T, Dell,
ALR, AST, Ampro, Diversified Technologies and others all vied for a share of the PC market.
Besides low prices and high performance, the clone vendors provided one other very important
thing to the PC market: an absolute hardware standard. In order to sell a PC clone, the
manufacturer had to be able to guarantee that it would run all of the customer's existing PC
software, and work with all of the customer's existing peripheral hardware. The only way to do
this was to design the clone to be identical to the original IBM PC at the register level. Thus,
the standard that the IBM PC defined became graven in stone as dozens of clone vendors
shipped millions of machines that conformed to it in every detail. This standardization has been
an important factor in the low cost and wide availability of PC systems.
Page | 3
8086 and 80186/88 were limited to addressing 1M of memory. Thus, the PC was also
limited to this range. This limitation was increased to 16 MB by 80286 released in 1982. It
had max clock of 16 MHz with more than 2 MIPS. It had 134,000 transistors at 1.5um. The
processors and the PC up to this point were all 16 bit. The 80386 range of processors, released
in 1985, were the first 32 bit processors to be used in the PC. The first of these had 275,000
transistors at 1um and was clocked at 33 MHz with 5.1 MIPS. Its addressing range could be
virtually 32 GB. Over the next few years, Intel modified the architecture and provided some
improvements in terms of memory addressing range and clock speed. The 80486 range of
processors, released in 1989, brought significant advancements in computing capability with a
whopping 41 MIPS for a processor clocked at 50 MHz with 1.2 million transistors at 0.8 um
or 800 nm. It had a new technique to speed up RAM read/writes with the Cache memory. It
was integrated onto the CPU die and was referred to as level 1 or L1 cache (as opposed to the
L2 cache available in the motherboard). As with the previous series, Intel slightly modified
the architecture and released higher clocked versions over the next few years.
The Pentium
The Intel Pentium microprocessor was introduced in
1993. Its microarchitecture, dubbed P5, was Intel's fifth-
generation and first 32 bit superscalar microarchitecture.
Superscalar architecture is one in which multiple execution
units or functional units (such as adders, shifters and
multipliers) are provided and operate in parallel. As a direct
extension of the 80486 architecture, it included dual integer
pipelines, a faster floating-point unit, wider data bus, separate
code and data caches and features for further reduced address
calculation latency. In 1996, the Pentium with MMX Technology (often simply referred to as
Pentium MMX) was introduced with the same basic microarchitecture complemented with an
MMX instruction set, larger caches, and some other enhancements. The Pentium was based
on 0.8 um process technology, involved 3.1 million transistors and was clocked at 60 MHz
with 100 MIPS. The Pentium was truly capable of addressing 4 GB of RAM without any
operating system based virtualization.
Figure 2: Pentium Chip
Page | 4
The next microarchitecture was the P6
or the Pentium Pro released in 1995.
It had an integrated L2 cache. One
major change Intel brought to the PC architecture was
the presence of FSB (Front Side Bus) that managed the
CPU’s communications with the RAM and other IO.
RAM and Graphics card were high speed peripherals
and were interfaced through the Northbridge. Other IO
devices like keyboard and speakers were interfaced
through the Southbridge.
Pentium II followed it soon in 1997. It
had MMX, improved 16 bit
performance and had double the L2 cache. Pentium II had 7.5 million
transistors starting with 0.35um process technology but later revisions utilised
0.25um transistors.
The Pentium III followed in 1999 with 9.5 million 0.25um transistors and a
new instruction set SSE (Streaming SIMD Extensions) that assisted DSP and
graphics processing. Intel was able to push the clock speed higher and higher
with Pentium III with some variants clocked as high as 1 GHz.
Pipelined Design
At a high level the goal of a CPU is to grab instructions from memory and execute those
instructions. All of the tricks and improvements we see from one generation to the next just
help to accomplish that goal faster.
The assembly line analogy for a pipelined microprocessor is over used but that's because it is
quite accurate. Rather than seeing one instruction worked on at a time, modern processors
Figure 3: Pentium CPU based PC architecture
Figure 4: Pentium 2 logo
Figure 5: Pentium 3 logo
Page | 5
feature an assembly line of steps that breaks up the grab/execute process to allow for higher
throughput.
The basic pipeline is as follows: fetch, decode, execute, and commit to memory. One would
first fetch the next instruction from memory (there's a counter and pointer that tells the CPU
where to find the next instruction). One would then decode that instruction into an internally
understood format (this is key to enabling backwards compatibility). Next one would execute
the instruction (this stage, like most here, is split up into fetching data needed by the instruction
among other things). Finally one would commit the results of that instruction to memory and
start the process over again. Modern CPU pipelines feature many more stages than what've
been outlined above.
Pipelines are divided into two halves. Frontend and Backend. The front end is responsible for
fetching and decoding instructions, while the back end deals with executing them. The division
between the two halves of the CPU pipeline also separates the part of the pipeline that must
execute in order from the part that can execute out of order. Instructions have to be fetched and
completed in program order (can't click Print until you click File first), but they can be executed
in any order possible so long as the result is correct.
Many instructions are either dependent on one another (e.g. C=A+B followed by E=C+D) or
they need data that's not immediately available and has to be fetched from main memory (a
process that can take hundreds of cycles, or an eternity in the eyes of the processor). Being able
to reorder instructions before they're executed allows the processor to keep doing work rather
than just sitting around waiting.
This document aims to highlight changes to the x86 pipeline with each generation of
processors.
The Pentium 4
The NetBurst microarchitecture started with Pentium 4. This line of processors started
in 2000 clocked at 1.4 GHz, 42 million transistors at 0.18 um process size and SSE2 instruction
set. The early variants were codenamed Willamette (1.9 to 2.0 GHz) and later ones Northwood
(up to 3.0 GHz) and Prescott.
Page | 6
The diagram is from Intel feature presentation
of the NetBurst architecture. The Willamette
was an early variant with SSE2, Rapid
Execution engine (in which ALUs operate at
twice the core clock frequency) and
Instruction Trace Cache (ITC cached
decoded instructions for faster loop execution).
HT Technology refers to the prevention of
CPU wastage by assigning it to execute one
thread or application when another one waits
for data from RAM to arrive. This essentially acts like a dual processor system.
The NetBurst pipeline was 20 stages long. As illustrated in the figure to the right, the BTB
(Branch Target Buffer) helps to define the address of the next micro-op in the trace cache (TC
Nxt IP). Then micro-ops are fetched out of the trace cache (TC Fetch) and are transferred
(Drive) into the RAT (register alias table). After that, the necessary resources are allocated
(such as loading queues, storing buffers etc. (Alloc)), and there comes logic registers rename
(Rename). Micro-ops are put in the Queue until there appears free place in the Schedulers.
There, micro-ops' dependencies are to be solved, and then micro-ops are transferred to the
register files of the corresponding Dispatch Units. There, a micro-op is executed, and Flags are
calculated. When implementing the jump instruction, the real branch address and the predicted
Figure 7: NetBurst architecture feature presentation at Intel Developer Forum
Figure 6: Pentium 4 HT technology illustration
Page | 7
one are to be compared (Branch Check). After that the new
address is recorded in the BTB (Drive).
Northwood and Prescott were later variations with certain
enhancements as illustrated in the diagram above. Processor
specific details are unnecessary.
The next major advancement was the 64 bit
NetBurst released in 2005. The Prescott line up continued
with maximum clock speeds of 3.8 GHz, transistor sizes of
0.09um. It had 2MB cache and EIST (Enhanced Intel
SpeedStep Technology – allowing dynamic processor clock
speed scaling through software). EIST was particularly
useful for mobile processors as a lot of power was conserved
when running at low clock speeds. NetBurst family
continued to grow with the Pentium D (dual core HT
disabled processors) and Pentium Extreme Edition
processors (Dual core with HT enabled).
The Core Microarchitecture
The high power consumption and heat intensity, the resulting inability to effectively
increase clock speed, and other shortcomings such as the inefficient pipeline were the primary
reasons for which Intel abandoned the NetBurst microarchitecture and switched to completely
different architectural design, delivering high efficiency through a small pipeline rather than
high clock speeds.
Intel’s solution was the Core microarchitecture released in 2006. The first of these
were sold under the brand name of “Core 2” with duo and quad variants (dual and quad CPUs).
Figure 8: The NetBurst Pipeline
Page | 8
Merom was for
mobile computing,
Conroe was for
desktop systems,
and Woodcrest
was for servers
and workstations.
While
architecturally
identical, the three
processor lines
differed in the
socket used, bus
speed, and power
consumption. The
diagram below illustrates the Conroe architecture.
14 stage pipeline of the Core
architecture was a trade-off between
long and short pipeline designs. The
architectural highlights of this
generation are given below.
Wide Dynamic Execution referred
to two things. First, the ability of the
processor to fetch, dispatch, execute
and return four instructions
simultaneously. Second, a technique
called Macro fusion in which two
x86 instructions could be combined
into a single micro-op to increase performance.
Figure 9: The Core architecture feature presentation at Intel Developer Forum
Figure 10: The Core architecture pipeline
Page | 9
In previous generations, the ALU typically broke instructions into two blocks, which resulted
in two micro ops and thus two execution clock cycles. In this generation, Intel extended the
execution width of the ALU and the load/store units to 128 bits, allowing for eight single
precision or four double precision blocks to be processed per cycle. The feature was called
Advanced Digital Media Boost, because it applied to SSE instructions which were utilised by
Multimedia transcoding applications. Intel Advanced Smart Cache referred to the unified
L2 cache that allowed for a large L2 cache to be shared by two processing cores (2 MB or 4
MB). Caching was more effective now because data was no longer stored twice into different
L2 caches any more (no replication). This freed up the system bus from being overloaded with
RAM read/write activity as each core could share data directly through the cache. The Smart
Memory Access feature referred to the inclusion of prefetchers. A prefetcher gets data into a
higher level unit using very speculative algorithms. It is designed to provide data that is very
likely to be requested soon, which can reduce memory access latency and increase efficiency.
The memory prefetchers constantly have a look at memory access patterns, trying to predict if
there is something they could move into the L2 cache from RAM - just in case that data could
be requested next. Intelligent Power Capability was a culmination of many techniques. The
65-nm process provided a good basis for efficient ICs. Clock gating and sleep transistors made
sure that all units as well as single transistors that were not needed remained shut down.
Enhanced SpeedStep still reduced the clock speed when the system was idle or under a low
load and was also capable of controlling each core separately. Some features were also
available such as Execute Disable Bit by which an operating system with support for the bit
may mark certain areas of memory as non-executable. The processor will then refuse to execute
any code residing in these areas of memory. The general technique, known as executable space
Figure 11: Macro fusion explained at IDF Figure 12: Power Management capabilities of Core architecture
Page | 10
protection, is used to prevent certain types of malicious software from taking over computers
by inserting their code into another program's data storage area and running their own code
from within this section; this is known as a buffer overflow attack. It is also to be noted that
HyperThreading was removed.
Tick-Tock Cadence
Since 2007, Intel adopted a "Tick-Tock" model to follow every microarchitectural
change with a die shrink of the process technology. Every "tick" is a shrinking of process
technology of the previous microarchitecture and every "Tock" is a new microarchitecture.
Every year to 18 months, there is expected to be one Tick or Tock.
In 2007, the Core microarchitecture underwent a “Tick” to the 45 nm process. Processors were
codenamed Penryn. Process shrinking always brings down energy consumption and improves
power savings.
The Nehalem Microarchitecture
The next Tock was introduced in 2008 with the Nehalem microarchitecture. The
transistor count in this generation was nearing the Billion mark with around 700 million
transistors in the i7. The pipeline frontend and backend are illustrated below.
Figure 13: Intel's new tick tock strategy revealed at IDF
Page | 11
The new changes to the pipeline in this were as
follows:
Loop Stream Detector – detected and cached
loops to prevent fetching instructions from cache
and decoding them again and again
Improved Branch Predictor – Fetched branch
instructions prior to execution based on an improved prediction algorithm
SSE 4+ - New instructions helpful for operations on database and DNA sequencing
were introduced.
Other changes to the architecture were:
HyperThreading – HT was reintroduced
Turbo Boost – The processor could
intelligently control its clock speed as per
application requirements and thus,
dynamically conserve power. Unlike EIST,
no OS intervention is required.
Backend
Figure 15: Nehalem pipeline frontend
Figure 14: Nehalem pipeline backend
Figure 16: Improved Loop Stream Detector
Figure 17: Nehalem CPU based PC architecture
Page | 12
QPI – QuickPath Interconnect was the new system bus replacing FSB. Intel had moved
the memory controller on to the CPU die.
L3 Cache – shared between all 4 cores
The next tick was in 2010 codenamed Westmere with process shrinking to 32nm.
The SandyBridge Microarchitecture
The next Tock was in 2011 with the SandyBridge microarchitecture also marketed as
2nd generation of i3, i5 and i7 processors. With SandyBridge, Intel surpassed the 1 Billion
transistor count mark. The architectural improvements in this generation can be summarised
in the diagram below:
Changes to the pipeline were as follows:
A Micro-op Cache - When SB’s fetch hardware grabs a new instruction it first checks
to see if the instruction is in the micro-op cache, if it is then the cache services the rest
of the pipeline and the front end is powered down. The decode hardware is a very
complex part of the x86 pipeline, turning it off saves a significant amount of power.
Figure 18: Sandybridge architecture overview at IDF
Page | 13
Redesigned Branch Prediction Unit – SB
caches twice as many branch targets as
Nehalem with much effective and longer
storage of history.
Physical Register File - A physical register file stores micro-op operands in the register
file; as the micro-op travels down the OoO (Out of Order execution engine) it only
carries pointers to its operands and not the data itself. This significantly reduces the
power of the OoO hardware (moving large amounts of data around a chip eats power),
it also reduces die area further down the pipe. The die savings are translated into a larger
out of order window.
AVX Instruction Set – Advanced Vector Extensions are a group of instructions that
are suitable for floating point intensive calculations in multimedia, scientific and
financial applications. SB features 256 bit operands for this instructions set.
Other changes to the architecture were:
Ring On-Die Interconnect - With Nehalem/Westmere all cores, whether dual, quad or
six of them, had their own private path to the last level (L3) cache. That’s roughly 1000
wires per core. The problem with this approach is that it doesn’t work well for scaling
up in things that need access to the L3 cache. Sandy Bridge adds a GPU and video
transcoding engine on-die that share the L3 cache. Rather than laying out another 2000
wires to the L3 cache Intel introduced a ring bus
Backend Figure 19: Sandybridge pipeline frontend
Figure 20: Sandybridge pipeline backend
Page | 14
On-Die GPU and QuickSync - The Sandy Bridge GPU is on-die built out of the same
32nm transistors as the CPU cores. It gets equal access to the L3 cache. The GPU is
on its own power island and clock domain. The GPU can be powered down or clocked
up independently of the CPU. Graphics turbo is available on both desktop and mobile
parts. QuickSync is a hardware acceleration technology for video transcoding.
Rendering videos will be faster and more efficient.
Multimedia Transcoding - Media processing in SB is composed
of two major components: video decode, and video encode. The
entire video pipeline is now decoded via fixed function units.
This is in contrast to Intel’s previous design that uses the EU array
for some video decode stages. SB processor power is cut in half
for HD video playback.
More Aggressive Turbo Boost
The next Tick was in 2012 with the IvyBridge microarchitecture. The die was shrinked to
a 22nm process. It was marketed as 3rd generation of i3, i5 and i7 processors. Intel used
FinFET tri-gate transistor structure for the first time. Comparisons of the new structure
released by Intel are provided below.
As the above diagram shows, a FinFET structure or a 3D gate (as Intel calls it) allows
for more control over the channel by maximizing the Gate area. This means high ON current
and extremely low leakage current. This directly translates into lower operating voltages, lower
TDPs and hence higher clock frequencies. Comparisons in terms of delay and operating
voltage between the two structures are shown to the right.
Figure 21: Video transcoding capabilities of Nehalem
Figure 22: Typical planar transistor Figure 23: FinFET Tri-Gate transistor
Page | 15
A scanning electron microscope image of the actual
transistors fabricated are shown to the right. A single transistor consists of multiple Fins as
parallel conduction paths maximize current flow.
The Haswell Microarchitecture
Ivy Bridge was followed by the next Tock of 2013, the Haswell microarchitecture. It
is currently being marketed as the 4th generation of core i3, i5 and i7 processors.
Changes to the pipeline were as follows:
Wider Execution Unit - adds two more execution ports, one for integer math and
branches (port 6) and one for store address calculation (port 7). The extra ALU and
port does one of two things: either improve performance for integer heavy code, or
allow integer work to continue while FP math occupies ports 0 and 1.
AVX2 and FMA - The other major addition to the execution engine is support for
Intel's AVX2 instructions, including FMA (Fused Multiply-Add). Ports 0 & 1 now
include newly designed 256-bit FMA units. As each FMA operation is effectively two
floating point operations, these two units double the peak floating point throughput of
Haswell compared to Sandy/Ivy Bridge.
Figure 24: FinFET Delay vs Power Figure 25: SEM photograph of fabricated FinFET trigate transistors
Page | 16
The architectural improvements in this
generation can be summarised as follows:
Improved L3 Cache – The cache
bandwidth has been increased and is
now also capable of clocking itself separately from the Cores.
GPU and QuickSync – Notable performance improvements have been made to the on-
die GPU. QuickSync is a hardware acceleration technology for Multimedia
transcoding. Haswell improves on image quality and adds support for certain codecs
such as SVC, Motion JPEG and MPEG2.
Performance Comparisons
Before concluding this document, a comparison of the performance of these processors
has to be illustrated. The following graphs showcase performance improvements of Intel
processors over five generations starting with Conroe all the
way up to Haswell. Processor naming convention is as
illustrated to the right
Backend Figure 26: Haswell pipeline frontend
Figure 27: Haswell pipeline backend
Page | 17
Intel is about half a century old. From the 4004 to the current 4th generation of i7, i5
and i3 processors, a lot has changed in the electronics industry. But this is not the end. This
evolution will continue. Intel’s next Tick will be Broadwell scheduled for this year utilizing
14nm transistor technology.
Figure 28: Performance comparisons of 5 generations of Intel processors
Page | 18
Shift in Computing Trends
With its powerful x86 architecture, and excellent business strategy, Intel has managed
to dominate the PC market for almost as long as its age. Now, however, market analysts have
noticed a significant new shift in computing trends. More and more customers are losing
interest in the PC and moving towards more mobile computing platforms. The chart below
(Courtesy: Gartner) highlights this shift.
Figure 29: Market share of personal computing devices.
PC sales are beginning to drop as is evident. Meanwhile, the era of tablets and
smartphones is beginning. A common mistake many industry giants make is the lack of
importance they give to such shifts and end up losing it all. It happened with IBM (it lost the
PC market) and Intel will be no exception unless it is careful.
Advanced RISC Machines
The battle for the mainstream processor market has been fought between two main
protagonists, Intel and AMD, while semiconductor manufacturers like Sun and IBM
traditionally concentrated on the more specialist Unix server and workstation markets.
Unnoticed to many, another company rose to a point of dominance, with sales of chips based
on its technology far surpassing those of Intel and AMD combined. That pioneering company
is ARM Holdings, and while it's not a name that's on everyone's lips in the same way that the
0
50,000
1,00,000
1,50,000
2,00,000
2,50,000
3,00,000
3,50,000
4,00,000
4,50,000
5,00,000
2012 2013 2014
Market Share
PC (Desk and Notebook) Ultramobile Tablet Smartphone (Normalised by 4)
Page | 19
'big two' are, indications suggest that this company will continue to go from strength to
strength.
Early 8-bit microprocessors like the Intel 8080 or the Motorola 6800 had only a few
simple instructions. They didn't even have an instruction to multiply two integer numbers, for
example, so this had to be done using long software routines involving multiple shifts and
additions. Working on the belief that hardware was fast but software was slow, subsequent
microprocessor development involved providing processors with more instructions to carry out
ever more complicated functions. Called the CISC (complicated instruction set computer)
approach, this was the philosophy that Intel adopted and that, more or less, is still followed by
today's latest Core i7 processors.
In the early 1980s a radically different philosophy called RISC (reduced instruction set
computer) was conceived. According to this model of computing, processors would have only
a few simple instructions but, as a result of this simplicity, those instructions would be super-
fast, most of them executing in a single clock cycle. So while much more of the work would
have to be done in the software, an overall gain in performance would be achievable. ARM
was established on this philosophy.
Semiconductor companies usually design their chips and fabricate them at their own
facility (like Intel) or lease it to a foundry such as TSMC. However, ARM designs processors
but neither manufactures silicon chips nor markets ARM-branded hardware. Instead it sells, or
more accurately licences, intellectual property (IP), which allows other semiconductor
companies to manufacture ARM-based hardware. Designs are supplied as a circuit description,
from which the manufacturer creates a physical design to meet the needs of its own
manufacturing processes. It's provided in a hardware description language that provides a
textual definition of how the building blocks connect together. The language used is RTL
(register transfer-level).
System on Chip (SoCs)
A processor is the large component that forms the heart of the PC. A core, on the other
hand, is the heart of a microprocessor that semiconductor manufacturers can build into their
own custom chip designs. That customised chip will often be much more than what most people
would think of as a processor, and could provide a significant proportion of the functionality
required in a particular device. Referred to as a system on chip (SoC) design, this type of chip
Page | 20
minimises the number of components, which, in turn, keeps down both the cost and the size of
the circuit board, both of which are essential for high volume portable products such as
smartphones.
ARM powered SoCs are included in games consoles, personal media players, set-top
boxes, internet radios, home automation systems, GPS receivers, ebook readers, TVs, DVD
and Blu-ray players, digital cameras and home media servers. Cheaper, less powerful chips
are found in home products, including toys, cordless phones and even coffee makers. They're
even used in cars to drive dashboard displays, anti-lock breaking, airbags and other safety-
related systems, and for engine management. Also, healthcare products is a major growth area
over the last five years, with products varying from remote patient monitoring systems to
medical imaging scanners. ARM devices are used extensively in hard disk and solid state
drives. They also crop up in wireless keyboards, and are used as the driving force behind
printers and networking devices like wireless router/access points.
Modern SoCs also come with advanced (DirectX-9 equivalent) graphics capabilities
that can surpass game consoles like the Nintendo Wii. Imagination Technologies, which was
once known in the PC world with its “PowerVR” graphics cards, licenses its graphics
processors designs to many SoC makers, including Samsung, Apple and many more. Others
like Qualcomm or NVIDIA design their own graphics architecture. Qualcomm markets its
Figure 30: A smartphone SoC; Qualcomm's OMAP
Page | 21
products under the OMAP series. NVIDIA markets under Tegra brand and other companies
such as Apple market theirs as A series. HTC, LG, Nokia and other smartphone manufacturers
do not design their own SoCs but use the above mentioned.
Finally, SoCs come with a myriad of smaller co-processors that are critical to overall
system performance. The video encoding and decoding hardware powers the video
functionality of smartphones. The image processor ensures that photos are processed properly
and saved quickly and the audio processor frees the CPU(s) from having to work on audio
signals. Together, all those components -and their associated drivers/software- define the
overall performance of a system.
Figure 31: A SoC for tablet; Nvidia TEGRA
Page | 22
Conclusion
Computers have truly revolutionized our world and have changed the way we work,
communicate and entertain ourselves. Fuelled by constant innovations in chip design and
transistor technology this evolution doesn’t seem to be bothered to stop. In recent years, there
have been tremendous shifts in computing trends with mobile computers such as tablets and
smartphones becoming more and more preferable, possibly, due to lowering costs and prices.
While computing did start with the microprocessor, it is headed towards a scheme that
incorporates the microprocessor as a smaller subset of a larger system. One that incorporates
graphics, memory, modem and video transcoding co processors on a single chip. The SoC era
has begun…
References
[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic
Architecture, [online] Available: http://www.intel.com/products/processor/manuals
[2] King, J. ; Quinnell, E. ; Galloway, F. ; Patton, K. ; Seidel, P. ; Dinh, J. ; Hai Bui and
Bhowmik, A., "The Floating-Point Unit of the Jaguar x86 Core," in 21st IEEE
Symposium on Computer Arithmetic (ARITH), 2013, pp. 7-16.
[3] Ibrahim, A.H. ; Abdelhalim, M.B. ; Hussein, H. ; Fahmy, A., "Analysis of x86
instruction set usage for Windows 7 applications," in 2nd International Conference on
Computer Technology and Development (ICCTD), 2010, pp. 511-516.
[4] PC Architecture, Acid Reviews, [online] 2014,
http://acidreviews.blogspot.in/2008/12/pc-architecture.html (Accessed: 2nd February
2014).
[5] Alpert, D. and Avnon, D., "Architecture of the Pentium microprocessor," IEEE
Micro, vol. 13, Issue 3, pp. 11-21, 1993.
[6] Computer Processor History, Computer Hope, [online] 2014,
http://www.computerhope.com/history/processor.htm (Accessed: 2nd February 2014).
[7] Gartner Press Release, Gartner Analyst, [online] 2014,
http://www.gartner.com/newsroom/id/2610015 (Accessed: 8th February 2014).
[8] Intel Processor Number, CPU World, [online] 2014, http://www.cpu-
world.com/info/Intel/processor-number.html (Accessed: 9th February 2014).