Upload
unimib
View
0
Download
0
Embed Size (px)
Citation preview
Università degli Studi di Milano BicoccaDipartimento di Informatica, Sistemistica e ComunicazioneCorso di laurea in Informatica
Persistent-memory awareness in
operating systems
Relatore: Flavio De PaoliCorrelatore: Leonardo Mariani
Relazione della prova �nale di:Diego AmbrosiniMatricola 031852
Anno Accademico 2013-2014
Persistent-memory awareness in operating systems.
Copyright© 2015 Diego Ambrosini. Some rights reserved.
http://creativecommons.org/licenses/by/4.0/
This work is licensed under a Creative Com-mons Attribution 4.0 International Licen-se (CC-BY 4.0), except for the followingitems:
- Figure 1.3: courtesy of TDK Corporation, see [76]. © TDK Corporation, allrights reserved.
- Figure 1.4: courtesy of Fujitsu Ltd, see [71]. © Fujitsu Ltd, all rights reserved.
- Figure 1.5: courtesy of Everspin Technologies Inc, see [66]. © 2013 EverspinTechnologies Inc, all rights reserved.
- Figure 1.6:
on the left, Figure 1.6a, courtesy of American Chemical Society, see [120,Page 241, Figure 1]. © 2010 American Chemical Society, all rights reserved.
on the right, Figure 1.6b, © Ovonyx Inc, all rights reserved.
- Figure 1.7: courtesy of John Wiley & Sons Inc, see [137, page 2632, Figure 1].© 2009 John Wiley & Sons Inc, all rights reserved.
- Figure 1.8: see [52, Page 6946, Fig. 1], licensed under CC-BY agreement.© 2013 Owner Societies, some rights reserved.
- Figure 2.4: see [60, page 3, Figure 1], licensed under CC-BY agreement. © 2014Oikawa, some rights reserved.
- Figure A.1: courtesy of McGraw-Hill Education, see [57, page 247, chapter 5,Figure 5.5]. © 2001 The McGraw Companies, all rights reserved.
.
Abstract
Persistence, in relation to computer memories, refers to the ability to retain thedata in time, without the need of any power supply. Until now, persistence hasbeen always supplied by hard disks or Flash memory drives. Regarding memories,persistence has been until now conceived as a slow service, while volatility hasbeen thought in relation to speed, as it happens in DRAM and in SRAM. Such adichotomy represents a bottleneck hard to bypass.
The panorama of memory devices is changing: new memory technologies arebeing currently developed, and are expected to be ready for commercialization inthe next years. These new technologies will o�er features that represent a majorqualitative change: these memories will be fast and persistent.
This work aims to understand how these new technologies will integrate intooperating systems, and to which extent they have the potential to change theircurrent design. Therefore, in the intent to gain this understanding, I have followedthese goals throughout the work:
- to analyze the economical and technological causes that are triggering thesequalitative changes;
- to present the new technologies, along with a classi�cation and a descriptionof their features;
- to analyze the e�ects that these technologies might have on models that arecurrently used in the design of operating systems;
- to present and summarize both the opportunities and the potential issuesthat operating system designers will have to manage in order to use conve-niently such new memory devices;
- to analyze the proposals found in scienti�c literature to exploit these newtechnologies.
Following the structure of the title, the �rst chapter is focused mainly on me-mory devices, whereas the second chapter will be centered on operating systems.
iv
The �rst chapter, initially, tries to grasp the causes of the expected technolo-gical change, beginning with economical observations.
Subsequently, the chapter contains some considerations about how di�erent butcomplementary aspects of the economical relation are urging the semiconductorindustry to �nd new memory technologies, able to satisfy the increasing demandof features and performances.
Afterwards, the paper shifts its focus on current technologies and their features.After a brief summary of each speci�c technology, a short description about theissues shared among all current charge-based technologies follows.
Then, the reader �nds a presentation of each of the new memory technologies,presented following the order of the ITRS taxonomy related to memory devices:�rstly are presented the ones in a prototypical development stage (MRAM, Fe-RAM, PCRAM), then followed by those in an emerging development stage (Fer-roelectric RAM, ReRAM, Mott Memory, Macromolecular and molecular memory).
The second chapter aims, in its �rst part, to understand the extent to whichcurrent funding models (Von Neumann model and the memory hierarchy) arein�uenced by the new technologies. As far as the computational model (fetch-decode-execute) does not change, the validity of the Von Neumann model seemsto hold. Conversely, as far as it concerns the memory hierarchy, the changesmight be extensive: two new layers should be added near to DRAM. After theseconsiderations, some additional observations will be made about how persistenceis just a technological property, and how a speci�c model would be necessary toexplicit how an operating system uses it.
Afterwards, there will be a description of the use of non-volatile memory tech-nologies such as Phase Change RAM inside fast SSDs. Even if this approach isquite traditional, the scienti�c literature explains how faster devices would requirea deep restructuring of the I/O stack. Such a restructuring is required because thecurrent I/O stack has been developed concentrating on functionality, not e�ciency.Fast devices would instead require a high e�ciency.
This second chapter will then present the most appealing use of persistentmemories: as storage class memory, either in replacement of common DRAM,either in tandem with DRAM on the same memory bus. This approach has perse a higher level of complexity, and under the umbrella of SCM there are manyviable declinations of use. Firstly some preliminary observations common with allthe approaches are made. Then, two easier approaches are presented (no-changeand Whole System Persistence). Finally, the approaches that aim to developa persistent-memory aware operating system will be introduced: most of themuses the �le system paradigm to exploit persistence into main memory. The paperproceeds in presenting �rstly some peculiarities of the current I/O path used in the
v
Linux operating systems, remarking how caching already moved persistence intomain memory; afterwards, some other considerations about consistency are made.Those observations then are used to understand the main di�erences betweenstandard I/O in respect with memory operations. After a brief presentation ofsome incomplete approaches proposed by the author, a framework to classify thethoroughness used by the di�erent approaches follows.
The paper continues by reporting the e�orts of the Linux community and thenintroduces each speci�c approach found in literature: Quill, BPFS, PRAMFS,PMFS, SCMFS. Concluding the part about �le system, there will be some remarksabout integration, a mean to use both �le system services and memory servicesfrom the same persistent memory.
Finally, persistent-memory awareness into user applications, along with a briefintroduction of the two main proposals coming from two academic research groupswill be presented.
Abstract (italiano)
Il concetto di �persistenza�, relativamente alle tecnologie di memoria, si riferiscealla capacità di mantenere i dati anche senza la necessità di alcuna alimentazioneelettrica. Sino a oggi, essa è stata prerogativa esclusiva dei dispositivi di memo-rizzazione lenti, quali ad esempio gli hard disk e le memorie Flash. La persistenzaè sempre stata immaginata come una funzionalità intrinsecamente lenta, mentrela volatilità, caratteristica tipica delle memorie DRAM e SRAM, è sempre stataassociata alla loro velocità. Tale dicotomia è tuttora un limite di�cile da aggirare.
Il panorama delle memorie tuttavia sta subendo dei cambiamenti strutturali:nuove tecnologie sono in corso di sviluppo e l'industria dei semiconduttori ha inprogramma di cominciarne la commercializzazione nei prossimi anni. Questi nuovidispositivi avranno delle caratteristiche che rappresenteranno un rilevante cambioqualitativo rispetto alle attuali tecnologie: la più signi�cativa di�erenza è che que-ste memorie saranno veloci e persistenti.
Il presente studio intende proporre un'analisi di come tali nuove tecnologie po-tranno integrarsi nei sistemi operativi, e di quali entità potranno essere le ricadutesulla progettazione degli stessi. Vengono perciò a�rontate:
- un'analisi delle cause economiche e tecnologiche di questi cambiamenti;
- una presentazione di ciascuna delle nuove tecnologie, assieme a una loroclassi�cazione e a una breve valutazione delle loro caratteristiche;
- un'analisi degli e�etti che queste nuove memorie possono avere sui principalimodelli usati per lo sviluppo dei sistemi operativi;
- una panoramica sulle opportunità e sulle problematiche potenziali che glisviluppatori dei sistemi operativi dovrebbero tenere in considerazione persfruttare al meglio tali tecnologie;
- una rassegna delle varie proposte presenti in letteratura per usare al megliole nuove memorie persistenti.
vii
In stretta connessione al titolo, la prima parte del lavoro è incentrata princi-palmente sui nuovi dispositivi di memoria persistente, mentre nella sua secondaparte si focalizza sui sistemi operativi.
Nel primo capitolo si approfondiscono le cause di questi cambiamenti tecnolo-gici, a partire da alcune considerazioni di natura economica; proseguendo, vienemostrato come le di�erenti (seppur complementari) necessità dei produttori di se-miconduttori e dei loro consumatori, stiano progressivamente spingendo la ricercaverso nuove tecnologie di memoria capaci di soddisfare le sempre crescenti richiestedi prestazioni e funzionalità.
L'attenzione viene poi spostata sulle attuali memorie e sulle loro caratteristiche.Dopo una breve descrizione di ogni tecnologia, viene svolta una breve analisi dialcuni dei problemi comuni a tutte le tecnologie di memoria basate sulla caricaelettrica.
Vengono quindi presentate le nuove memorie, seguendo l'ordine proposto dallatassonomia della ITRS (International Technology Roadmap for Semiconductors):prima sono descritti quei dispositivi la cui produzione è già cominciata ma il cuigrado di maturità del prodotto è ancora iniziale (MRAM,FeRAM,PCRAM), men-tre successivamente vengono mostrati quei dispositivi il cui stato di sviluppo èancora alle prime fasi (Ferroelectric RAM, ReRAM, Mott Memory, memorie ma-cromolecolari e molecolari).
Il secondo capitolo, nella sua prima parte, a�ronta le tematiche sull'eventualitàche tali memorie possano intaccare la validità di alcuni modelli fondamentali per losviluppo dei sistemi operativi, quali il modello della macchina di Von Neumann e lagerarchia di memoria. Viene sottolineato come la validità del modello di Von Neu-mann resti immutata. Tuttavia, si evidenzia come tali memorie apportino dellemodi�che importanti alla attuale gerarchia di memoria, la quale vedrebbe l'ag-giunta di due nuovi livelli sotto quello relativo alle memorie DRAM. Dopo questevalutazioni, ne sono proposte di ulteriori rispetto al concetto stesso di persistenza:viene sottolineato come esso sia sostanzialmente una proprietà di alcune tecnolo-gie, e di come sia necessaria una �modellizzazione� che espliciti come il sistemaoperativo intenda utilizzarla.
Successivamente, si fornisce una descrizione di come le memorie persistenti (adesempio le PCRAM) possano essere impiegate per costruire degli SSD più veloci.Sebbene un tale approccio sia piuttosto �conservativo�, viene evidenziato come unasimile soluzione richieda una profonda modi�ca dei meccanismi usati per e�ettuarele operazioni di I/O. L'attuale gestione degli I/O è infatti concentrata sull'o�ertadi numerose funzionalità, mentre la sua e�cienza non è stata curata nel tempo:questo nuovo tipo di memorie tuttavia richiede un'alta e�cienza del software.
viii
Il secondo capitolo procede presentando la modalità d'uso più interessante dellenuove memorie persistenti: nel bus di memoria, in sostituzione alle comuni DRAM,oppure al loro �anco. Un simile approccio ha un grado di complessità superiore, epuò essere declinato in molte di�erenti modalità d'uso.
Vengono svolte alcune osservazioni preliminari, comuni a tutti gli approcci;successivamente vengono presentati quelli più semplici (nessun-cambio e WholeSystem Persistence). In seguito sono introdotti quelli che prevedono la modi�cadei sistemi operativi per realizzare una reale �consapevolezza� d'uso delle memoriepersistenti: la maggior parte di essi sfrutta il paradigma del �le system per otteneretale scopo. Vengono presentati alcuni dettagli dell'attuale gestione dell'I/O inambiente Linux, sottolineando come tramite il caching, la persistenza si è giàspostata dai dispositivi lenti alla memoria principale; vengono fatti poi ulterioriapprofondimenti circa la consistenza dei dati. Queste osservazioni quindi sonousate per comprendere le principali di�erenze tra le operazioni di I/O e quelle dimemoria e per approcciare e�ettivamente le modi�che al sistema operativo. Dopouna breve presentazione di alcune soluzioni non consolidate, sono successivamenteintrodotti degli elementi valutativi per comprendere l'e�cacia e la profondità diquelle analizzate nel seguito.
Il lavoro continua nella presentazione delle proposte documentate sia dalla co-munità degli sviluppatori Linux, sia dalla letteratura scienti�ca: Quill, BPFS,PRAMFS, PMFS, SCMFS. Concludendo la parte riguardante i sistemi operativi,sono fatte delle osservazioni sul concetto di integrazione, ovvero di un metodo perpermettere l'uso condiviso delle le memorie persistenti da parte del kernel e daparte del �le system.
In�ne, il lavoro si conclude toccando l'argomento della �consapevolezza� dellapersistenza nelle applicazioni, per valutare la proposta di permettere alle applica-zioni un uso diretto delle memorie persistenti.
Contents
Copyright notes ii
Abstract iii
Abstract (italiano) vi
Contents ix
List of Figures xi
List of Tables xii
Glossary xiii
Introduction 1
1 Technology 31.1 Generic issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 An economical view . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 A technological view . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Technology � the present . . . . . . . . . . . . . . . . . . . . . . . . 191.2.1 Mechanical devices . . . . . . . . . . . . . . . . . . . . . . . 191.2.2 Charge-based devices . . . . . . . . . . . . . . . . . . . . . . 211.2.3 Limits of charge-based devices . . . . . . . . . . . . . . . . . 26
1.3 Technology � the future . . . . . . . . . . . . . . . . . . . . . . . . 271.3.1 Prototypical . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.3.2 Emerging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.3.3 From the memory cell to memories . . . . . . . . . . . . . . 40
2 Operating Systems 422.1 Reference models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.1 The Von Neumann machine . . . . . . . . . . . . . . . . . . 43
x
2.1.2 The �memory� and the memory hierarchy . . . . . . . . . . . 452.1.3 A dynamic view in time . . . . . . . . . . . . . . . . . . . . 532.1.4 Viable architectures . . . . . . . . . . . . . . . . . . . . . . . 54
2.2 Fast SSDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.2.1 Preliminary design choices . . . . . . . . . . . . . . . . . . . 572.2.2 Impact of software I/O stack . . . . . . . . . . . . . . . . . . 60
2.3 Storage Class Memory: operating systems . . . . . . . . . . . . . . 662.3.1 Preliminary observations . . . . . . . . . . . . . . . . . . . . 672.3.2 No changes into operating system . . . . . . . . . . . . . . . 712.3.3 Whole System Persistence . . . . . . . . . . . . . . . . . . . 722.3.4 Persistence awareness in the operating system . . . . . . . . 742.3.5 Adapting current �le systems . . . . . . . . . . . . . . . . . 832.3.6 Persistent-memory �le systems . . . . . . . . . . . . . . . . . 852.3.7 Further steps . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.4 Storage Class Memory and applications . . . . . . . . . . . . . . . . 94
Conclusions 100
A Asides 105A.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.2 Physics and Semiconductors . . . . . . . . . . . . . . . . . . . . . . 106A.3 Operating systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
B Tables 113
Bibliography 118
Acknowledgments 135
List of Figures
1.1 The ubiquitous memory hierarchy . . . . . . . . . . . . . . . . . . . 151.2 ITRS Memory Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 161.3 The Flash memory cell . . . . . . . . . . . . . . . . . . . . . . . . . 221.4 Ferroelectric crystal bistable behavior . . . . . . . . . . . . . . . . . 291.5 Magnetic Tunnel Junction . . . . . . . . . . . . . . . . . . . . . . . 301.6 Phase Change memory cell . . . . . . . . . . . . . . . . . . . . . . . 321.7 RRAM memory cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.8 ElectroChemical Metallization switching process . . . . . . . . . . . 36
2.1 The Von Neumann model . . . . . . . . . . . . . . . . . . . . . . . 442.2 The memory hierarchy with hints . . . . . . . . . . . . . . . . . . . 472.3 A new memory hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 552.4 The Linux I/O path . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.1 Field e�ect transistor . . . . . . . . . . . . . . . . . . . . . . . . . . 111
List of Tables
B.1 Top 10 technology trends for 2015 . . . . . . . . . . . . . . . . . . . 113B.2 Performance comparison between memories . . . . . . . . . . . . . . 114B.3 4K Transfer times with PCM and other memories . . . . . . . . . . 115B.4 Bus latency comparison . . . . . . . . . . . . . . . . . . . . . . . . 116B.5 HDD speed vs bus theoretical speed . . . . . . . . . . . . . . . . . . 116B.6 Persistence awareness through �le systems . . . . . . . . . . . . . . 117
Glossary
ACID Atomicity, Consistency, Isolation, Durability
ATA AT Attachment (I/O interface standard)
BE Bottom Electrode
BIOS Basic Input Output System
BTRFS B-TREe File System
CB Conductive Bridge
CBRAM Conductive Bridge RAM
DAX Direct Access and XIP
DDR Double Data Rate (DRAM technology)
ECC Error Correcting Code
FLOPS Floating Point Operation Per Second
FTL Flash Translation Layer
FUSE File system in USEr space
GPU Graphics Processor Unit
HMC Hybrid Memory Cube
HRS High Resistance Status
L1 Level 1 cache
L2 Level 2 cache
L3 Level 3 cache
xiv
LRS Low Resistance Status
LVM Logical Volume Manager
MFMIS Metal-Ferroelectric-Metal-Insulator-Semiconductor
MIM Metal-Insulator-Metal
MTD Memory Technology Device (Linux subsystem)
NVM Non-Volatile Memory
PATA Parallel ATA
PCI Peripheral Component Interconnect (I/O interface standard)
PMC Programmable Metallization Cell
RAMFS RAM File System
RS Resistive Switching
SATA Serial ATA
STT-MRAM Spin-Transfer Torque Magnetic RAM
TCM ThermoChemical Mechanism
TE Top Electrode
TER Tunnel Electro-Resistance
TLB Translation Look-aside Bu�er
TMPFS TEMPorary File System
VFS Virtual File System
WAFL Write Anywhere File Layout (�le system)
XFS X File System
XIP eXecute-In-Place
ZFS Z File System
Introduction
Computers are programmed to execute tasks, somehow manipulating data: they
should have a mean for retrieving such data, manipulating it, and �nally store it
back when tasks are completed. Similarly to our human experience, the devices
used in computers to store data and to retrieve it, are called memories.
Although memory naturaliter refers to the ability to remember data in time,
computer science distinguishes between persistent and volatile memories, depend-
ing on the length of that �time�: in fact, some memory devices are de�ned as
volatile, whereas some others are de�ned as persistent. The former class of devices
can't store any data permanently between power-o� events, and henceforth that
data gets lost when power is o�. Conversely, memories belonging to the latter
class feature the ability to �remember� the stored data in time, regardless to the
power status: in these memory devices data �persists� in time1.
Persistence and volatility are just properties of each speci�c memory technol-
ogy, and have always been present in computing devices: for example, punched
cards in the 50s o�ered persistence. Today, it is supplied by hard disks and SSDs.
Conversely, volatility has always been present in the main memory. This scenario
remained almost unchanged throughout the years: since volatile memories were
fast and expensive, whereas persistent ones were slow and cheap, persistence has
been always relegated to I/O devices.
This work �nd its raison d'être on the fact that the semiconductor industry is
preparing to produce and sell new memory technologies whose features are much
di�erent from those seen until now. In time, this particular industry has contin-
ued to enhance its o�er, producing memories gradually better from one generation
1Actually, the evaluating parameter to certify a memory as a persistent memory is the abilityto store data correctly for at least ten years.
2
to the next one: they extensively took advantage of the bene�ts o�ered by tech-
nology scaling, reaching a continuous increase both in memory densities2 and in
their performances. Despite these enhancements, however, the fundamental fea-
tures of memories remained almost the same throughout the years: changes were
almost always quantitative. The new technologies promise instead to be both fast
and persistent, giving to engineers the choice to use them in I/O devices or in
the memory bus: in either ways, such memories would represent e�ectively a ma-
jor qualitative change. Henceforth, the term �persistent-memory� refers to those
technologies that are able to o�er persistence into main memory. Usually, such
memories are also referred to as non-volatile memories (NV). The �rst chapter of
this work will focus on the speci�cities of these new devices.
Operating systems are developed to use conveniently and e�ectively computers
and are carefully designed to use at best each feature o�ered by their hardware.
Volatility of the main memory is probably one of the most important assumptions
that have always in�uenced the development of operating systems. Since these new
technologies could move persistence to main memory, scientists and researchers
are trying to understand what aspects of the operating systems would need to
be modi�ed in order to to adapt conveniently to such a major change. These
modi�cations would then lead to persistence-aware operating systems: the e�orts
currently made by the scienti�c community to manage this transition are the
subject of the second chapter.
2Density, when referring to memory devices, refers to the quantity of bits per area achievedby a given technology. Without changing the area consumed, a better density means highercapacity.
Chapter 1
Technology
Prior to analyze computing models and operating system issues, it could be useful
to present the technological changes that are expected: this �rst part aims to
describe the memory technologies used in current computing systems and presents
those that we could use in the future.
1.1 Generic issues
Here follow some generic considerations about the economical and technical aspects
that can be useful to gain a better understanding about the peculiarities of both
the current memory technologies and the ones that are currently competing to
become the ones used tomorrow. The reason of these observations is to share
with the reader a sort of framework that could help to evaluate the causes that
are conducting to technological changes in memory devices and to understand the
expectations that are relied on them.
1.1.1 An economical view
Throughout the world, each economical activity produces and distributes goods
and services that are then sold to consumers (o�er side). In turn, human needs,
leveraging the demand of goods produced by businesses, are the engine of each
economical activity (demand side).
4
The same relation exists in the semiconductor market: computing is indeed
realized by the semiconductor industry, in turn embodied by a myriad of �rms
that compete to survive, earn money and reach a leading position on the market
(o�er side). Consumers then buy semiconducting devices that satisfy their needs
(demand side).
With the intent of gaining a better insight about the reasons that are triggering
a qualitative change in the memory devices panorama, I will focus brie�y on some
aspects of both the o�er and the demand side of this tight relation.
O�er side � The need to pursue Moore's prediction to exponential
When talking about trends in semiconductor industry, the most frequently cited
one is usually expressed as Moore's law (i.e. the number of transistors in integrated
circuits grows exponentially). I will follow this tradition, believing that its use still
o�ers a useful insight on computer industry itself. To be exact, it has to be said
that semiconductor industry is trying to update Moore's law with a so-called �More
than Moore� forecast [5]. Nonetheless, also the �More than Moore� approach is
still exponential, albeit equivalently.
Moore's law, asserting the exponential growth of computational power, is an
economic conjecture, not strictly an economic �law�. Moore rooted its thoughts
on its experience in the semiconductor industry, and on the fundamental principle
according to which each economical activity has the primary goal of maximizing
its pro�tability. He observed how, in the semiconductor industry, each two years,
its maximum pro�tability point coincided with both:
- A doubling of the transistor number into integrated circuits;
- A corresponding price fall of each transistor (cost per function).
This double result, started from the inception of the semiconductor industry, con-
tinued to occur until nowadays. The former permitted the exponential increase
of computational power, whereas the latter permitted to transform computational
resources from a rare commodity to a widespread consumer good.
From a business perspective, throughout the years the pursuing of Moore's
trend has guaranteed to the computer industry high revenues and the much de-
sired maximization of pro�ts. This has been possible because Moore's funded his
5
thoughts precisely on the pro�tability of the semiconductor industry [107]. Quite
roughly, industry must optimize pro�tability, thus industry must follow Moore's
law, as long as this result is achievable.
This point is much discussed by analysts: achievability is a great concern.
A classic question is if this exponential trend could continue also in the future:
despite the use of the word �law�, this exponential growth is not a guarantee,
it is instead the result of continuous research e�orts, technological advancements,
achieved know-how, and so on. Until now, a series of technologies have guaranteed
to the computer industry the achievability of the exponential growth, but the
question about whether in the future some other new technologies will permit the
same pace is currently open.
These concerns are not new: throughout the years, this question has been raised
many times: despite the concerns, Moore and many other analysts afterwards,
have been right to forecast the continuing of its trend until now. Even if Moore
itself admits that �No exponential is forever�, he however explains that computer
industry is trying to delay its end forever [54]. As a consequence, throughout the
years, the lifecycle of di�erent technologies has been carefully managed to permit
this continuous delay. As depicted into reports from the International Technology
Roadmap for Semiconductors (ITRS, see section A.1), the semiconductor industry
has currently the �rm expectation to be still able to delay the end of exponential
growth for many years: they are expecting that �equivalent Moore� will hold fast
also at least up to an impressive 2025 [83].
Through the years, �achievability� and �delay� were made possible by using two
complementary strategies:
- lengthen as more as it is possible the life of current technologies (at least as
long as production is pro�table);
- promote research e�orts to design, develop and at last produce next-generation
technologies, in order to be ready to step to a better technology when current
ones become obsolete or less pro�table.
These two goals are the two alternative approaches that inspire each research
e�ort made into universities, research laboratories, and industries. Since the 50s,
6
for example, the former strategy has permitted to reduce incredibly the size of
transistors1, reaching today that of just few nanometers. Usually the latter strat-
egy, surely more challenging, does produce new technologies: as an example, hard
disk during the 50s and Flash memories during the 80s were the o�springs of this
strategy.
Demand side � A selection among current trends and needs
As the o�er of the industry must comply with the needs of the customers in order
to be sold, a brief analysis of current needs of semiconductors customer would be
valuable. Often, these needs do in�uence profoundly the o�er, forcing producers
to adapt quickly to new challenges.
As depicted into table B.1, current technology trends are focusing principally
on the following areas:
Computing everywhere: price of electronic devices is continuously decreasing
(following the cost per function), thus facilitating their spread. Without
lingering too much about the important role of computing in almost every
human activity, the use of computing resources is further spreading: today
portable devices like smartphones and tablets are fully �edged computers,
just as laptops or traditional workstations. Wearable devices are just another
step into the direction of �computing everywhere�.
Another aspect of the same trend is the spread of �smart� logic into a plethora
of simpler devices: washing machines, alarm systems, thermostats, home
automation systems, TVs, and many other appliances used every day by
millions of people. Even in much smaller devices computing is being o�ered
as a standard feature: smart cards, networks of sensors, RFID devices, micro-
devices, all have some grade of computational power.
One of the causes of such a widespread use of computing resources lays on
a simple observation: if millions of transistors build up a CPU having both
a relevant computational power and a relevant price, few transistors can be
1i.e. scaling: pass from one technology node to a smaller one, reducing the feature size.
7
used to build far simpler electronic devices still having (reduced) computa-
tional power at a much lower price. This in turn permits to industries to
select the needed balance between cost and functionality of their products.
Internet of Anything: this point follows the preceding one, as people need con-
nectivity along with computing: without it, in a world where information
and interaction is almost in �real time�, computing resources would be use-
less. So, just as computing resources are spreading across an in�nite number
of devices, the same devices are increasingly becoming able to connect to
various networks. Focusing on the Internet, analysts are expecting an expo-
nential growth of devices using it: from the smallest devices to the biggest
data center, connectivity to the Internet is fundamental. As a consequence
of this ceaseless growth, both the network tra�c and the overall amount of
data produced by each device increase.
Data Centers growth continues: just as the falling of the cost per function fa-
cilitates and urges the widespread of cheap and ubiquitous devices, the same
cost reduction permits to concentrate bigger masses of computing power: this
is the case of data centers, as the ones currently built and used by companies
like Microsoft, Google, Amazon and Facebook to provide to their customers
cloud services.
Relating to data centers, analysts are expecting both:
- growth in the size of big data centers, re�ecting the increase of use of
data center services (Xaas2 patterns, social networking, hybrid cloud
patterns, and so on);
- growth in the number of big data centers, as businesses are increasingly
using co-location [68].
Some scienti�c, academic, and government institutions are trying to build
exascale-level supercomputers [12, 67, 70] in order to be able to solve huge
computing tasks (i.e. simulation and modeling). An exascale supercomputer
would be able to reach at least one exa-FLOPS performance. Such e�orts
2Anything (X) as a service
8
too, go in the same direction of more computing power and more storage
volume into data centers.
Finally, �Big Data� too falls into this category of trends and is somehow
related both to bigger data centers and to exascale computing. Computer
Desktop Encyclopedia refers to it as �the massive amounts of data collected
over time that are di�cult to analyze and handle using common database
management tools. The data are analyzed for marketing trends in business
as well as in the �elds of manufacturing, medicine and science. The types
of data include business transactions, e-mail messages, photos, surveillance
videos, activity logs and unstructured text from blogs and social media, as
well as the huge amounts of data that can be collected from sensors of all
varieties� [65]. �Massive amounts of data� need huge databases and huge
data storage, as those found only in big data centers. Similarly, the trends
about predictive analytics and of context-rich systems, taken from table B.1,
fall in this category too.
Security and safety concerns increase: This is both a current trend in tech-
nology and a consequence of the points just described. Since:
- computing devices are spreading;
- connectivity capabilities are spreading in every area where computing
devices are used;
- �elds of application of such computing devices are increasing, touch-
ing very sensitive ones, as those related to human health and medical
science;
- each computing device generates data and the whole amount of data
generated each year is drastically increasing;
- the use of connected services is increasing, with the e�ect of placing a
huge volume of data �on the cloud�,
then the e�orts to protect such devices and their data will be signi�cant, as
those to engineer the safest ways to use them.
9
The ones described above are some of the major technology trends currently no-
ticed by analysts. Anyway, just because each trend is ultimately a speci�c pattern
of use of bare hardware resources, those trends must translate into more speci�c
requests placed to the semiconductor industry: in the end, the semiconductor
industry produces transistors and integrated circuits.
Given the trends just seen, the requests currently focus on these areas:
Speed: people are demanding an ever increasing amount of information retrieved
in real time. The use of web search engines, social networks and cloud
computing platforms is extensive: they expect their queries to be answered
extremely fast. In order to be up to these expectations, information technol-
ogy businesses need technologies allowing extreme speed. People expect to
use fast personal devices too (laptops, tablets, smartphones, and so on).
Computational power and parallelism: people demand computational power,
not only speed. Tasks performed by modern smartphones increase continu-
ously both in number and complexity: hardware must have enough compu-
tational power to ful�ll every request. Moreover, people expect many tasks
to be executed concurrently, henceforth further increasing the demand of
computing performances. Data centers too do not di�erentiate from this
need: they are asked to solve many and concurrent requests of continuously
increasing complexity.
Power E�ciency: while until a few years ago this matter was not of primary
importance, nowadays it is indeed pivotal. Power e�ciency is fundamental
both in the domain of small devices and in that of the biggest realities. While
it is simple to understand the need for power e�ciency in a smartphone or
(in�nitely more) in a modern pacemaker, this issue arises also relating to
data centers. As an example, one of the Google data center is adjacent to a
power plant and its use of electrical power totals the huge �gure of 75 MW
[29]. Moreover, it is reported that data centers account for the 1.3% of
all electricity use worldwide and for the 2% in the U.S. As data centers are
expected to increase both in number and in dimensions, this issue increases
further in its signi�cance.
10
Expectations on memories
As this work is focused on memories, it is important to remark how each of these
requests directly in�uences the features that memories should have to satisfy the
aforecited needs. I referred until now to those needs mainly as something related
to computing devices, treated as a whole. However, while the e�ective use of each
device has its own �eld of employment, each of them performs its speci�c job just
executing computations on some data: they di�er accordingly to the properties
of the data upon which computations are made (i.e. the data managed from a
phone is voice, whereas data managed by a Internet router are IP packets). Data
is indeed the the object of each computation. While theoretically the simplest
devices (for example, sensors) could just manipulate simple data and transmit it
without the need to store and to retrieve it, memory is nonetheless found in almost
all computing devices.
Since most of them must retrieve and store data in order to e�ectively perform
a computation, the speed of each retrieval and each store represents a upper limit
of the total speed of a computation. This observation evidences how the relation
between computations and memory (as a whole, for the time being without making
any distinction) is strict: as the market asks for speed, this request naturally re�ects
on memories.
The same happens with the other requests: both those related to computational
power and power e�ciency do re�ect naturally to memories.
A further fundamental feature considered when evaluating memories, is den-
sity : since their purpose is the maintaining of data through time, a critical aspect
is that of how much data can be contained into each memory chip. Given the
technology trends just presented, this issue is expected to increase in its central-
ity: increasingly complex computations need to manage increasing quantities of
data. The pervasive and increasing presence of computing devices too generates in-
credibly high amounts of data, increasing the mass of data that potentially should
be stored somewhere. Bigger and bigger data centers, along with exascale super-
computing, need to manage, store, retrieve huge amounts of data too. �Big data�
intrinsically refers to the need to manage a huge and ever increasing amount of
data. All these requirements urge the semiconductor industry to develop denser
11
memories to manage this growth.
The need of persistence
A further remark should pertain to persistence: as persistent memories represent
the triggering reason of this work, the question about whether either the semicon-
ductor industry or the market are demanding persistent memories arises naturally.
Persistent-memory devices would result to be just perfect to be employed as a
storage medium. Until now storage has always deeply su�ered from the classical
dichotomy fast-volatile and slow-persistent: storage has always been slow. More-
over, computers, sooner or later, must use a storage layer to save permanently the
data that they manipulated. In this exact point, as computing devices have the
need to use storage, computations pay a high price in latency: storage slowness
represents a upper limit of performances of storage-related operations.
Referring to hard disks, while their capacities have increased more than 10,000-
fold since the 80s, seek time and rotational latency have improved only by a factor
of two, and this trend is expected to continue [112, 115]. So, as the need of data
storage increases, the problems related with storage speed are expected to increase
too. Fast and persistent memories would thus give the opportunity to overcome
these limitations, permitting to storage to become, at last, extremely fast: this
achievement would represent a major innovation in many computing areas.
As an example, a likely area of use among the many ones, would be into data
centers where caching servers are extensively used to speed-up data retrieval from
storage. Persistent memory would represent at least a big simpli�cation opportu-
nity as the caching server would become unnecessary: in turn, this simpli�cation
would represent an important opportunity of savings in cycles and energy and
ultimately, in money.
From an engineering standpoint, moreover, persistent memories would repre-
sent a useful element of choice to engineers, whilst today there is no choice: either
speed or persistence. Consequently, they would have the opportunity to build
devices better suited to the high and frequently changing needs of the market.
A last observation pertains to the reality: as a matter of facts, the most of the
technologies that are currently under extensive research are persistent.
12
Memories - demand and o�er
While the semiconductor industry is currently producing high volumes of DRAM
and Flash memories, as well as high volumes of hard disks, these industries nonethe-
less are steadily preparing a transition towards new technologies: the current ones
su�er from some limitations that are already challenging. Analysts and the semi-
conductor industry itself are expecting that in the next years, those limitations
will become overwhelming, eventually frustrating both the o�er and the demand
side of the economical relation:
The o�er: current memory technologies have always bene�ted from technology
scaling: it su�ces to think since how long technologies as hard disks (1956)
and DRAM (1968) are on the market. However, current memory technologies
are reaching a point where just scaling is no more easily achievable: while
the reasons will be unfolded subsequently, the fact is that the semiconductor
industry is expecting that such technologies (DRAM, Flash, hard disks) will
result increasingly di�cult to produce and to enhance, thus losing appeal.
The demand: current memory technologies are either fast and not dense or not
fast and dense. This fact, although frustrating, has always been perceived as
a matter of fact. However, since currently the market is increasingly demand-
ing both speed and density, current technologies are becoming increasingly
un�t to ful�ll such requirements. Moreover, just considering an increase in
only speed or density, this goal too is becoming increasingly hard to reach.
The issue about power e�ciency is simply loosely considered into current
technologies: unfortunately, each of them is power hungry. As an example,
currently DRAM accounts for the 30% of the total power consumption in a
data center [30].
Summarizing, since current technologies are evaluated to reach soon their limits,
the semiconductor industry is currently searching for new memory technologies
that would allow both the continuation of the exponential growth and the scaling
trend for a long time. Those technologies, that will be subsequently referred to
as prototypical and emerging, have the potential to succeed into this fundamental
goal. Among them, some will prevail against others; some others maybe will be
13
never produced, whereas others will eventually succeed and become mainstream.
In the case of a successful technology, anyway, that technology will have assured
both the maximum achievable pro�tability to the producing businesses (as a mix of
right timing, ease of production, low costs, minimum cost per function, good know-
how, etc.) and the ful�llment of the requests coming from memory consumers,
being them individuals or organizations.
14
1.1.2 A technological view
Leaving the economical considerations behind, here are introduced the taxonomies
used to present current and future technologies, as well as the technical parameters
used to present the peculiarities of each speci�c technology.
From a hypothetical perfect memory to the memory hierarchy
As it happens in any scienti�c �eld, a given resource is evaluated by measuring the
score of some evaluating parameters; thus, if there was a perfect resource, it would
maximize the score of each feature in a free variable manner. However, referring
to the real world, most of the times some of the features are not free variables,
but they are instead dependent on each other: some improvements in some of the
variables are often at the cost of some other variable. Since no perfect resource
does exist, the same happens with memories; if however a perfect memory existed,
it would maximize each of the following features:
- quantity;
- speed;
- low cost;
- low power consumption;
- data retention;
- write endurance.
In real memories however, some of these features (especially quantity, speed and
cost) are mutually dependent and one is usually increased at the cost of the other:
fast memories are expensive and slow memories are cheaper, the quantity depends
from the compromise between speed and cost. The reason of this correlation lays
on the fact that technologies that focus on speed consume a bigger area of chips
than those that focus on quantity. In turn, this in�uences data density per area:
slower technologies achieve better data density than the faster ones, achieving thus
a better cost per bit function.
15
A modern computer (as well as a modern data center) can run and solve the
same class of problems of a Turing machine [36]. In Turing machines memory
exists in form of a in�nite-long tape containing in�nite cells. Even if memory in
that model is very simple, it is indeed in�nite. Somehow relating to that model,
also in modern computing systems, no matter their size, memory (as a whole, not
relating to �memory� concept of the Von Neumann model) can be though as in�-
nite. For examples, disks, tapes, DVDs, can be inde�nitely added, switched and
removed, creating e�ectively, although indirectly, an in�nite memory. The mem-
ory hierarchy, as presented in �gure 1.1, represents visually the consequence of the
issues just presented: needing an in�nite memory and having a limited amount of
money, necessarily computers are engineered to use few fast memories, and a lot
of slower but cheaper memories with the intent of both maximizing performances
and capacity, while minimizing costs.
Figure 1.1: The ubiquitous memory hierarchy
Current memory technologies will be presented subsequently following the
structure of �gure 1.1, starting from the base of the pyramid, reaching gradu-
ally the top. Di�erently, the new ones will be presented without referring to the
memory hierarchy pyramid: the taxonomy drafted by ITRS will be used instead,
16
as represented into �gure 1.2, believing that it is more helpful to classify the tech-
nologies that will be soon presented.
Figure 1.2: ITRS Memory Taxonomy
The memory cell and its performance parameters
Electronic technologies that do not use mechanical moving parts are de�ned com-
monly as solid state technologies. In the context of solid state memory technologies
a key concept is that of a �memory cell�. A memory cell is the smallest functional
unit of a memory used to access, store and change the information bit, encoded
as a zero or a one. Each memory cell contains:
- The storage medium and its switching mechanics: where the bit is
encoded and the mechanism to execute the switch between 0 and 1.
- The access mechanism: the mechanism to select the correct memory cell.
This concept could be used, although less usefully, also in the case of mechanical
technologies: in this context, the de�nition of memory cell refers to the storage
medium only, since in mechanical technologies usually both the switching and the
access mechanics are shared among the whole set of memory cells.
17
Referring thus to solid state memory technologies, each di�erent technology
have a speci�c memory cell, with speci�c performances. The parameters usually
measured to compare a speci�c technology with the others are:
- feature size (length: F - µm or nm);
- cell area (measured in square F 2);
- read latency (time: µs or ns);
- write latency (time: µs or ns);
- write endurance (scalar: maximum write cycles);
- data retention (time);
- write voltage (V);
- read voltage (V);
- write Energy (pJ/bit or fJ/bit);
- productions process (CMOS, SOI, others);
- con�guration (3-terminal or 2-terminal);
- scalability.
Inside the memory cell
Despite the fact that each speci�c memory technology has its own memory cell
details, di�erent technologies can share similar approaches in some engineering as-
pects : the following insights could be useful to the reader in order to better follow
the terms and the descriptions of each speci�c technology found subsequently.
The storage unit and the write/read logic: the storage unit is responsible of
the e�ective data storage and retention. Each di�erent technology de�nes,
along with the storage unit, the mechanics (and the logic) to write and to
read the data. There are two fundamental methods used to store data in
solid-state memory technologies:
18
- As 3-terminal storage units: this approach uses modi�ed Field Ef-
fect Transistors (FETs), to store data3. Data is stored modifying (rais-
ing or lowering) the threshold voltage of the transistor, thus in�uencing
the current passage between source and drain electrodes, while the po-
tential di�erence applied to the gate electrode is maintained �xed. In
such devices, reading is usually performed by sensing the current �ow at
the drain electrode, applying a potential di�erence to both the source
and the control gate: depending on the value of the threshold volt-
age previously set (and thus depending on the value of the bit stored),
current �ow is permitted or avoided.
- As 2-terminal storage units: technologies using this approach usu-
ally build each storage unit as a stack of one or more di�erent materials
enclosed between two electrodes (terminals). Storage units built this
way usually have a resistive approach: data is read sensing if a probe
current passes (if SLC) or sensing how much current passes (if MLC)
between the two electrodes. The writing mechanics however, depends
on speci�c technologies: some technologies must execute the write pro-
cess using additional logic (i.e. standard MRAM, see section 1.3.1),
whereas some newer ones feature the writing mechanics directly em-
bedded into the storage units (i.e. all ReRAM technologies, see section
1.3.2). In particular, this last class of newer devices presents a con-
�guration similar to that of fundamental electric devices like resistors,
capacitors and inductors, all featuring the 2-terminal approach, and are
sometimes referred to as �memristors� (see section A.2).
Sometimes, a speci�c physical property can be used to implement devices
in both the 2-terminal and the 3-terminal con�gurations, as it is the case of
memory cells built using ferroelectric properties (FeFETs: 3-terminal, FTJs:
2-terminal).
The access control: a memory uses memory cells as building blocks, but there
must be a way to select single memory cells in order to execute a read or a
3FETs have a source, a drain and a gate electrode; see section A.2
19
write. Usually, two alternative approaches can be used:
- Active matrix: a transistor is used to access the storage unit. These
technologies are usually referred as 1T-XX technologies where 1T stands
for �1 transistor� (the controlling transistor) and the �XX� parts depends
on a speci�c technology. This is the case for DRAM, a 1T-1C technology
(C stands for capacitor);
- Passive matrix (crossbar): the storage unit is accessed almost di-
rectly, with at most the only indirection of a non-linear element, used
to avoid half-select problems.
Destructive vs non-destructive reads: Some technologies su�er from the de-
struction of data contained into memory cells when reading is performed:
these type of read operation are called �destructive reads�. Such technologies
usually have additional logic for executing a re-write of the same data after
the read operation to prevent the data loss. Obviously this is a issue that en-
gineers would avoid where possible, thus preferring those technologies whose
read operations are non-destructive. Needless to say, writes are intrinsically
always destructive.
1.2 Technology � the present
The memory technologies that are currently mainstream and can be used by en-
gineers into computing devices will now be presented. For each description the
reader �nds a brief explication about the speci�cities of each technology, as well
as a short dissertation about the claims of the scienti�c community about their
limitations.
1.2.1 Mechanical devices
Magnetic tapes, CDROM, DVD, BLU-DISC
Even if not very interesting in the context of this work, it would be worthwhile to
cite the presence of these devices as they represent the lower part of the memory
20
hierarchy. These devices are speci�cally engineered to store data permanently
through I/O operations at the lower cost possible. At this level of the memory
hierarchy performances are not as important as storage capacity.
Hard disks
Data is stored persistently into a magnetic layer applied on the surface of one or
many rotating discs. This technology is similar, mutatis mutandis, to the one used
in vinyl music discs: a moving head moves following the tangent of concentric
rings to read stored data. Data is retrieved (or written) by the head sensing the
magnetic �eld coded into the magnetic layer. Data transfers are I/O bound, data
is stored and accessed in blocks, reading is not destructive.
Even if �rst hard disks appeared in 1956 (IBM RAMAC [79]), this technology
is still alive and vital: it o�ers high storage density, long data retention period,
low price per bit. Hard disks often are equipped with some amount of cache
memory, needed to raise their performances. This same e�ort to improve the
performances of hard disk has led to �Hybrid Hard Disks�, i.e. fast performing
hard disk with a Flash cache [42]. These products are attempting to approach
Flash-like performances at the price of a common hard disk.
The memory hierarchy pyramid describes at a glance both the advantages
of hard disks and their most noticeable shortcoming: high densities at the cost
of slow speed. Despite the age of this technology, it seems that scaling is still
viable as the densities per square inch are still increasing: while current ones are
about one TB per platter, newer technologies are promising to be even higher
[97, 88]. Unfortunately, hard disks have always su�ered from slow transfer rates
and very high (milliseconds) latencies; moreover, these undesirable aspects cannot
be bypassed without workarounds: the mechanical nature of the hard disk is an
intrinsic limit (the head has to physically reach the position to execute reads and
writes). The mechanical nature of HDDs has other drawbacks: the rotation of the
platters, the move of the head on disks, the read or the write process can all be
a source of failures, either due to a physical breakage, or to external sources as
vibrations and accidental falls [133]. As for power e�ciency, hard disks consume
a high amount of electrical power: consumption could be from around 1.5W to
21
around 9W each4. Consequently, referring to the needs previously described, hard
disk do not seem to comply with those high claims: while they seem to be just
perfect for long term and high volume data retention, they seem to lose appeal
when high throughput, low latencies and power e�ciency are needed.
1.2.2 Charge-based devices
This class of devices, all solid-state, use the electric charge of electrons to store bits
into the memory cells: hence their name. In each description are summarized both
the technological aspects that enable the bit storage and the switching mechanics,
along with a short summary of the speci�c issues of each technologies. Common
issues of this class of devices are instead speci�cally treated afterwards.
NAND and NOR Flash memories
This technology is based on the ability of building �enhanced� �eld e�ect transistors
to achieve the desired behavior, as it happens in EPROMs and in EEPROMs [50,
pp. 9-4]. Memory cells using this technology feature a persistent storage achieved
through a 3-terminal con�guration: data is stored modulating the threshold volt-
age of the �enhanced FET� transistor. These transistors are similar to standard
FETs, except for the fact that there are two gates instead of just one (control
gate and �oating gate). The control gate (the upper one) is the same as in FET
technology, whereas the �oating gate (the lower one) acts as an electron vessel:
being made of a conductive material, it can contain ��oating� electrons thanks to
the insulator layer that encloses it (oxide). The threshold value (and hence the
contained data) is modi�ed ��lling� or �emptying� the �oating gate with electrons5.
In a SLC con�guration, a memory cell with high threshold voltage would be in a
�programmed� status (0, non conducting, vessel full); vice versa, if the threshold
was low, it would be in a �erased� status (1, conducting, vessel empty). Reading
is performed as previously described about 3-terminal memory cells, and the read
process is not destructive, as it does not imply neither the erasing neither the
4Consumption of, respectively, a consumer low power HDD [109], and of a enterprise HDD[105].
5The ��lling� is permitted by Fowler�Nordheim Tunneling or Hot Carrier Injection, whereasusually �emptying� is achieved through Fowler�Nordheim Tunneling. See [50].
22
programming processes. Flash technology is con�gured as a �1T technology�, as
each memory cell has exactly one transistor.
Figure 1.3: The Flash memory cell. © TDK Corporation
Depending on how the Flash memory cells are linked together, NAND Flash
or NOR Flash are produced. Whichever the case, before being programmed each
cell must be in the erased state. In both con�gurations, erasing operation is slow
(milliseconds) and expensive (high power) and is performed in groups of many
bytes, called �erasesize�. Reading is not a destructive operation, NOR �ash can be
either I/O bound or memory bound, NAND is only I/O bound.
Research e�orts in this technologies are extensive, even if this technology can
be considered as mature: the use of a modi�ed �eld e�ect transistor as memory
devices dates back to papers appeared in 1967 [138]. Those studies conducted
thereafter to the development of EPROM and EEPROM, whose main principles
are further exploited in Flash technology, conceived by Dr. Masuoka's researches
in the �rst years of the 80s [49, 124]: NOR �ash was o�cially presented in 1984,
NAND in 1987 [98].
Flash memories can be used to build a large number of memory devices: when
used to build SSDs, performances are surely better than those of hard disks. La-
tency of Flash memory cells is between the order of tens and that of hundreds of
microseconds: in particular Flash SSDs o�er a signi�cant speedup in comparison
with common hard disks, usually o�ering better latencies and higher throughput;
their power e�ciency improvement on hard disk is yet to be veri�ed: while com-
mercial documentation claims without doubts a sure power bene�t over HDDs,
23
e�ective �gures found in datasheets are indeed less clear6. Flash memories, being
solid-state, do not have any mechanical part moving into: this avoids mechan-
ical failures. Surely Flash memories represent a �rst approach in the direction
requested by customer's demands. However, Flash memories are far from being
perfect. Common issues are:
Cell wearing, low endurance: currently in the order of 104 ∼ 105 write cycles.
This problem roots both in the write/erase process and on the materials used
to build the Flash transistor: both erasing and programming are achieved
using high energies in order to force the electron passage through the thin ox-
ide insulating layer (oxide tunnel). Such process gradually produces damages
into the oxide layer, eventually causing the loss of its insulating properties:
since the �oating gate is made of a conductive material, the damage cause
the loss of the contained electrons [61, 13]. In order to guarantee a long
life to devices employing Flash memories, wear leveling strategies must be
adopted in order to distribute writes and erases across the whole set of cells
and avoid the concentration of these operations on few ones.
Low reliability: NAND Flash con�guration is the most used into SSDs princi-
pally for the better achievable density of this con�guration. This, anyway,
has a cost in reliability: NAND Flash devices su�er from both read disturbs
and read disturbs when reading and writing; moreover, this type of devices
exits from factories without the guarantee of having all the cells in optimal
status. For these reasons, ECC functions are needed when using Flash de-
vices, especially when NAND con�guration is used. Such functions usually
increase the complexity of either the hardware or the software, thus they
have a cost [102].
Complex writing mechanism: Flash memories have to follow a much involved
mechanics: in order to be programmed, each cell must be in erased state.
Moreover, erases are expensive and erase sizes are quite important (8K to
32K): this forces each SSD to have large reserves of erased blocks in order
6For example, when comparing a consumer HDD with a consumer SSD both built in 2.5inches sizes, it is apparent the gain of SSD when idle or in standby mode, whereas in read orwrite mode, the SSD can consume more energy (∼ 3W vs ∼ 1.75W, see [96, 109].)
24
to speedup writes. Moreover, since NAND Flash is I/O bound, transfers
are made at least per blocks, thus increasing the ine�ciencies when small
amounts of data are transferred even in presence of small changes.
All these issues are usually managed into software layers (either in the operating
system or in SSD �rmware) called Flash Translation Layers (FTLs), whose job is
to hide such problems to the computer and to e�ciently manage the needed wear
leveling, error correction and, generally, failure avoidance. However, while these
layers do e�ciently succeed in simulating a standard hard disk, all this complexity
is expensive and could easily be a potential source of performance loss.
Dynamic RAM
DRAM technology is the main technology currently used by computing systems
(from the smaller smartphone to the biggest supercomputer) to implement what
Von Neumann called the �memory� in his well-known model. Dynamic RAM
memory cell currently consists in one control transistor and one capacitor (1T-1C
technology). These cells are then organized in grids. When line is opened through
the transistor, charge is free to exit the capacitor if it is charged (meaning that its
value is 1, 0 if the capacitor is empty). Due to this design, the reading operation
is destructive: destruction of data is anyway avoided with a re-write in case the
cell was charged at cost of some additional complexity. Again, due to the just
explained design, this memory is volatile because capacitors discharge fast: the
memory cells need to be refreshed to retain their data, i.e. the capacitors have to
be recharged periodically (typical average time is 64ns each capacitor): when the
computer is o�, all data is lost. Data read and write operations are fast (memory
latency in the nanosecond range, which means one or two order of magnitude
less than the CPU speed) and each byte is directly addressed from the processor.
Surprisingly, a memory composed of capacitors that needed to be continuously
refreshed was present in a machine called �Aquarius� built in Bletchley Park (UK)
during World War II [45]. However, the 1T-1C design dates back to 1968 when Dr.
Robert Dennard registered US patent no. 3,387,286 [24] and improved previous
design that required more components.
The limits of DRAM are to ascribe principally to the low density and to the
25
high energetic cost, as explained before. The scienti�c literature claims that as
density increases, also total refresh times of each chip do increase, causing high
overheads in both latency and bandwidth [37]. Another issue related to DRAM is
the fact that the growth in density can't stay ahead with that of CPU cores: while
in the last years CPU cores doubled every two years, DRAM density doubled only
every three years, thus losing memory per core ratio [55]. Even if the speed of
DRAM is high, its latency remains a bottleneck for the even higher speed of the
CPU: the improvements in latencies in time have been minimal (only 20% in ten
years), and this slight improvement trend is expected to remain the same also in
the future. Just when the demand for speed is so pressing, this technology seems
to have di�culties to sustain the needed performances asked by current processors.
SRAM
Static RAM is the fastest and most expensive memory used outside the CPU cores:
usually SRAM is located on the same die of the CPU, anyway as near as possible
to its cores and it is used mostly to serve as hardware data and instruction cache.
It is frequent to �nd SRAM in other caches, as those of hard disks, routers, and
other devices. These memories are volatile, although the design does not require
refreshing as it happens to DRAM. Classical design is 6T7; this is a major issue
since it increases the cost of a single memory cell and limits the density and the
scalability.
CPU Registers
CPU registers represent the highest level of the memory hierarchy. These mem-
ories are completely integrated into the CPU core, are used at full speed and
are set aside from the standard addressing space: registers are directly accessed
by name. Usually registers are used to store temporary data between load and
store operations. Since they are fully integrated into CPU cores, their number
is very limited: each more register means less space for computational functions.
Information stored into registers is lost when power is o�.
7To build a SRAM cell are used 6 transistors. See [72].
26
1.2.3 Limits of charge-based devices
Besides the speci�c limits of each technology just described, charge-based tech-
nologies share some common issues: the most important problem currently faced
by researchers and engineers is the technical concern related to scaling: since cur-
rently, feature sizes have reached 28nm in DRAM and 16nm in NAND Flash cells,
researchers are concerned about how long still smaller sizes are achievable [94,
108].
In fact, at these little sizes:
- memory cells are very close to each other: the risk of cell to cell interference
is high;
- the total number of electrons that can be e�ectively stored either in a capaci-
tor or in a �oating gate is little: if electrons are too few, current technologies
cannot sense correctly their level. Moreover, in case of capacitors, small
capacitors mean small charge and this, in turn, means higher refresh rates;
- each functional element of the memory cell is very little and very thin, thus
the risk of electron leakage is higher.
Semiconductor industry is currently trying to pursue the goal of extending as
long as possible the life of these technologies. Such e�orts have many common
aspects between charge based memories; the most used approaches are:
- Better materials: this approach permits to obtain better properties while
maintaining unchanged the cell design. This is for example the case of high-
k8 materials and production processes. A similar approach is used in Flash
memories, when the �oating gate conductor is substituted with a insulator
(di�erent from the oxide) able to �trap� electrons inside it: this particular
approach permits a better resilience to tunnel oxide damages, as well as a
better isolation against cell-to-cell interferences.
- 3D-Restructuring: this approach is indeed a major modi�cation to the
structure of each memory cell, even if the functional logic of each technol-
8�k� is the dielectric constant. High values permit better insulation and electrical properties[101].
27
ogy does not change. 3D-stacked semiconductors are object of extensive
research e�orts, as this particular approach would permit an important life-
cycle lengthening of both DRAM and Flash technologies. The main ad-
vantage of 3D structures is that memory cells can be stacked up vertically.
Vertical stacking permits both an optimized use of the chip area (density
optimization) and it allows higher distances between memory cells (in order
to avoid cell-to-cell interference) [108]. Currently, Samsung is already pro-
ducing SSDs using 3D vertically-stacked Flash memory cells that use charge
trap technology instead of the classic �oating gate [96]. Referring to DRAM,
3D-restructuring is currently applied into a prototypical technology called
Hybrid Memory Cube [78]. The promises of this particular technology are
high: speed and performances near to those of the CPU, high power e�ciency
and a much better density.
Despite these extensive e�orts to delay the retirement of charge-based technolo-
gies, the semiconductor industry has nonetheless the �rm expectation that sooner
or later, these technologies will become too di�cult to produce and enhance.
Following this expectation, the technologies subsequently presented are the
object of lots of research e�orts that aim to obtain, �nally, products able to be
both successfully used as a replacement of current technologies and to respond
completely to the continuously higher expectations of the market for the years to
come.
1.3 Technology � the future
In the next paragraphs are presented the technologies that will compete to become
the ones used in next-generation memory hierarchy. Before delving into details of
each speci�c technology, most of these technologies share some common features:
- Charge-based approach is being dismissed: instead, the resistive ap-
proach is preferred. Technologies like DRAM and Flash memory use �elec-
trons containers� (capacitors or any material able to retain electrons) to
encode the information bit into the memory cell. However, this approach
has the disadvantages just shown. Research is thus preferring the resistive
28
approach: the information bit is encoded as a property of a speci�c switching
material: high resistance or low resistance status. There is therefore no need
to store electrical charge: electrons are used just to check the memory cell
status. This approach permits better performances and better scalability.
- Persistence: each of these new technologies is not volatile; they are able to
remember data when power is o�, as it happens for SSDs or HDDs. Some
of them still cannot guarantee a long retention time, but these are prob-
lems related with the early engineering and development stages: these new
memories are engineered to be persistent.
- These technologies use the �RAM� word into their name: this is in-
deed a clue of the fact that these technologies are both approaching the speed
of RAM (sub microsecond speeds) and seem just perfect to be implemented
into byte addressable memories, as it happens in common DRAM.
- Density is expected to be higher: than that of DRAM. Some of these
technologies, especially those in prototypical stages of development, have
some problems to achieve such a goal, as their area in square features is too
high. Generally, however, these new technologies promise a better density
(featuring the very reduced area of 4F 2, see table B.2).
- Endurance is limited: most of these technologies have a limited endurance
respect to that of current DRAM, but better than Flash, as the most wear-
able technology features at least 109 write cycles.
- R/W asymmetry: the most of these new technologies feature di�erent
timings between read (faster) and write (slower) operations, as it happens
with Flash memories.
1.3.1 Prototypical
Technologies in a prototypical development stage are already being commercially
produced even if technology is not mature. Production volumes are low and, as
a consequence, prices are high. Prototypical products are often used in niche
markets, and usually su�er from the fact that some of the evaluating features have
29
still not reached the target levels. The research e�orts undertaken to obtain better
performances, densities and, more generally, a product ready to be produced at
high volumes, are usually extensive.
Ferroelectric RAM (still not resistive)
FeRAM, or FRAM, uses ferroelectricity (see section A.2) to store the information
bit into a ferroelectric capacitor, able to remember its bistable polarization status
in time. The capacitor acts as a dipole, whose polarization is changed under
the e�ect of an electrical �eld. One of the two polarization status is logically
associated to a �0� value, whereas the other is given a logical meaning of a �1�.
As a consequence of this switching mechanism, this memory still doesn't use the
resistive approach. This technology uses the active matrix con�guration to access
Figure 1.4: Ferroelectric crystal bistable behavior. © Fujitsu Ltd
the capacitor through a transistor: it is a one transistor � one ferroelectric capacitor
(1T-1FC) architecture.
Similar to the mechanism used in DRAMs, reading is destructive: to my knowl-
edge, in literature the process of reading is never described neatly, except from the
fact that its destructiveness is remarked.
Current issues related to this technology are bound to scalability and the used
production process: as remarked in table B.2, feature sizes are still at the 180nm
30
node, and the area of 22F 2 is such high that scaling becomes problematic. An-
other problem is related to the cell wearing, since the process of switching in time
degrades the cell's performances (dipole relaxation).
Currently this memory is used in some embedded computing devices.
MRAM and STT-MRAM
This technology, also referred as Magnetic RAM, uses memory cells built in a
one transistor � one magnetic tunnel junction (1T-1MTJ) architecture (see sec-
tion A.2). The MTJ element acts as a resistive switch (as the ones shown subse-
quently in ReRAM, PCRAM, FTJ RAM), encoding a bit as a di�erent resistance
status.
Figure 1.5: Magnetic Tunnel Junction. © Everspin Technologies Inc.
The technology to program MTJ elements changes between classic MRAM de-
sign and STT-MRAM design. In classic design are used magnetic �elds induced
by currents passing into nanowires [66]. Otherwise, STT-MRAMs use the Spin
Transfer Torque e�ect to change the magnetic polarization of the free layer [4,
119].
The memory cell is read sensing the resistivity of the MTJ element, detecting
whether current passes: the read operation is thus not destructive since the status
of the free layer does not change upon reading. Even if some of the ideas used in
MRAM date to those developed in magnetic core memory in the 40s and 50s [86],
MRAM is based on new researches in physics undertaken from the 1960 about
what is now called spintronics [63, 139], culminated in 1988 with the discovery
of the Giant Magneto-Resistance by Albert Fert and Peter Grünberg [100]. First
patent of MRAM technology dates to 1994 (IBM - US patent no. 5,343,422).
31
Phase Change RAM (PCRAM � PCM)
During the 50s, Dr. B. T. Kolomiets conducted his research work on chalcogenide
glasses, verifying their ability to change from an amorphous state to a crystalline
state under the e�ect of heating [43]. The crystalline state presents low electrical
resistance and high light re�ectivity, while the amorphous state presents the op-
posite features: high electrical resistance and low light re�ectivity. The ability to
change state as just depicted was subsequently referred as �phase change�. Phase
Change RAM indeed exploits the resistivity di�erence between the two states as-
sumed by a phase change material (the most commonly used compound is GeSbTe,
or GST) to encode the logic �0� and �1� levels. Each memory cell in PCRAM con-
sists in a chalcogenide layer enclosed between two electrodes. The top electrode is
directly attached to the calchogenide layer, whereas the bottom electrode is linked
to the phase change layer via a small conductive interconnect surrounded by an
insulator. The conductive interconnect is responsible of the chalcogenide status
switch by means of Joule e�ect, functioning as a heater. Each memory cell is
programmed using either a fast pulse of a high current to �melt� the crystalline
status to the amorphous one, or using a longer pulse of lower current to induce the
growth of crystals, modulating the use of the heater. Cell reading is performed
using a low probe current, sensing its �ow, which depends on the cell's resistivity.
As a consequence of this design, reading is not destructive. Phase Change RAM
represents a very promising, yet prototypical technology. First articles about elec-
trical switching of phase change materials date to 1968 [113] and a �rst prototype
of a phase change memory was presented by Intel in 1970 [94]. However, since it
was expensive and power greedy, this technology starved during the 70s and 80s,
until current phase change design was developed, exploiting the research e�orts
made during 80s and 90s in optical storage: phase change materials are the key
to obtain writability and re-writability of CDs and DVDs [131]. Even if phase
change technology still has to be improved to reach expected performances and
e�ective pro�tability, working prototypes have already been produced by Micron,
Samsung, IBM; phase change memories are used also in some very high performing
PCI Express SSDs [1].
32
(a) PCM switching mechanics© Am. Chem. Soc.
(b) PCM �mushroom� type© Ovonyx Inc
Figure 1.6: Phase Change memory cell
1.3.2 Emerging
Besides PCRAM, MRAM and FeRAM, new emerging technologies are being cur-
rently developed, the most known among them is ReRAM. Those emerging tech-
nologies (brie�y depicted afterwards) represent a real jump in sophistication re-
spect to the ones just explained. Obviously, knowledge and know-how are cu-
mulative, and new technologies are developed only thanks to the e�orts made
in preceding times; nonetheless, the quality and depth of scienti�c and technical
knowledge needed to reach the goal of commercial (and pro�table) products using
the technologies later described, is impressive: chemical mastery, quantum physics,
nano-ionics, materials science, nano and sub-nano scale production processes, etc.
Except for ReRAM, other technologies are still embryonal.
Ferroelectric Memory
ITRS inserts into this emerging memory category two di�erent technologies, both
based on ferroelectricity: Ferroelectric FET technology and Ferroelectric polariza-
tion ReRAM (Ferroelectric Tunnel Junction technology � FTJ).
Ferroelectric FET or FeFETs, resemble to standard MOSFETs using a ferro-
33
electric layer, instead of an oxide layer, between the gate electrode and the
silicon surface [27]. One polarization status permits current passage between
source and drain electrodes, whereas the other does not. A memory based
on FeFETs would have memory cells very similar to those of Flash technol-
ogy since it would also be a 1T technology. Writing would be achieved by
changing the polarity applied on the control gate, whereas reading would be
sensed probing the current passage between the source and the drain. The
reading process therefore would be not destructive.
Real FeFETs usually employ an insulating layer between the ferroelectric
and the semiconductor in order to achieve better performances. This need
has led to various con�gurations, as MFIS or MFMIS9.
First attempts to develop FeFET technology were made during late 50s and
a �rst patent using this approach was issued in 195710. However, more than
50 years after the �rst patent, this technology has proved to be de�nitely
un-trivial, su�ering from some still unaddressed problems, the biggest of
which being the data retention loss. New approaches to this technology are
investigating new organic ferroelectric materials.
If improved, this technology would be interesting when applied to a DRAM-
like memory: such a solution would be much more scalable, since it would
not need a capacitor, hence reducing the minimum feature size.
Ferroelectric Polarization ReRAM � FTJ ReRAM uses memory con�gu-
ration similar to commonly called ReRAM, whereas the memory cell uses
FTJ technology to encode information bits persistently and to permit non-
destructive reads (in contrast to classic FeRAM technology, see section A.2).
As said into 2013 ITRS report on emerging memory devices, �Although ear-
lier attempts11date back to 1970, the �rst demonstrations of TER came in
2009� [84]. This memory technology is thus in a very early stage of develop-
9Respectively, Metal-Ferroelectric-Insulator-Semiconductor and Metal-Ferroelectric-Metal-Insulator-Semiconductor.
10US patents 2,791,758 and 2,791,759 and 2,791,760 and 2,791,761.11To model tunnel electro-resistance (TER), the mechanism used into FTJ (ed.)
34
ment and literature papers re�ects this. No industry player is yet producing
such a technology, even when referring to prototypes.
Resistive RAM (ReRAM) � Redox memories
The �resistive� term suggests that this technology, as for the case of MRAM,
uses resistivity to encode data into memory cells: each cell has a RS (Resistive
Switching) element, responsible of the e�ective storage of data. This element, in a
SLC (single level cell) approach, would encode a �zero� as a non-conducting state
(high resistance status � HRS � RESET status), whereas a �one� would be encoded
as a conducting state (low resistance � LRS � SET status). Each memory cell acts
as a building block of bigger memories, usually displaced in a grids, as it happens
for DRAM.
Figure 1.7: RRAM cell. © John Wiley & Sons
Resistive RAM (or Redox memory, as clas-
si�ed by ITRS) indeed is a generic term rep-
resenting a series of di�erent strategies adopted
to induce the resistance switching of the RS ele-
ment by means of chemical reactions (nanoionic
reduction-oxidation e�ects) [137].
Whatever the speci�c switching process,
each RS element is generally built as a
capacitor-like MIM (Metal-Insulator-Metal)
structure, composed of an insulating or resis-
tive material �I� (usually a thin �lm oxide) sandwiched between two (possibly
di�erent) electron conductors �M�. �M�s are sometimes referred to as �top elec-
trode� (TE) and �bottom electrode� (BE). These RS elements can be electrically
switched between at least two di�erent resistance states, usually after an initial
electroforming cycle, which is required to activate the switching property [145].
Resistive RAM represents a class of very promising, yet emerging, technologies
for the next-generation memory. Expectations are very high, even when compared
to other prototypical technologies: high endurance, long retention, extreme speed
and reliability, very low power consumption, high scalability with relative ease of
production. Moreover, scientists and researchers claim high degree of improve-
35
ments are achievable.
Resistive RAM technologies are tightly linked to research e�orts made on thin
�lm oxides especially during the 60s and 70s. First patent about a memory tech-
nology using a �memory array of cells containing bistable switchable resistors�
dates back to 197312. Anyway, as pointed out in Kim's paper [40], that technol-
ogy starved until late 90s and 2000, when a newer approach was proposed and
undertaken. Current scienti�c literature about these technologies well describes
the emerging nature of ReRAM: most of the papers focus on the �core� research
side. Recurring topics are the need to model correctly the atomic behavior of the
MIM compound during the resistive switch, the need to understand thoroughly the
interactions between the materials used into the RS, the need to develop better
laboratory tools and techniques to analyze precisely the resistive switch mechan-
ics. On the other hand, scarce are the divulgative papers and few are the papers
dedicated to implementors and computer scientists. Nonetheless, the most part of
the big electronic industry players are currently developing prototypical memories
based on redox memory cells, and some startups companies are holding new in-
tellectual properties based on ReRAM projects. Even if the pro�tability and the
related commercialization of these products seems to be quite far to come, reliable
samples have already been produced using current production processes [77].
Before presenting each speci�c switching mechanism, it could be useful to spec-
ify the meaning of the following terms as they are used frequently in subsequent
descriptions:
- �lamentary / non-�lamentary: in this context, �lamentary means that
the change in resistivity into the RS element is achieved creating a �lamen-
tary conductive link between the two conductors (�M�), i.e. the passage of
current is not uniform into the RS element and the most part of the resistive
material continues to act as an insulator. Conversely, non-�lamentary mech-
anisms achieve the resistive switch uniformly into the resistive material (�I�),
i.e. the current passes through the whole volume of the resistive material
[84, section 4.1.2.1];
- unipolar / bipolar: suggests whether the speci�c mechanism uses either
12U.S. patent 3,761,896.
36
one �xed polarity between the two electrodes or the polarity is inverted
among them in order to switch the cell status. In case of unipolarity the
current will have to be somehow modulated in order to produce the status
switch. In case of bipolarity the mechanism is simpler: polarity inversion
causes the switch between states [84, section 4.1.2.1].
There are four di�erent approaches to redox memories: each one use one speci�c
combination of the alternative features just presented.
ElectroChemical Metallization: mechanism (ECM), sometimes referred as Elec-
trochemical Metallization Bridge, �Conductive Bridge� (CB) or �Programmable
Metallization Cell� (PMC). This technology uses the �lamentary bipolar ap-
proach: one of the two electrodes is electrochemically active, whereas the
other electrode is electrochemically inert. The �I� material is a solid elec-
trolyte, allowing the movement of charged ions towards the electrodes. The
change in resistance depends on the creation of a conductive path between
the electrodes, under the e�ect of an electric �eld among them.
Figure 1.8: ElectroChemical Metallization switching process© 2013 Owner Societies
These are the reactions veri�ed under the e�ect of an electric �eld with
(su�ciently) positive potential attached to active electrode (RESET to SET
transition):
- Oxidation: the material used into active electrode loses electrons and
disperses its ions (M+, cations) into the solid electrolyte (M −−→ Mz++
ze�);
37
- Migration: the positively charged ions move towards the low potential
electrode under the e�ect of the high electric �eld;
- Reduction and electrocrystallization: on the surface of the inert
electrode takes place the reduction process, where electrons from the
electrode react with the ions arriving, forming a �lament of the same
metal of the active electrode, growing preferentially in the direction of
the active electrode (Mz+ + ze� −−→ M).
The memory cell thus retains its SET status until a su�cient voltage of
opposite polarity causes the opposite reactions, leading to the RESET status.
Such an approach in memory production is currently pursued by Nec (Nanobridge
Technology), Crossbar (PMC), In�neon (CBRAM).
Metal Oxide � Bipolar �lamentary: Valence Change Mechanism (VCM), be-
sides ECM, uses anions movement to reach the resistive switch feature. It
relies on defects into crystal structures (usually oxygen vacancies), positively
charged, and on the ability of anions to move into holes through the �I� ele-
ment.
Referring to the redox feature, in this context reduction refers to the act of
recreating the original crystalline structure, �lling a vacancy (usually acquir-
ing oxygen anions), whereas oxidation refers to the e�ect of vacancy creation
(usually losing oxygen anions). Reduction or oxidation have the e�ect of
changing atomic valence of the atoms building the crystal structure where
this change happens, hence the name of �Valence Change�.
The resistive switch is induced under the e�ect of an electric �eld: in one
polarity, it creates a conductive tunnel of accumulated vacancies, whereas
the other polarity has the e�ect of restoration of anions at their place.
Currently, Panasonic and Toshiba are developing ReRAM memories in their
laboratories, and samples have already been demonstrated [84].
Metal Oxide � Unipolar �lamentary: ThermoChemical Mechanism (TCM) rep-
resents another approach used to create a �lamentary conductive link be-
tween the two electrodes of the MIM compound. This approach is somehow
38
similar to the one used in Phase Change RAM: instead of a di�erent polarity
of an electrical �eld as it happens in VCM and ECM, in TCM a modulation
between current and voltage (not impulse time as in PCM) is used to induce
SET-RESET and RESET-SET transitions, maintaining a �xed polarity.
The �I� will not prevent completely the current passage: when in RESET
status, the current �ow will su�er high resistivity; conversely, in SET status,
the resistivity will be low. To obtain a RESET-SET switch a limited current
under a high potential electrical �eld is used : the limited current passage
induces the Joule e�ects that, in turn, triggers a redox process (similar as that
of VCM) that creates a �lamentary breakdown of the oxide (�I�), leading to
a conduction channel between the electrodes and to an immediate resistivity
decrease. Conversely, to obtain a SET-RESET switch a high current with
low voltage is used: this current has the e�ect of �breaking� the conductive
link, as if it was a traditional household fuse. For this reason, TCM is also
referred to as a fuse-antifuse mechanism.
This approach seems still far from maturity: ITRS does not report any big
electronic �rm endeavoring this technology.
Metal Oxide � Bipolar non-�lamentary: the last class of redox-based approaches
uses a non-�lamentary strategy, sometimes referred to also as interfacial
switching. Resistance switch mechanism is triggered by �eld-driven redistri-
butions of oxygen vacancies close to Insulator-Metal junctions.
ITRS refers to this technology as the less mature approach besides other
ReRAM.
Mott memory
Researchers are investigating the feasibility of memory cells using the Mott transi-
tion e�ect as resistance-switching mechanism . Such memory cells could be con�g-
ured either as modi�ed FETs (as it is the case in FeFETs) or as MIM compounds
(as it happens in ReRAMs). Relying both on documentations from ITRS and
on clues searched on the web, as well as on scienti�c literature, research in this
technology seems to be at a very early stage and there are no information about
39
produced prototypes. Research e�orts are still concentrated on chemical and phys-
ical properties of Mott insulators.
Carbon Memory
Some researchers have proposed to use carbon as a new material to build resis-
tive and non-volatile memory cells. Investigated con�gurations are to be listed
as bot 2-terminal and 3-terminal memory cells. In this approach, memory cells
would exploit some of the physical and electrical features of carbon allotropes
(diamond, graphite and fullerene), especially that of graphite (graphene and car-
bon nanotubes are the more common examples). Some approaches would use
the transition between a diamond-like state (insulating) and a graphite-like state
(conducting) as the switching mechanics. Others would use local modi�cations in
carbon nanotubes inducing a resistance switch; others again would use an insu-
lating diamond-like carbon between conductors as in the case of electrochemical
metallization to induce electrically a conductive graphite-like �lament. Research
on carbon allotropes in this �eld is more mature than that of other emerging
memories (such as Mott memory for example): started in 1859 by English chemist
Benjamin Collins Brodie, with the discovery of the atomic weight of graphite,
knowledge on this material grew throughout the last century.
Carbon Memory is another memory technology at an embryonal state of de-
velopment.
Macromolecular Memory
Macromolecular technologies focus, as in the case of Redox memories, on Metal-
Insulator-Metal compounds. The material between the two electrodes is a layer of
polymer, that must have a resistive switching ability. The term �macromolecular�
is however quite general, as ITRS reports that many polymers are being currently
investigated and some have shown di�erent behaviors that could be used to build
new memory technologies: some have ferroelectric features, whereas other feature
the formation of metallic �laments. However, the status of these research e�orts
is embryonal.
40
Molecular Memory
Molecular memory technologies represent another research �eld that is still at a
very early stage. Such technology would be based on single molecules or little
clusters of them. These molecules would be used into resistive switches elements
to store the information bit. As in Redox memories, current would be used to
reach a resistive switch of the molecule, and reading would not be destructive.
The promises of this technologies are very high since theoretically each memory
cell could reach the dimensions of a single molecule: �rst studies on molecular
memory report exceptional power e�ciency and high switching speed. ITRS ad-
mits however that still many research e�orts have to be made in order to gain an
adequate understanding of this technology.
Other memories
Besides those o�cially included into the ITRS taxonomy, other technologies are
being investigated by researchers, laboratories and industries. Among these others
technologies, Racetrack [114], Millipede [134] and Nanocrystal [18] memories are
worthy to be here reminded.
1.3.3 From the memory cell to memories
Memory cells, as those that have been presented so far, are the building blocks of
actual memories. Each memory cell can contain at least one single bit of infor-
mation, and MLC technologies permit the encoding of more bits, usually two or
three. Then, memory cells are assembled together to provide bytes, cache lines,
pages, and so on.
Since the byte or the block addressability depends on the way in which cells
are linked together, as it is the case for Flash cells, decisions of engineers about
this topic are pivotal, as they in�uence the way in which those technologies will
be used: block addressable memories �t natively to I/O devices; vice versa, byte
addressable ones are well suited to be used both on the memory bus and on I/O
devices. Regarding the technologies that have been presented so far as prototypical
and emerging, it seems that engineers are expecting to connect each cell in such a
41
way as to allow byte addressability. In fact, as subsequently explained, one of the
most followed expectation about these memories is that they will be used attached
to the memory bus, which requires byte addressability.
One last remark about these technologies should focus on their e�ective per-
formances. Figures about performances are not always easy to �nd. Moreover,
semiconductor producers are sometimes reluctant to give extensive informations
about their product. Actually, they could evaluate as counterproductive disclos-
ing such data, as �gures could reveal issues that they prefer to hide and manage
eventually using software layers (through �rmware or FTLs, for example). Despite
these remarks, in table B.2 there are some �gures about current, prototypical and
emerging technologies that could be used for a �rst comparison.
Following those �gures, I would just underline how it seems that FeRAM and
STT-MRAM could su�er from problems about scalability, as their cell area is
excessively high (the goal for semiconductor producers is 4F 2). Moreover, it seems
that FeRAM is being produced with very dated production processes (180 nm),
this fact could be a clue that this technology seems to be starving. Among the
prototypical ones, Phase Change memories seem to be the only promising high
densities.
Performing a rough comparison between the prototypical and the emerging
technologies, it is apparent how the promises of the emerging ones are much higher:
better scalability, density, performances and endurance.
While these products are still not into production or, at best, produced only
in low volumes, researchers are nonetheless wondering extensively how these new
memories would in�uence operating system's operation and their design; to per-
mit their analysis, they have made the following assumptions: these memories will
be persistent, byte addressable, denser than DRAM and faster than Flash mem-
ories. The issue about the impact of persistent memories on operating systems is
therefore the topic of the next chapter.
Chapter 2
Operating Systems
Until now I have focused on the technical features of persistent memories, somehow
relating to just the �rst part of the title. From now on, the focus will shift to
operating systems, and I will try to present at best their design issues related to
persistent memories.
All the examples made hereafter follow the UNIX paradigm and, speci�cally,
are based on the Linux operating system: even if the same principles and ap-
proaches are similarly used into other classes of operating systems (i.e. Windows),
a speci�c paradigm is nonetheless necessary to maintain some concreteness; thanks
to the open-source nature of Linux, the access to its internals is easier: I will thus
take advantage of it.
Here follow some preliminary observations about the models that could be in-
�uenced, or even changed, under the pressure of new memory technologies. After-
wards, as persistent memories can be used either in a fast SSD or directly attached
on the memory bus, there is a presentation of each of the approaches starting from
the former one, indeed the more conservative.
Each operating system can be conceived both as an extended machine and
as a resource manager [130]. In the former perspective, an operating system is
responsible for hiding to the user all the complex details related to the hardware
by providing an �abstract machine� simpler to use and to program, whereas in
the latter one the operating system is responsible for the management of all the
43
resources available on a speci�c computing system. In either viewpoints, the op-
erating system is, most of times, a software product acting like a glue between the
hardware and the programs (and �nally the user).
The relation between hardware and software is somehow porous: even if each
of them represents a speci�c research domain, they are nonetheless inseparably
related. It could easily happen that advances (or di�erent approaches) in software
engineering would urge changes in hardware design, and vice versa. However,
since operating systems are conceived primarily to permit the most pro�table use
of hardware resources, it is not only licit, but rational too, to question whether new
hardware technologies do have the potential to in�uence software, and to what ex-
tent. Scientists, researchers and developers are claiming that the new technologies
just presented will urge deep changes in the operating system engineering.
2.1 Reference models
Each science uses models, abstracting from speci�c details of problem instances in
order to describe synthetically and generically the problems themselves. Models
are indeed valuable: they permit an elegant representation and resolution of prob-
lems, acting as a useful frame within which scientist, engineers and developers can
build real solutions. Changes in the founding models usually trigger changes in a
sort of chain-reaction: it happens in mathematics, physics, and computer science
is no exception. In particular, operating systems have some fundamental mod-
els used as a reference. Since these new memories have some features that main
memory never had, researchers are thus trying to understand to what extent such
features will urge changes into current operating systems models.
2.1.1 The Von Neumann machine
Von Neumann's model is probably one of the most important used in computer
science: it describes how computations are executed into computers. Its model,
shown in �gure 2.1 is in e�ect quite simple: memory (the �memory�), along with
a processing unit (the �control� = Control Unit + ALU) and with a input/output
function (the �I/O�), all connected by a single bus, compose a complete computing
44
system. Instructions are fetched from memory, then decoded by the control unit,
operands are retrieved from memory, execution is performed in collaboration with
the ALU, and results are �nally stored into memory.
Figure 2.1: The Von Neumann model
While in the past real implementations of computing devices was very close
to that of the model (the hardware design of the PDP-8 minicomputer, for exam-
ple, was very close to Von Neumann's model), today, real computer systems are
no more similar to it, and current architectures are much more complex: a stan-
dard workstation could use many input/output devices attached to many di�erent
buses, could use more CPUs, and a single CPU could have many cores, and so on.
Moreover, computing systems have evolved in time to o�er an ever increasing set of
features: multitasking, multithreading, networking, parallel computations, virtu-
alization, and many others. However, despite this complexity, the founding models
still are the same as when computing started to become a reality: CPUs still per-
form computations using a fetch-decode-execute cycle based on the Von Neumann
Machine model. In this model, each of the functional units (control, memory, I/O)
has a speci�c role and speci�c tasks, not shared among each other: as long as the
execution model do not change and each functional unit remains distinct from the
45
others, the model should hold fast. In time, due to performance reasons, some
portions of �memory� have been brought close to the �control� through the use of
L1, L2 and L3 caches; nonetheless, even if closer, �control� and �memory� remain
still logically separated. As brie�y outlined in the next paragraph, a di�erent con-
clusion would apply in the case that �control� merges with �memory�. Referring to
persistent memories, the most challenging hypotheses of use occur when they are
placed on the memory bus (see section 2.3). However, such a use is almost iden-
tical to that of common DRAM. Therefore, if engineered as such, faster, denser,
even persistent memories would not change the basics of the model.
Studies about memristance and memristors could have the potential to seri-
ously strain the Von Neumann model (see section A.2). Memristor-like memory
cells could be used to build recon�gurable processors or logic functional units: this
would result in a merge between �the memory� and �the control� [15, 140]. Some
researchers are endeavoring the use of memristive memories to build neuromorphic
chips and their use in neural networks studies [126]. These are however currently
futuristic scenarios. With the intention to remain concrete, the focus of this pa-
per will not investigate anymore aspects related to potential changes to the Von
Neumann model, giving as granted its aliveness for the years to come. Instead,
the Von Neumann model will be used as a reference model in background.
2.1.2 The �memory� and the memory hierarchy
It could be useful to restrict the focus on the �memory� side: after all, new memory
technologies natively relate to it. Although not properly a model, I wish to recall
here the memory hierarchy, since it is indeed a neat and synthetic representation
of how current computing systems do implement memory. When talking about the
memory hierarchy, the word �memory� is quite di�erent from the meaning used in
the Von Neumann model, hence the quote for the latter. In this scope, �memory�
represents generically a place where computing devices can store instructions and
data, either temporarily or persistently. Consequently, the memory hierarchy rep-
resents both the �memory� (in the upper part) and a portion of the I/O (in the
lower part) of the Von Neumann model.
The memory hierarchy represents at a glance, as already stated, the funda-
46
mental relationship between speed and density: however, some other relevant in-
formation is hidden inside it. In particular, information about the speeds of each
level is not apparent, nor is information of where volatility ceases and persistence
starts. Moreover, a dynamic view of the changes in the hierarchy in time could be
useful.
New solid state memory technologies, as those presented above, are ideally con-
tinuing an innovation path started with the upcoming of Flash memories. Before
them, the con�guration of the memory hierarchy had remained almost the same
for about thirty years (50s � 80s): it was built of registers, caches, RAM, hard
disks, tapes (or punched cards). Although the performances had changed in time,
such changes were in absolute terms, while the relative values and the structure
itself had remained almost unchanged. Picture 2.2 represents again the memory
hierarchy. To carry more information, these hints have been added:
- access time has been added on the right (as power of ten negative exponen-
tials of seconds);
- a thick gray margin represents the border between volatility and persistence;
- The border between memory bound and I/O bound devices has been pinned
up on the left;
- a dashed line represents both:
� the border between symmetrical and asymmetrical read/write timings;
� the border between (near to) in�nite and limited endurance.
The following further facts are to be stressed about �gure 2.2: �rstly, the
six-fold orders of magnitude gap between hard disks and RAM is apparent; sec-
ondly, the thick gray margin and the I/O limit coincide: fast memory is volatile,
whereas slow memory is persistent. Fast memory is accessed with load and store
instructions from CPU, whereas slow memory needs complex access mechanisms
(I/O). Finally, fast memories have symmetrical performances and su�er no wear-
ing, whereas slow memories su�er from limited endurance and from asymmetrical
performances.
47
Figure 2.2: The memory hierarchy with hints
The memory hierarchy carries within it also some clues about the problems
that arise from every item: in a perfect world, the memory hierarchy would be
in�nitely large, but �at. In an operating system viewpoint, the perfect memory
would o�er:
- CPU speed;
- persistence;
- native byte addressability;
- symmetric performances in reading and writing;
- technological homogeneity;
- in�nite endurance.
Unfortunately, real memories cannot have all these �good� features: each layer
of the memory hierarchy o�ers only a subset of them. As a result, each level of
the memory hierarchy also depicts the problems that an operating system have to
manage in order to use it e�ectively.
48
In the next paragraphs it will be attempted a brief summary of the main
techniques adopted in time by operating system designers to overcome the limits
of each layer of the hierarchy.
Speed issues
CPU speed is a luxury commodity enjoyed only by registers. Descending from
layer to layer, access time increases exponentially. This means that each access to
a memory of a given level has a cost in time. A well-engineered operating system
will try to minimize the cost and to maximize performances.
Unfortunately, not only the cost of memory accesses increases for each level
downwards, but historically, the gap between speeds of the �memory� and �I/O�-
driven memories, such as hard disks, tapes, punched cards, CDs, has always been
very large; today it is milliseconds vs nanoseconds, a six-fold orders of magnitude
delta. This fact has always been a problematic limit.
To circumvent this limit, developers usually adopt a large set of strategies;
among others, caching and interrupts will be analyzed.
Caching Every access to slower memories other than RAM has a de�nite cost:
in order to minimize it, it is better to store as much as data as possible in RAM.
This approach would be wasteful if indiscriminately used: the faster the memory,
the less its density, the more the need to use it optimally is. In association with
data space locality and time locality considerations, however, operating systems
can use e�ciently various levels of caches, thus permitting the processor to work
on data very fast: this way, both memory operations and very slow I/O operations
bene�t from an important speed boost.
This approach is of paramount importance to gain a performance incrementation:
it is therefore used in an in�nite number of software applications (operating sys-
tems, database engines, applications, and so on) and hardware (routers, switches,
hard disks, SSDs, GPUs, and so on). Moreover, this technique is fundamental in
big data centers to o�er fast performances, as it is the case for Facebook and its
use of server clusters using Memcached1 to serve faster requests from web servers
1www.memcached.org.
49
(front-end), interposing between them and databases (back-end) [58]. Another
famous example is Redis2, used in cloud services o�ered by Amazon [62]. The use
of caching however has its own cost:
- software and hardware are more complex: code for caching management and
accountability, or chip regions devoted to cache management;
- data is copied from the original data location and thus more memory is con-
sumed;
- data modi�ed into caches have to be, sooner or later, updated in the original
location;
- multiple caches need to comply to cache coherency rules.
The bene�t experienced on systems that use caching is easily calculated with the
formula recalled in a the famous article by Wulf and McKee [143], where the total
average time is calculated as follows:
Tavg = hc+ (1− h)M
Where h is the hit rate, (1−h) is the miss rate, c is the time to access the cache andM is the time taken to access memory. In case, this simple formula is expandable:
Tavg = xc+ yM + zH
Where x is the hit rate, y is the probability of a memory access, z is the probability
of a HDD access, whereas once again c is the time to access cache, M is the
time taken to access memory, H is the time taken to access a given I/O device
(x+ y + z = 1).
As just mentioned, caching has however a cost in complexity. Whilst hardware
caching (as it is the case with L1, L2 and L3 CPU caches) has a cost primarily
expressed in hardware complexity (higher price), software caching (as that used
for example in the Linux page cache) induces a complexity growth of the software.
This added complexity is thus accountable for some extra work made usually by
the operating system3, henceforth measurable in cycle times and, consequently, in
2redis.io.3Database systems usually perform caching autonomously, avoiding the operating system
intervention
50
some additional time (latency) and energy (power). Caching advantage, respect to
its cost, is however tremendous in the case of standard HDDs: in a standard o�-
the-shelf Linux storage stack, software accounts for just 0.27% of I/O operational
latency (0.27% x 1 ms = 2.7 µs) [129]. The software portion responsible of cache
management is only a part of the entire I/O stack: other parts refer to system call
management, device drivers, memory management, and so on. Supposing that the
cost of the entire path traversal was entirely due to caching, it would be anyway
tremendously useful: microseconds vs milliseconds.
Taking the opportunity given by caching, the preceding observation can be further
generalized speaking more generally of software: software layers built upon slow
I/O devices do generally have a little impact on performances. As an example,
surveys made on LVM in Linux highlight the fact that it adds only a 0.03% of
software latency and 0.04% of energy consumption in case of a disk based storage
stack.
Interrupts, or asynchronous execution: interrupts permit to optimize the
CPU usage by suspending processes that requested slow operations. The wake-up
of the suspended processes is triggered by the hardware emitting an interrupt,
signaling this way to the operating system that its operations is �nished. This
approach does not technically avoid the slowness of I/O operations, but takes it
into account, hence permitting as a result to use the whole computing system much
more e�ciently.
This technique is fundamental: almost every computer uses it. However, it does
also has a cost in complexity. The operating system usually has to perform at
least a context switch to suspend the process, has to start I/O and to schedule an-
other process. This sequence, highly simpli�ed, is complex and time (and energy)
consuming: many sources refers this process as weighting about 6 µs. Once again,
this value is only a little percentage of the cost of waiting for a I/O (to HDD)
request to complete: microseconds vs milliseconds.
Hardware, data, failure models
I will now focus on some observations regarding models derived from the memory
hierarchy: while some of these considerations might seem obvious, the aim of
51
such deepening is to let emerge which ones are the assumptions implicitly given
as granted into the design of current operating systems. Moreover, some of the
following distinctions could result useless if considered in relation to the classical
memory hierarchy: practically every operating system uses the same models, since
the memory hierarchy on which they are funded is the same. Conversely, I �nd
such distinctions valuable in a change perspective: the reach of persistence into
the higher layers of the hierarchy do o�er many degrees of exploitation, not just
one. The need of some classi�cation tools could thus help the analysis.
Each operating system, at least implicitly, uses a �hardware model� and a �data
model�: the former would specify (generically) what are the main functional units
that it can manage, whereas the latter would, in turn, specify a series of choices
about how data is managed. In particular, as a further deepening, the data model
would describe, among others, both the speci�c design choices related to volatility
and those related to persistence. These last design choices could be referred as
�persistence model�: the part of the data model related speci�cally to persistence
and its management.
Another set of choices, transversally related to all of the previous models (hard-
ware, data, persistence and volatility), is the one about failures. Data inside mem-
ories is subject to a long series of potential threats: power losses, hardware failures,
electrical disturbs, programming errors, memory leaks, crashes, unauthorized ac-
cesses, and so on. These problems are well known by operating system designers,
who take countermeasures, decide which class of problems are managed, which ones
are instead ignored: these design choices could described as the �failure model�.
The current data model and the current failure model are deeply based on
the properties extrapolated from the classical memory hierarchy, which have been
taken for granted for many decades, and still they are: developers have always
engineered operating system consequently.
Current data model is quite simple: persistence is delegated to I/O devices,
whereas registers, caches and RAM are volatile. Data is stored into hard disks and
SSDs using �les located into �le systems. Talking rather generally, the �classical�
failure model, while guaranteeing security both in memory and in persistent de-
vices, focuses on safety in memory and on consistency in persistent devices. Safety
is preferred in memory as a consequence of its speed and its volatility: the goal is
52
to set policies to avoid corruption, while assuming the risk that such events could
happen. Consistency is instead needed in persistent devices as a consequence of
persistence itself: consistency permits the e�ective and correct preservation of data
in time (errors reaching persistent memories become precisely, persistent errors).
Moreover, the slow speed and the complexity of I/O operations of current persis-
tent memories (hard disks and SSDs) further exacerbates the need of consistency:
the slowness increases the likelihood of a power failure during a I/O operation,
thus it is important to design I/O operations to permit data survival even after
such events. Strategies as �le system checks, journaling, logging, transactional
semantics, are all designed to minimize problems on data in persistent memories
after a power failure event.
Continuing to refer to failures in computing devices, there is a substantial
di�erence between errors in memory and those into I/O devices: in memory the
potential sources of errors could be many, whereas in I/O devices errors are almost
always caused by power failures and hardware errors, not by software: as noted
in Chen's article [19], memory has always been perceived as an unreliable place to
store data. Firstly, this is due to its volatility, but secondly to the ease of access
and modi�cation of its content: people do know that operating system crashes
can easily corrupt the memory. On the other hand, I/O driven memory devices
have always been perceived as reliable places to store data not only because of
their persistent behavior: since the I/O stack is slow and complex, it is unlikely
that a faulting condition can successfully perform a correct I/O operation with
wrong data. Moreover, just because I/O operations are slow and complex, it is
easy to add additional security (in software) by means of transactional semantics
or some other similar techniques. These observations hold still today: persistence
has always been thought as being reliable, whereas the opposite happens with
volatility. Unfortunately, if persistence reaches the memory bus, this property
would not hold any more: this aspect too should be taken into account or at least,
acknowledged.
53
2.1.3 A dynamic view in time
All the aspects just outlined refer to the �classic� memory hierarchy model: this
model, however, has started to change after the advent of Flash memories: a level
was added, thus reducing part of the big gap between slow HDDs and fast RAM
memories. Such an �insertion� led to the current memory hierarchy.
The point is now to imagine the future con�guration of the memory hierarchy,
using as clues the promises of the new technologies presented in the �rst part.
Rather generically speaking, new technologies promise to be:
- Faster than Flash: but slower than RAM or eventually, as fast as RAM.
The speed would be anyway more next to that of RAM than that of Flash:
the order of magnitude is still in tens of ns.
- Denser than RAM.
- Persistent.
- Longer-lasting than Flash: the endurance is better than Flash, but worse
than RAM 4.
- Natively byte addressable.
- Su�ering from both read/write asymmetries and cell wearing.
Trying to imagine a next-generation memory hierarchy, these memories would
collocate natively between RAM and SSDs. The promise about density is coherent
with the pyramid logic: the hypothesis of a higher pyramid would be henceforth
legitimate.
Before trying to sketch a next-generation memory hierarchy, some other con-
siderations are needed:
- about the future of RAM memory, caches and registers;
- about the use of byte addressing on the memory bus or block addressing on
other paths.
4Phase Change technology, the one su�ering most from cell wearing, have a four orders ofmagnitude better endurance (109 cycles) compared to that of Flash (105 cycles).
54
Referring to RAM, the Hybrid Memory Cube technology has been described as
a viable enhancement of current DRAM technology: it is thus conceivable that also
RAM shifts �up� in the hierarchy, getting closer to caches and registers. However,
these technologies are all volatile. It could be theoretically feasible to build regis-
ters and caches with FeFETs, thus transforming them in at least semi-persistent
memories (FeFET transistors have been demonstrated to remember their status
only for some days) [48]; this approach represent however a scenario far in time,
taken into account in literature very few times [136]. The models presented in
the next paragraphs will thus continue to give for granted the volatility of higher
layers of the memory hierarchy.
Regarding to the addressing technique, the new non-volatile memories �t beau-
tifully the byte addressing schema, thus their use in the memory bus seems the
most natural choice. Conversely, Flash memories do integrate natively into the
block addressing schema, at least for what concerns NAND Flash. Even if it is
feasible to adapt a block-native memory to a byte addressing schema, it is nonethe-
less intricate [59]. The opposite is however simpler: byte addressing memories can
be adapted to be block addressed at cost of some added hardware complexity5.
The question about the use of NVMs either as slower RAM or as faster SSDs
is thus licit. The former approach will be referred to as Storage Class Memory:
non-volatile memories placed on the memory bus.
All that said, next-generation memory hierarchy might appear as in �gure 2.3
.
2.1.4 Viable architectures
This representation depicts at a glance all the possibilities that engineers might
have in the future to build real computer systems: not all of these layers are in-
dispensable. For example, smallest devices would have no RAM and no HMC,
but only few registers, a tiny cache, and a very low-power non-volatile memory on
the memory bus (to be used both as RAM replacement and as storage); such a
con�guration could permit engineers to build battery-less devices with advanced
memory capabilities [39]. Another implementation might use new memories just
5This approach is the one used with Moneta [16] and explained in section 2.2.2
55
Figure 2.3: A new memory hierarchy
to build a faster SSD, maintaining all the other components of a classic memory
hierarchy. Yet another implementation would use both HMC memory and Stor-
age Class Memory to o�er a �dual mode� fast-volatile and slow-persistent hybrid
memory.
The next paragraphs will present the main models of usage of the new mem-
ories taken into account in literature. Researchers are studying and are trying to
model the e�ects of the coming of persistent memories both on the I/O side and
on the memory bus side: the models will follow either the former or the latter
approach. Each scenario has the potential to improve greatly current computing
performances, but there are also important issues, as it will soon explained. In
particular, even the use �as is� of either:
- a hypothetical SSD featuring a speed close to that of RAM;
- a hypothetical persistent and dense DIMM on the memory bus;
56
on a system running a o�-the-shelf operating system would result problematic.
It has to be stressed that operating systems are developed to be a well-balanced
system on certain assumptions made by designers and developers: one of the main
assumptions made by designers in modern operating systems is the con�guration
of the memory hierarchy like the one of the classical pyramid and its consequences,
i.e. the data model and the failure model. Using a metaphor, operating systems
behave as a weight-lever-fulcrum mechanical system in an equilibrium state when
the assumptions do occur. Compliance with the assumptions assures that the
fulcrum stays in the right point. Un-compliance would result in a fulcrum shift and,
consequently, in a loss of equilibrium. Still following the metaphor, the e�orts made
by researchers on NVM-aware operating systems are similar to the re-equilibration
of a mechanical system that lost its equilibrium, thus modifying the weights placed
on both sides of the lever. Firstly, the easier approach will be analyzed, i.e. the
use of new non-volatile memories as bricks to build a very fast SSD. Afterwards,
the reader will be introduced to the various Storage Class Memories approaches
proposed either by developers, or by researchers.
Before delving into the speci�cities of each approach, it is useful to note here
the inversely proportional relation that arises (in this context) between the ease of
setup of a test environment and the quantity of e�orts required to adopt changes in
software. Ironically, whereas a Fast SSD is tougher to engineer, develop, prototype
and test [64], it is however much simpler (although surely not trivial) to conceive
and model software changes in order to drive it conveniently. On the other side,
as the reader can verify afterwards, in a SCM context it is much easier to setup
a test environment (for example, non-volatile memories can be �emulated� using
just normal DRAM); despite this ease of test, it is much more complex to develop
a complete and e�cient solution to the raised challenges.
2.2 Fast SSDs
This approach is the more conservative one, since the only change in the memory
hierarchy would be the presence of a new I/O device, running faster than common
SSDs. Such a solution would not in�uence neither the standard data model, neither
the standard failure model, as the only anomaly would be its speed. This approach,
57
moreover, would be in continuity with the path started with SSDs6.
The availability of new solid state memory technologies as Phase Change RAM,
would permit manufacturers to build and sell SSDs featuring much higher speeds
than the ones of current NAND Flash-based SSDs. The speed of PCM is about
100 ns in write mode, whereas in read mode the speed is about 12 ns. This speed
is 50 times worst than DRAM when writing and 6 times when reading. Despite
the speed decrease in comparison with DRAM, these memories would be however
very fast respect to common NAND Flash memories (∼100 µs, see table B.2).
2.2.1 Preliminary design choices
Before delving into operating system issues related to faster-than-Flash SSDs, it
can be worthwhile to linger on some hardware issues as:
- the I/O bus;
- the SSD choice vs a MTD-like choice.
These aspects and the related choices establish a sort of framework that oper-
ating systems must take into account, thus in�uencing their internal design.
The I/O bus
Since SSDs, as HDDs, use a I/O bus to transfer data, engineers will have to
make some choices about the bus used in those products. The speed of these
new technologies justi�es the concern of whether the bus is able to sustain the
performances of the SSD. Drivers design will subsequently follow the choices made
by engineers.
Table B.4 shows some �gures about data transfer speed of some of the most
important buses; the �rst two are I/O buses, whereas the last ones are memory
buses (it can be observed that there is a gap of one-two orders of magnitude in
data transfer speed between the two bus classes).
6This approach is convenient also as a learning tool: in the e�ort to build with a new technol-ogy a device otherwise quite common, the focus could be �xed on gaining an adequate know-howabout the peculiarities of those new technologies.
58
Current alternatives in I/O bus are SATA and PCI Express. In order to eval-
uate the two alternatives, �gures about hardware features are fundamental, but
these are not the only factors that must be taken into account. Other factors
in�uencing a choice are:
- protocol overhead: for example 8b/10b encodings are much less e�cient than
128b/130b ones7;
- potential of further technical improvements;
- scalability;
- ability to adapt to virtualization schemas;
- ability to adapt to multi-core and multi-processor requests;
- quality of the I/O stack and the potential to improve it: hardware features
do decide how device drivers work.
A well-built bus will permit to develop �good� drivers, whereas a problematic bus
will force developers to bypass problems in software, thus raising the software
complexity.
Speaking for a while only from a hardware standpoint, SATA has been con-
ceived to be used with standard HDDs, as an improvement on standard PATA:
this fact still in�uences its behavior. In comparison to HDDs (given a very low 1ms
access time in both read and write), SATA can theoretically execute a 4K transfer
in about 6.83 µs, or 146 times faster than the speed of the hard disk to service
the same amount of data (table B.5). Such a di�erence makes SATA appear as
a in�nite-fast channel to transport data to HDD. Transition from a SATA HDD
to a good performing SATA SSD does present di�erent proportions though: some
SSDs [96] do o�er 550 MB/s sequential read speed and 520 MB/s, thus reaching
very close the theoretical speed of 600 MB/s. Supposing that a 4K data arrived to
the SSD could be just written in bulk in one write cycle (about 0.1 ms or 100 µs),
this time would be only 14-15 times slower than the transfer time: the proportion
is very di�erent from that of HDDs. These observations alone would justify to
7XXb/YYb where XX is the payload, whereas YY is the transfer size (XX≤YY).
59
conceive SATA technology as not being suitable for SSDs faster than Flash. The
�gure of 0.9 times presented in table B.4 con�rms the same hypothesis. In case
SATA were used as the chosen bus for a PCM SSD, it would perform well only in
case of 4K writes performed byte per byte; in every other condition, it would per-
form near to its limit, whereas in case of 4K read performed in groups of 64 bytes
each, SATA would behave as a bottleneck. These �gures are deduced from bus
theoretical limits but sometimes implementations are slower, and this would ag-
gravate the problem. Finally, since new memory technologies are just �new�, there
is a high margin of improvement of their performances: a bus used at its limit
from the beginning would promise thus to a�ect and waste every technological
improvement.
Even if these observations are quite rough, they follow the path already un-
dertaken by scientists, researchers, storage manufacturers and technicians, who
reached the same conclusions: SATA is being abandoned, preferring instead PCI
Express as the bus to use fast SSDs [104, 91]. The motivations of this choice are
not only rooting in current hardware features but also in the other factors cited
previously: PCI Express has higher speed and lower overhead, is scalable, has
an appealing road-map towards future improvements, is usable e�ciently by both
virtualized environments, multi-core and multi-processors systems, and so on. En-
gineers have taken this choice focusing on current NAND Flash technologies: the
scenario of faster memory technologies is taken into account, but still as being quite
far in time. This fact underlines that they evaluated SATA to be obsolete even
for NAND Flash. New SSDs using PCI Express have already been released on the
marked, and this trend is expected to increase steadily in the next years(examples
are the Fusion-io SSDs, the Apple SSD in Mac Book Pro, the Plextor M6e). It is
thus likely to expect next-generation PCM SSDs to appear as PCI Express SSDs.
Finally, giving a last glimpse to table B.4 and B.5, PCI Express does not perform
with the same proportion of SATA in comparison with HDDs: it could thus also
happen to it to act as a bottleneck; this limit is however shifted forward in time
thanks to its improvement potential, which is much better than that of SATA (for
example, PCI-Express generation 4 should appear this year).
60
SSD vs MTD-like choice
Similar to what happened with Flash memories, a choice must be made about
whether the internals of memories are hidden or not to the other parts of a com-
puting system and, ultimately, to the operating system. The same issue arose
with Flash memories: used into SSDs, all internals are completely hidden to the
system, and they are employed just as common hard disks; on the other hand,
Flash memories could also be connected directly to the computing system with-
out any intermediation. In the former case, all issues related to Flash technology
(wear leveling, cell erase before re-write, error checking) are managed by a con-
troller acting as an interface between the bus and the Flash chips: this controller
implements a Flash Translation Layer (FTL), in order to present to the system
just a block device. In the latter, those issues must be managed by the operating
system, which must thus take charge of implementing in software all the functions
of a Flash Translation Layer, as it happens in Linux with MTD devices.
PCM technology has actually a better endurance than Flash. Moreover, PCM
cells do not need to be erased before re-writes. Newer technologies will o�er
even higher endurance. These observations should permit to build simpler and
thus faster translation layers. However, the increased speed also requires the same
translation layer to be extremely fast, in order to not a�ect memory performances.
Start-gap wear leveling technique is one of the suggested approaches to be used into
these translation layers [37, 118]. The need to have a translation layer extremely
fast would suggest the path of a hardware implementation, thus supporting the
choice of the SSD approach; this option will also permit to hide conveniently the
read/write time asymmetry. Examples presented hereafter use this same approach:
it seems that the MTD one not to be currently investigated.
2.2.2 Impact of software I/O stack
Researchers and students of the Non-Volatile Systems Laboratory of UCSD Uni-
versity have conducted a series of thorough and interesting studies about the con-
sequences of fast SSDs on operating systems since at least 2010. In particular,
their observations about operating systems were the o�spring of the experience
gained while developing two prototypes of fast PCIe-based SSDs and of the ef-
61
forts made to exploit at most their performances. Both the prototypes have been
built upon FPGA architectures: the �rst one used common DRAM to emulate the
behavior of new-generation non-volatile memories [16], whereas the second used
e�ectively Phase Change memories [1]. Researchers claimed performances of 38 µs
for a 4KB random read and 179 µs for a 4KB random write: these �gures are in
line with those of table B.4, given the fact that software time is included (here read
and write are meant as complete operating system operations). Their experience
is valuable: much of the considerations made about changes in operating system
design to exploit fast SSD do come from their work.
Two articles written by UCSD scholars in particular [17, 129], describe respec-
tively the initial e�orts (the former) and the �nal conclusions of their work (the
latter). In the �rst article, there is an accessible description of the various sce-
narios that they tested to evaluate the performances of new memory technologies
as PCM and STT-RAM. The description about the testing environment used to
model the behavior of PCM and STT-RAM (at that time still not available on
the market) is indeed very interesting: they used common DRAM along with a
programmable memory controller to introduce latencies compatible with those of
PCM and STT-RAM. The solution adopted is remarkable, since a programmable
memory controller can permit to measure (as they did in their study) how perfor-
mances are a�ected in presence of read/write latency asymmetries and when those
latencies increase. They evaluated and measured performances and latencies of:
- a standard RAID solution;
- a state-of-the-art PCI Express Flash;
- a NVM-emulated PCI Express SSD (this in particular became the basis for
their Moneta and Onyx projects);
- a DRAM portion used as a ramdisk to emulate future NVM Storage Class
Memory (this in particular is rather an introductory approach to SCM, since
it does not take into account nor persistence, nor safety; however, the �rst
focus of this model is to describe performance problems concerning the soft-
ware I/O stack, not a complete discussion about SCMs).
62
The most important achievement of their work is the evidence of the huge
impact on latency and throughput to account to the software I/O stack: as the
speed of the memory device raises, this impact raises. It is underlined that using
the ramdisk environment, the �cost of the system calls, �le system and operating
system are steep; they prevent the ramdisk from utilizing more than 12% of the
bandwidth that the DDR3 memory bus can deliver�. When looking at the FPGA
solution they veri�ed that the I/O software stack was responsible of an important
performance fall; in particular, they veri�ed how the �le system was responsible
of an important latency increase (about 6µs per access). Also, they observed too
how the �le system internal design in�uences throughput: they veri�ed that ext3
�le system was responsible of a 74% reduction in bandwidth, whereas this impact
was much lower when using XFS.
These observations are discussed thoroughly in [129] and many other papers in
literature do use similar observation to analyze software I/O stack. In particular,
besides the description of the developed solutions, the two following charts, that
summarize easily the increasing impact of software as speed increases, are included.
0.3Hard drive19.3SATA SSD21.9PCIe-Flash
70.0PCIe-PCM94.1DDR
0 10 20 30 40 50 60 70 80 90 100
Chart 2.1: I/O software stack impact on latency (percent)
Cost of software in latency jumps from around 20% in case of SATA or PCI
Express Flash-based SSD, up to 70% of a PCI Express PCM-based SSD. In the
case of a SCM the cost would account to an impressive 94%.
Causes
Talking about the causes of such ine�ciencies, as it has already been stressed
previously, the common assumption that has always been given as granted by every
63
0.4Hard drive96.9SATA SSD
75.3PCIe-Flash87.7PCIe-PCM
98.8DDR
0 10 20 30 40 50 60 70 80 90 100
Chart 2.2: I/O software stack impact on energy
operating system developer is that I/O devices are slow. This simple assumption
induced developers to:
- focus on o�ering functionality in software to alleviate hardware de�ciencies
(this is the case, as already cited, of page cache, bu�er cache, LVM, and so
on). Such a functionality, given a slow device, has a minimal cost in latency
and bandwidth;
- not bother too much about the e�ciency of the software layer because soft-
ware would account only to a minimal part of the I/O: since devices are
slow, e�orts to develop e�cient software would result only in a little improve-
ment; such improvements would thus have a highly unfavorable cost/bene�t
proportion. Instead, e�orts on safety, correctness and security were better
rewarded.
Unfortunately, these assumptions are the �philosophical� roots that cause the
I/O software stack to perform so badly when the device gets faster. Indeed, the
two charts just shown suggest that also with common Flash SATA SSDs software
becomes a problem.
Following these observations, researchers and developers agreed on the need to
identify which parts of the I/O stack are mostly responsible of latency and energy
cost. This analysis is the �rst step to proceed forward in making decisions about
the best strategy to enhance the I/O stack behavior. Researchers did �rstly some
of the conceptual observations that will now be presented.
O�-the-shelf I/O stacks are developed as modular stacks usually employing a
generic block driver that works in conjunction with a device-speci�c driver. This
64
design permits to virtualize and standardize the access of programs and kernel to
speci�c devices. However, most of the times it is the kernel that is responsible of
all the aspects regarding both storage access and storage management. The kernel
does not only set the access policy (space allocation, permission management) but
it is also responsible for the policy enforcement. This extensive use inside the
kernel of both policy set and policy enforcement can lead to ine�ciencies.
The second general observation is bound to the generality of the I/O stack:
whereas its design allows a great �exibility, its generality does not permit to im-
plement all of the optimizations that could improve it at most, thus sacri�cing
some opportunities. Whilst, if the devices are slow, these �opportunities� are not
so signi�cant, they become valuable opportunities in the case of fast devices.
These observations alone could suggest some improvement areas, as the need of
speci�c (not generic) I/O stacks devoted to fast memory devices and the necessity
to prevent whether possible, the intervention of the kernel in the management of
I/O accesses. Investigating further these observations, researchers have identi�ed
these �hot areas�:
- I/O request schema can hide bottlenecks: I/O requests in Linux use a I/O
scheduler to collect and issue at proper time I/O requests. This approach
permit �exibility but adds latency (about 2 µs).
- interrupt management is expensive, especially when requests are small. In-
terrupts are intrinsically complex and expensive procedures (they add at
least 6 µs of latency): in the case of a small request, the time between its
issue and its servicing can be shorter than the time between a sleep and a
wakeup. Moreover, a fast device using interrupts would frequently issue in-
terrupts themselves: the more the demand of little I/O operations, the more
the time lost just for interrupt managing. Finally, it is necessary to underline
that, usually, interrupts must be managed. The presence of many interrupts
due to a fast device can ironically sacri�ce system responsiveness precisely
because of the device speed;
- the �le system is one of the causes of added latency for each I/O request
(about 5 µs);
65
- the cost of entering and exiting the kernel is high (in case of small requests
about 18% of the total cost).
Solutions
The insights just explained inspired the strategy, subsequently referred to as
�RRR�, pursued by researchers from UCSD while trying to optimize Linux I/O
stack to use e�ciently their prototypes of PCI Express SSDs:
Reduce: to eliminate redundant or useless features, to avoid those parts of code
bad performing with fast memories. As examples, they avoided the use
of Linux standard I/O scheduler, thus preferring direct requests. Another
solution they undertook is the choice of preferring the spin alternative to
interrupts in case of small requests [146].
Refactor: to restructure the I/O stack in such a way that e�orts are distributed
between the actors (applications, operating system and hardware). For in-
stance, separate policy management from policy enforcement, preferably as-
signing the former to the operating system and the latter to the hardware,
where possible. As examples, in Moneta the following refactoring tasks have
been implemented : development of user-space driver to avoid kernel enter-
ing and exiting, virtualized hardware interface to permit each application
to issue requests directly, hardware permission checks in order to relief the
kernel from policy enforcement.
Recycle: to reuse the parts of software already created, where feasible. For ex-
ample, reuse some of the functionality o�ered by �le system tools and �le
system themselves.
The NVM Express group8 is currently pursuing a similar approach [91]: their
goal is to develop a new standard host controller interface to be used upon PCI
Express, conceived to be adopted by PCI Express SSDs. While the PCI Express
speci�cation sets the standards referred to the lower layers of communication be-
tween the CPU and a compliant device, NVM Express speci�cation �ts in a higher
8www.nvmexpress.org .
66
level of abstractions that compliant drivers must follow. Their focus is the devel-
opment of a I/O stack able to support and exploit fast SSDs, thus deriving the
maximum bene�t from the PCI Express bus. Currently Windows Server, Linux
and VMware already o�er NVM Express drivers. NVM Express documentation
reports the latency and performance gains obtained using their I/O stack instead
of a standard one: although documentation never refers directly to the Reduce-
Refactor-Reuse approach, their work seems just to be following the same approach,
somehow certifying its e�ectiveness.
2.3 Storage Class Memory: operating systems
The use of non-volatile memories directly attached on the memory bus represents
both a big opportunity for next-generation computing systems and a tough chal-
lenge: persistence in the memory bus, along with a promised better density than
that of DRAM, are the opportunities.
Persistence in memory represents an opportunity because I/O operations, even
if much faster than in the past (as it happens in the case of NVM Express SSDs or
of the Onyx prototype), are intrinsically slower than memory accesses. Persistence-
related operations issued at memory speed would then permit an extremely fast
storage (and retrieval) of data.
Density too represents an opportunity: more density would permit both to
lower the cost of storage into memory and would instead permit to manage at
high speed a bigger amount of data.
The challenges, instead, are due principally to the side e�ects of persistence
into the memory: a thorough exploitation into the operating system is di�cult, as
it requires a complex re-design of some of its major parts. Other issues do exist
though. One of them is heterogeneity: in the case that SCM was placed along
with common DRAM on the same memory bus, the operating system would have
to decide how to use at best both the memories. A design with SCM only would
be architecturally simpler. Other challenges are the need of wear leveling along
with the need to cope with r/w asymmetries.
67
2.3.1 Preliminary observations
An analysis and a classi�cation of the proposed approaches to use SCM follows
in the next paragraphs. However, before presenting each speci�c proposal, it
would be here worthwhile to focus on some aspects shared by all the approaches
subsequently described.
Wear leveling and r/w asymmetry
While the most important issues, namely persistence and heterogeneity, are each
di�erently managed in each speci�c approach, the issues that arise from cell wear-
ing and r/w asymmetry are instead common whatever the approach.
About wear leveling: it is licit to forecast the intervention of hardware engi-
neers on memory controllers [23]: as new technologies have di�erent memory
switching mechanics, timings and electrical needs, it is thus likely that CPUs
will need new memory controllers to drive new memory technologies. As the
need of new memory controllers already exists, it would be easier and cheaper
to add more functionality in hardware than it usually is. As an example, it
could be implemented in hardware a fast wear leveling schema as that of
start-gap already cited. Other schemas are also proposed in [144, 46, 148,
20]. Further needs of wear-leveling could then be reached with the support
of software. The most part of the articles about SCM do not deal with this
issue, as the most of the times it is assumed that this issue is managed by
hardware.
About the r/w asymmetry: a mitigation in hardware is feasible with the uti-
lization of either a classic SRAM cache, or even a cache built with FeFETs:
in either ways the engineering must be careful, since a bad implementation
would a�ect persistence. Anyway, this issue is often completely ignored in
literature. As the r/w asymmetry is anyway a feature of new memory tech-
nologies, I assume that it is given for granted that r/w asymmetries are
exposed to the software. This approach is perhaps preferable: maintaining
the sight over the whole panorama of NVMs, memristive memory devices
promise to o�er a r/w asymmetry much smaller than that of PCM, thus
68
mitigating drastically this issue. This aspect would be anyway easy to test
and analyze using DRAM along with a memory emulator to get projections
about changes in latency and bandwidth upon changes in timings values [17].
From now on, both these issues will be ignored, as if they were managed by hard-
ware or, e�ectively, ignored.
Background literature
It might now be the right time to observe that, in the background of most part of
each and every solution presented hereafter stand some research e�orts made in
the past years that in�uenced more than others subsequent works. These studies,
focused on topics about �le systems, persistence and caching, were all carried
out during the 90s: they were, from a generic computer science viewpoint, not
only interesting, but they did anticipate some issues that are pivotal in SCM
context. Among the articles most cited in literature, the following are worthy to
be mentioned:
The Rio File Cache: Surviving Operating System Crashes [19]: this ar-
ticle has been written in the intent of describing a computer system that
uses a RAM I/O �le cache (RIO) with the aim to �make ordinary main
memory safe for persistent storage by enabling memory to survive to operat-
ing system crashes [. . . ] reliability equivalent to a write-through �le cache,
where every write is instantly safe, and performance equivalent to a pure
write-back cache�. The intent of the writers was to execute every I/O opera-
tion directly on the in-memory cache facility, using the classical storage just
as a backup facility. They proposed to protect the �le cache with extensive
memory protection and sandboxing in order to avoid its corruption upon
operating system crashes, and to allow only warm reboots in order to avoid
memory leaks when system is re-booted.
The article claims that the proposed approach would proof even more safe
against crashes than classical I/O to devices like HDDs and SSDs (their so-
lution reached a crash probability to corrupt the �le system of just 0.6%).
Interesting �gures about incidence of crashes into �le system corruption are
69
supplied, supporting the �common sense� according to which disks are more
reliable than memory: their �gures however shows how the increase in cor-
ruption incidence in a memory without protection is only slightly greater
than that of common HDDs (just 1,5% probability instead that of a write-
through cache, 1.1%). This articles is an evidence of the fact that as early
as in the 90s, researchers set sight on storage kept entirely in memory: this
topic is today a primary need in big data centers and in large database sys-
tems. Moreover, this article contains ideas similar to those used into the
�Whole System Persistence� approach (see section 2.3.3). Trials and testing
made to measure system crash e�ects are still usable today to design a cor-
rect mechanism to use persistence into main memory as those proposed in
the next paragraphs regarding �le systems and applications (see respectively
sections 2.3.4 and 2.4).
File System Design for an NFS File Server Appliance [35]: this article is
a technical report issued by NetApp, and it explains the design choices made
into their WAFL �le system (Write Anywhere File Layout), used into their
storage appliances. WAFL used extensively shadow paging to obtain data
consistency and fault tolerance, while o�ering a robust snapshot facility to
easily and e�ciently manage backups. The ideas described neatly into this
article are used extensively as a reference in many other articles about �le
system design and they certainly anticipated the times, since newer �le sys-
tems as ZFS [111] or BTRFS [121] use approaches similar to those �rstly
developed for WAFL. Finally, NetApp's appliances running WAFL did use
a non-volatile RAM (battery-backed) to maintain immediately available the
operation log after crashes: this solution somehow implicitly uses the non-
volatile memory paradigm.
The Design and Implementation of a Log-Structured File System [122]:
this article proposed a new pattern to use blocks into a �le system as a contin-
uous and cycling log. Following this pattern, every write operation triggers
new block writes into free spaces, thus �lling free spaces of the disk, as it
happens in circular bu�ers. Besides the fact that this approach requires
a garbage collection layer, this article proposed nonetheless a real new ap-
70
proach for �le system design. It was inspired on the copy-on-write approach,
and it permitted to avoid the need of �dual writes� for consistency reasons
(the �rst one on the journal and the second to the e�ective data block).
Finally, this approach implicitly enforces a wear level strategy: writing each
time di�erent blocks, writes are distributed around a disk, and endurance
of memory cells is consequently raised. This design has thus inspired those
of many Flash translation layers and many Flash �le systems [85]. This ap-
proach could be valuable also in the context of SCMs, where the issues of
cell wearing must be taken into account.
To a lesser extent, other articles from the 90s that anticipated the topics of non-
volatility are [141] and [9]: the former one describes the architecture of a computer
that used a persistent memory (Flash + SRAM) on the memory bus, and the latter
uses a non-volatile DRAM to use it either as cache or to speedup recovery times.
Choosing the hardware model
Recalling the observations made previously about the hardware model, the data
model and the failure model, it could be worthwhile to present here the two hard-
ware models that will be subsequently considered. Each of the proposals presented
afterwards necessarily uses one of them. Storage class memory could be used only
in two con�gurations: either alone, in replacement of DRAM, either in tandem
with it.
The �rst option, i.e. DRAM replacement, represents a simpler alternative, as
it avoids the needs to manage heterogeneity. However, there are also some prob-
lematic aspects: �rstly, as persistent memories would be slower than DRAM, such
a use would sacri�ce performances. Moreover, this option would force operating
system designers to necessarily manage the issues related with persistence: all the
memory available would be persistent.
The second option is more complex: standard DRAM would share with SCM
the whole set of physical addresses. Therefore, a portion of addresses would be
volatile, while the remaining one would be persistent. Despite the need to manage
memory heterogeneity, this approach is the preferred one in most implementations.
This con�guration, beside its complexities, permits to mix both the technologies
71
(DRAM and NVRAM) to achieve the best compromise between performances and
storage needs. Moreover, it gives to developers the faculty to decide when and to
what extent they want to use persistent memories.
As it will be soon apparent, the hardware model, the data model and the failure
model, are tightly related: some data and failure models are achievable only on a
given hardware con�guration.
2.3.2 No changes into operating system
A �rst and tempting approach to use SCMs could be the easier one: to use SCM
just as common DRAM using a standard operating system. This would correspond
to the choice of maintaining the same data model and the same failure model
currently used in operating systems. This approach would permit a standard
operating system to bene�t immediately from the density increase that SCM would
o�er: whatever the chosen hardware con�guration, the SCM would be anyway used
as a standard, volatile memory.
This solution would be however problematic since a part or all of the memory
(depending on the hardware con�guration) would become persistent. O�-the-
shelf operating systems do use DRAM with the expectation of its volatility: the
operating system enforces safety and security of data inside it just as long as
power is on, without the need to take care of it when power goes down, as it is
given as granted that it is erased. It has been evidenced in [32] that, even in
the case of standard DRAM, data is not lost immediately, but gradually, within
1 to 5 minutes: this fact alone can be a source of security concerns. More so, if
non-volatile memories were used with a standard o�-the-shelf operating system,
data security could be bypassed even more easily: each data ever stored in it would
persist as long as it would not be re-written, and the read operation would be much
easier. As temporary data can store passwords, encryption keys, sensitive data,
and an in�nite pattern of mixed information, it would be a terribly bad practice
to expose all this data to potentially unauthorized accesses. To mitigate these
problems, a change in the hardware con�guration or in the data and failure model
would be required: just with the intent of maintaining the same levels of safety
and security currently enforced by standard operating systems, the SCM should be
72
encrypted, either in hardware [26, 3], either in software [117]. The encryption of the
memory with random keys set at each system reboot would emulate the volatility
of the memory, thus permitting a safe use of persistent memories as common
DRAM. This strategy could even increase the resistance of standard DRAM to
security attacks as those presented in [32]. Another criticality about this way of
managing SCM arises from performances: since SCM performances are worse than
those of DRAM, a normal operating system using SCM as DRAM would su�er
from reduced performances, unless the density increase would be so necessary to
compensate the performance loss. Moreover, in a hybrid hardware architecture, the
operating system would use the whole memory as just a unique quality of memory,
whereas features between SCM and DRAM would be very di�erent from each
other: the timings of the operating system would issue a problematic variability
and unpredictability.
Finally, this �rst approach is not studied in deep into literature since practi-
cally it is a no-use strategy on persistence: opportunities o�ered by persistence
are simply ignored, whereas some problems arise and must be dealt with. This
approach is however useful as a cognitive tool to get the awareness that persistence
is just a property of certain memories: it must be managed in order to bene�t from
it.
2.3.3 Whole System Persistence
This approach, proposed by researchers from Microsoft Research in [56], is mainly
focused on the reality of large database systems and of large data centers, even if
it could also be used in standard computers.
The chosen hardware model uses only persistent memory. Since currently SCM
technologies are not as mature to be used as DRAM replacement, in this approach
persistence is achieved using NVDIMMs9, currently available on the market. How-
ever even if not originally thought to be used with SCM, this approach is nonethe-
less perfectly suited for them. Here, persistence is exploited to achieve practically a
zero impact on power failures (failure model). The goal of this approach is to trans-
9Non-volatile DIMMs. NVDIMMs are standard DRAM DIMMs that use super-capacitorsand NAND Flash modules to behave persistently.
73
form a commonly critical scenario as power interruptions into a suspend/resume
cycle: if achieved, this change would lead to systems completely resilient to power
outages.
The originating observations of this approach are:
- a consolidated trend in large databases is the storage of the entire dataset
into main memory. The �cloud� paradigm further urged the development of
caching servers using large dataset entirely into memory (see section 2.1.2);
- DRAM in servers can reach big sizes, currently around 6TB [69]: accord-
ingly to this �gure, clusters of servers can manage tens or even hundreds of
terabytes of memory each;
- power outages are expensive, especially for large environments. The cost
of resuming a system increases in complex environments because when re-
covering large datasets, many I/O requests are placed on storage back-ends,
typically slow. This �stress� su�ered by storage back-ends can be itself cause
for other critical events such as that experienced at Facebook in 2010 [89];
- as the quantity of memory in server raises, the cost of recovery from back-
ends raises, since the amount of data necessary to re-build the entire state
is raised too .
These observations have inspired the search for a mechanism that relieves ma-
chines from the need of re-building the entire in-memory dataset when power
outages happen. Key ideas of the Whole System Persistence are to:
- retain all the content present in memory thanks to persistence;
- save the entire state of a server (registers, caches) into persistent memory
when a power fail event is detected (Flush-on-fail strategy);
- modify the hardware in order to permit a residual energy window long enough
to permit the state save into memory, by adding capacitance into PSU using
supercapacitors;
74
- restore automatically into registers and caches the state saved previously, as
soon as power is restored.
These steps should appear to the operating system as transparent as possible,
emulating just a suspend/resume cycle. Actually, some subtleties about the saved
state and the real state of devices after the power cycle do point out that the
process of resume/restart needs some further adjustment in order to be completely
transparent. Even if some minor changes to the operating system may be needed,
this approach has been successfully tested and showed in [110].
This proposal will be certainly of great advantage into large data centers: the
focus is exactly on some of the major criticalities experimented in that context.
However, some remarks must be made about it:
- although its use on every day computing would allow to use a computer
immediately after the switch-on (after the switch-o� the system has managed
a power fail event), the same security problems noticed previously would arise
(see section 2.3.2);
- there is no speci�c data consistency management: it should be therefore
entrusted to the operating system. This fact however suggests that, since
current operating systems are designed to reload at each reboot, there must
be some mechanism to detect system degraded functionality and force a
system reboot;
- this approach uses persistence only as a temporary storage, not as a long
term memory. The rest of the system does not even have the perception of
being using persistence. This strategy is certainly simple, but some of the
opportunities of persistence are not used.
2.3.4 Persistence awareness in the operating system
A step further toward the exploitation of SCMs is the intervention of the operating
system: di�erently to what happens in the approaches shown before, in the ones
hereafter presented the operating system is aware of the presence of some persistent
memory device connected to the standard memory bus along with DRAM. The
75
hardware model here considered is thus hybrid. The presence of SCM can be
noti�ed by the �rmware at boot time as it normally happens for other hardware
features in BIOS, and the memory physical address space will then be divided in
a DRAM portion and a SCM portion.
Almost all of the approaches here presented use persistent memory to store
data into �les, as it happens today with common hard disks or SSDs: through a
�le system. This continuity with the well-known paradigm of �le systems has the
advantage of preserving software compatibility, being thus a viable and acceptable
approach to persistence-awareness.
Anyway, while still maintaining the persistence awareness only at the operating
system level, �le systems are not the only way to exploit it. As described at the
end of this topic, some proposals conceive a scenario both more complex and more
thorough. For the moment, however, my focus is on �le systems.
Typical services o�ered by �le systems, are:
- a high level interface to applications to use storage;
- an interposing layer between applications and devices whose job is arranging
data to be safely stored and lately safely retrieved;
- security enforcing and concurrency management on contained data;
- a low level interface to device drivers.
Despite all the features o�ered by persistent memories, the need of these ser-
vices (at least of the �rst three ones) is unchanged: the features of new non-volatile
memories do not in�uence the need of �le system services, at least as long as the
�le/folder paradigm is extensively used as it is today.
Instead, what may be subject to revision are the mechanisms used to o�er
these services: �le systems are software devices that have been always developed
keeping in mind the technical details of the underlying storage media (see section
A.3); as persistent memories are so di�erent from the other ones currently in use,
it is thus advisable to re-design and adapt operating systems to their features.
76
Brief analysis of current I/O path
In Linux, applications that need to operate on �les stored into I/O devices follow
a sequence of common steps involving many kernel layers, illustrated in �gure 2.4.
Figure 2.4: The Linux I/O path. © 2014 Oikawa
This sequence is triggered by applications, it arrives (when needed) to the I/O
devices and eventually it returns to application themselves. Applications use �le
system services through �lesystem-related system calls, as open(), read(), write(),
close(), and so on. All these system calls use in turn the services of a kernel
layer called Virtual File System (VFS). Its role is to hide to applications all of
the implementation details of each speci�c �le system, thus exposing to them just
a standard and well-documented interface. The VFS manages all the software
objects required to use �les, folders, links, and issues all the speci�c requests
directly to each speci�c �le system. In turn, each speci�c �le system interacts
with the page cache to check if the data is cached into memory (see also section
77
2.1.2); in case it is not, the kernel issues the needed I/O request to the block device
driver layer. This layer in turn issues the request to the right device driver. The
device driver then executes the task along with the driven device. This sequence,
even if rather roughly sketched, describes the impressive amount of operations
carried out by the kernel (other fundamental tasks, as security checking, have not
been even considered).
Caches anticipated the persistence shift to main memory
Following these steps, it is apparent how it is common to have �le system data into
memory, though used as cache, that can be reasonably seen as a (temporary, as it
is volatile) storage location, whose backing are I/O devices. Moreover, these �cache
locations� are used as the fundamental building block of the mmap() system call:
with this system call, an application can map into its own virtual memory address
range data stored into �les from some �le system. This mechanism is permitted
by the page cache: if data is not present, it is �rstly retrieved from the I/O device,
then when the page cache has retrieved it, the memory pages containing that
data can be mapped into the application's virtual address space. The mmap()
system call can be thus seen as a sort of byte addressability on persistent data
used into memory. Developers, in order to exploit the byte addressability of NOR
Flash memories, added during the 00s new functionalities to the mmap() system
call: XIP (eXecute-In-Place). These changes permit to mmap() , in cooperation
with a XIP-enabled �le system and a XIP-enabled device driver, to connect the
virtual address space of an application directly to the byte-addressable Flash chip,
without using the page cache as �middle ground�. This feature was added to permit
to lower the size of DRAMs into portable devices [11], and to decrease their bootup
time [14]. XIP in e�ect, allows to processes to address directly data existing into
persistent storage as if that data was into main memory.
Consistency considerations
As stated before, caches are used principally to raise the performances of devices
that otherwise behave poorly. I/O performances are raised because caching moves
I/O operations from the devices to the memory: here operations are faster, band-
78
width is high, latency is minimal. However, this mechanism has a further cost in
reliability10. Data into caches is always up to date, but data into devices becomes
up to date at a slower rate: since transfers costs are high, and particularly in the
case of small transfers, few bulk transfers are preferred against many little ones
(write-back caching strategy). This raises throughput and lowers transfer costs.
This need forces operating system to wait (usually for 30s) to transfer back to
I/O devices those pages of cache that are marked as �dirty�. The idea behind this
behavior is to raise performances at the cost of some uncertainty. Continuing to
discuss about I/O transfers, there are some other further uncertainties. Firstly,
data transferred to I/O devices is unsafe until when writes are e�ectively executed;
secondly, often modi�cations to data stored on I/O devices require more than just
one write operation: if a modify operation is not completely executed in all its
steps, there is the risk of �le system corruption. So, just to summarize this topic,
caches are useful, but they raise the risk of data loss; also, write operations are
risky while in progress, as well as logically-connected multiple operations are, until
they are completely executed.
These issues have been studied extensively throughout the past decades: correct
survival of data is strategic in both generic �le systems and, even more, in database
systems. Actually, it is from the database world that the acronym of �ACID�
comes (Atomicity, Consistency, Isolation, Durability), as well as the concept of
transaction [31]. In order to tackle these issues, in the years many approaches
have been proposed and successfully used both in database and in �le systems:
the most known among them are transaction logging and copy-on-write techniques.
Transaction logging is extensively used in journaling �le systems11, whereas copy-
on-write is used in �le systems that use shadow paging, and in those who use the
log-structured design. As a further remark, all of these techniques do �t perfectly
in a fault model that tries to nullify the adverse e�ects of power outages. These
strategies against faults do not take into account software faults, bugs, crashes, for
the same assumptions explained before in RIO.
10As said before, a �rst cost is in complexity, see section 2.1.2.11For example, �le systems as ext3 and ext4.
79
Approaching design
While trying to treasure the observations just made and to keep speaking in a
rather general way, a project of a �le system designed speci�cally for SCM should
at least take into account the following major di�erences between SCM and I/O
devices:
- access semantics is completely di�erent (easy load/store vs complex I/O
read/write);
- access granularity is di�erent (byte or 64 bytes vs blocks);
- cost of accessing persistent media changes (low with SCM, high with I/O
devices). The motivation that induced the widespread use of caches loses its
relevance: cache worthiness could be reconsidered;
- execution delay window is di�erent (short with SCM, long with I/O devices)
but still present. This fact can simplify the design of a �le system resilient
to power outages, but the risks of data loss still must be a concern. As
an example, journaling can lose its appeal compared to logging and shadow
paging, which seem to be patterns better suited for memory storage;
- ACID loses the last letter, Durability, as it is implicitly achieved through per-
sistence. Atomicity, Consistency and Isolation still are needed to guarantee
a long data lifespan.
Moreover, some new issues would arise. Among them, at least the following
should be considered:
- memory protection against operating system crashes and programming errors
should become a primary goal: memory is less safe than I/O. The experience
about this topic documented in [19] can be used as a reference;
- some subtle issues related to how atomicity is achieved do arise (it is fun-
damental in shadow paging for example). Memory operations (load/store)
reach e�ectively the memory only after CPU caches in a write-back manner.
80
Cache lines tagged as �dirty� must be rewritten into memory, but this opera-
tion could be subject to reordering: this behavior optimizes stores and raises
performances, but becomes a problem if the programmer relies on an exact
order in which certain stores happen (as it is the case of atomic writes where
an exact order is needed). While cache contents can be synchronized between
processors and even between cores with cache coherency algorithms (using
for example, in x64 instruction set, the mfence instruction), currently there
are no guarantees against memory reordering operations from cache to mem-
ory: such guarantees can only be achieved using the mfence instruction in
tandem with cache �ushing, but this practice lowers sensibly performances.
Personal hypotheses
Here I would like to expose some ideas that come up while imagining a system
that use persistent memories along with a �le system. The �rst one actually would
lead probably to a bad design, whereas the second one is only just sketched, even
if might proof quite interesting. I feel nonetheless that they are quite useful to
underline the di�erences that come out when compared to the approaches found
in literature. From a �Recycle� viewpoint a persistent memory could be exploited
by:
- Recycling the page cache facility: a �le system would be located completely
inside a part of it. Page cache should be programmed to use that zone
without a backing store;
- Using a ramdisk block device along with the directive O_DIRECT of the
open() sytem call to avoid the page cache and to not duplicate data into
memory.
Both these approaches focus on the fact that, if storage is placed on the memory
bus, then page cache would just replicate the same data, thus wasting space and
CPU cycles. So, my idea was to use storage either in the page cache only (the
former approach), or in the persistent storage only (the latter). While this intent
is paramount also in the approaches found in literature, my thoughts were indeed
much more airy.
81
The �rst approach would appear somehow redundant, since already both TMPFS
and RAMFS special �le systems take pro�t of the page cache facility in the same
way I imagined: without a backing store. This approach would be somehow simi-
lar to the proposals made into into [19], but it has to be reminded that RIO is well
suited only on systems whose entire memory address range is persistent, and this
would not be the case: this could be a clue that probably this might not be a cor-
rect design choice. However, similarly to what is objected in [142] about TMPFS
and RAMFS, page cache has been developed �rstly just as cache: the focus was
on speed, giving for granted its volatility. Moreover, the page cache still should be
used as a volatile cache for other standard I/O devices. Finally, at present there is
no easy way to instruct page cache to use a given range of physical addresses: this
information would be necessary to use page cache on persistent zone of memory.
Each of these remarks stresses how a hypothetical redesign of page cache would
be both not trivial and would unfortunately a�ect its core logic: simply, it would
probably be a bad design choice. A better choice would likely be the creation of a
new facility devoted to SCM [142].
The second strategy is the opposite of the previous one: if the page cache
cannot be used, the alternative way to not duplicate data is to avoid it. This
approach might be better than the previous one: surely it does not a�ect the logic
of other important facilities of the operating system. O_DIRECT is already used
as a open() modi�er, either to avoid double caching (as it is the case for database
engines, which have their speci�c cache mechanism) or to avoid cache pollution
(if caching is for some reason not needed). This interesting approach has already
inspired other developers in the e�ort to build a solution suitable for persistence
memory, as in the PRAMFS �le system [95] and in current e�orts of the Linux
community to develop DAX (a successor for XIP, see section 2.3.5). However, my
idea appears to me as somehow limited. The approach here chosen is the use of a
�le system to permit to every application to continue to use the already developed
persistence semantics without any need of re-developing the application code: this
means that if an application does not use O_DIRECT natively, reads and writes
still pass through the page cache. Moreover, O_DIRECT in�uences only �les
accessed with open(), not �les accessed through memory mapping. So, while my
idea was certainly in the right direction, it was however incomplete.
82
A gradual framework
In e�ect, to build a working solution as those found in literature, some facilities
must be developed:
- A manager: a means for reserving and managing persistent memory, in
order to avoid the standard memory manager to use it as common volatile
memories. The alternatives are: to build a speci�c driver to use persistent
memory as it was a device, to embed management capability in a �le system
developed speci�cally to be used on memory areas, to modify the standard
memory manager in order to use persistent memory as another type of mem-
ory, develop any other facility to manage persistent memory only.
- A translator: a means for changing the semantics to access data. This
change of semantics is necessary but the problem is where to place the trans-
lation mechanism: the solutions proposed in literature are either inside de-
vice driver, or in the �le system, or in a library devoted to act as a semantic
translator.
Moreover, as solutions become more and more thorough, these services can be
o�ered:
- E�ciency: a means to avoid completely the use of page cache and access
directly data.
- Safety: a means to enforce memory protection against operating system
crashes.
- Consistency: a means to resist to power failures during writes and to guar-
antee long data lifespan.
- Exploitation: a design to manage space that exploits the architecture of
the memory, thus leaving the design approaches �t for hard disks.
- Integration: an elegant solution would permit the use of persistent memory
both as storage and as memory for the kernel. Such use is the more complex,
as the kernel must be instructed about how to use persistent memory and
83
this can potentially expose the kernel to bugs. These types of solutions are
some sort of bridge between this class of approaches and those which propose
to expose persistent memories directly to applications.
The di�erent approaches are summarized in table B.6. Before presenting the
details of each speci�c line of the table, I shall make a �nal remark: these �le
systems do use persistent memories as storage, but all their internal structures
used to keep the accounting about �les, folders, open �les, etc. still are placed into
DRAM. This behavior is the same of common �le systems: it simply re�ects the
fact that still these approaches do not use persistent memory as memory available
to programs, but just as a place for storage.
2.3.5 Adapting current �le systems
The simplest approach
The simplest approach would only permit to use a standard �le system on per-
sistent memory: this would only need a manager and a translator. However, as
�le systems would run without changes, a block device driver would be needed:
developers should thus embed into it both the functions of the manager and those
of the translator. A viable starting point could be the modi�cation of the standard
Linux brd driver (o�ered by brd.c). Such a solution would be functional even if
ine�cient: page cache would be used normally.
Linux developer community
Linux kernel developers are however walking through a more thorough solution:
DAX. This acronym stands for Direct Access12 and it's developed to permit the
use of standard �le systems on persistent memories with minor changes. It is the
successor of XIP and it is a sort of complete solution that o�ers both automatic
O_DIRECT for open() system calls and XIP functionality for mmap() system
calls. Even if these solutions are not well documented in scienti�c literature (to
my knowledge there are only some slides and videos), current e�orts can be found in
12The �X� probably stands for the �rst letter of the XIP acronym. Direct access comes insteadfrom the main function that must be implemented into compliant �le systems, direct_access
84
the mailing lists and o�cial Linux documentation about experimental features [99,
93]. To follow this paradigm, modi�cations must be made into a driver (to become
DAX compliant and to use persistent memory) and in a standard �le system (that
uses then the DAX driver through DAX functions): currently a �le system subject
to these modi�cations is ext4. This approach is object of work by Linux kernel
developers, and this fact should be considered as a clue of its validity. Moreover,
as documented in [73], e�orts made in this direction induced developers to focus
on the refactoring of the design of I/O system calls to increase their e�ciency in
the case of use with fast storage. This fact, as well as the challenges raised by the
new NVM Express standard, could lead to a deep redesign of the mechanics of the
I/O subsystem in Linux. The issues presented about fast SSD apply also in this
context: software is a big cause of loss in latency and throughput and Reduce,
Refactor, Recycle still is a valuable methodology.
Quill
Quill is a proposal documented in literature [2] that has been developed to permit
even less modi�cations to common �le systems. As the previous approach, Quill
is not focused on a speci�c �le system, but its aim is instead to be used with
standard ones. Another of its goals is to use as least as possible the kernel to
avoid expensive context switches: the most of its code is in user mode. It acts as
a user mode translator, developed as a �service� library that interposes between
each I/O system call and its e�ective execution, adding a sort of indirection. Quill
is a software facility built of three components:
- the �Nib�, that catches the I/O system calls and forwards each of them to
the following component, the �Hub�;
- the �Hub�, that chooses the right handler for the system call, depending on
whether the request is relative to a XIP �le system or not;
- the �handlers�, that e�ectively manage the requests: if it concerns a standard
�le system, the handler selected by the �Hub� is the standard system call of
the �le system. On the other hand, if the request concerns a XIP �le system,
a special handler performs a mmap() operation.
85
Quill, comparing to the approach of the Linux community, simpli�es the need
of �le system refactoring. Anyway, a thorough comparison between the e�ective
performances of the two approaches does not exist. While both these approaches
are surely focused on guaranteeing the needed e�ciency, the management of safety
against operating system crashes is not well documented, and most probably is up
to the design of the manager. As a further remark, both the previous approaches
do rely on a driver that is used both as the manager and as the translator. This
design is indeed a requirement, since common �le system do expect to use a block
device and, hence, a block device driver. Regarding consistency, the semantics used
depends on the e�ective �le system used. Moreover, these solutions are �exible
and simple enough to be used as a springboard towards a fast use of persistent
memories into operating systems.
2.3.6 Persistent-memory �le systems
A personal convincement of the author is that since software latencies are problem-
atic at high speed, sooner or later, a �le system speci�cally designed to use memory
storage, will be needed in order to further increase performance; this would permit
to save latency from those optimizations that are related to common spinning disks
as well as to gain latency with a design suited to memory. The next approaches
will show the job made by researchers to develop �le systems speci�cally designed
for byte-addressable persistent memories: BPFS [23], PRAMFS [95], PMFS [25],
SCMFS [142].
As a generic observation, the following approaches are a step further to the
exploitation of persistence memories. Even if the speci�c features do vary between
each of them, these approach are the results of e�orts made to o�er a larger set of
features, especially the ones that are needed into a �le system used in �production�
environments: safety and consistency.
BPFS
BPFS is an experimental �le system developed by researchers at Microsoft. The
literature about it is focused on the internal structure of the �le system and in the
proposal of two important hardware modi�cations to permit a fast utilization of
86
its features. This �le system aims at providing high consistency guarantees using a
design similar to that used into WAFL. Like WAFL, BPFS uses a tree-like structure
that starts from a root inode: this design permits an update of an arbitrary portion
of the tree with a single pointer write. However, BPFS researchers argued that in
WAFL the mechanism used to perform �le system updates was too much expensive
(each update triggered a cascade of copy-on-write operations from the modi�ed
location up to the root of the �le system tree). This remark conducted their work
towards a proposal of a �short-circuit shadow paging�, i.e. a technique that would
use adaptively three di�erent modify approaches: in place writes, in place appends,
partial copy-on-write.
This technique permits e�cient writes, along with a management of operations
focused on consistency. The side e�ect of these choices is however the loss of the
powerful snapshot management of WAFL. An other speci�city about this approach
is the proposal of two hardware modi�cations. Usually this type of requests are to
be avoided for the simple fact that they go easily unheard: hardware modi�cations
are very expensive and happen only when the pro�tability is certain. However,
one of the two proposed modi�cations is indeed very interesting: it tries to address
the problem of memory re-ordering writes. As previously described, currently the
problem can be managed by �ushing the cache (in tandem with mfence instruc-
tion): this approach is anyway limited, as it lowers considerably the performances.
In [23] it is proposed to add into the hardware (as new instructions, similar to
mfence) a mechanism to allow programmers to set ordering constraints into L1,
L2, L3 caches: �epoch barriers�. An �epoch� would be �a sequence of writes to per-
sistent memory from the same thread, delimited by a new form of memory barrier
issued by software. An epoch that contains dirty data that is not yet re�ected to
BPRAM is an in-�ight epoch; an in-�ight epoch commits when all of the dirty data
written during that epoch is successfully written back to persistent storage. The
key invariant is that when a write is issued to persistent storage, all writes from
all previous epochs must have already been committed to the persistent storage,
including any data cached in volatile bu�ers on the memory chips themselves. So,
as long as this invariant is maintained, an epoch can remain in-�ight within the
cache subsystem long after the processor commits the memory barrier that marks
the end of that epoch, and multiple epochs can potentially be in �ight within
87
the cache subsystem at each point in time. Writes can still be reordered within
an epoch, subject to standard reordering constraints�[23]. This behavior would
be achieved through the insertion of two new �elds into caches: a persistence bit
and a Epoch identi�cation pointer. The other proposal made into [23] consists in
adding capacitance into RAM modules in order to reach the guarantee of e�ec-
tive completion of each write request already entrusted to the memory modules.
This last change proposal is similar to those found in [56]. BPFS proposal brings
both lights and shadows. Lights are surely related to the care of this design, and
to the search for a solid mechanism for consistency. Moreover, the design is well
adapted to memory access patterns. Shadows consists primarily on the fact that
some features are not mentioned: security issues are not taken into account, as
neither is a revision of mmap() system call (XIP functionality). Moreover, BPFS
relies on hypothetical hardware changes, and their concrete realization is aleatory.
Another issue is a certain amount of opacity about some details on the e�ective
implementation they built: in [23] is pointed out how this solution has been devel-
oped on the Windows operating system platform, even if there is a complete port
for Linux FUSE, but no further details are given. The problem of the depth of
documentation arises also in other proposals, as the one that follows (PRAMFS),
but in this case some information lacks completely: for example, it cannot be
clearly identi�ed where management and semantic translation happen. The most
likely answer is in a driver, as the previous scenarios, but it is indeed a guess.
PRAMFS
The next proposals, PRAMFS, comes from the open source community. It consists
in a much more classical �le system design, along with the required features about
XIP functionality, I/O direct access and security against operating system crashes.
It is a simpler approach than BPFS but it tries to address more issues. Memory
protection is achieved using current virtual memory infrastructure, thus marking
the pages into TLB and into page tables as read-only and changing the permis-
sions only when strictly needed. This method seems the same as that of [19]. For
what concerns the issues about the management and the semantic translation, this
approach seems to be better than the previous ones: these two fundamental func-
88
tions are executed directly in the �le system itself, without intervention of a block
device driver. To be exact, the documentation in this regard is not completely
clear, but some clues from PRAMFS documentation and from [60] con�rm what
just claimed. Another clue of this behavior, is the way this �le system is mounted:
directly by specifying the starting physical address.
mount �t pramfs �o physaddr=... (Example of mount code)
This behavior is thus similar to those of TMPFS and RAMFS, albeit adapted to
persistent memory: it is a great advantage, as it permits to avoid those overheads
that are accountable to the block device emulation that is at best, an unneeded
layer (see section 2.3.6). Unfortunately, re�ecting perhaps the very prototypical
status of this proposal, in the documentation there is no mention about consistency
concerns and relative mitigation techniques.
PMFS
The PMFS �le system was developed by the kernel developers of the Linux com-
munity before they started to focus on the DAX approach. In its design, direct
interaction with persistent memory is clear: the �le system manages directly the
persistent memory, which is reclaimed from the kernel at mount time. The man-
agement and the translation are executed by the �le system. Although PMFS
internal design is di�erent from the other ones reviewed before, the intent of the
developers was the same: to create a lightweight �le system for SCM in order to
give to application a support for standard read, write and mmap operations, while
o�ering consistency and memory optimizations to increase performances. XIP fea-
tures are used to permit e�cient use of mmap() system call. Standard I/O system
calls are conveniently translated into memory operations while avoiding any data
replication into page cache. In order to o�er high levels of consistency, metadata
updates are executed through atomic updates when feasible, or instead, by using
a undo journal, copy-on-write is used for data updates. As in other approaches,
BPFS developers remarked the need for hardware features to help consistency, and
they pointed out the same consistency issues about memory operations remarked
89
in other papers. In order to solve these issues, they proposed the insertion of a new
hardware instruction (pm_wbarrier) into the instructions set. This approach is
similar to that proposed by BPFS developers, though simpler (no cache structure
is changed). Such an instruction would guarantee the durability of those stores
(i.e. the store instruction has been e�ectively executed) to persistent memory
already �ushed from CPU caches. An original approach to obtain the desired
memory protection is explained: instead of using the expensive RIO strategy of
a continuous write-protect and write-unprotect cycle, it would be better to use a
uninterruptible, temporal, write window to protect virtual memory pages.
SCMFS
The last proposal here reviewed, SCMFS, comes from academic researchers from
Texas A&M University and is presented in [142]. Proposed through a neat and
thorough paper, their work is centered on a new �le system developed speci�cally
for SCMs (SCMFS stands indeed for Storage Class Memory File System). A ma-
jor strength of their approach is the high integration of such a �le system with
current Linux memory subsystem. Such an integration could pave the way to a
future use of persistent structures and data by the kernel. For this reason this
proposal really seems to represent a step further toward the concepts presented
into [60]. The paper presents how the team modi�ed the BIOS to advertise to the
operating system about the presence of SCM and the Linux memory manager, in
order to create a new memory zone (ZONE_STORAGE), that will be used only
with new non-volatile system calls (nvmalloc(), nvfree()). In turn, the �le system
uses these new system call to manage memory allocation of its structures into
persistent memory zone. Following these concepts, the manager is the standard
Linux memory manager, and the translator is the �le system, that uses directly
the allocated memory. Another major strength is the high integration with exist-
ing virtual memory hardware infrastructure: each structure used in SCMFS uses
extensively the virtual memory concept, and has been engineered to adapt easily
to current page tables, TLBs, and CPU caches. For example, each �le is seen
as a �at address range of the virtual address space starting from zero to a maxi-
mum address: this range is then remapped into a non-contiguous set of physical
90
memory, as it happens normally with application heap and stack. The whole �le
system space is managed into a range of virtual addresses. �Superpages� are used
to avoid excessive use of space in TLB. preallocation is used to save valuable time
in complex memory allocation procedures. Like BPFS, SCMFS also relies on some
guarantees about ordering of memory instructions, but contrary to BPFS, it uses
the slower (but viable) approach of mfence and cache �ushing. In this case, the
hardware modi�cation proposed by researchers from Microsoft would be of great
help. It is claimed that this operation is performed each and every time a critical
information is changed in order to achieve a good consistency enforcement. How-
ever, since the section about consistency is only just brie�y depicted, it should be
further investigated if these consistency guarantees would be su�cient when �in
production�. Another interesting feature of this approach is the need of a garbage
collection facility: since each �le receives a �big� virtual address range, under stress
circumstances it could be necessary to manage fragmentation (too many holes of
unused virtual addresses). This need is similar in �le system as that proposed
into [122]. Despite the depth of the proposal, some areas seem to remain quite
opaque: there is no description of the implementation of the I/O system calls and
of the much likely page cache bypass, as nor there is a source code made avail-
able on Internet. Moreover, while the focus on the extensive reutilization of the
virtual memory infrastructure can suggest that memory protection is enforced,
nonetheless this topic lacks.
2.3.7 Further steps
Other strategies to take advantage of non-volatile memories
Until now, the classical storage dichotomy between data used by processes (heap,
stack) and storage (�les), has been respected. When they reserve a memory portion
for �le storage, �le systems use it exclusively for �le storage. Anyway, as persistent
memory is byte addressable as just DRAM is and its use through a �le system is
not the only �pattern� of use. Rather, as long as persistence awareness is up to
the operating system only, �le systems are the only way to allow applications
to use persistence seamlessly. In fact, �le system are fundamental to permit to
applications to be executed correctly, as interaction with �le system is embedded
91
into a plethora of programs.
However, an operating system could use persistent memory for itself, instead
of servicing applications. Applications would thus only bene�t from a better job
done by the operating system thanks to persistent memory, instead of using it
directly, though unconsciously. An operating system could therefore:
- use persistent memory as DRAM extension to avoid the swapping of user
virtual memory pages or to store part of its memory structures when DRAM
runs low, or both;
- use persistent memory to store a part of its data structures persistently, in
order to boost boot-up, reboots and restore cycles after power failures.
Concerning DRAM extension, this operation would expose data moved from
DRAM to the persistent memory to the same security issues remarked before.
However, it could be remarked that, in case of swapping, the same issues arise when
process data is moved to hard disks. In literature, a potential use of persistent
memory as DRAM extension has been proposed into [53] and in [38].
The second approach would use persistent memory to store (persistently, not
as DRAM replacement) a part of the data structures used for the operating sys-
tem execution. This approach somehow anticipates issues presented in the next
paragraphs when the proposals to expose persistence to applications will be pre-
sented. Here it should be remarked only that this approach, though having a great
appeal, brings with it many potential programming bugs. Whilst risks will be an-
alyzed subsequently along with persistence into applications, the assumption that
is made here is that kernel code is expected to be safe. Correctness, safety and
quality of kernel code is achievable much more easily than in user applications.
The exploitation of persistent memory to decrease boot and restore times still has
not been studied thoroughly, and scienti�c literature about this topic is little. The
issues about boot and recovery time have been met before in the WSP approach,
but it had limitations about real exploitation of persistence. A better approach
could be the use of a mixed strategy for boot and recovery: strategic structures
are recreated at each boot to preserve system health across reboots (and to exploit
volatility too), while other structures and data can be left into persistent memory,
92
ready to be used. The work needed at each startup would thus be less, saving time
and increasing responsiveness. A study about this topic is presented in [41].
A step further: integration
If the approaches just shown were only alternatives to those about �le systems, it
would be very likely that the choice of operating system developers would reward
�le systems: a fast �le system is indeed appealing. However, those approaches
are not alternatives: although integration would be a complex task, memory ar-
chitecture would permit to use persistent memory both with a �le system and
for the other uses just shown, all at the same time. Just as memory is divided
into physical pages, and each physical page can be owned by di�erent processes
(through virtual address ranges), similarly, some pages could be used for the �le
system, whereas the others could be used for the operating system's advantage.
Such an architecture would permit to an operating system, when booting, to exe-
cute the kernel image immediately from the persistent memory, and then load the
persistent �le system at boot time.
Researchers still have not delved this topic: it seems an area needing further
investigations. To my knowledge, [60] is the only attempt of modelization of
integration. A key aspect of this topic is that, in order to use persistent memory
following the integration concept, there must be a facility that somehow allocates
persistent memory dynamically. Such a facility would manage requests from the
�le system and from the kernel as well as garbage collection (if needed), and
other related activities. A �rst approach to integration is made implicitly in the
SCMFS proposal: they modi�ed the standard Linux memory allocator to add the
nvmalloc() and nvfree() new system calls. This can indeed be one of the viable
ways to achieve integration: the two system calls can permit dynamic allocation
of persistent memory. Besides the fact that those system calls in SCMFS are used
exclusively by the �le system, the kernel itself could take advantage of them. In
[60] a di�erent choice is proposed: the management of persistent space should be
up to the �le system, that must then assign memory to the standard memory
manager upon request. Following Oikawa's proposal, the use of persistent storage
is for temporary DRAM substitution, not for kernel structure persistence. Oikawa
93
models the access management in three viable alternatives: directly, indirectly and
through mmap() operations. With the direct method, the Linux memory manager
accesses directly the data structures of the �le system to obtain memory pages; on
the other hand, the indirect method uses a special �le into the �le system to be
used by DRAM.
Issues about persistence-awareness in operating systems
Some �nal remarks are concluding this part, where persistence is exposed at the
operating system level. Articles which propose �le system services to exploit
persistence are the majority, and the e�orts made by researchers are extensive.
Nonetheless, achievements on this subject are, while promising, still experimental.
Software solutions o�ering a complete set of features still do not exist. The same
situation is re�ected by the level of persistence awareness of current operating
systems: an operating system e�ectively persistence-aware is still far to come. In-
deed, each of the software big players is currently investing in research and e�orts
about future exploitation of new memory technologies, but these are still intents.
Moreover, research has to be broadened and deepened in order to achieve products
suitable to real �in production� scenarios: unfortunately, still many issues are not
addressed properly or, worse, not even yet considered. For example, one issue still
centered on �le systems is the fact that the �RRR� approach is still underutilized:
this approach is only rarely used into the articles analyzed. Surely, refactoring is
a concern for the DAX implementors, but the other proposals do not cite the need
of analyzing thoroughly the e�ciency of the software stack. The risk, for example,
is that kernel execution could be over-used, thus incrementing the needs of expen-
sive context switches. Another issue is the fact that the discussion, until now, has
been entirely focused on systems working practically stand-alone, whereas concrete
needs are actually di�erent: the paradigms just seen must be proven to behave
properly also in distributed and in highly replicated environments, such as those of
big data centers. A �rst e�ort in this direction is presented in [147], but this topic
should still be faced in depth by researchers. Moreover, researchers must adapt
the architectures conceived for persistent memories to modern computing trends:
virtualization, multi-core architectures, concurrent and highly parallel computa-
94
tions, and so on. This branch of research in computer science will have to mature
in time: this process will be eventually stimulated by the e�ective release of the
new memory technologies on the market.
2.4 Storage Class Memory and applications
A topic related with those that have just been shown is the level of awareness
of persistence into applications, and, in particular, the level of awareness about
persistent-memory devices. Since operating systems are the foundations on which
applications are built, many researchers have wondered if the presence of persistent
memories could be exposed to applications. As it ordinarily happens, applications
use their own virtual addresses to perform their tasks: it is therefore natural to
question whether applications could use directly persistent memories as they do
with standard DRAM.
Persistence referred to applications is not a novelty, as applications have always
used �les and folders to manage persistent data. Moreover, the topic about the
generic concept of persistence in applications is a vast research domain: in the years
many e�orts have been made to permit the seamless use of persistent data structure
into applications. In particular, researchers concentrated their works to permit to
object-oriented programming languages the use of persistent object through the
use of database back-end. Some examples are ObjectStore [44], Thor [47], and
the Java Persistence Api (JPA) [81, 106]. Other examples referring to persistence
into applications in general [7] and in Java in particular [6] arise from the work of
researchers from the Glasgow University during the 90s. Such e�orts were inspired
principally by the fact that data structures used in programming languages are
badly suited to how persistence is managed by �le systems, and applications usually
perform expensive adaptation management (an example is the complex process of
serialization): a direct use of persistent data structures and persistent objects
into programming languages would be much easier. However, the use of database
back-ends conveys some complex mapping issues. Researchers think that the new
memory technologies could potentially remove those complexities, by removing
both the need of a database back-end and the need of serialization: the topic thus
moves from persistence in general to persistence into main memory, also in the
95
context of user applications.
Exposing storage class memory to applications sounds attractive also for other
reasons: if applications could directly address persistent-memory devices, they
could use their full power without intermediation, thus removing unnecessary over-
heads. In turn, this approach could relief applications from the burden of relying
on the kernel to execute functions related to persistence (through a �le system):
this would permit high savings in terms of latency and energy. As naturally ap-
plications use memory in terms of memory instructions, no translation would be
needed, increasing the e�ectiveness of the approach. Moreover, this strategy would
allow to programmers to �nally use all the researches and studies made in the past
decades to optimize in-memory data structures: until now, as persistence has been
always relegated to slow devices, such slowness has heavily discouraged the direct
use of these data structures. This new approach to persistence would allow to pro-
grammers to use highly e�cient data structures also when coping with persistence.
While the bene�ts could be many, the issues would be many too. A �rst
observation roots in the fact that, as previously shown, the idea of using SCM as
if it was just a slower DRAM has many contraindications: in order to be properly
used, SCM should be used consciously. Moreover, a part of this �consciousness� is
achieved through consistency management and enforcing, as persistence is deeply
related with data consistency13: if SCM was exposed to applications, applications
would have to manage consistency issues.
Another observation related with the preceding one refers to the average qual-
ity of code in user applications in comparison with that of kernel code: usually user
applications cannot be trusted as safe, while kernel code is almost safe. This is not
secondary: currently, by relying on �le system services, applications demand to
the operating system all the tasks related to the needed level of consistency. This
approach has the merit of giving the programmer the opportunity to concentrate
just on its main goal: the development of a functional application. Contrariwise,
exposing persistence to applications would require an increase of the issues that
developers should manage: they should use persistence knowingly. Such an ap-
proach would require extensive code restructuring and rewriting in order to take
pro�t of the potentially available performance gains.
13Consistency permits to persistence to be e�ective.
96
Another important claim made by scholars is that this approach would induce
the typical programming issues to reach the domain of persistence: this event
would yield to a scenario even more complex. For example, the reader would agree
that the management of pointers into programs is central in many programming
languages; the presence of SCM and DRAM together, however, would complicate
their use, and would permit the following pointers:
- non volatile (NV) to NV pointers;
- NV to volatile (V) pointers;
- V to NV pointers;
- V to V pointers.
Clearly, at shutdown, only NV memory areas would survive, exposing the code
that uses unsafe pointers to potential and subtle programming bugs. Moreover,
the risk of dangling pointer, memory leaks, multiple free()s and locking errors
would be present as in every programming language: the risk is however that, in
the absence of appropriate checks, such errors could persist in time, thus becoming
persistent errors.
Two competing university research groups have proposed, around 2011, two dif-
ferent approaches to expose SCM to applications: NV-Heaps [22] and Mnemosyne
[135]. These two papers are considered among scholars as the two reference works
about persistent-memory exploiting in applications: this fact is proved by the
numerous citations that both the papers have in literature.
While the two proposals are quite di�erent from each other, the goals that the
two research group had were nonetheless quite similar: their intent was that of
building a framework for programming languages able to permit to applications
a safe use of persistence into main memory. A major goal undertaken by both
the research group has been that of guaranteeing high consistency and, in the
meanwhile, high performances: the former goal would permit to relief programmer
from the di�cult and error-prone explicit management of consistency, while the
latter one is necessary to use conveniently persistent-memories without sacri�cing
excessively their high performances. Here follows a brief presentations of both the
approaches:
97
NV-Heaps: this proposal presents itself as the most complete. The group that
developed it had the intent to address most of the major problems that spring
from SCM exposition in applications. As it will soon showed, however, this
intent of completeness is payed at cost of generality. NV-Heaps consists in a
C++ library built upon a Linux kernel, and it's focused on �allowing the use
of persistent, user-de�ned object [. . . ] as an attractive abstraction for work-
ing with non-volatile program state�. The system requirements are a XIP �le
system, along with cache epochs14. The services o�ered by their library are
extensive: pointer safety through referential integrity, �exible ACID trans-
actions, a familiar interface (using C++ common syntax), high performance
and high scalability. Each NV-Heap represents a sort of persistent domain
for an application, where only safe pointers can be used: extra NV to NV and
NV to V pointers are avoided. Moreover, the library, through transactions,
permits the correct storage of data in time, while preserving performances.
Concurrency related primitives like atomic sections and generational locks
are supplied. Each NV-Heap, �nally, is managed through a �le abstraction:
each heap is completely self contained, �allowing the system to copy, to move
or transmit them just like normal �les�. NV-Heaps are used by applications
by recurring to the library services, using the �le name as a handle: the
library, then, in co-operation with the kernel, executes the mmap() through
the XIP functionality, mapping the application virtual address space with
the e�ective persistent memory area used by the NV-Heap. Interaction with
kernel is used only when strictly necessary.
Mnemosyne: this proposal o�ers fewer features than the preceding one, but this
simplicity preserves its generality. Mnemosyne too is developed as a library
that o�ers to user mode programs the ability to use safely persistent-memory.
The design goals that the developers decided to follow were respectively:
the prior need to maximize user-mode accesses to persistence, the need to
implement consistent updates and the need to use conventional hardware.
Mnemosyne is therefore developed as a low-level interface, similar to C, that
14This proposal has been developed by the same people that proposed epochs for the BPFS�le system (see section 2.3.6).
98
provides:
- persistent memory regions, allocatable either statically or dynamically
with a pmalloc() instruction similar to malloc();
- persistence primitives to consistently update data;
- durable memory transactions.
Persistent memory regions are managed simply by extending the Linux func-
tionality to manage memory regions, quite similarly to how the kernel is
modi�ed in the SCMFS proposal (see section 2.3.6). Consistent updates are
o�ered through single variable updates, append updates, shadow updates
and in place updates, whereas the implemented persistence primitives con-
sist in a persistent heap and in a log facility. Write ordering is achieved
through the easier approach (mfence and �ush). Finally, transactions are
o�ered through a compiler facility that permits to convert common C/C++
code in transactions.
Each of the two approaches presented above has its strength: while the NV-Heaps
is more thorough, the Mnemosyne one is more general. From the point of view
of generality, the former approach is somehow problematic, as it speci�cally relies
on the C++ programming model. Perhaps a more general model as the latter is
preferable: its services could be used to build, upon it, more specialized libraries
aimed each to serve di�erent programming languages; in such a way, further levels
of consistency could be o�ered. For example, referential integrity in the former
approach is managed through variable overloading, but this feature is not common
to all the programming languages. Perhaps a layered approach would be better
customizable in order to achieve a better �ne-grained level of service.
Trying to look at the eventual weaknesses of these approaches, the most remark-
able one is probably the one related to compatibility with current applications. If
this approach was the only mean to exploit SCM, applications would need to be
re-written or re-structured in order to bene�t from such improvements. Without
these modi�cations, otherwise, applications would continue to use volatile memory
as they always did.
99
While this topic has been just only sketched, it is indeed a valuable part of the
studies about persistence, and would be worthy of further investigations. As a last
remark about these approaches, I have the feeling that these works represent just
the �rst steps on a long path: these proposals remain somehow limited as they
address a part only of the approach to persistent memories, that should be, �nally,
when the times will be mature, a complete approach.
Conclusions
In the �rst chapter, current memories have been presented, along with their limits,
the economical and the technical ones. Afterwards, the new persistent memories
have been introduced, with some details about their internals. Then, my study
moved from the devices, to operating systems: the aim has been that of under-
standing at best the extent of the changes that operating systems should adopt
in order to use at best the new devices. Actually, these devices represent a real
disruptive change in the �eld of computer memories, probably the most notable of
the last decades: if exploited at their full potential, they surely would change the
way in which today storage is conceived. To reach this appealing goal, however,
operating system should undergo a deep restructuring: the approaches seen in the
preceding chapter are just the �rst steps in this direction. Even if those approaches
have been experimental attempts, all the measurements that researchers made to
verify the e�ective performances of their proposals con�rmed the high potential
revenue in terms of latency, throughput and other performance metrics: results
were indeed promising in each of them.
With the aim of reaching the conclusion of this work, my last intent is to
leave here some personal considerations about the future work that researchers
and developers will probably have to do in the next years to prepare operating to
persistent memories.
My personal convincement is that the e�orts that still must be made to achieve
a real persistent-memory awareness in operating systems are many: while each of
the proposals here presented is a concrete step toward it, the reach of the goal is
still far and, up to now, such an operating system doesn't exist yet. As these new
memories are expected to reach a �rst degree of maturity by the next decade, this
101
time window could permit to the scienti�c community to have a reasonable period
to prepare this transition also into operating systems.
A complete solution
While each of the approaches that have been described tries to manage a subset
of the potential bene�ts that persistent memories could o�er, what still is lacking
is a complete solution. Even if persistence awareness is valuable when achieved
through a fast �le system or else, when achieved through application awareness,
it would be even better if users or developers would have not to choose either
one, but if they could bene�t both from the former and from the latter. I think
that this should be one of the long term goals in operating system research: the
implementation of a complete SCM solution.
In the context of a stand-alone system, a complete approach should thus allow
the seamless use of persistent memories through:
- �le system services;
- kernel persistent objects and data structures;
- application persistent objects and data structures.
Anyway, these services, if conceived to remain stand-alone, would result to be
almost useless. Indeed, to reach an exhaustive level of completeness, a correct
approach to persistence should also:
- be highly scalable;
- �t distributed environments;
- �t virtualized and cloud environments;
- adapt easily to new hardware architectures;
- behave adaptively depending on the scale in which it would be used and on
the metrics on which performances would be measured.
102
Without doubt, these �wishes� represent, in their entirety, a tough target: there is
enough material for many and many years of research in operating systems. I have
nonetheless the feeling that slowly, one step at the time, many research domains
in computer science are converging. Perhaps, at the right time, the knowledge
reached in each of them would represent the �critical mass� that would permit
to obtain a complete product, able to exploit thoroughly the new storage class
memories.
Converging research domains
In particular, from what I have read for this work, research about persistence
and persistent memories, is increasingly related to some other research domains of
computer science. These other areas can contribute to pursue a complete approach
in order to exploit persistent memories and, in the meanwhile, they represent
in�uencing factors of how this goal is going to be achieved:
Changing hardware architectures: not only the memory panorama is chang-
ing; also current hardware architectures are experimenting a slow but con-
tinuous change, evolving toward platforms that use many cores, eventually
di�erent from each other: operating system design is trying to follow these
trends [10, 116]. Some new hardware approaches are being developed [103],
and new operating systems, in the case that these new technologies should
succeed, would have to adapt to these new architectures. Moreover, it is
much likely that new computing architectures will be developed taking into
account the recent achievement in memory technology: such e�orts would
thus represent a further opportunity for the search of a thorough persistence-
awareness.
Database systems: the �eld of databases is quickly approaching main memories;
as underlined before, the use of the entire dataset in memory is not new, but
this trend is increasing with the use of �No-SQL� database systems, like
the key-value stores used in distributed caching systems. The knowledge
gained in database systems has been proven to be fundamental in managing
consistency requirements into the approaches to persistence just seen, and it
103
is likely that each further achievement related to in-memory databases, could
be used also in the context of persistent memories. Database paradigms as
the key-value one, have already been hypothesized to be used to exploit
persistent memories [8].
Distributed systems: the need to scale software to big sizes, as those needed
into data centers, already motivated the use of distributed paradigms to
both databases and storage systems. The research in this scienti�c branch
is further advancing: currently, e�orts as those of RAMCloud, represent an
interesting approach to storage systems using only main memory [112, 128].
These e�orts too could represent a valuable and useful contribution to a
further exploiting of persistent memories.
File system design: while the log-structured approach has already been cited,
researchers have proposed to use this approach also when managing DRAM
[123]. These experiences could then be used also for persistent memories.
Transactional memories: transactional memory represents an important re-
search �eld in computer science; in the past, this approach and, more gen-
erally, the need to use implicitly consistency when using main memory, has
been deeply investigated [34, 33]; the knowledge gained in this �eld has been
used many times in the approaches previously presented and, much likely,
each new achievement in this �eld could have the potential to in�uence also
the research about persistent memories and their use in operating systems.
A futuristic operating system
Thinking of a hypothetical next-generation operating system, I would represent it
as being built similarly to current hypervisors used to achieve virtualization. The
storage facility would be an important piece of the hypervisor, and would be the
part conceived to use persistent memory. This hypotetical storage facility would:
- behave as a database, using extensively the fast key-value paradigm: such a
management of its data could then permit the use of variable sizes of data into
distributed environments, abandoning the use of �xed-size blocks. Moreover,
104
a database-like behavior would permit the storage facility to be used as
a service, used by many software layers: operating systems, �le systems,
applications, and so on. Such an approach would permit the transparent
move of the stored data as needed, permitting thus, for example, replication,
scaling, caching. The most fascinating hypotheses would be the ability to
scale a local heap (eventually persistent) from a local process to a distributed
one, in order to permit the use of that data concurrently from, for example,
a server cluster in a data center.
- perform data allocation following a log-structured pattern: this would permit
to manage easily the memory wearing; such a pattern seems to be well suited
to be used together with key-value databases. Data allocation should permit
the concurrent use of many di�erent services (persistent objects, �le systems,
kernel data, and so on);
- use a highly e�cient snapshot facility similar to that used in the WAFL �le
system, or similar to that proposed in [8].
- use transactions and ACID semantics to guarantee reliability at the highest
levels, in order to permit the usage in �production� environments;
- implement the services necessary to �t to multiple distributed environments,
using adaptive technologies that change behavior depending on performances,
tra�c and other metrics.
Final salutation
Despite these personal thoughts and whatever the future will hold about mem-
ory technologies and operating systems, I hope that my work can prove to be a
tool to help the understanding of the topic about persistent-memory awareness in
operating systems.
Appendix A
Asides
A.1 General
ITRS - International Technology Roadmap for Semiconduc-
tors
ITRS, acronym of International Technology Roadmap for Semiconductors is an
international organization built on the ashes of the former born United States'
national organization NTRS, National Technology Roadmap for Semiconductors.
ITRS is currently sponsored by the �ve leading chip manufacturing regions in
the world: Europe, Japan, Korea, Taiwan, United States. The sponsoring or-
ganizations are the semiconductor industry associations of each of those regions:
ESIA, JEITA, KSIA, TSIA and SIA1. Its aim is to help the semiconductor in-
dustry as a whole to maintain its pro�tability, o�ering, among other services, the
following: produce every two years a thorough report about the semiconductor
industry status and its roadmap to maintain the exponential growth. This in
particular is a key document drafted by a international committee of scientists
and technologists, conveying the most exhaustive and accurate assessment on the
semiconductor industry and promoting a deep and vast analysis e�ort on current
and future semiconductor technologies
1Respectively, European Semiconductor Industry Association, Japan Electronics and Infor-mation Technology industries Association, Korea Semiconductor Industry Association, TaiwanSemiconductor Industry Association and Semiconductor Industry Association
106
A.2 Physics and Semiconductors
Ferroelectricity
Property of the matter, usually noticed on some crystal structure materials: these
materials are able to be electrically polarized under the e�ect of an electric �eld,
maintain the polarization when the electric �eld ceases and to reverse (or change)
the polarization if the electric �eld reverses (or changes). Ferroelectricity discovery
roots on studies of piroelectric and piezoelectric properties conducted by Pierre and
Paul-Jacques Curie brothers around 1880, and was �rstly noticed as an anomalous
behavior of Rochelle salt in 1894 by F. Pockels (this salt was �rstly separated
in 1655 by Elie Seignette, an apothecary in the town of La Rochelle, France).
Ferroelectricity was then called as such and identi�ed as a speci�c property of the
matter in 1924 by W. F. G. Swann [75].
Ferromagnetism
From Encyclopedia Britannica: �physical phenomenon in which certain electri-
cally uncharged materials strongly attract others. Two materials found in nature,
lodestone (or magnetite, an oxide of iron, Fe3O4) and iron, have the ability to
acquire such attractive powers, and they are often called natural ferromagnets.
They were discovered more than 2,000 years ago, and all early scienti�c studies of
magnetism were conducted on these materials. Today, ferromagnetic materials are
used in a wide variety of devices essential to everyday life�e.g., electric motors
and generators, transformers, telephones, and loudspeakers.
Ferromagnetism is a kind of magnetism that is associated with iron, cobalt,
nickel, and some alloys or compounds containing one or more of these elements. It
also occurs in gadolinium and a few other rare-earth elements. In contrast to other
substances, ferromagnetic materials are magnetized easily, and in strong magnetic
�elds the magnetization approaches a de�nite limit called saturation. When a
�eld is applied and then removed, the magnetization does not return to its origi-
nal value�this phenomenon is referred to as hysteresis. When heated to a certain
temperature called the Curie point, which is di�erent for each substance, ferro-
magnetic materials lose their characteristic properties and cease to be magnetic;
107
however, they become ferromagnetic again on cooling.
The magnetism in ferromagnetic materials is caused by the alignment patterns
of their constituent atoms, which act as elementary electromagnets. Ferromag-
netism is explained by the concept that some species of atoms possess a magnetic
moment�that is, that such an atom itself is an elementary electromagnet pro-
duced by the motion of electrons about its nucleus and by the spin of its electrons
on their own axes. Below the Curie point, atoms that behave as tiny magnets in
ferromagnetic materials spontaneously align themselves. They become oriented in
the same direction, so that their magnetic �elds reinforce each other.
One requirement of a ferromagnetic material is that its atoms or ions have
permanent magnetic moments. The magnetic moment of an atom comes from its
electrons, since the nuclear contribution is negligible. Another requirement for fer-
romagnetism is some kind of interatomic force that keeps the magnetic moments
of many atoms parallel to each other. Without such a force the atoms would be
disordered by thermal agitation, the moments of neighbouring atoms would neu-
tralize each other, and the large magnetic moment characteristic of ferromagnetic
materials would not exist.
There is ample evidence that some atoms or ions have a permanent magnetic
moment that may be pictured as a dipole consisting of a positive, or north, pole
separated from a negative, or south, pole. In ferromagnets, the large coupling
between the atomic magnetic moments leads to some degree of dipole alignment
and hence to a net magnetization.
Since 1950, and particularly since 1960, several ionically bound compounds
have been discovered to be ferromagnetic. Some of these compounds are electri-
cal insulators; others have a conductivity of magnitude typical of semiconductors.
Such compounds include chalcogenides (compounds of oxygen, sulfur, selenium,
or tellurium), halides (compounds of �uorine, chlorine, bromine, or iodine), and
their combinations. The ions with permanent dipole moments in these materials
are manganese, chromium (Cr), and europium (Eu); the others are diamagnetic.
At low temperatures, the rare-earth metals holmium (Ho) and erbium (Er) have a
nonparallel moment arrangement that gives rise to a substantial spontaneous mag-
netization. Some ionic compounds with the spinel crystal structure also possess
ferromagnetic ordering. A di�erent structure leads to a spontaneous magnetization
108
in thulium (Tm) below 32 kelvins (K).�
Mott transition
�Mott transition describes the transition from insulating to metallic state of a
material. It appears if the electron density and therefore the electron screening of
the coulomb potential changes.
Normally we consider a material either to be a metal or an insulator, depending
on the position of the Fermi energy within the band structure. But due to screening
a transition can take place. To understand this we consider an electron in a �nite
quantum well. There is only a �nite number of bound states inside the well. If its
width is decreased all states move up in energy and the highest ones move outside
the well. Therefore the number of bound states decreases until a critical value is
reached. Below this width there are no more bound states. An insulating material
with a certain lattice and long distances between the atoms is considered. If the
atoms are moved closer together the electron density increases, screening of the
coulomb potential appears and the energy levels move up. After a certain point
there are no more bound states for the outer electrons and the material becomes
a metal.� [90].
Tunnel Junctions
As said into Tsymbal and Kohlstedt paper, �The phenomenon of electron tunneling
has been known since the advent of quantum mechanics, but it continues to enrich
our understanding of many �elds of physics, as well as o�ering a route toward
useful devices. A tunnel junction consists of two metal electrodes separated by
a nanometer-thick insulating barrier layer, as was �rst discussed by Frenkel in
1930. Although forbidden by classical physics, an electron is allowed to traverse
a potential barrier that exceeds the electron's energy. The electron therefore has
a �nite probability of being found on the opposite side of the barrier. A famous
example is electron tunneling in superconducting tunnel junctions, discovered by
Giaever, that allowed measurement of important properties of superconductors. In
the 1970s, spin-dependent electron tunneling from ferromagnetic metal electrodes
across an amorphous Al2O3 �lm was observed by Tedrow and Meservey. The latter
109
discovery led Jullière to propose and demonstrate a magnetic tunnel junction in
which the tunneling current depends on the relative magnetization orientation of
the two ferromagnetic electrodes, the phenomenon nowadays known as tunneling
(or junction) magnetoresistance. New kinds of tunnel junctions may be very useful
for various technological applications. For example, magnetic tunnel junctions
have recently attracted considerable interest due to their potential application in
spin-electronic devices such as magnetic �eld sensors and magnetic random access
memories.� [132].
Tunnel junctions are thus electronic compounds built with di�erent layers of
potentially di�erent materials, acting as resistive switching elements, containing
at least a �tunnel barrier� element. The �tunnel� term refers to the mechanics of
the electron passage into the tunnel barrier element: electron passage is by means
of direct tunneling as studied in quantum mechanics. The e�ective resistive switch
depends on the underlying physical principle, but the e�ect is the modulation and
the change in the electronic potential barrier between layers, resulting in changes
in resistivity of the tunneling layer(s).
Ferromagnetic Tunnel Junctions
A magnetic tunnel junction consists in a sandwich of two magnetic material layers
separated by a thin barrier. One of the two magnetic layers has a �xed magnetic
polarization (�xed layer), whereas in the ferromagnetic layer (free layer) magneti-
zation can be switched. Di�erent magnetic polarization in the free layer interacts
with the polarization of the �xed layer, changing the resistance into the tunneling
layer by means of Giant Magneto-Resistive e�ect.
Ferroelectric Tunnel Junctions
Once again, as said into Tsymbal and Kohlstedt paper, �Yet another concept is the
ferroelectric tunnel junction (FTJ), which takes advantage of a ferroelectric as the
barrier material. Ferroelectrics possess a spontaneous electric polarization that
can be switched by an applied electric �eld. This adds a new functional property
to a tunnel junction, which may lead to novel, yet undiscovered electronic devices
based on FTJs. The discovery of ferroelectricity goes back to 1921 �approximately
110
when the principles of quantum mechanical electron tunneling were formulated.
The basic idea of a FTJ (called a polar switch at that time) was formulated in
1971 by Esaki et al. Owing to a reversible electric polarization, FTJs are expected
to have current-voltage characteristics di�erent from those of conventional tunnel
junctions. The electric �eld�induced polarization reversal of a ferroelectric barrier
may have a profound e�ect on the conductance of a FTJ, leading to resistive
switching when the magnitude of the applied �eld equals that of the coercive
�eld of the ferroelectric. Indeed, the polarization reversal alters the sign of the
polarization charges at a barrier-electrode interface.�
Each ferroelectric tunnel junction is thus a device in which two electrodes sand-
wich a tunnel barrier with ferroelectric properties [28]. The electric polarization of
the barrier can be switched by means of opposite electrical �eld (su�ciently high
to reach the coercitive �eld of the ferroelectric), causing a change into electronic
potential barrier, in turn triggering a di�erent conductance by means of Giant
Electroresistance e�ect [149, 125].
Field E�ect Transistor
Transistors are foundamental semiconductive units featuring three electrodes. A
potential di�erence applied on one electrode (the gate) in�uences the passage of a
current between the other two electrodes (source and drain). Transistors are used
either as switches or as ampli�ers. There are two main types of transistors:
- bipolar junction transistors
- �eld e�ect transistors
A typical �eld e�ect transistor (FET) schema is shown on �gure A.1 [57, p. 247].
If a potential di�erence applied to G is less than threshold, there is no conduct-
ing channel between S and D; vice versa, a conductive channel establishes between
S and D, allowing thus current passage.
Memristor
In 1971 Dr. Leonard Chua hypothesized the existence of these devices, a fourth
basic type of electrical devices along with resistors, capacitors and inductors [21].
111
Figure A.1: Field e�ect transistor, perspective (a) and front (b)© 2001 The McGraw Companies
Fascinating studies on memristance have been undertaken since Chua's hypothesis,
because this type of devices could permit to change the computing paradigm:
networks of memristors can supersede transistors into a processor functional units
and can be used to build a computing paradigm based on neural networks [140].
Researchers are claiming that �nally, the new persistent memories using the 2-
terminal con�guration, are full-�edged memristors if their switching mechanics is
implicitly embedded inside them, as it is the case of redox memory cells [127].
A.3 Operating systems
Hardware technology in�uences �le system design
As already stated, �le systems are one of the sources of added latency and reduced
throughput. Moreover, it has been said that this impact di�ers among �le systems,
depending on their internal design.
File systems are software components designed to serialize and maintain data
persistently into a persistent memory device. However, for more than �fty years,
these �persistent memory devices� have always been identi�ed just as hard disks.
Flash, albeit appeared in 1984, is still considered like a sort of �new comer�. During
this long time, �le systems necessarily have adapted to the features of hard disks:
as they needed to enforce a safe endurance of data in time, this goal can be achieved
more incisively only when the internals of the memory medium are exploited (or,
112
at least, known and taken into account). An example is the transition from the
old traditional Unix �le system (the one developed in Bell Labs) to the newer Unix
Fast File System, then called UFS [51]:
- the �new� �le system did distribute inodes throughout the disk near data
blocks they pointed to, in order to reduce drastically seek time and the need
to execute random reads;
- the �new� �le system was organized into cylinder groups: one of the e�ects
was the added redundancy to replicate the superblock in such a way that
it was distributed between cylinders and platters too (to obtain a better
resiliency upon a single platter failure). It is apparent that it was thoroughly
taken into account the physical structure of hard disks.
The same adaptation on the physical features of the memory media happened
when Flash memories became widespread: many �le systems have been speci�cally
designed for Flash memories (JFFS, YAFFS, YAFFS2, UBIFS, and so on).
As an aside, it could be interesting to note that also in the case of common SSDs
or common Flash USB sticks, even if the internal architecture is hidden, di�erent
�le system settings can change performances because they adapt better or worse
to the underlying architecture: this is the case of block sizes of �le systems. Some
�le system block sizes are well suited to the dimension of the erase size and of the
internal block size of the Flash chips, whereas some other are not [87, 92].
These observations could easily explicate the reason why ext3 performs so badly
when used with fast SSDs
Appendix B
Tables
RankSource
Gartner [74] IEEE [80]
1 Computing Everywhere Wearable devices2 The Internet of Things (IoT) Internet of Anything3 3D Printing Security into software design4 Advanced, Pervasive, Invisible Analytics Software-de�ned Anything (SDx)5 Context-Rich Systems Cloud security and privacy concerns grow6 Smart Machines 3D Printing7 Cloud/Client Architecture Predictive Analytics8 Software-De�ned Infrastructure and Applications Embedded Computing security9 Web-Scale IT Augmented Reality Applications10 Risk-Based Security and Self-Protection Smartphones: new opportunities for Digital Health
Table B.1: Top 10 technology trends for 2015
114
Param
eters
Baselin
ePrototy
pical
Emergin
g
DRAM
NANDFlash
FeR
AM
STT-M
RAM
PCM
Redox
mem
oriesFTJ
ECM
VCM
TCM
BNF
Featu
resize
(nm)
3616
18065
4520
535
4050
Cell
area6
422
204
44
44
n/a
Read
latency
(ns)
<10
10040
3512
n/a
n/a
n/a
n/a
n/a
Write/E
raselaten
cy(ns)
<10
100/100065
35100
<1
<1
10/510
10Enduran
ce>10
16
105
1014
>10
12
109
1010
1012
106
106
104
Data
retention
64ms
10y10y
>10y
>10y
>10y
>10y
>10y
4h3d
Write
energy
(fJ)
40.4
302500
60001000/8000
115/<1000
n/a
100010
TableB.2:Perfo
rmance
compariso
nbetw
eenmem
ories
Figu
rescom
efrom
ITRS2013
Emergin
gResearch
Devices
tables
ERD3,
ERD4a
andERD4b
[82].Figu
resabout
pow
ercon
sumption
ofDRAM
andFlash
could
contain
someprob
lemsrelative
tohow
thevalu
esare
calculated
(tableERD3).
115
Technology - operation Latency (µs) 4K bit per bit (µs) 4K 64 bit (µs)
PCM - Write 0.1 409.6 51.2PCM - Read 0.012 49.152 6.144Emerging memory - Write 0.02 81.92 10.24Emerging memory - Read 0.005 20.48 2.56
Table B.3: 4K Transfer times with PCM and other memories
116
Bus
Year
Tran
sfers/sbit/tran
sferPayload
Bustran
sfertim
eRead
Read
64bit
Write
Write
64bit
SATAIII
20086G
.8600M
B/s
6.83µs
7.2x0.9x
59.97x7.5x
PCIExpress
gen3
20108G
.98985M
B/s
4.16µs
11.82x1.48x
98.46x12.31x
DDR3
20051333M
6410.6G
B/s
0.39µs
126.03x15.75x
1050.26x131.28x
Intel
QPI
20076.4G
1612.8G
B/s
0.32µs
153.6x19.2x
1280x160x
TableB.4:Buslaten
cycompariso
n
Bustran
sfertim
eisthetim
eelap
sedwhen
transferrin
g4K
usin
gthetheoretical
speed
ofthebus.
Thelast
fourcolu
mnsare
ratios:bustran
sfertim
e/mem
oryspeed
.Mem
oryspeed
sare
those
fromtab
leB.3.
Bus
Bustran
sfertim
eHDDLaten
cy(1000µ
s/bus)
SSDLaten
cy(100µ
s/bus)
SATAIII
6.83µs
146x14.64x
PCIExpress
gen3
4.16µs
240x24x
DDR3
0.39µs
2564x256x
Intel
QPI
0.32µs
3125x312x
TableB.5:HDDspeed
vsbustheoretica
lspeed
117
Approach
esRequired
Option
al
Nam
eType
Manager
Tran
slatorE�cien
cySafety
Consisten
cyIntegration
Linux
Std
FS+DAX
Block
driver
Nocach
eDriver
File
system
No
Quill
S.FS+XIP
+library
Block
driver
Nocach
eDriver
File
system
No
BPFS
SCM
FS
n/a
FStune
n/a
Yes
No
PRAMFS
SCM
FS
File
system
(clue)
FStune
Yes
No
No
PMFS
SCM
FS
File
system
FStune
Yes
Yes
No
SCMFS
SCM
FS
Mem
orymgr
File
system
n/a
n/a
Yes
Viab
le:nvmallo
c,nvfree
TableB.6:Persisten
ceaw
aren
essthrough�lesystem
s
LinuxandQuill
aredevelop
edto
use
only
standard
�lesystem
swith
minimal
change
(DAX
andXIP
complian
ceresp
ectively).
SCM
FS:storage
classmem
ory�lesystem
,i.e.
develop
ershave
built
a�lesystem
speci�
callysuited
aroundtheSCMsfeatu
res.Such
speci�
cdesign
usually
re�ects
toe�
ciency:�lesystem
arebuilt
touse
e�cien
tlySCMs;in
these
cases,cach
eavoid
ance
isgiven
forgran
ted.
Bibliography
[1] Ameen Akel et al. �Onyx: A Protoype Phase Change Memory Storage Ar-
ray�. In: Proceedings of the 3rd USENIX Conference on Hot Topics in
Storage and File Systems. HotStorage'11. Portland, OR: USENIX Asso-
ciation, 2011, pp. 2�2. url: http://dl.acm.org/citation.cfm?id=2002
218.2002220.
[2] Louis Alex, Eisner Todor, and Mollov Steven Swanson. Quill: Exploit-
ing Fast Non-Volatile Memory by Transparently Bypassing the File System.
2013.
[3] Ross Anderson and Markus Kuhn. �Tamper Resistance: A Cautionary
Note�. In: Proceedings of the 2Nd Conference on Proceedings of the Sec-
ond USENIX Workshop on Electronic Commerce - Volume 2. WOEC'96.
Oakland, California: USENIX Association, 1996, pp. 1�1. url: http:
//dl.acm.org/citation.cfm?id=1267167.1267168.
[4] Dmytro Apalkov et al. �Spin-transfer Torque Magnetic Random Access
Memory (STT-MRAM)�. in: J. Emerg. Technol. Comput. Syst. 9.2 (2013),
pp. 13�1. url: http://doi.acm.org/10.1145/2463585.2463589.
[5] Wolfgang Arden et al. More-than-Moore. Tech. rep. ITRS, 2010. url: ht
tp://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%
20Links/2010ITRS/IRC-ITRS-MtM-v2%203.pdf.
[6] M. P. Atkinson et al. �An Orthogonally Persistent Java�. In: SIGMOD
Rec. 25.4 (1996), pp. 68�75. url: http://doi.acm.org/10.1145/245882
.245905.
[7] Malcolm Atkinson and Ronald Morrison. �Orthogonally Persistent Object
119
Systems�. In: The VLDB Journal 4.3 (1995), pp. 319�402. url: http:
//dl.acm.org/citation.cfm?id=615224.615226.
[8] Katelin A. Bailey et al. �Exploring Storage Class Memory with Key Value
Stores�. In: Proceedings of the 1st Workshop on Interactions of NVM/FLASH
with Operating Systems and Workloads. INFLOW '13. Farmington, Penn-
sylvania: ACM, 2013, pp. 4�1. url: http://doi.acm.org/10.1145/2527
792.2527799.
[9] Mary Baker et al. �Non-volatile Memory for Fast, Reliable File Systems�.
In: Proceedings of the Fifth International Conference on Architectural Sup-
port for Programming Languages and Operating Systems. ASPLOS V.
Boston, Massachusetts, USA: ACM, 1992, pp. 10�22. url: http://do
i.acm.org/10.1145/143365.143380.
[10] Andrew Baumann et al. �The Multikernel: A New OS Architecture for
Scalable Multicore Systems�. In: Proceedings of the ACM SIGOPS 22Nd
Symposium on Operating Systems Principles. SOSP '09. Big Sky, Montana,
USA: ACM, 2009, pp. 29�44. url: http://doi.acm.org/10.1145/1629
575.1629579.
[11] Tony Benavides et al. �The Enabling of an Execute-In-Place Architecture
to Reduce the Embedded System Memory Footprint and Boot Time�. In:
JCP 3.1 (2008), pp. 79�89. url: http://dx.doi.org/10.4304/jcp.3.1
.79-89.
[12] Keren Bergman et al. ExaScale Computing Study: Technology Challenges
in Achieving Exascale Systems Peter Kogge, Editor & Study Lead. 2008.
[13] R. Bez et al. �Introduction to �ash memory�. In: Proceedings of the IEEE
91.4 (2003), pp. 489�502. url: http://dx.doi.org/10.1109/JPROC.200
3.811702.
[14] Tim R. Bird. �Methods to Improve Bootup Time in Linux�. In: Proceedings
of the Linux Symposium 2004. Vol. I. 2004, pp. 79�88.
[15] Julien Borghetti et al. �Memristive switches enable stateful logic operations
via material implication�. In: Nature 464.7290 (2010), pp. 873�876. url:
http://dx.doi.org/10.1038/nature08940.
120
[16] Adrian M. Caul�eld et al. �Moneta: A High-Performance Storage Array
Architecture for Next-Generation, Non-volatile Memories�. In: Proceedings
of the 2010 43rd Annual IEEE/ACM International Symposium on Microar-
chitecture. MICRO '43. Washington, DC, USA: IEEE Computer Society,
2010, pp. 385�395. url: http://dx.doi.org/10.1109/MICRO.2010.33.
[17] Understanding the Impact of Emerging Non-Volatile Memories on High-
Performance, IO-Intensive Computing. Proceedings of the 2010 ACM/IEEE
International Conference for High Performance Computing, Networking,
Storage and Analysis. Washington, DC, USA: IEEE Computer Society,
2010, pp. 1�11. url: http://dx.doi.org/10.1109/SC.2010.56.
[18] Ting-Chang Chang et al. �Developments in nanocrystal memory�. In: Ma-
terials Today 14.12 (2011), pp. 608�615. url: http://www.sciencedirec
t.com/science/article/pii/S1369702111703029.
[19] Peter M. Chen et al. �The Rio File Cache: Surviving Operating System
Crashes�. In: Proceedings of the Seventh International Conference on Ar-
chitectural Support for Programming Languages and Operating Systems.
ASPLOS VII. Cambridge, Massachusetts, USA: ACM, 1996, pp. 74�83.
url: http://doi.acm.org/10.1145/237090.237154.
[20] Sangyeun Cho and Hyunjin Lee. �Flip-N-Write: A simple deterministic
technique to improve PRAM write performance, energy and endurance�.
In: Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM Inter-
national Symposium on. 2009, pp. 347�357.
[21] L. O. Chua. �Memristor-The missing circuit element�. In: Circuit Theory,
IEEE Transactions on 18.5 (1971), pp. 507�519. url: http://dx.doi.o
rg/10.1109/TCT.1971.1083337.
[22] NV-Heaps: making persistent objects fast and safe with next-generation,
non-volatile memories. Vol. 39. ACM SIGARCH Computer Architecture
News 1. 2011, pp. 105�118. url: http://dl.acm.org/citation.cfm?i
d=1950380.
[23] Jeremy Condit et al. �Better I/O through byte-addressable, persistent mem-
ory�. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating
121
systems principles. 2009, pp. 133�146. url: http://dl.acm.org/citat
ion.cfm?id=1629589.
[24] R. H. Dennard. �Technical literature [Reprint of "Field-E�ect Transistor
Memory" (US Patent No. 3,387,286)]�. In: Solid-State Circuits Society
Newsletter, IEEE 13.1 (2008), pp. 17�25. url: http://dx.doi.org/10.
1109/N-SSC.2008.4785686.
[25] Subramanya R. Dulloor et al. �System Software for Persistent Memory�.
In: Proceedings of the Ninth European Conference on Computer Systems.
EuroSys '14. Amsterdam, The Netherlands: ACM, 2014, pp. 15�1. url:
http://doi.acm.org/10.1145/2592798.2592814.
[26] W. Enck et al. �Defending Against Attacks on Main Memory Persistence�.
In: Computer Security Applications Conference, 2008. ACSAC 2008. An-
nual. 2008, pp. 65�74. url: http://dx.doi.org/10.1109/ACSAC.2008.
45.
[27] Michael Fitsilis and Rainer Waser. �Scaling of the ferroelectric �eld ef-
fect transistor and programming concepts for non-volatile memory applica-
tions�. Aachen, Techn. Hochsch., Diss., 2005. PhD thesis. Aachen: Fakul-
tat fur Elektrotechnik und Informationstechnik der Rheinisch-Westfalischen
Technischen Hochschule Aachen, 2005. url: http://publications.rwt
h-aachen.de/record/62096.
[28] Vincent Garcia and Manuel Bibes. �Ferroelectric tunnel junctions for in-
formation storage and processing�. In: Nat Commun 5.. (2014). Review,
p. . url: http://dx.doi.org/10.1038/ncomms5289.
[29] Paolo Gargini. �The Roadmap to Success: 2013 ITRS Update�. In: Sem-
inar on 2013 ITRS Roadmap Update. 2014. url: http://www.ewh.ieee
.org/r6/scv/eds/slides/2014-Mar-11-Paolo.pdf.
[30] Bharan Giridhar et al. �Exploring DRAMOrganizations for Energy-e�cient
and Resilient Exascale Memories�. In: Proceedings of the International
Conference on High Performance Computing, Networking, Storage and Anal-
ysis. SC '13. Denver, Colorado: ACM, 2013, pp. 23�1. url: http:
//doi.acm.org/10.1145/2503210.2503215.
122
[31] Theo Haerder and Andreas Reuter. �Principles of Transaction-oriented
Database Recovery�. In: ACM Comput. Surv. 15.4 (1983), pp. 287�317.
url: http://doi.acm.org/10.1145/289.291.
[32] J. Alex Halderman et al. �Lest We Remember: Cold-boot Attacks on
Encryption Keys�. In: Commun. ACM 52.5 (2009), pp. 91�98. url:
http://doi.acm.org/10.1145/1506409.1506429.
[33] Lance Hammond et al. �Transactional Memory Coherence and Consis-
tency�. In: Proceedings of the 31st Annual International Symposium on
Computer Architecture. ISCA '04. München, Germany: IEEE Com-
puter Society, 2004, pp. 102�. url: http://dl.acm.org/citation.cfm?i
d=998680.1006711.
[34] Maurice Herlihy and J. Eliot B. Moss. �Transactional Memory: Archi-
tectural Support for Lock-free Data Structures�. In: SIGARCH Comput.
Archit. News 21.2 (1993), pp. 289�300. url: http://doi.acm.org/10.1
145/173682.165164.
[35] Dave Hitz, James Lau, and Michael Malcolm. �File System Design for
an NFS File Server Appliance�. In: Proceedings of the USENIX Winter
1994 Technical Conference on USENIX Winter 1994 Technical Conference.
WTEC'94. San Francisco, California: USENIX Association, 1994, pp. 19�
19. url: http://dl.acm.org/citation.cfm?id=1267074.1267093.
[36] John E. Hopcroft, Rajeev Motwani, and Je�rey D. Ullman. Automi, lin-
guaggi e calcolabilità. Ed. by Pearson Education. Prima Edizione Italiana.
Pearson Education, 2003.
[37] H. Hunter, L. A. Lastras-Montano, and B. Bhattacharjee. �Adapting
Server Systems for New Memory Technologies�. In: Computer 47.9 (2014),
pp. 78�84. url: http://dx.doi.org/10.1109/MC.2014.233.
[38] Ju-Young Jung and Sangyeun Cho. �Dynamic Co-management of Persis-
tent RAM Main Memory and Storage Resources�. In: Proceedings of the
8th ACM International Conference on Computing Frontiers. CF '11. Is-
chia, Italy: ACM, 2011, pp. 13�1. url: http://doi.acm.org/10.1145/
2016604.2016620.
123
[39] Tolga Kaya and Hur Koser. �A New Batteryless Active RFID System:
Smart RFID�. in: RFID Eurasia, 2007 1st Annual. 2007, pp. 1�4. url:
http://dx.doi.org/10.1109/RFIDEURASIA.2007.4368151.
[40] Kyung Min Kim, Doo Seok Jeong, and Cheol Seong Hwang. �Nano�la-
mentary resistive switching in binary oxide system: a review on the present
status and outlook�. In: Nanotechnology 22.25 (2011), p. 254002. url:
http://dx.doi.org/10.1088/0957-4484/22/25/254002.
[41] Myungsik Kim, Jinchul Shin, and Youjip Won. �Selective Segment Ini-
tialization: Exploiting NVRAM to Reduce Device Startup Latency�. In:
Embedded Systems Letters, IEEE 6.2 (2014), pp. 33�36.
[42] Young-Jin Kim et al. �I/O Performance Optimization Techniques for Hy-
brid Hard Disk-Based Mobile Consumer Devices�. In: Consumer Elec-
tronics, IEEE Transactions on 53.4 (2007), pp. 1469�1476. url: http:
//dx.doi.org/10.1109/TCE.2007.4429239.
[43] B. T. Kolomiets. �Vitreous Semiconductors (I)�. in: physica status solidi
(b) 7.2 (1964), pp. 359�372. url: http://dx.doi.org/10.1002/pssb.19
640070202.
[44] Charles Lamb et al. �The ObjectStore Database System�. In: Commun.
ACM 34.10 (1991), pp. 50�63. url: http://doi.acm.org/10.1145/1252
23.125244.
[45] Simon Lavington. �In the Footsteps of Colossus: A Description of Oedipus�.
In: IEEE Ann. Hist. Comput. 28.2 (2006), pp. 44�55. url: http:
//dx.doi.org/10.1109/MAHC.2006.34.
[46] Benjamin C. Lee et al. �Architecting Phase Change Memory As a Scalable
Dram Alternative�. In: Proceedings of the 36th Annual International Sym-
posium on Computer Architecture. ISCA '09. Austin, TX, USA: ACM,
2009, pp. 2�13. url: http://doi.acm.org/10.1145/1555754.1555758.
[47] B. Liskov et al. �Safe and E�cient Sharing of Persistent Objects in Thor�.
In: Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data. SIGMOD '96. Montreal, Quebec, Canada: ACM,
1996, pp. 318�329. url: http://doi.acm.org/10.1145/233269.233346.
124
[48] T. P. Ma and Jin-Ping Han. �Why is nonvolatile ferroelectric memory
�eld-e�ect transistor still elusive?� In: Electron Device Letters, IEEE 23.7
(2002), pp. 386�388. url: http://dx.doi.org/10.1109/LED.2002.1015
207.
[49] F. Masuoka et al. �A new �ash E2PROM cell using triple polysilicon tech-
nology�. In: Electron Devices Meeting, 1984 International. Vol. 30. 1984,
pp. 464�467. url: http://dx.doi.org/10.1109/IEDM.1984.190752.
[50] Brian Matas and Christian De Suberbasaux. MEMORY 1997. Ed. by In-
tegrated Circuit Engineering Corporation. Integrated Circuit Engineering,
1997. url: http://smithsonianchips.si.edu/ice/cd/MEMORY97/titl
e.pdf.
[51] Marshall K. McKusick et al. �A Fast File System for UNIX�. in: ACM
Trans. Comput. Syst. 2.3 (1984), pp. 181�197. url: http://doi.acm.or
g/10.1145/989.990.
[52] Stephan Menzel et al. �Switching kinetics of electrochemical metallization
memory cells�. In: Phys. Chem. Chem. Phys. 15.18 (2013), pp. 6945�
6952. url: http://dx.doi.org/10.1039/C3CP50738F.
[53] Je�rey C. Mogul et al. �Operating System Support for NVM+DRAM Hy-
brid Main Memory�. In: Proceedings of the 12th Conference on Hot Top-
ics in Operating Systems. HotOS'09. Monte Verità, Switzerland:
USENIX Association, 2009, pp. 14�14. url: http://dl.acm.org/citat
ion.cfm?id=1855568.1855582.
[54] G. E. Moore. �No exponential is forever: but "Forever" can be delayed!
[semiconductor industry]�. In: Solid-State Circuits Conference, 2003. Di-
gest of Technical Papers. ISSCC. 2003 IEEE International. 2003, pp. 20�
23. url: http://dx.doi.org/10.1109/ISSCC.2003.1234194.
[55] O. Mutlu. �Memory scaling: A systems architecture perspective�. In: Mem-
ory Workshop (IMW), 2013 5th IEEE International. 2013, pp. 21�25. url:
http://dx.doi.org/10.1109/IMW.2013.6582088.
[56] Whole-system persistence. Vol. 40. ACM SIGARCH Computer Architec-
ture News 1. 2012, pp. 401�410. url: http://dl.acm.org/citation.cf
125
m?id=2151018.
[57] Donald A. Neamen. Electronic Circuit Analysis and Design. Ed. by Mc-
GrawHill. 2nd. McGrawHill, 2000.
[58] Rajesh Nishtala et al. �Scaling Memcache at Facebook�. In: Presented
as part of the 10th USENIX Symposium on Networked Systems Design
and Implementation (NSDI 13). Lombard, IL: USENIX, 2013, pp. 385�
398. url: https://www.usenix.org/conference/nsdi13/technical-s
essions/presentation/nishtala.
[59] S. Oikawa. �Virtualizing Storage as Memory for High Performance Storage
Access�. In: Parallel and Distributed Processing with Applications (ISPA),
2014 IEEE International Symposium on. 2014, pp. 18�25. url: http:
//dx.doi.org/10.1109/ISPA.2014.12.
[60] Shuichi Oikawa. �Non-volatile main memory management methods based
on a �le system�. In: SpringerPlus 3.1 (2014), p. 494. url: http://www.
springerplus.com/content/3/1/494.
[61] [online]. 3D NAND: Bene�ts of Charge Traps over Floating Gates. 2013.
url: http://thememoryguy.com/3d-nand-benefits-of-charge-traps
-over-floating-gates/.
[62] [online]. Amazon ElastiCache. last accessed: 2015. url: http://aws.am
azon.com/elasticache/.
[63] [online]. An idiosyncratic survey of Spintronics. last accessed: 2015. url:
https://physics.tamu.edu/calendar/talks/cmseminars/cm_talks/200
7_10_18_Levy_P.pdf.
[64] [online]. BEE3. last accessed: 2015. url: http://research.microsoft
.com/en-us/projects/bee3/.
[65] [online]. Big Data. last accessed: 2015. url: http://lookup.computerl
anguage.com/host_app/search?cid=C999999&term=Big%20Data.
[66] [online]. Comparing Technologies: MRAM vs. FRAM. last accessed: 2015.
url: http://www.everspin.com/PDF/EST02130_Comparing_Technologie
s_FRAM_vs_MRAM_AppNote.pdf.
126
[67] [online]. DARPA Developing ExtremeScale Supercomputer System. 2010.
url: http://www.darpa.mil/WorkArea/DownloadAsset.aspx?id=1795.
[68] [online]. Datacenter Construction Expected To Boom. 2014. url: http://
www.enterprisetech.com/2014/04/17/datacenter-construction-expec
ted-boom/.
[69] [online]. Dell PowerEdge R920 Data Sheet. last accessed: 2015. url: ht
tp://i.dell.com/sites/doccontent/shared-content/data-sheets/en
/Documents/PowerEdge_R920_Spec-Sheet.pdf.
[70] [online]. European Exascale Software Initiative [Home Page]. 2013. url:
http://www.eesi-project.eu/pages/menu/homepage.php.
[71] [online]. FRAM Structure. last accessed: 2015. url: http://www.fujits
u.com/global/products/devices/semiconductor/memory/fram/overvi
ew/structure/.
[72] [online]. Fundamentals of volatile memory technologies. 2011. url: http:
//www.electronicproducts.com/Digital_ICs/Memory/Fundamentals_o
f_volatile_memory_technologies.aspx.
[73] [online]. Further adventures in non-volatile memory. last accessed: 2015.
url: https://www.youtube.com/watch?v=UzsPnw11KX0.
[74] [online]. Gartner Identi�es the Top 10 Strategic Technology Trends for
2015. 2014. url: http://www.gartner.com/newsroom/id/2867917.
[75] [online]. History of ferroelectrics. last accessed: 2015. url: http://www.
ieee-uffc.org/ferroelectrics/learning-e003.asp.
[76] [online]. How Does Flash Memory Store Data? last accessed: 2015. url:
https://product.tdk.com/info/en/techlibrary/archives/techjourn
al/vol01_ssd/contents03.html.
[77] [online]. HP and SK Hynix Cancel Plans to Commercialize Memristor-
Based Memory in 2013. 2012. url: http://www.xbitlabs.com/news/st
orage/display/20120927125227_HP_and_Hynix_Cancel_Plans_to_Comme
rcialize_Memristor_Based_Memory_in_2013.html.
127
[78] [online]. Hybrid Memory Cube Consortium - Home Page. last accessed:
2015. url: http://www.hybridmemorycube.org/.
[79] [online]. IBM 350 disk storage unit. last accessed: 2015. url: http://ww
w-03.ibm.com/ibm/history/exhibits/storage/storage_350.html.
[80] [online]. IEEE-CS Unveils Top 10 Technology Trends for 2015. 2014. url:
http://www.computer.org/web/pressroom/2015-tech-trends.
[81] [online]. Introduction to the Java Persistence API. last accessed: 2015.
url: http://docs.oracle.com/javaee/6/tutorial/doc/bnbpz.html.
[82] [online]. ITRS 2013 ERD TABLES. last accessed: 2015. url: https://ww
w.dropbox.com/sh/2fme4y0avvv7uxs/AAAB10oeC7wNtQkFp5XAcenba/ITR
S/2013ITRS/2013ITRS%20Tables_R1/ERD_2013Tables.xlsx?dl=0.
[83] [online]. ITRS 2013 EXECUTIVE SUMMARY. last accessed: 2015. url:
http://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%
20Links/2013ITRS/2013Chapters/2013ExecutiveSummary.pdf.
[84] [online]. ITRS ERD 2013 REPORT. last accessed: 2015. url: https://ww
w.dropbox.com/sh/6xq737bg6pww9gq/AAAXRzGlUis1sVUxurZnMCY4a/201
3ERD.pdf?dl=0.
[85] [online]. Log-structured �le systems. 2009. url: http://lwn.net/Articl
es/353411/.
[86] [online]. Magnetic Core Memory. last accessed: 2015. url: http://www.
computerhistory.org/revolution/memory-storage/8/253.
[87] [online]. Managing �ash storage with Linux. 2012. url: http://free-e
lectrons.com/blog/managing-flash-storage-with-linux/.
[88] [online]. Mechanical roadmap points to hard drives over 100TB by 2025.
2014. url: http://techreport.com/news/27420/mechanical-roadmap
-points-to-hard-drives-over-100tb-by-2025.
[89] [online]. More Details on Today's Outage. 2010. url: https://www.face
book.com/notes/facebook-engineering/more-details-on-todays-out
age/431441338919.
128
[90] [online]. Mott transition. last accessed: 2015. url: http://lamp.tu-gra
z.ac.at/~hadley/ss2/problems/mott/s.pdf.
[91] [online]. NVM Express and the PCI Express SSD Revolution. 2012. url:
http://www.nvmexpress.org/wp-content/uploads/2013/04/IDF-2012-N
VM-Express-and-the-PCI-Express-SSD-Revolution.pdf.
[92] [online]. Optimizing Linux with cheap �ash drives. 2011. url: http:
//lwn.net/Articles/428584/.
[93] [online]. [PATCH v10 11/21] Replace XIP documentation with DAX. last
accessed: 2015. url: http://lwn.net/Articles/610316/.
[94] [online]. PCM BECOMES A REALITY. 2009. url: http://www.object
ive-analysis.com/uploads/2009-08-03_Objective_Analysis_PCM_Whi
te_Paper.pdf.
[95] [online]. Protected and Persistent RAM Filesystem. last accessed: 2015.
url: http://pramfs.sourceforge.net/.
[96] [online]. Samsung 850 PRO Speci�cations. last accessed: 2015. url: http:
//www.samsung.com/global/business/semiconductor/minisite/SSD/g
lobal/html/ssd850pro/specifications.html.
[97] [online]. Seagate preps for 30TB laser-assisted hard drives. 2014. url: ht
tp://www.computerworld.com/article/2846415/seagate-preps-for-3
0tb-laser-assisted-hard-drives.html.
[98] [online]. Solid Memory by Toshiba. last accessed: 2015. url: http://www.
toshiba-memory.com/cms/en/meta/memory_division/about_us.html.
[99] [online]. Supporting �lesystems in persistent memory. last accessed: 2015.
url: http://lwn.net/Articles/610174/.
[100] [online]. The Discovery of Giant Magnetoresistance. 2007. url: http://
www.nobelprize.org/nobel_prizes/physics/laureates/2007/advance
d-physicsprize2007.pdf.
[101] [online]. The High-k Solution. 2007. url: http://spectrum.ieee.org/
semiconductors/design/the-highk-solution.
129
[102] [online]. The Inconvenient Truths of NAND Flash Memory. 2007. url: ht
tps://www.micron.com/~/media/documents/products/presentation/fl
ash_mem_summit_jcooke_inconvenient_truths_nand.pdf.
[103] [online]. The Machine: A new kind of computer. last accessed: 2015. url:
http://www.hpl.hp.com/research/systems-research/themachine/.
[104] [online]. The Transition to PCI Express for Client SSDs. 2012. url: http:
//www.flashmemorysummit.com/English/Collaterals/Proceedings/20
12/20120821_S102C_Huffman.pdf.
[105] [online]. Ultrastar He8. last accessed: 2015. url: http://www.hgst.com
/hard-drives/enterprise-hard-drives/enterprise-sas-drives/ultr
astar-he8.
[106] [online]. Understanding JPA. 2008. url: http://www.javaworld.com/ar
ticle/2077817/java-se/understanding-jpa-part-1-the-object-orien
ted-paradigm-of-data-persistence.html?null.
[107] [online]. Understanding Moore's Law: Four Decades of Innovation. 2006.
url: http://www.chemheritage.org/community/store/books-and-cat
alogs/understanding-moores-law.aspx.
[108] [online]. Ushering in the 3D Memory Era with V- NAND. 2013. url: http:
//www.flashmemorysummit.com/English/Collaterals/Proceedings/20
13/20130813_KeynoteB_Elliot_Jung.pdf.
[109] [online]. WD Black - Mobile Hard Drives. last accessed: 2015. url: http:
//www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771435.pdf.
[110] [online]. Whole System Persistence Computer based on NVDIMM. 2014.
url: https://www.youtube.com/watch?v=gFuXn2QHXWo.
[111] [online]. ZFS THE LAST WORD IN FILE SYSTEMS. 2008. url: http:
//lib.stanford.edu/files/pasig-spring08/RaymondClark_ZFS_Overi
view.pdf.
[112] John Ousterhout et al. �The Case for RAMCloud�. In: Commun. ACM
54.7 (2011), pp. 121�130. url: http://doi.acm.org/10.1145/1965724.
1965751.
130
[113] Stanford R. Ovshinsky. �Reversible Electrical Switching Phenomena in
Disordered Structures�. In: Phys. Rev. Lett. 21.20 (1968), pp. 1450�1453.
url: http://link.aps.org/doi/10.1103/PhysRevLett.21.1450.
[114] Stuart S. P. Parkin, Masamitsu Hayashi, and Luc Thomas. �Magnetic
domain-wall racetrack memory.� In: Science (New York, N.Y.) 320.5873
(2008), pp. 190�194. url: http://www.ncbi.nlm.nih.gov/pubmed/?ter
m=18403702.
[115] David A. Patterson. �Latency Lags Bandwith�. In: Commun. ACM 47.10
(2004), pp. 71�75. url: http://doi.acm.org/10.1145/1022594.102259
6.
[116] Simon Peter et al. �Arrakis: The Operating System is the Control Plane�.
In: 11th USENIX Symposium on Operating Systems Design and Implemen-
tation (OSDI 14). Broom�eld, CO: USENIX Association, 2014, pp. 1�
16. url: https://www.usenix.org/conference/osdi14/technical-s
essions/presentation/peter.
[117] P. A. H. Peterson. �Cryptkeeper: Improving security with encrypted
RAM�. in: Technologies for Homeland Security (HST), 2010 IEEE Inter-
national Conference on. 2010, pp. 120�126. url: http://dx.doi.org/1
0.1109/THS.2010.5655081.
[118] Moinuddin K. Qureshi et al. �Enhancing Lifetime and Security of PCM-
based Main Memory with Start-gap Wear Leveling�. In: Proceedings of the
42Nd Annual IEEE/ACM International Symposium on Microarchitecture.
MICRO 42. New York, New York: ACM, 2009, pp. 14�23. url: http:
//doi.acm.org/10.1145/1669112.1669117.
[119] D. C. Ralph and M. D. Stiles. �Spin transfer torques�. In: Journal of Mag-
netism and Magnetic Materials 320.7 (2008), pp. 1190�1216. url: http:
//www.sciencedirect.com/science/article/pii/S0304885307010116.
[120] Simone Raoux, Welnic Wojciech, and Daniele Ielmini. �Phase Change
Materials and Their Application to Nonvolatile Memories�. In: Chemi-
cal Reviews 110.1 (2010). PMID: 19715293, pp. 240�267. url: http:
//dx.doi.org/10.1021/cr900040x.
131
[121] Ohad Rodeh, Josef Bacik, and Chris Mason. �BTRFS: The Linux B-Tree
Filesystem�. In: Trans. Storage 9.3 (2013), pp. 9�1. url: http://doi.ac
m.org/10.1145/2501620.2501623.
[122] Mendel Rosenblum and John K. Ousterhout. �The Design and Implementa-
tion of a Log-structured File System�. In: ACM Trans. Comput. Syst. 10.1
(1992), pp. 26�52. url: http://doi.acm.org/10.1145/146941.146943.
[123] Log-structured Memory for DRAM-based Storage. Proceedings of the 12th
USENIX Conference on File and Storage Technologies. Santa Clara, CA:
USENIX, 2014, pp. 1�16. url: https://www.usenix.org/conference/
fast14/technical-sessions/presentation/rumble.
[124] K. Sakui. �Professor Fujio Masuoka's Passion and Patience Toward Flash
Memory�. In: Solid-State Circuits Magazine, IEEE 5.4 (2013), pp. 30�33.
url: http://dx.doi.org/10.1109/MSSC.2013.2278084.
[125] Rohit Soni et al. �Giant electrode e�ect on tunnelling electroresistance in
ferroelectric tunnel junctions�. In: Nat Commun 5.. (2014). Article, p. .
url: http://dx.doi.org/10.1038/ncomms6414.
[126] D. B. Strukov and H. Kohlstedt. �Resistive switching phenomena in thin
�lms: Materials, devices, and applications�. In: MRS Bulletin 37.02 (2012),
pp. 108�114. url: http://journals.cambridge.org/article_S088376
9412000024.
[127] Dmitri B. Strukov et al. �The missing memristor found�. In: Nature 453.7191
(2008), pp. 80�83. url: http://dx.doi.org/10.1038/nature06932.
[128] Ryan Stutsman and John Ousterhout. �Toward Common Patterns for Dis-
tributed, Concurrent, Fault-Tolerant Code�. In: Presented as part of the
14th Workshop on Hot Topics in Operating Systems. Santa Ana Pueblo,
NM: USENIX, 2013. url: https://www.usenix.org/toward-common-p
atterns-distributed-concurrent-fault-tolerant-code.
[129] S. Swanson and A. M. Caul�eld. �Refactor, Reduce, Recycle: Restructur-
ing the I/O Stack for the Future of Storage�. In: Computer 46.8 (2013),
pp. 52�59. url: http://dx.doi.org/10.1109/MC.2013.222.
132
[130] Andrew S. Tanenbaum and Albert S. Woodhull. Operating Systems - De-
sign and Implementation. Ed. by Pearson International. 3rd. International,
Pearson, 2009.
[131] Junji Tominaga et al. �Large Optical Transitions in Rewritable Digital Ver-
satile Discs: An Interlayer Atomic Zipper in a SbTe Alloy�. In: Sym-
posium G � Phase-Change Materials for Recon�gurable Electronics and
Memory Applications. Vol. 1072. MRS Proceedings. 2008. url: http:
//journals.cambridge.org/article_S1946427400030414.
[132] Evgeny Y. Tsymbal and Hermann Kohlstedt. �Tunneling Across a Ferro-
electric�. In: Science 313.5784 (2006), pp. 181�183. url: http://www.sc
iencemag.org/content/313/5784/181.short.
[133] Julian Turner. �E�ects of Data Center Vibration on Compute System Per-
formance�. In: Proceedings of the First USENIX Conference on Sustainable
Information Technology. SustainIT'10. San Jose, CA: USENIX Associa-
tion, 2010, pp. 5�5. url: http://dl.acm.org/citation.cfm?id=186315
9.1863164.
[134] P. Vettiger et al. �The "millipede" - nanotechnology entering data storage�.
In: Nanotechnology, IEEE Transactions on 1.1 (2002), pp. 39�55. url:
http://dx.doi.org/10.1109/TNANO.2002.1005425.
[135] Mnemosyne: Lightweight persistent memory. Vol. 39. ACM SIGARCH
Computer Architecture News 1. 2011, pp. 91�104. url: http://dl.acm
.org/citation.cfm?id=1950379.
[136] YiqunWang et al. �A 3us wake-up time nonvolatile processor based on ferro-
electric �ip-�ops�. In: ESSCIRC. IEEE, 2012, pp. 149�152. url: http://
ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6331297.
[137] Rainer Waser et al. �Redox-Based Resistive Switching Memories � Nanoionic
Mechanisms, Prospects, and Challenges�. In: Advanced Materials 21.25-26
(2009), pp. 2632�2663. url: http://dx.doi.org/10.1002/adma.200900
375.
[138] H. A. R. Wegener et al. �The variable threshold transistor, a new electrically-
alterable, non-destructive read-only storage device�. In: Electron Devices
133
Meeting, 1967 International. Vol. 13. 1967, pp. 70�70. url: http:
//dx.doi.org/10.1109/IEDM.1967.187833.
[139] S. A. Wolf et al. �Spintronics: A Spin-Based Electronics Vision for the
Future�. In: Science 294.5546 (2001), pp. 1488�1495. url: http://www.
sciencemag.org/content/294/5546/1488.abstract.
[140] C. David Wright, Peiman Hosseini, and Jorge A. Vazquez Diosdado. �Be-
yond von-Neumann Computing with Nanoscale Phase-Change Memory De-
vices�. In: Advanced Functional Materials 23.18 (2013), pp. 2248�2254.
url: http://dx.doi.org/10.1002/adfm.201202383.
[141] Michael Wu and Willy Zwaenepoel. �eNVy: A Non-volatile, Main Memory
Storage System�. In: Proceedings of the Sixth International Conference on
Architectural Support for Programming Languages and Operating Systems.
ASPLOS VI. San Jose, California, USA: ACM, 1994, pp. 86�97. url:
http://doi.acm.org/10.1145/195473.195506.
[142] Xiaojian Wu, Sheng Qiu, and A. L. Narasimha Reddy. �SCMFS: A File
System for Storage Class Memory and Its Extensions�. In: Trans. Storage
9.3 (2013), pp. 7�1. url: http://doi.acm.org/10.1145/2501620.2501
621.
[143] Wm A. Wulf and Sally A. McKee. �Hitting the Memory Wall: Implications
of the Obvious�. In: SIGARCH Comput. Archit. News 23.1 (1995), pp. 20�
24. url: http://doi.acm.org/10.1145/216585.216588.
[144] Yuan Xie. �Modeling, Architecture, and Applications for Emerging Mem-
ory Technologies�. In: Design Test of Computers, IEEE 28.1 (2011), pp. 44�
51.
[145] J. Joshua Yang et al. �Metal oxide memories based on thermochemical and
valence change mechanisms�. In: MRS Bulletin 37.02 (2012), pp. 131�137.
url: http://journals.cambridge.org/article_S0883769411003563.
[146] Jisoo Yang, Dave B. Minturn, and Frank Hady. �When Poll is Better Than
Interrupt�. In: Proceedings of the 10th USENIX Conference on File and
Storage Technologies. FAST'12. San Jose, CA: USENIX Association, 2012,
pp. 3�3. url: http://dl.acm.org/citation.cfm?id=2208461.2208464.
134
[147] Yiying Zhang et al. �Mojim: A Reliable and Highly-Available Non-Volatile
Memory System�. In: ASPLOS '15, March 14�18, 2015, Istanbul, Turkey.
2015.
[148] Ping Zhou et al. �A Durable and Energy E�cient Main Memory Using
Phase Change Memory Technology�. In: Proceedings of the 36th Annual
International Symposium on Computer Architecture. ISCA '09. Austin,
TX, USA: ACM, 2009, pp. 14�23. url: http://doi.acm.org/10.1145/
1555754.1555759.
[149] M. Ye. Zhuravlev et al. �Giant Electroresistance in Ferroelectric Tunnel
Junctions�. In: Phys. Rev. Lett. 94 (24 2005), p. 246802. url: http:
//link.aps.org/doi/10.1103/PhysRevLett.94.246802.
Acknowledgments
The idea about this work has been conceived thanks to Professor De Paoli, who
talked to me about these new memories, which are the subject of this work: I
would like to thank him for both the idea and the trust that he reserved me.
Professor Mariani was asked to help me as advisor of this �nal work: I would
like to thank him for his advices and its helpfulness, which have been valuable.
Great many thanks to all the people that supported me during this tough
time.