Persistent-memory awareness in operating systems

Università degli Studi di Milano BicoccaDipartimento di Informatica, Sistemistica e ComunicazioneCorso di laurea in Informatica

Persistent-memory awareness in

operating systems

Relatore: Flavio De PaoliCorrelatore: Leonardo Mariani

Relazione della prova �nale di:Diego AmbrosiniMatricola 031852

Anno Accademico 2013-2014

Persistent-memory awareness in operating systems.

Copyright© 2015 Diego Ambrosini. Some rights reserved.

http://creativecommons.org/licenses/by/4.0/

This work is licensed under a Creative Com-mons Attribution 4.0 International Licen-se (CC-BY 4.0), except for the followingitems:

- Figure 1.3: courtesy of TDK Corporation, see [76]. © TDK Corporation, allrights reserved.

- Figure 1.4: courtesy of Fujitsu Ltd, see [71]. © Fujitsu Ltd, all rights reserved.

- Figure 1.5: courtesy of Everspin Technologies Inc, see [66]. © 2013 EverspinTechnologies Inc, all rights reserved.

- Figure 1.6:

on the left, Figure 1.6a, courtesy of American Chemical Society, see [120,Page 241, Figure 1]. © 2010 American Chemical Society, all rights reserved.

on the right, Figure 1.6b, © Ovonyx Inc, all rights reserved.

- Figure 1.7: courtesy of John Wiley & Sons Inc, see [137, page 2632, Figure 1].© 2009 John Wiley & Sons Inc, all rights reserved.

- Figure 1.8: see [52, Page 6946, Fig. 1], licensed under CC-BY agreement.© 2013 Owner Societies, some rights reserved.

- Figure 2.4: see [60, page 3, Figure 1], licensed under CC-BY agreement. © 2014Oikawa, some rights reserved.

- Figure A.1: courtesy of McGraw-Hill Education, see [57, page 247, chapter 5,Figure 5.5]. © 2001 The McGraw Companies, all rights reserved.

.





Abstract

Persistence, in relation to computer memories, refers to the ability to retain thedata in time, without the need of any power supply. Until now, persistence hasbeen always supplied by hard disks or Flash memory drives. Regarding memories,persistence has been until now conceived as a slow service, while volatility hasbeen thought in relation to speed, as it happens in DRAM and in SRAM. Such adichotomy represents a bottleneck hard to bypass.

The panorama of memory devices is changing: new memory technologies arebeing currently developed, and are expected to be ready for commercialization inthe next years. These new technologies will o�er features that represent a majorqualitative change: these memories will be fast and persistent.

This work aims to understand how these new technologies will integrate intooperating systems, and to which extent they have the potential to change theircurrent design. Therefore, in the intent to gain this understanding, I have followedthese goals throughout the work:

- to analyze the economical and technological causes that are triggering thesequalitative changes;

- to present the new technologies, along with a classi�cation and a descriptionof their features;

- to analyze the e�ects that these technologies might have on models that arecurrently used in the design of operating systems;

- to present and summarize both the opportunities and the potential issuesthat operating system designers will have to manage in order to use conve-niently such new memory devices;

- to analyze the proposals found in scienti�c literature to exploit these newtechnologies.

Following the structure of the title, the �rst chapter is focused mainly on me-mory devices, whereas the second chapter will be centered on operating systems.

iv

The �rst chapter, initially, tries to grasp the causes of the expected technolo-gical change, beginning with economical observations.

Subsequently, the chapter contains some considerations about how di�erent butcomplementary aspects of the economical relation are urging the semiconductorindustry to �nd new memory technologies, able to satisfy the increasing demandof features and performances.

Afterwards, the paper shifts its focus on current technologies and their features.After a brief summary of each speci�c technology, a short description about theissues shared among all current charge-based technologies follows.

Then, the reader �nds a presentation of each of the new memory technologies,presented following the order of the ITRS taxonomy related to memory devices:�rstly are presented the ones in a prototypical development stage (MRAM, Fe-RAM, PCRAM), then followed by those in an emerging development stage (Fer-roelectric RAM, ReRAM, Mott Memory, Macromolecular and molecular memory).

The second chapter aims, in its �rst part, to understand the extent to whichcurrent funding models (Von Neumann model and the memory hierarchy) arein�uenced by the new technologies. As far as the computational model (fetch-decode-execute) does not change, the validity of the Von Neumann model seemsto hold. Conversely, as far as it concerns the memory hierarchy, the changesmight be extensive: two new layers should be added near to DRAM. After theseconsiderations, some additional observations will be made about how persistenceis just a technological property, and how a speci�c model would be necessary toexplicit how an operating system uses it.

Afterwards, there will be a description of the use of non-volatile memory tech-nologies such as Phase Change RAM inside fast SSDs. Even if this approach isquite traditional, the scienti�c literature explains how faster devices would requirea deep restructuring of the I/O stack. Such a restructuring is required because thecurrent I/O stack has been developed concentrating on functionality, not e�ciency.Fast devices would instead require a high e�ciency.

This second chapter will then present the most appealing use of persistentmemories: as storage class memory, either in replacement of common DRAM,either in tandem with DRAM on the same memory bus. This approach has perse a higher level of complexity, and under the umbrella of SCM there are manyviable declinations of use. Firstly some preliminary observations common with allthe approaches are made. Then, two easier approaches are presented (no-changeand Whole System Persistence). Finally, the approaches that aim to developa persistent-memory aware operating system will be introduced: most of themuses the �le system paradigm to exploit persistence into main memory. The paperproceeds in presenting �rstly some peculiarities of the current I/O path used in the

v

Linux operating systems, remarking how caching already moved persistence intomain memory; afterwards, some other considerations about consistency are made.Those observations then are used to understand the main di�erences betweenstandard I/O in respect with memory operations. After a brief presentation ofsome incomplete approaches proposed by the author, a framework to classify thethoroughness used by the di�erent approaches follows.

The paper continues by reporting the e�orts of the Linux community and thenintroduces each speci�c approach found in literature: Quill, BPFS, PRAMFS,PMFS, SCMFS. Concluding the part about �le system, there will be some remarksabout integration, a mean to use both �le system services and memory servicesfrom the same persistent memory.

Finally, persistent-memory awareness into user applications, along with a briefintroduction of the two main proposals coming from two academic research groupswill be presented.

Abstract (italiano)

Il concetto di �persistenza�, relativamente alle tecnologie di memoria, si riferiscealla capacità di mantenere i dati anche senza la necessità di alcuna alimentazioneelettrica. Sino a oggi, essa è stata prerogativa esclusiva dei dispositivi di memo-rizzazione lenti, quali ad esempio gli hard disk e le memorie Flash. La persistenzaè sempre stata immaginata come una funzionalità intrinsecamente lenta, mentrela volatilità, caratteristica tipica delle memorie DRAM e SRAM, è sempre stataassociata alla loro velocità. Tale dicotomia è tuttora un limite di�cile da aggirare.

Il panorama delle memorie tuttavia sta subendo dei cambiamenti strutturali:nuove tecnologie sono in corso di sviluppo e l'industria dei semiconduttori ha inprogramma di cominciarne la commercializzazione nei prossimi anni. Questi nuovidispositivi avranno delle caratteristiche che rappresenteranno un rilevante cambioqualitativo rispetto alle attuali tecnologie: la più signi�cativa di�erenza è che que-ste memorie saranno veloci e persistenti.

Il presente studio intende proporre un'analisi di come tali nuove tecnologie po-tranno integrarsi nei sistemi operativi, e di quali entità potranno essere le ricadutesulla progettazione degli stessi. Vengono perciò a�rontate:

- un'analisi delle cause economiche e tecnologiche di questi cambiamenti;

- una presentazione di ciascuna delle nuove tecnologie, assieme a una loroclassi�cazione e a una breve valutazione delle loro caratteristiche;

- un'analisi degli e�etti che queste nuove memorie possono avere sui principalimodelli usati per lo sviluppo dei sistemi operativi;

- una panoramica sulle opportunità e sulle problematiche potenziali che glisviluppatori dei sistemi operativi dovrebbero tenere in considerazione persfruttare al meglio tali tecnologie;

- una rassegna delle varie proposte presenti in letteratura per usare al megliole nuove memorie persistenti.

vii

In stretta connessione al titolo, la prima parte del lavoro è incentrata princi-palmente sui nuovi dispositivi di memoria persistente, mentre nella sua secondaparte si focalizza sui sistemi operativi.

Nel primo capitolo si approfondiscono le cause di questi cambiamenti tecnolo-gici, a partire da alcune considerazioni di natura economica; proseguendo, vienemostrato come le di�erenti (seppur complementari) necessità dei produttori di se-miconduttori e dei loro consumatori, stiano progressivamente spingendo la ricercaverso nuove tecnologie di memoria capaci di soddisfare le sempre crescenti richiestedi prestazioni e funzionalità.

L'attenzione viene poi spostata sulle attuali memorie e sulle loro caratteristiche.Dopo una breve descrizione di ogni tecnologia, viene svolta una breve analisi dialcuni dei problemi comuni a tutte le tecnologie di memoria basate sulla caricaelettrica.

Vengono quindi presentate le nuove memorie, seguendo l'ordine proposto dallatassonomia della ITRS (International Technology Roadmap for Semiconductors):prima sono descritti quei dispositivi la cui produzione è già cominciata ma il cuigrado di maturità del prodotto è ancora iniziale (MRAM,FeRAM,PCRAM), men-tre successivamente vengono mostrati quei dispositivi il cui stato di sviluppo èancora alle prime fasi (Ferroelectric RAM, ReRAM, Mott Memory, memorie ma-cromolecolari e molecolari).

Il secondo capitolo, nella sua prima parte, a�ronta le tematiche sull'eventualitàche tali memorie possano intaccare la validità di alcuni modelli fondamentali per losviluppo dei sistemi operativi, quali il modello della macchina di Von Neumann e lagerarchia di memoria. Viene sottolineato come la validità del modello di Von Neu-mann resti immutata. Tuttavia, si evidenzia come tali memorie apportino dellemodi�che importanti alla attuale gerarchia di memoria, la quale vedrebbe l'ag-giunta di due nuovi livelli sotto quello relativo alle memorie DRAM. Dopo questevalutazioni, ne sono proposte di ulteriori rispetto al concetto stesso di persistenza:viene sottolineato come esso sia sostanzialmente una proprietà di alcune tecnolo-gie, e di come sia necessaria una �modellizzazione� che espliciti come il sistemaoperativo intenda utilizzarla.

Successivamente, si fornisce una descrizione di come le memorie persistenti (adesempio le PCRAM) possano essere impiegate per costruire degli SSD più veloci.Sebbene un tale approccio sia piuttosto �conservativo�, viene evidenziato come unasimile soluzione richieda una profonda modi�ca dei meccanismi usati per e�ettuarele operazioni di I/O. L'attuale gestione degli I/O è infatti concentrata sull'o�ertadi numerose funzionalità, mentre la sua e�cienza non è stata curata nel tempo:questo nuovo tipo di memorie tuttavia richiede un'alta e�cienza del software.

viii

Il secondo capitolo procede presentando la modalità d'uso più interessante dellenuove memorie persistenti: nel bus di memoria, in sostituzione alle comuni DRAM,oppure al loro �anco. Un simile approccio ha un grado di complessità superiore, epuò essere declinato in molte di�erenti modalità d'uso.

Vengono svolte alcune osservazioni preliminari, comuni a tutti gli approcci;successivamente vengono presentati quelli più semplici (nessun-cambio e WholeSystem Persistence). In seguito sono introdotti quelli che prevedono la modi�cadei sistemi operativi per realizzare una reale �consapevolezza� d'uso delle memoriepersistenti: la maggior parte di essi sfrutta il paradigma del �le system per otteneretale scopo. Vengono presentati alcuni dettagli dell'attuale gestione dell'I/O inambiente Linux, sottolineando come tramite il caching, la persistenza si è giàspostata dai dispositivi lenti alla memoria principale; vengono fatti poi ulterioriapprofondimenti circa la consistenza dei dati. Queste osservazioni quindi sonousate per comprendere le principali di�erenze tra le operazioni di I/O e quelle dimemoria e per approcciare e�ettivamente le modi�che al sistema operativo. Dopouna breve presentazione di alcune soluzioni non consolidate, sono successivamenteintrodotti degli elementi valutativi per comprendere l'e�cacia e la profondità diquelle analizzate nel seguito.

Il lavoro continua nella presentazione delle proposte documentate sia dalla co-munità degli sviluppatori Linux, sia dalla letteratura scienti�ca: Quill, BPFS,PRAMFS, PMFS, SCMFS. Concludendo la parte riguardante i sistemi operativi,sono fatte delle osservazioni sul concetto di integrazione, ovvero di un metodo perpermettere l'uso condiviso delle le memorie persistenti da parte del kernel e daparte del �le system.

In�ne, il lavoro si conclude toccando l'argomento della �consapevolezza� dellapersistenza nelle applicazioni, per valutare la proposta di permettere alle applica-zioni un uso diretto delle memorie persistenti.

Contents

Copyright notes ii

Abstract iii

Abstract (italiano) vi

Contents ix

List of Figures xi

List of Tables xii

Glossary xiii

Introduction 1

1 Technology 31.1 Generic issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 An economical view . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 A technological view . . . . . . . . . . . . . . . . . . . . . . 14

1.2 Technology � the present . . . . . . . . . . . . . . . . . . . . . . . . 191.2.1 Mechanical devices . . . . . . . . . . . . . . . . . . . . . . . 191.2.2 Charge-based devices . . . . . . . . . . . . . . . . . . . . . . 211.2.3 Limits of charge-based devices . . . . . . . . . . . . . . . . . 26

1.3 Technology � the future . . . . . . . . . . . . . . . . . . . . . . . . 271.3.1 Prototypical . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.3.2 Emerging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.3.3 From the memory cell to memories . . . . . . . . . . . . . . 40

2 Operating Systems 422.1 Reference models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.1.1 The Von Neumann machine . . . . . . . . . . . . . . . . . . 43

x

2.1.2 The �memory� and the memory hierarchy . . . . . . . . . . . 452.1.3 A dynamic view in time . . . . . . . . . . . . . . . . . . . . 532.1.4 Viable architectures . . . . . . . . . . . . . . . . . . . . . . . 54

2.2 Fast SSDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.2.1 Preliminary design choices . . . . . . . . . . . . . . . . . . . 572.2.2 Impact of software I/O stack . . . . . . . . . . . . . . . . . . 60

2.3 Storage Class Memory: operating systems . . . . . . . . . . . . . . 662.3.1 Preliminary observations . . . . . . . . . . . . . . . . . . . . 672.3.2 No changes into operating system . . . . . . . . . . . . . . . 712.3.3 Whole System Persistence . . . . . . . . . . . . . . . . . . . 722.3.4 Persistence awareness in the operating system . . . . . . . . 742.3.5 Adapting current �le systems . . . . . . . . . . . . . . . . . 832.3.6 Persistent-memory �le systems . . . . . . . . . . . . . . . . . 852.3.7 Further steps . . . . . . . . . . . . . . . . . . . . . . . . . . 90

2.4 Storage Class Memory and applications . . . . . . . . . . . . . . . . 94

Conclusions 100

A Asides 105A.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.2 Physics and Semiconductors . . . . . . . . . . . . . . . . . . . . . . 106A.3 Operating systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

B Tables 113

Bibliography 118

Acknowledgments 135

List of Figures

1.1 The ubiquitous memory hierarchy . . . . . . . . . . . . . . . . . . . 151.2 ITRS Memory Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 161.3 The Flash memory cell . . . . . . . . . . . . . . . . . . . . . . . . . 221.4 Ferroelectric crystal bistable behavior . . . . . . . . . . . . . . . . . 291.5 Magnetic Tunnel Junction . . . . . . . . . . . . . . . . . . . . . . . 301.6 Phase Change memory cell . . . . . . . . . . . . . . . . . . . . . . . 321.7 RRAM memory cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.8 ElectroChemical Metallization switching process . . . . . . . . . . . 36

2.1 The Von Neumann model . . . . . . . . . . . . . . . . . . . . . . . 442.2 The memory hierarchy with hints . . . . . . . . . . . . . . . . . . . 472.3 A new memory hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 552.4 The Linux I/O path . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.1 Field e�ect transistor . . . . . . . . . . . . . . . . . . . . . . . . . . 111

List of Tables

B.1 Top 10 technology trends for 2015 . . . . . . . . . . . . . . . . . . . 113B.2 Performance comparison between memories . . . . . . . . . . . . . . 114B.3 4K Transfer times with PCM and other memories . . . . . . . . . . 115B.4 Bus latency comparison . . . . . . . . . . . . . . . . . . . . . . . . 116B.5 HDD speed vs bus theoretical speed . . . . . . . . . . . . . . . . . . 116B.6 Persistence awareness through �le systems . . . . . . . . . . . . . . 117

Glossary

ACID Atomicity, Consistency, Isolation, Durability

ATA AT Attachment (I/O interface standard)

BE Bottom Electrode

BIOS Basic Input Output System

BTRFS B-TREe File System

CB Conductive Bridge

CBRAM Conductive Bridge RAM

DAX Direct Access and XIP

DDR Double Data Rate (DRAM technology)

ECC Error Correcting Code

FLOPS Floating Point Operation Per Second

FTL Flash Translation Layer

FUSE File system in USEr space

GPU Graphics Processor Unit

HMC Hybrid Memory Cube

HRS High Resistance Status

L1 Level 1 cache

L2 Level 2 cache

L3 Level 3 cache

xiv

LRS Low Resistance Status

LVM Logical Volume Manager

MFMIS Metal-Ferroelectric-Metal-Insulator-Semiconductor

MIM Metal-Insulator-Metal

MTD Memory Technology Device (Linux subsystem)

NVM Non-Volatile Memory

PATA Parallel ATA

PCI Peripheral Component Interconnect (I/O interface standard)

PMC Programmable Metallization Cell

RAMFS RAM File System

RS Resistive Switching

SATA Serial ATA

STT-MRAM Spin-Transfer Torque Magnetic RAM

TCM ThermoChemical Mechanism

TE Top Electrode

TER Tunnel Electro-Resistance

TLB Translation Look-aside Bu�er

TMPFS TEMPorary File System

VFS Virtual File System

WAFL Write Anywhere File Layout (�le system)

XFS X File System

XIP eXecute-In-Place

ZFS Z File System

Introduction

Computers are programmed to execute tasks, somehow manipulating data: they

should have a mean for retrieving such data, manipulating it, and �nally store it

back when tasks are completed. Similarly to our human experience, the devices

used in computers to store data and to retrieve it, are called memories.

Although memory naturaliter refers to the ability to remember data in time,

computer science distinguishes between persistent and volatile memories, depend-

ing on the length of that �time�: in fact, some memory devices are de�ned as

volatile, whereas some others are de�ned as persistent. The former class of devices

can't store any data permanently between power-o� events, and henceforth that

data gets lost when power is o�. Conversely, memories belonging to the latter

class feature the ability to �remember� the stored data in time, regardless to the

power status: in these memory devices data �persists� in time1.

Persistence and volatility are just properties of each speci�c memory technol-

ogy, and have always been present in computing devices: for example, punched

cards in the 50s o�ered persistence. Today, it is supplied by hard disks and SSDs.

Conversely, volatility has always been present in the main memory. This scenario

remained almost unchanged throughout the years: since volatile memories were

fast and expensive, whereas persistent ones were slow and cheap, persistence has

been always relegated to I/O devices.

This work �nd its raison d'être on the fact that the semiconductor industry is

preparing to produce and sell new memory technologies whose features are much

di�erent from those seen until now. In time, this particular industry has contin-

ued to enhance its o�er, producing memories gradually better from one generation

1Actually, the evaluating parameter to certify a memory as a persistent memory is the abilityto store data correctly for at least ten years.

2

to the next one: they extensively took advantage of the bene�ts o�ered by tech-

nology scaling, reaching a continuous increase both in memory densities2 and in

their performances. Despite these enhancements, however, the fundamental fea-

tures of memories remained almost the same throughout the years: changes were

almost always quantitative. The new technologies promise instead to be both fast

and persistent, giving to engineers the choice to use them in I/O devices or in

the memory bus: in either ways, such memories would represent e�ectively a ma-

jor qualitative change. Henceforth, the term �persistent-memory� refers to those

technologies that are able to o�er persistence into main memory. Usually, such

memories are also referred to as non-volatile memories (NV). The �rst chapter of

this work will focus on the speci�cities of these new devices.

Operating systems are developed to use conveniently and e�ectively computers

and are carefully designed to use at best each feature o�ered by their hardware.

Volatility of the main memory is probably one of the most important assumptions

that have always in�uenced the development of operating systems. Since these new

technologies could move persistence to main memory, scientists and researchers

are trying to understand what aspects of the operating systems would need to

be modi�ed in order to to adapt conveniently to such a major change. These

modi�cations would then lead to persistence-aware operating systems: the e�orts

currently made by the scienti�c community to manage this transition are the

subject of the second chapter.

2Density, when referring to memory devices, refers to the quantity of bits per area achievedby a given technology. Without changing the area consumed, a better density means highercapacity.

Chapter 1

Technology

Prior to analyze computing models and operating system issues, it could be useful

to present the technological changes that are expected: this �rst part aims to

describe the memory technologies used in current computing systems and presents

those that we could use in the future.

1.1 Generic issues

Here follow some generic considerations about the economical and technical aspects

that can be useful to gain a better understanding about the peculiarities of both

the current memory technologies and the ones that are currently competing to

become the ones used tomorrow. The reason of these observations is to share

with the reader a sort of framework that could help to evaluate the causes that

are conducting to technological changes in memory devices and to understand the

expectations that are relied on them.

1.1.1 An economical view

Throughout the world, each economical activity produces and distributes goods

and services that are then sold to consumers (o�er side). In turn, human needs,

leveraging the demand of goods produced by businesses, are the engine of each

economical activity (demand side).

4

The same relation exists in the semiconductor market: computing is indeed

realized by the semiconductor industry, in turn embodied by a myriad of �rms

that compete to survive, earn money and reach a leading position on the market

(o�er side). Consumers then buy semiconducting devices that satisfy their needs

(demand side).

With the intent of gaining a better insight about the reasons that are triggering

a qualitative change in the memory devices panorama, I will focus brie�y on some

aspects of both the o�er and the demand side of this tight relation.

O�er side � The need to pursue Moore's prediction to exponential

When talking about trends in semiconductor industry, the most frequently cited

one is usually expressed as Moore's law (i.e. the number of transistors in integrated

circuits grows exponentially). I will follow this tradition, believing that its use still

o�ers a useful insight on computer industry itself. To be exact, it has to be said

that semiconductor industry is trying to update Moore's law with a so-called �More

than Moore� forecast [5]. Nonetheless, also the �More than Moore� approach is

still exponential, albeit equivalently.

Moore's law, asserting the exponential growth of computational power, is an

economic conjecture, not strictly an economic �law�. Moore rooted its thoughts

on its experience in the semiconductor industry, and on the fundamental principle

according to which each economical activity has the primary goal of maximizing

its pro�tability. He observed how, in the semiconductor industry, each two years,

its maximum pro�tability point coincided with both:

- A doubling of the transistor number into integrated circuits;

- A corresponding price fall of each transistor (cost per function).

This double result, started from the inception of the semiconductor industry, con-

tinued to occur until nowadays. The former permitted the exponential increase

of computational power, whereas the latter permitted to transform computational

resources from a rare commodity to a widespread consumer good.

From a business perspective, throughout the years the pursuing of Moore's

trend has guaranteed to the computer industry high revenues and the much de-

sired maximization of pro�ts. This has been possible because Moore's funded his

5

thoughts precisely on the pro�tability of the semiconductor industry [107]. Quite

roughly, industry must optimize pro�tability, thus industry must follow Moore's

law, as long as this result is achievable.

This point is much discussed by analysts: achievability is a great concern.

A classic question is if this exponential trend could continue also in the future:

despite the use of the word �law�, this exponential growth is not a guarantee,

it is instead the result of continuous research e�orts, technological advancements,

achieved know-how, and so on. Until now, a series of technologies have guaranteed

to the computer industry the achievability of the exponential growth, but the

question about whether in the future some other new technologies will permit the

same pace is currently open.

These concerns are not new: throughout the years, this question has been raised

many times: despite the concerns, Moore and many other analysts afterwards,

have been right to forecast the continuing of its trend until now. Even if Moore

itself admits that �No exponential is forever�, he however explains that computer

industry is trying to delay its end forever [54]. As a consequence, throughout the

years, the lifecycle of di�erent technologies has been carefully managed to permit

this continuous delay. As depicted into reports from the International Technology

Roadmap for Semiconductors (ITRS, see section A.1), the semiconductor industry

has currently the �rm expectation to be still able to delay the end of exponential

growth for many years: they are expecting that �equivalent Moore� will hold fast

also at least up to an impressive 2025 [83].

Through the years, �achievability� and �delay� were made possible by using two

complementary strategies:

- lengthen as more as it is possible the life of current technologies (at least as

long as production is pro�table);

- promote research e�orts to design, develop and at last produce next-generation

technologies, in order to be ready to step to a better technology when current

ones become obsolete or less pro�table.

These two goals are the two alternative approaches that inspire each research

e�ort made into universities, research laboratories, and industries. Since the 50s,

6

for example, the former strategy has permitted to reduce incredibly the size of

transistors1, reaching today that of just few nanometers. Usually the latter strat-

egy, surely more challenging, does produce new technologies: as an example, hard

disk during the 50s and Flash memories during the 80s were the o�springs of this

strategy.

Demand side � A selection among current trends and needs

As the o�er of the industry must comply with the needs of the customers in order

to be sold, a brief analysis of current needs of semiconductors customer would be

valuable. Often, these needs do in�uence profoundly the o�er, forcing producers

to adapt quickly to new challenges.

As depicted into table B.1, current technology trends are focusing principally

on the following areas:

Computing everywhere: price of electronic devices is continuously decreasing

(following the cost per function), thus facilitating their spread. Without

lingering too much about the important role of computing in almost every

human activity, the use of computing resources is further spreading: today

portable devices like smartphones and tablets are fully �edged computers,

just as laptops or traditional workstations. Wearable devices are just another

step into the direction of �computing everywhere�.

Another aspect of the same trend is the spread of �smart� logic into a plethora

of simpler devices: washing machines, alarm systems, thermostats, home

automation systems, TVs, and many other appliances used every day by

millions of people. Even in much smaller devices computing is being o�ered

as a standard feature: smart cards, networks of sensors, RFID devices, micro-

devices, all have some grade of computational power.

One of the causes of such a widespread use of computing resources lays on

a simple observation: if millions of transistors build up a CPU having both

a relevant computational power and a relevant price, few transistors can be

1i.e. scaling: pass from one technology node to a smaller one, reducing the feature size.

7

used to build far simpler electronic devices still having (reduced) computa-

tional power at a much lower price. This in turn permits to industries to

select the needed balance between cost and functionality of their products.

Internet of Anything: this point follows the preceding one, as people need con-

nectivity along with computing: without it, in a world where information

and interaction is almost in �real time�, computing resources would be use-

less. So, just as computing resources are spreading across an in�nite number

of devices, the same devices are increasingly becoming able to connect to

various networks. Focusing on the Internet, analysts are expecting an expo-

nential growth of devices using it: from the smallest devices to the biggest

data center, connectivity to the Internet is fundamental. As a consequence

of this ceaseless growth, both the network tra�c and the overall amount of

data produced by each device increase.

Data Centers growth continues: just as the falling of the cost per function fa-

cilitates and urges the widespread of cheap and ubiquitous devices, the same

cost reduction permits to concentrate bigger masses of computing power: this

is the case of data centers, as the ones currently built and used by companies

like Microsoft, Google, Amazon and Facebook to provide to their customers

cloud services.

Relating to data centers, analysts are expecting both:

- growth in the size of big data centers, re�ecting the increase of use of

data center services (Xaas2 patterns, social networking, hybrid cloud

patterns, and so on);

- growth in the number of big data centers, as businesses are increasingly

using co-location [68].

Some scienti�c, academic, and government institutions are trying to build

exascale-level supercomputers [12, 67, 70] in order to be able to solve huge

computing tasks (i.e. simulation and modeling). An exascale supercomputer

would be able to reach at least one exa-FLOPS performance. Such e�orts

2Anything (X) as a service

8

too, go in the same direction of more computing power and more storage

volume into data centers.

Finally, �Big Data� too falls into this category of trends and is somehow

related both to bigger data centers and to exascale computing. Computer

Desktop Encyclopedia refers to it as �the massive amounts of data collected

over time that are di�cult to analyze and handle using common database

management tools. The data are analyzed for marketing trends in business

as well as in the �elds of manufacturing, medicine and science. The types

of data include business transactions, e-mail messages, photos, surveillance

videos, activity logs and unstructured text from blogs and social media, as

well as the huge amounts of data that can be collected from sensors of all

varieties� [65]. �Massive amounts of data� need huge databases and huge

data storage, as those found only in big data centers. Similarly, the trends

about predictive analytics and of context-rich systems, taken from table B.1,

fall in this category too.

Security and safety concerns increase: This is both a current trend in tech-

nology and a consequence of the points just described. Since:

- computing devices are spreading;

- connectivity capabilities are spreading in every area where computing

devices are used;

- �elds of application of such computing devices are increasing, touch-

ing very sensitive ones, as those related to human health and medical

science;

- each computing device generates data and the whole amount of data

generated each year is drastically increasing;

- the use of connected services is increasing, with the e�ect of placing a

huge volume of data �on the cloud�,

then the e�orts to protect such devices and their data will be signi�cant, as

those to engineer the safest ways to use them.

9

The ones described above are some of the major technology trends currently no-

ticed by analysts. Anyway, just because each trend is ultimately a speci�c pattern

of use of bare hardware resources, those trends must translate into more speci�c

requests placed to the semiconductor industry: in the end, the semiconductor

industry produces transistors and integrated circuits.

Given the trends just seen, the requests currently focus on these areas:

Speed: people are demanding an ever increasing amount of information retrieved

in real time. The use of web search engines, social networks and cloud

computing platforms is extensive: they expect their queries to be answered

extremely fast. In order to be up to these expectations, information technol-

ogy businesses need technologies allowing extreme speed. People expect to

use fast personal devices too (laptops, tablets, smartphones, and so on).

Computational power and parallelism: people demand computational power,

not only speed. Tasks performed by modern smartphones increase continu-

ously both in number and complexity: hardware must have enough compu-

tational power to ful�ll every request. Moreover, people expect many tasks

to be executed concurrently, henceforth further increasing the demand of

computing performances. Data centers too do not di�erentiate from this

need: they are asked to solve many and concurrent requests of continuously

increasing complexity.

Power E�ciency: while until a few years ago this matter was not of primary

importance, nowadays it is indeed pivotal. Power e�ciency is fundamental

both in the domain of small devices and in that of the biggest realities. While

it is simple to understand the need for power e�ciency in a smartphone or

(in�nitely more) in a modern pacemaker, this issue arises also relating to

data centers. As an example, one of the Google data center is adjacent to a

power plant and its use of electrical power totals the huge �gure of 75 MW

[29]. Moreover, it is reported that data centers account for the 1.3% of

all electricity use worldwide and for the 2% in the U.S. As data centers are

expected to increase both in number and in dimensions, this issue increases

further in its signi�cance.

10

Expectations on memories

As this work is focused on memories, it is important to remark how each of these

requests directly in�uences the features that memories should have to satisfy the

aforecited needs. I referred until now to those needs mainly as something related

to computing devices, treated as a whole. However, while the e�ective use of each

device has its own �eld of employment, each of them performs its speci�c job just

executing computations on some data: they di�er accordingly to the properties

of the data upon which computations are made (i.e. the data managed from a

phone is voice, whereas data managed by a Internet router are IP packets). Data

is indeed the the object of each computation. While theoretically the simplest

devices (for example, sensors) could just manipulate simple data and transmit it

without the need to store and to retrieve it, memory is nonetheless found in almost

all computing devices.

Since most of them must retrieve and store data in order to e�ectively perform

a computation, the speed of each retrieval and each store represents a upper limit

of the total speed of a computation. This observation evidences how the relation

between computations and memory (as a whole, for the time being without making

any distinction) is strict: as the market asks for speed, this request naturally re�ects

on memories.

The same happens with the other requests: both those related to computational

power and power e�ciency do re�ect naturally to memories.

A further fundamental feature considered when evaluating memories, is den-

sity : since their purpose is the maintaining of data through time, a critical aspect

is that of how much data can be contained into each memory chip. Given the

technology trends just presented, this issue is expected to increase in its central-

ity: increasingly complex computations need to manage increasing quantities of

data. The pervasive and increasing presence of computing devices too generates in-

credibly high amounts of data, increasing the mass of data that potentially should

be stored somewhere. Bigger and bigger data centers, along with exascale super-

computing, need to manage, store, retrieve huge amounts of data too. �Big data�

intrinsically refers to the need to manage a huge and ever increasing amount of

data. All these requirements urge the semiconductor industry to develop denser

11

memories to manage this growth.

The need of persistence

A further remark should pertain to persistence: as persistent memories represent

the triggering reason of this work, the question about whether either the semicon-

ductor industry or the market are demanding persistent memories arises naturally.

Persistent-memory devices would result to be just perfect to be employed as a

storage medium. Until now storage has always deeply su�ered from the classical

dichotomy fast-volatile and slow-persistent: storage has always been slow. More-

over, computers, sooner or later, must use a storage layer to save permanently the

data that they manipulated. In this exact point, as computing devices have the

need to use storage, computations pay a high price in latency: storage slowness

represents a upper limit of performances of storage-related operations.

Referring to hard disks, while their capacities have increased more than 10,000-

fold since the 80s, seek time and rotational latency have improved only by a factor

of two, and this trend is expected to continue [112, 115]. So, as the need of data

storage increases, the problems related with storage speed are expected to increase

too. Fast and persistent memories would thus give the opportunity to overcome

these limitations, permitting to storage to become, at last, extremely fast: this

achievement would represent a major innovation in many computing areas.

As an example, a likely area of use among the many ones, would be into data

centers where caching servers are extensively used to speed-up data retrieval from

storage. Persistent memory would represent at least a big simpli�cation opportu-

nity as the caching server would become unnecessary: in turn, this simpli�cation

would represent an important opportunity of savings in cycles and energy and

ultimately, in money.

From an engineering standpoint, moreover, persistent memories would repre-

sent a useful element of choice to engineers, whilst today there is no choice: either

speed or persistence. Consequently, they would have the opportunity to build

devices better suited to the high and frequently changing needs of the market.

A last observation pertains to the reality: as a matter of facts, the most of the

technologies that are currently under extensive research are persistent.

12

Memories - demand and o�er

While the semiconductor industry is currently producing high volumes of DRAM

and Flash memories, as well as high volumes of hard disks, these industries nonethe-

less are steadily preparing a transition towards new technologies: the current ones

su�er from some limitations that are already challenging. Analysts and the semi-

conductor industry itself are expecting that in the next years, those limitations

will become overwhelming, eventually frustrating both the o�er and the demand

side of the economical relation:

The o�er: current memory technologies have always bene�ted from technology

scaling: it su�ces to think since how long technologies as hard disks (1956)

and DRAM (1968) are on the market. However, current memory technologies

are reaching a point where just scaling is no more easily achievable: while

the reasons will be unfolded subsequently, the fact is that the semiconductor

industry is expecting that such technologies (DRAM, Flash, hard disks) will

result increasingly di�cult to produce and to enhance, thus losing appeal.

The demand: current memory technologies are either fast and not dense or not

fast and dense. This fact, although frustrating, has always been perceived as

a matter of fact. However, since currently the market is increasingly demand-

ing both speed and density, current technologies are becoming increasingly

un�t to ful�ll such requirements. Moreover, just considering an increase in

only speed or density, this goal too is becoming increasingly hard to reach.

The issue about power e�ciency is simply loosely considered into current

technologies: unfortunately, each of them is power hungry. As an example,

currently DRAM accounts for the 30% of the total power consumption in a

data center [30].

Summarizing, since current technologies are evaluated to reach soon their limits,

the semiconductor industry is currently searching for new memory technologies

that would allow both the continuation of the exponential growth and the scaling

trend for a long time. Those technologies, that will be subsequently referred to

as prototypical and emerging, have the potential to succeed into this fundamental

goal. Among them, some will prevail against others; some others maybe will be

13

never produced, whereas others will eventually succeed and become mainstream.

In the case of a successful technology, anyway, that technology will have assured

both the maximum achievable pro�tability to the producing businesses (as a mix of

right timing, ease of production, low costs, minimum cost per function, good know-

how, etc.) and the ful�llment of the requests coming from memory consumers,

being them individuals or organizations.

14

1.1.2 A technological view

Leaving the economical considerations behind, here are introduced the taxonomies

used to present current and future technologies, as well as the technical parameters

used to present the peculiarities of each speci�c technology.

From a hypothetical perfect memory to the memory hierarchy

As it happens in any scienti�c �eld, a given resource is evaluated by measuring the

score of some evaluating parameters; thus, if there was a perfect resource, it would

maximize the score of each feature in a free variable manner. However, referring

to the real world, most of the times some of the features are not free variables,

but they are instead dependent on each other: some improvements in some of the

variables are often at the cost of some other variable. Since no perfect resource

does exist, the same happens with memories; if however a perfect memory existed,

it would maximize each of the following features:

- quantity;

- speed;

- low cost;

- low power consumption;

- data retention;

- write endurance.

In real memories however, some of these features (especially quantity, speed and

cost) are mutually dependent and one is usually increased at the cost of the other:

fast memories are expensive and slow memories are cheaper, the quantity depends

from the compromise between speed and cost. The reason of this correlation lays

on the fact that technologies that focus on speed consume a bigger area of chips

than those that focus on quantity. In turn, this in�uences data density per area:

slower technologies achieve better data density than the faster ones, achieving thus

a better cost per bit function.

15

A modern computer (as well as a modern data center) can run and solve the

same class of problems of a Turing machine [36]. In Turing machines memory

exists in form of a in�nite-long tape containing in�nite cells. Even if memory in

that model is very simple, it is indeed in�nite. Somehow relating to that model,

also in modern computing systems, no matter their size, memory (as a whole, not

relating to �memory� concept of the Von Neumann model) can be though as in�-

nite. For examples, disks, tapes, DVDs, can be inde�nitely added, switched and

removed, creating e�ectively, although indirectly, an in�nite memory. The mem-

ory hierarchy, as presented in �gure 1.1, represents visually the consequence of the

issues just presented: needing an in�nite memory and having a limited amount of

money, necessarily computers are engineered to use few fast memories, and a lot

of slower but cheaper memories with the intent of both maximizing performances

and capacity, while minimizing costs.

Figure 1.1: The ubiquitous memory hierarchy

Current memory technologies will be presented subsequently following the

structure of �gure 1.1, starting from the base of the pyramid, reaching gradu-

ally the top. Di�erently, the new ones will be presented without referring to the

memory hierarchy pyramid: the taxonomy drafted by ITRS will be used instead,

16

as represented into �gure 1.2, believing that it is more helpful to classify the tech-

nologies that will be soon presented.

Figure 1.2: ITRS Memory Taxonomy

The memory cell and its performance parameters

Electronic technologies that do not use mechanical moving parts are de�ned com-

monly as solid state technologies. In the context of solid state memory technologies

a key concept is that of a �memory cell�. A memory cell is the smallest functional

unit of a memory used to access, store and change the information bit, encoded

as a zero or a one. Each memory cell contains:

- The storage medium and its switching mechanics: where the bit is

encoded and the mechanism to execute the switch between 0 and 1.

- The access mechanism: the mechanism to select the correct memory cell.

This concept could be used, although less usefully, also in the case of mechanical

technologies: in this context, the de�nition of memory cell refers to the storage

medium only, since in mechanical technologies usually both the switching and the

access mechanics are shared among the whole set of memory cells.

17

Referring thus to solid state memory technologies, each di�erent technology

have a speci�c memory cell, with speci�c performances. The parameters usually

measured to compare a speci�c technology with the others are:

- feature size (length: F - µm or nm);

- cell area (measured in square F 2);

- read latency (time: µs or ns);

- write latency (time: µs or ns);

- write endurance (scalar: maximum write cycles);

- data retention (time);

- write voltage (V);

- read voltage (V);

- write Energy (pJ/bit or fJ/bit);

- productions process (CMOS, SOI, others);

- con�guration (3-terminal or 2-terminal);

- scalability.

Inside the memory cell

Despite the fact that each speci�c memory technology has its own memory cell

details, di�erent technologies can share similar approaches in some engineering as-

pects : the following insights could be useful to the reader in order to better follow

the terms and the descriptions of each speci�c technology found subsequently.

The storage unit and the write/read logic: the storage unit is responsible of

the e�ective data storage and retention. Each di�erent technology de�nes,

along with the storage unit, the mechanics (and the logic) to write and to

read the data. There are two fundamental methods used to store data in

solid-state memory technologies:

18

- As 3-terminal storage units: this approach uses modi�ed Field Ef-

fect Transistors (FETs), to store data3. Data is stored modifying (rais-

ing or lowering) the threshold voltage of the transistor, thus in�uencing

the current passage between source and drain electrodes, while the po-

tential di�erence applied to the gate electrode is maintained �xed. In

such devices, reading is usually performed by sensing the current �ow at

the drain electrode, applying a potential di�erence to both the source

and the control gate: depending on the value of the threshold volt-

age previously set (and thus depending on the value of the bit stored),

current �ow is permitted or avoided.

- As 2-terminal storage units: technologies using this approach usu-

ally build each storage unit as a stack of one or more di�erent materials

enclosed between two electrodes (terminals). Storage units built this

way usually have a resistive approach: data is read sensing if a probe

current passes (if SLC) or sensing how much current passes (if MLC)

between the two electrodes. The writing mechanics however, depends

on speci�c technologies: some technologies must execute the write pro-

cess using additional logic (i.e. standard MRAM, see section 1.3.1),

whereas some newer ones feature the writing mechanics directly em-

bedded into the storage units (i.e. all ReRAM technologies, see section

1.3.2). In particular, this last class of newer devices presents a con-

�guration similar to that of fundamental electric devices like resistors,

capacitors and inductors, all featuring the 2-terminal approach, and are

sometimes referred to as �memristors� (see section A.2).

Sometimes, a speci�c physical property can be used to implement devices

in both the 2-terminal and the 3-terminal con�gurations, as it is the case of

memory cells built using ferroelectric properties (FeFETs: 3-terminal, FTJs:

2-terminal).

The access control: a memory uses memory cells as building blocks, but there

must be a way to select single memory cells in order to execute a read or a

3FETs have a source, a drain and a gate electrode; see section A.2

19

write. Usually, two alternative approaches can be used:

- Active matrix: a transistor is used to access the storage unit. These

technologies are usually referred as 1T-XX technologies where 1T stands

for �1 transistor� (the controlling transistor) and the �XX� parts depends

on a speci�c technology. This is the case for DRAM, a 1T-1C technology

(C stands for capacitor);

- Passive matrix (crossbar): the storage unit is accessed almost di-

rectly, with at most the only indirection of a non-linear element, used

to avoid half-select problems.

Destructive vs non-destructive reads: Some technologies su�er from the de-

struction of data contained into memory cells when reading is performed:

these type of read operation are called �destructive reads�. Such technologies

usually have additional logic for executing a re-write of the same data after

the read operation to prevent the data loss. Obviously this is a issue that en-

gineers would avoid where possible, thus preferring those technologies whose

read operations are non-destructive. Needless to say, writes are intrinsically

always destructive.

1.2 Technology � the present

The memory technologies that are currently mainstream and can be used by en-

gineers into computing devices will now be presented. For each description the

reader �nds a brief explication about the speci�cities of each technology, as well

as a short dissertation about the claims of the scienti�c community about their

limitations.

1.2.1 Mechanical devices

Magnetic tapes, CDROM, DVD, BLU-DISC

Even if not very interesting in the context of this work, it would be worthwhile to

cite the presence of these devices as they represent the lower part of the memory

20

hierarchy. These devices are speci�cally engineered to store data permanently

through I/O operations at the lower cost possible. At this level of the memory

hierarchy performances are not as important as storage capacity.

Hard disks

Data is stored persistently into a magnetic layer applied on the surface of one or

many rotating discs. This technology is similar, mutatis mutandis, to the one used

in vinyl music discs: a moving head moves following the tangent of concentric

rings to read stored data. Data is retrieved (or written) by the head sensing the

magnetic �eld coded into the magnetic layer. Data transfers are I/O bound, data

is stored and accessed in blocks, reading is not destructive.

Even if �rst hard disks appeared in 1956 (IBM RAMAC [79]), this technology

is still alive and vital: it o�ers high storage density, long data retention period,

low price per bit. Hard disks often are equipped with some amount of cache

memory, needed to raise their performances. This same e�ort to improve the

performances of hard disk has led to �Hybrid Hard Disks�, i.e. fast performing

hard disk with a Flash cache [42]. These products are attempting to approach

Flash-like performances at the price of a common hard disk.

The memory hierarchy pyramid describes at a glance both the advantages

of hard disks and their most noticeable shortcoming: high densities at the cost

of slow speed. Despite the age of this technology, it seems that scaling is still

viable as the densities per square inch are still increasing: while current ones are

about one TB per platter, newer technologies are promising to be even higher

[97, 88]. Unfortunately, hard disks have always su�ered from slow transfer rates

and very high (milliseconds) latencies; moreover, these undesirable aspects cannot

be bypassed without workarounds: the mechanical nature of the hard disk is an

intrinsic limit (the head has to physically reach the position to execute reads and

writes). The mechanical nature of HDDs has other drawbacks: the rotation of the

platters, the move of the head on disks, the read or the write process can all be

a source of failures, either due to a physical breakage, or to external sources as

vibrations and accidental falls [133]. As for power e�ciency, hard disks consume

a high amount of electrical power: consumption could be from around 1.5W to

21

around 9W each4. Consequently, referring to the needs previously described, hard

disk do not seem to comply with those high claims: while they seem to be just

perfect for long term and high volume data retention, they seem to lose appeal

when high throughput, low latencies and power e�ciency are needed.

1.2.2 Charge-based devices

This class of devices, all solid-state, use the electric charge of electrons to store bits

into the memory cells: hence their name. In each description are summarized both

the technological aspects that enable the bit storage and the switching mechanics,

along with a short summary of the speci�c issues of each technologies. Common

issues of this class of devices are instead speci�cally treated afterwards.

NAND and NOR Flash memories

This technology is based on the ability of building �enhanced� �eld e�ect transistors

to achieve the desired behavior, as it happens in EPROMs and in EEPROMs [50,

pp. 9-4]. Memory cells using this technology feature a persistent storage achieved

through a 3-terminal con�guration: data is stored modulating the threshold volt-

age of the �enhanced FET� transistor. These transistors are similar to standard

FETs, except for the fact that there are two gates instead of just one (control

gate and �oating gate). The control gate (the upper one) is the same as in FET

technology, whereas the �oating gate (the lower one) acts as an electron vessel:

being made of a conductive material, it can contain ��oating� electrons thanks to

the insulator layer that encloses it (oxide). The threshold value (and hence the

contained data) is modi�ed ��lling� or �emptying� the �oating gate with electrons5.

In a SLC con�guration, a memory cell with high threshold voltage would be in a

�programmed� status (0, non conducting, vessel full); vice versa, if the threshold

was low, it would be in a �erased� status (1, conducting, vessel empty). Reading

is performed as previously described about 3-terminal memory cells, and the read

process is not destructive, as it does not imply neither the erasing neither the

4Consumption of, respectively, a consumer low power HDD [109], and of a enterprise HDD[105].

5The ��lling� is permitted by Fowler�Nordheim Tunneling or Hot Carrier Injection, whereasusually �emptying� is achieved through Fowler�Nordheim Tunneling. See [50].

22

programming processes. Flash technology is con�gured as a �1T technology�, as

each memory cell has exactly one transistor.

Figure 1.3: The Flash memory cell. © TDK Corporation

Depending on how the Flash memory cells are linked together, NAND Flash

or NOR Flash are produced. Whichever the case, before being programmed each

cell must be in the erased state. In both con�gurations, erasing operation is slow

(milliseconds) and expensive (high power) and is performed in groups of many

bytes, called �erasesize�. Reading is not a destructive operation, NOR �ash can be

either I/O bound or memory bound, NAND is only I/O bound.

Research e�orts in this technologies are extensive, even if this technology can

be considered as mature: the use of a modi�ed �eld e�ect transistor as memory

devices dates back to papers appeared in 1967 [138]. Those studies conducted

thereafter to the development of EPROM and EEPROM, whose main principles

are further exploited in Flash technology, conceived by Dr. Masuoka's researches

in the �rst years of the 80s [49, 124]: NOR �ash was o�cially presented in 1984,

NAND in 1987 [98].

Flash memories can be used to build a large number of memory devices: when

used to build SSDs, performances are surely better than those of hard disks. La-

tency of Flash memory cells is between the order of tens and that of hundreds of

microseconds: in particular Flash SSDs o�er a signi�cant speedup in comparison

with common hard disks, usually o�ering better latencies and higher throughput;

their power e�ciency improvement on hard disk is yet to be veri�ed: while com-

mercial documentation claims without doubts a sure power bene�t over HDDs,

23

e�ective �gures found in datasheets are indeed less clear6. Flash memories, being

solid-state, do not have any mechanical part moving into: this avoids mechan-

ical failures. Surely Flash memories represent a �rst approach in the direction

requested by customer's demands. However, Flash memories are far from being

perfect. Common issues are:

Cell wearing, low endurance: currently in the order of 104 ∼ 105 write cycles.

This problem roots both in the write/erase process and on the materials used

to build the Flash transistor: both erasing and programming are achieved

using high energies in order to force the electron passage through the thin ox-

ide insulating layer (oxide tunnel). Such process gradually produces damages

into the oxide layer, eventually causing the loss of its insulating properties:

since the �oating gate is made of a conductive material, the damage cause

the loss of the contained electrons [61, 13]. In order to guarantee a long

life to devices employing Flash memories, wear leveling strategies must be

adopted in order to distribute writes and erases across the whole set of cells

and avoid the concentration of these operations on few ones.

Low reliability: NAND Flash con�guration is the most used into SSDs princi-

pally for the better achievable density of this con�guration. This, anyway,

has a cost in reliability: NAND Flash devices su�er from both read disturbs

and read disturbs when reading and writing; moreover, this type of devices

exits from factories without the guarantee of having all the cells in optimal

status. For these reasons, ECC functions are needed when using Flash de-

vices, especially when NAND con�guration is used. Such functions usually

increase the complexity of either the hardware or the software, thus they

have a cost [102].

Complex writing mechanism: Flash memories have to follow a much involved

mechanics: in order to be programmed, each cell must be in erased state.

Moreover, erases are expensive and erase sizes are quite important (8K to

32K): this forces each SSD to have large reserves of erased blocks in order

6For example, when comparing a consumer HDD with a consumer SSD both built in 2.5inches sizes, it is apparent the gain of SSD when idle or in standby mode, whereas in read orwrite mode, the SSD can consume more energy (∼ 3W vs ∼ 1.75W, see [96, 109].)

24

to speedup writes. Moreover, since NAND Flash is I/O bound, transfers

are made at least per blocks, thus increasing the ine�ciencies when small

amounts of data are transferred even in presence of small changes.

All these issues are usually managed into software layers (either in the operating

system or in SSD �rmware) called Flash Translation Layers (FTLs), whose job is

to hide such problems to the computer and to e�ciently manage the needed wear

leveling, error correction and, generally, failure avoidance. However, while these

layers do e�ciently succeed in simulating a standard hard disk, all this complexity

is expensive and could easily be a potential source of performance loss.

Dynamic RAM

DRAM technology is the main technology currently used by computing systems

(from the smaller smartphone to the biggest supercomputer) to implement what

Von Neumann called the �memory� in his well-known model. Dynamic RAM

memory cell currently consists in one control transistor and one capacitor (1T-1C

technology). These cells are then organized in grids. When line is opened through

the transistor, charge is free to exit the capacitor if it is charged (meaning that its

value is 1, 0 if the capacitor is empty). Due to this design, the reading operation

is destructive: destruction of data is anyway avoided with a re-write in case the

cell was charged at cost of some additional complexity. Again, due to the just

explained design, this memory is volatile because capacitors discharge fast: the

memory cells need to be refreshed to retain their data, i.e. the capacitors have to

be recharged periodically (typical average time is 64ns each capacitor): when the

computer is o�, all data is lost. Data read and write operations are fast (memory

latency in the nanosecond range, which means one or two order of magnitude

less than the CPU speed) and each byte is directly addressed from the processor.

Surprisingly, a memory composed of capacitors that needed to be continuously

refreshed was present in a machine called �Aquarius� built in Bletchley Park (UK)

during World War II [45]. However, the 1T-1C design dates back to 1968 when Dr.

Robert Dennard registered US patent no. 3,387,286 [24] and improved previous

design that required more components.

The limits of DRAM are to ascribe principally to the low density and to the

25

high energetic cost, as explained before. The scienti�c literature claims that as

density increases, also total refresh times of each chip do increase, causing high

overheads in both latency and bandwidth [37]. Another issue related to DRAM is

the fact that the growth in density can't stay ahead with that of CPU cores: while

in the last years CPU cores doubled every two years, DRAM density doubled only

every three years, thus losing memory per core ratio [55]. Even if the speed of

DRAM is high, its latency remains a bottleneck for the even higher speed of the

CPU: the improvements in latencies in time have been minimal (only 20% in ten

years), and this slight improvement trend is expected to remain the same also in

the future. Just when the demand for speed is so pressing, this technology seems

to have di�culties to sustain the needed performances asked by current processors.

SRAM

Static RAM is the fastest and most expensive memory used outside the CPU cores:

usually SRAM is located on the same die of the CPU, anyway as near as possible

to its cores and it is used mostly to serve as hardware data and instruction cache.

It is frequent to �nd SRAM in other caches, as those of hard disks, routers, and

other devices. These memories are volatile, although the design does not require

refreshing as it happens to DRAM. Classical design is 6T7; this is a major issue

since it increases the cost of a single memory cell and limits the density and the

scalability.

CPU Registers

CPU registers represent the highest level of the memory hierarchy. These mem-

ories are completely integrated into the CPU core, are used at full speed and

are set aside from the standard addressing space: registers are directly accessed

by name. Usually registers are used to store temporary data between load and

store operations. Since they are fully integrated into CPU cores, their number

is very limited: each more register means less space for computational functions.

Information stored into registers is lost when power is o�.

7To build a SRAM cell are used 6 transistors. See [72].

26

1.2.3 Limits of charge-based devices

Besides the speci�c limits of each technology just described, charge-based tech-

nologies share some common issues: the most important problem currently faced

by researchers and engineers is the technical concern related to scaling: since cur-

rently, feature sizes have reached 28nm in DRAM and 16nm in NAND Flash cells,

researchers are concerned about how long still smaller sizes are achievable [94,

108].

In fact, at these little sizes:

- memory cells are very close to each other: the risk of cell to cell interference

is high;

- the total number of electrons that can be e�ectively stored either in a capaci-

tor or in a �oating gate is little: if electrons are too few, current technologies

cannot sense correctly their level. Moreover, in case of capacitors, small

capacitors mean small charge and this, in turn, means higher refresh rates;

- each functional element of the memory cell is very little and very thin, thus

the risk of electron leakage is higher.

Semiconductor industry is currently trying to pursue the goal of extending as

long as possible the life of these technologies. Such e�orts have many common

aspects between charge based memories; the most used approaches are:

- Better materials: this approach permits to obtain better properties while

maintaining unchanged the cell design. This is for example the case of high-

k8 materials and production processes. A similar approach is used in Flash

memories, when the �oating gate conductor is substituted with a insulator

(di�erent from the oxide) able to �trap� electrons inside it: this particular

approach permits a better resilience to tunnel oxide damages, as well as a

better isolation against cell-to-cell interferences.

- 3D-Restructuring: this approach is indeed a major modi�cation to the

structure of each memory cell, even if the functional logic of each technol-

8�k� is the dielectric constant. High values permit better insulation and electrical properties[101].

27

ogy does not change. 3D-stacked semiconductors are object of extensive

research e�orts, as this particular approach would permit an important life-

cycle lengthening of both DRAM and Flash technologies. The main ad-

vantage of 3D structures is that memory cells can be stacked up vertically.

Vertical stacking permits both an optimized use of the chip area (density

optimization) and it allows higher distances between memory cells (in order

to avoid cell-to-cell interference) [108]. Currently, Samsung is already pro-

ducing SSDs using 3D vertically-stacked Flash memory cells that use charge

trap technology instead of the classic �oating gate [96]. Referring to DRAM,

3D-restructuring is currently applied into a prototypical technology called

Hybrid Memory Cube [78]. The promises of this particular technology are

high: speed and performances near to those of the CPU, high power e�ciency

and a much better density.

Despite these extensive e�orts to delay the retirement of charge-based technolo-

gies, the semiconductor industry has nonetheless the �rm expectation that sooner

or later, these technologies will become too di�cult to produce and enhance.

Following this expectation, the technologies subsequently presented are the

object of lots of research e�orts that aim to obtain, �nally, products able to be

both successfully used as a replacement of current technologies and to respond

completely to the continuously higher expectations of the market for the years to

come.

1.3 Technology � the future

In the next paragraphs are presented the technologies that will compete to become

the ones used in next-generation memory hierarchy. Before delving into details of

each speci�c technology, most of these technologies share some common features:

- Charge-based approach is being dismissed: instead, the resistive ap-

proach is preferred. Technologies like DRAM and Flash memory use �elec-

trons containers� (capacitors or any material able to retain electrons) to

encode the information bit into the memory cell. However, this approach

has the disadvantages just shown. Research is thus preferring the resistive

28

approach: the information bit is encoded as a property of a speci�c switching

material: high resistance or low resistance status. There is therefore no need

to store electrical charge: electrons are used just to check the memory cell

status. This approach permits better performances and better scalability.

- Persistence: each of these new technologies is not volatile; they are able to

remember data when power is o�, as it happens for SSDs or HDDs. Some

of them still cannot guarantee a long retention time, but these are prob-

lems related with the early engineering and development stages: these new

memories are engineered to be persistent.

- These technologies use the �RAM� word into their name: this is in-

deed a clue of the fact that these technologies are both approaching the speed

of RAM (sub microsecond speeds) and seem just perfect to be implemented

into byte addressable memories, as it happens in common DRAM.

- Density is expected to be higher: than that of DRAM. Some of these

technologies, especially those in prototypical stages of development, have

some problems to achieve such a goal, as their area in square features is too

high. Generally, however, these new technologies promise a better density

(featuring the very reduced area of 4F 2, see table B.2).

- Endurance is limited: most of these technologies have a limited endurance

respect to that of current DRAM, but better than Flash, as the most wear-

able technology features at least 109 write cycles.

- R/W asymmetry: the most of these new technologies feature di�erent

timings between read (faster) and write (slower) operations, as it happens

with Flash memories.

1.3.1 Prototypical

Technologies in a prototypical development stage are already being commercially

produced even if technology is not mature. Production volumes are low and, as

a consequence, prices are high. Prototypical products are often used in niche

markets, and usually su�er from the fact that some of the evaluating features have

29

still not reached the target levels. The research e�orts undertaken to obtain better

performances, densities and, more generally, a product ready to be produced at

high volumes, are usually extensive.

Ferroelectric RAM (still not resistive)

FeRAM, or FRAM, uses ferroelectricity (see section A.2) to store the information

bit into a ferroelectric capacitor, able to remember its bistable polarization status

in time. The capacitor acts as a dipole, whose polarization is changed under

the e�ect of an electrical �eld. One of the two polarization status is logically

associated to a �0� value, whereas the other is given a logical meaning of a �1�.

As a consequence of this switching mechanism, this memory still doesn't use the

resistive approach. This technology uses the active matrix con�guration to access

Figure 1.4: Ferroelectric crystal bistable behavior. © Fujitsu Ltd

the capacitor through a transistor: it is a one transistor � one ferroelectric capacitor

(1T-1FC) architecture.

Similar to the mechanism used in DRAMs, reading is destructive: to my knowl-

edge, in literature the process of reading is never described neatly, except from the

fact that its destructiveness is remarked.

Current issues related to this technology are bound to scalability and the used

production process: as remarked in table B.2, feature sizes are still at the 180nm

30

node, and the area of 22F 2 is such high that scaling becomes problematic. An-

other problem is related to the cell wearing, since the process of switching in time

degrades the cell's performances (dipole relaxation).

Currently this memory is used in some embedded computing devices.

MRAM and STT-MRAM

This technology, also referred as Magnetic RAM, uses memory cells built in a

one transistor � one magnetic tunnel junction (1T-1MTJ) architecture (see sec-

tion A.2). The MTJ element acts as a resistive switch (as the ones shown subse-

quently in ReRAM, PCRAM, FTJ RAM), encoding a bit as a di�erent resistance

status.

Figure 1.5: Magnetic Tunnel Junction. © Everspin Technologies Inc.

The technology to program MTJ elements changes between classic MRAM de-

sign and STT-MRAM design. In classic design are used magnetic �elds induced

by currents passing into nanowires [66]. Otherwise, STT-MRAMs use the Spin

Transfer Torque e�ect to change the magnetic polarization of the free layer [4,

119].

The memory cell is read sensing the resistivity of the MTJ element, detecting

whether current passes: the read operation is thus not destructive since the status

of the free layer does not change upon reading. Even if some of the ideas used in

MRAM date to those developed in magnetic core memory in the 40s and 50s [86],

MRAM is based on new researches in physics undertaken from the 1960 about

what is now called spintronics [63, 139], culminated in 1988 with the discovery

of the Giant Magneto-Resistance by Albert Fert and Peter Grünberg [100]. First

patent of MRAM technology dates to 1994 (IBM - US patent no. 5,343,422).

31

Phase Change RAM (PCRAM � PCM)

During the 50s, Dr. B. T. Kolomiets conducted his research work on chalcogenide

glasses, verifying their ability to change from an amorphous state to a crystalline

state under the e�ect of heating [43]. The crystalline state presents low electrical

resistance and high light re�ectivity, while the amorphous state presents the op-

posite features: high electrical resistance and low light re�ectivity. The ability to

change state as just depicted was subsequently referred as �phase change�. Phase

Change RAM indeed exploits the resistivity di�erence between the two states as-

sumed by a phase change material (the most commonly used compound is GeSbTe,

or GST) to encode the logic �0� and �1� levels. Each memory cell in PCRAM con-

sists in a chalcogenide layer enclosed between two electrodes. The top electrode is

directly attached to the calchogenide layer, whereas the bottom electrode is linked

to the phase change layer via a small conductive interconnect surrounded by an

insulator. The conductive interconnect is responsible of the chalcogenide status

switch by means of Joule e�ect, functioning as a heater. Each memory cell is

programmed using either a fast pulse of a high current to �melt� the crystalline

status to the amorphous one, or using a longer pulse of lower current to induce the

growth of crystals, modulating the use of the heater. Cell reading is performed

using a low probe current, sensing its �ow, which depends on the cell's resistivity.

As a consequence of this design, reading is not destructive. Phase Change RAM

represents a very promising, yet prototypical technology. First articles about elec-

trical switching of phase change materials date to 1968 [113] and a �rst prototype

of a phase change memory was presented by Intel in 1970 [94]. However, since it

was expensive and power greedy, this technology starved during the 70s and 80s,

until current phase change design was developed, exploiting the research e�orts

made during 80s and 90s in optical storage: phase change materials are the key

to obtain writability and re-writability of CDs and DVDs [131]. Even if phase

change technology still has to be improved to reach expected performances and

e�ective pro�tability, working prototypes have already been produced by Micron,

Samsung, IBM; phase change memories are used also in some very high performing

PCI Express SSDs [1].

32

(a) PCM switching mechanics© Am. Chem. Soc.

(b) PCM �mushroom� type© Ovonyx Inc

Figure 1.6: Phase Change memory cell

1.3.2 Emerging

Besides PCRAM, MRAM and FeRAM, new emerging technologies are being cur-

rently developed, the most known among them is ReRAM. Those emerging tech-

nologies (brie�y depicted afterwards) represent a real jump in sophistication re-

spect to the ones just explained. Obviously, knowledge and know-how are cu-

mulative, and new technologies are developed only thanks to the e�orts made

in preceding times; nonetheless, the quality and depth of scienti�c and technical

knowledge needed to reach the goal of commercial (and pro�table) products using

the technologies later described, is impressive: chemical mastery, quantum physics,

nano-ionics, materials science, nano and sub-nano scale production processes, etc.

Except for ReRAM, other technologies are still embryonal.

Ferroelectric Memory

ITRS inserts into this emerging memory category two di�erent technologies, both

based on ferroelectricity: Ferroelectric FET technology and Ferroelectric polariza-

tion ReRAM (Ferroelectric Tunnel Junction technology � FTJ).

Ferroelectric FET or FeFETs, resemble to standard MOSFETs using a ferro-

33

electric layer, instead of an oxide layer, between the gate electrode and the

silicon surface [27]. One polarization status permits current passage between

source and drain electrodes, whereas the other does not. A memory based

on FeFETs would have memory cells very similar to those of Flash technol-

ogy since it would also be a 1T technology. Writing would be achieved by

changing the polarity applied on the control gate, whereas reading would be

sensed probing the current passage between the source and the drain. The

reading process therefore would be not destructive.

Real FeFETs usually employ an insulating layer between the ferroelectric

and the semiconductor in order to achieve better performances. This need

has led to various con�gurations, as MFIS or MFMIS9.

First attempts to develop FeFET technology were made during late 50s and

a �rst patent using this approach was issued in 195710. However, more than

50 years after the �rst patent, this technology has proved to be de�nitely

un-trivial, su�ering from some still unaddressed problems, the biggest of

which being the data retention loss. New approaches to this technology are

investigating new organic ferroelectric materials.

If improved, this technology would be interesting when applied to a DRAM-

like memory: such a solution would be much more scalable, since it would

not need a capacitor, hence reducing the minimum feature size.

Ferroelectric Polarization ReRAM � FTJ ReRAM uses memory con�gu-

ration similar to commonly called ReRAM, whereas the memory cell uses

FTJ technology to encode information bits persistently and to permit non-

destructive reads (in contrast to classic FeRAM technology, see section A.2).

As said into 2013 ITRS report on emerging memory devices, �Although ear-

lier attempts11date back to 1970, the �rst demonstrations of TER came in

2009� [84]. This memory technology is thus in a very early stage of develop-

9Respectively, Metal-Ferroelectric-Insulator-Semiconductor and Metal-Ferroelectric-Metal-Insulator-Semiconductor.

10US patents 2,791,758 and 2,791,759 and 2,791,760 and 2,791,761.11To model tunnel electro-resistance (TER), the mechanism used into FTJ (ed.)

34

ment and literature papers re�ects this. No industry player is yet producing

such a technology, even when referring to prototypes.

Resistive RAM (ReRAM) � Redox memories

The �resistive� term suggests that this technology, as for the case of MRAM,

uses resistivity to encode data into memory cells: each cell has a RS (Resistive

Switching) element, responsible of the e�ective storage of data. This element, in a

SLC (single level cell) approach, would encode a �zero� as a non-conducting state

(high resistance status � HRS � RESET status), whereas a �one� would be encoded

as a conducting state (low resistance � LRS � SET status). Each memory cell acts

as a building block of bigger memories, usually displaced in a grids, as it happens

for DRAM.

Figure 1.7: RRAM cell. © John Wiley & Sons

Resistive RAM (or Redox memory, as clas-

si�ed by ITRS) indeed is a generic term rep-

resenting a series of di�erent strategies adopted

to induce the resistance switching of the RS ele-

ment by means of chemical reactions (nanoionic

reduction-oxidation e�ects) [137].

Whatever the speci�c switching process,

each RS element is generally built as a

capacitor-like MIM (Metal-Insulator-Metal)

structure, composed of an insulating or resis-

tive material �I� (usually a thin �lm oxide) sandwiched between two (possibly

di�erent) electron conductors �M�. �M�s are sometimes referred to as �top elec-

trode� (TE) and �bottom electrode� (BE). These RS elements can be electrically

switched between at least two di�erent resistance states, usually after an initial

electroforming cycle, which is required to activate the switching property [145].

Resistive RAM represents a class of very promising, yet emerging, technologies

for the next-generation memory. Expectations are very high, even when compared

to other prototypical technologies: high endurance, long retention, extreme speed

and reliability, very low power consumption, high scalability with relative ease of

production. Moreover, scientists and researchers claim high degree of improve-

35

ments are achievable.

Resistive RAM technologies are tightly linked to research e�orts made on thin

�lm oxides especially during the 60s and 70s. First patent about a memory tech-

nology using a �memory array of cells containing bistable switchable resistors�

dates back to 197312. Anyway, as pointed out in Kim's paper [40], that technol-

ogy starved until late 90s and 2000, when a newer approach was proposed and

undertaken. Current scienti�c literature about these technologies well describes

the emerging nature of ReRAM: most of the papers focus on the �core� research

side. Recurring topics are the need to model correctly the atomic behavior of the

MIM compound during the resistive switch, the need to understand thoroughly the

interactions between the materials used into the RS, the need to develop better

laboratory tools and techniques to analyze precisely the resistive switch mechan-

ics. On the other hand, scarce are the divulgative papers and few are the papers

dedicated to implementors and computer scientists. Nonetheless, the most part of

the big electronic industry players are currently developing prototypical memories

based on redox memory cells, and some startups companies are holding new in-

tellectual properties based on ReRAM projects. Even if the pro�tability and the

related commercialization of these products seems to be quite far to come, reliable

samples have already been produced using current production processes [77].

Before presenting each speci�c switching mechanism, it could be useful to spec-

ify the meaning of the following terms as they are used frequently in subsequent

descriptions:

- �lamentary / non-�lamentary: in this context, �lamentary means that

the change in resistivity into the RS element is achieved creating a �lamen-

tary conductive link between the two conductors (�M�), i.e. the passage of

current is not uniform into the RS element and the most part of the resistive

material continues to act as an insulator. Conversely, non-�lamentary mech-

anisms achieve the resistive switch uniformly into the resistive material (�I�),

i.e. the current passes through the whole volume of the resistive material

[84, section 4.1.2.1];

- unipolar / bipolar: suggests whether the speci�c mechanism uses either

12U.S. patent 3,761,896.

36

one �xed polarity between the two electrodes or the polarity is inverted

among them in order to switch the cell status. In case of unipolarity the

current will have to be somehow modulated in order to produce the status

switch. In case of bipolarity the mechanism is simpler: polarity inversion

causes the switch between states [84, section 4.1.2.1].

There are four di�erent approaches to redox memories: each one use one speci�c

combination of the alternative features just presented.

ElectroChemical Metallization: mechanism (ECM), sometimes referred as Elec-

trochemical Metallization Bridge, �Conductive Bridge� (CB) or �Programmable

Metallization Cell� (PMC). This technology uses the �lamentary bipolar ap-

proach: one of the two electrodes is electrochemically active, whereas the

other electrode is electrochemically inert. The �I� material is a solid elec-

trolyte, allowing the movement of charged ions towards the electrodes. The

change in resistance depends on the creation of a conductive path between

the electrodes, under the e�ect of an electric �eld among them.

Figure 1.8: ElectroChemical Metallization switching process© 2013 Owner Societies

These are the reactions veri�ed under the e�ect of an electric �eld with

(su�ciently) positive potential attached to active electrode (RESET to SET

transition):

- Oxidation: the material used into active electrode loses electrons and

disperses its ions (M+, cations) into the solid electrolyte (M −−→ Mz++

ze�);

37

- Migration: the positively charged ions move towards the low potential

electrode under the e�ect of the high electric �eld;

- Reduction and electrocrystallization: on the surface of the inert

electrode takes place the reduction process, where electrons from the

electrode react with the ions arriving, forming a �lament of the same

metal of the active electrode, growing preferentially in the direction of

the active electrode (Mz+ + ze� −−→ M).

The memory cell thus retains its SET status until a su�cient voltage of

opposite polarity causes the opposite reactions, leading to the RESET status.

Such an approach in memory production is currently pursued by Nec (Nanobridge

Technology), Crossbar (PMC), In�neon (CBRAM).

Metal Oxide � Bipolar �lamentary: Valence Change Mechanism (VCM), be-

sides ECM, uses anions movement to reach the resistive switch feature. It

relies on defects into crystal structures (usually oxygen vacancies), positively

charged, and on the ability of anions to move into holes through the �I� ele-

ment.

Referring to the redox feature, in this context reduction refers to the act of

recreating the original crystalline structure, �lling a vacancy (usually acquir-

ing oxygen anions), whereas oxidation refers to the e�ect of vacancy creation

(usually losing oxygen anions). Reduction or oxidation have the e�ect of

changing atomic valence of the atoms building the crystal structure where

this change happens, hence the name of �Valence Change�.

The resistive switch is induced under the e�ect of an electric �eld: in one

polarity, it creates a conductive tunnel of accumulated vacancies, whereas

the other polarity has the e�ect of restoration of anions at their place.

Currently, Panasonic and Toshiba are developing ReRAM memories in their

laboratories, and samples have already been demonstrated [84].

Metal Oxide � Unipolar �lamentary: ThermoChemical Mechanism (TCM) rep-

resents another approach used to create a �lamentary conductive link be-

tween the two electrodes of the MIM compound. This approach is somehow

38

similar to the one used in Phase Change RAM: instead of a di�erent polarity

of an electrical �eld as it happens in VCM and ECM, in TCM a modulation

between current and voltage (not impulse time as in PCM) is used to induce

SET-RESET and RESET-SET transitions, maintaining a �xed polarity.

The �I� will not prevent completely the current passage: when in RESET

status, the current �ow will su�er high resistivity; conversely, in SET status,

the resistivity will be low. To obtain a RESET-SET switch a limited current

under a high potential electrical �eld is used : the limited current passage

induces the Joule e�ects that, in turn, triggers a redox process (similar as that

of VCM) that creates a �lamentary breakdown of the oxide (�I�), leading to

a conduction channel between the electrodes and to an immediate resistivity

decrease. Conversely, to obtain a SET-RESET switch a high current with

low voltage is used: this current has the e�ect of �breaking� the conductive

link, as if it was a traditional household fuse. For this reason, TCM is also

referred to as a fuse-antifuse mechanism.

This approach seems still far from maturity: ITRS does not report any big

electronic �rm endeavoring this technology.

Metal Oxide � Bipolar non-�lamentary: the last class of redox-based approaches

uses a non-�lamentary strategy, sometimes referred to also as interfacial

switching. Resistance switch mechanism is triggered by �eld-driven redistri-

butions of oxygen vacancies close to Insulator-Metal junctions.

ITRS refers to this technology as the less mature approach besides other

ReRAM.

Mott memory

Researchers are investigating the feasibility of memory cells using the Mott transi-

tion e�ect as resistance-switching mechanism . Such memory cells could be con�g-

ured either as modi�ed FETs (as it is the case in FeFETs) or as MIM compounds

(as it happens in ReRAMs). Relying both on documentations from ITRS and

on clues searched on the web, as well as on scienti�c literature, research in this

technology seems to be at a very early stage and there are no information about

39

produced prototypes. Research e�orts are still concentrated on chemical and phys-

ical properties of Mott insulators.

Carbon Memory

Some researchers have proposed to use carbon as a new material to build resis-

tive and non-volatile memory cells. Investigated con�gurations are to be listed

as bot 2-terminal and 3-terminal memory cells. In this approach, memory cells

would exploit some of the physical and electrical features of carbon allotropes

(diamond, graphite and fullerene), especially that of graphite (graphene and car-

bon nanotubes are the more common examples). Some approaches would use

the transition between a diamond-like state (insulating) and a graphite-like state

(conducting) as the switching mechanics. Others would use local modi�cations in

carbon nanotubes inducing a resistance switch; others again would use an insu-

lating diamond-like carbon between conductors as in the case of electrochemical

metallization to induce electrically a conductive graphite-like �lament. Research

on carbon allotropes in this �eld is more mature than that of other emerging

memories (such as Mott memory for example): started in 1859 by English chemist

Benjamin Collins Brodie, with the discovery of the atomic weight of graphite,

knowledge on this material grew throughout the last century.

Carbon Memory is another memory technology at an embryonal state of de-

velopment.

Macromolecular Memory

Macromolecular technologies focus, as in the case of Redox memories, on Metal-

Insulator-Metal compounds. The material between the two electrodes is a layer of

polymer, that must have a resistive switching ability. The term �macromolecular�

is however quite general, as ITRS reports that many polymers are being currently

investigated and some have shown di�erent behaviors that could be used to build

new memory technologies: some have ferroelectric features, whereas other feature

the formation of metallic �laments. However, the status of these research e�orts

is embryonal.

40

Molecular Memory

Molecular memory technologies represent another research �eld that is still at a

very early stage. Such technology would be based on single molecules or little

clusters of them. These molecules would be used into resistive switches elements

to store the information bit. As in Redox memories, current would be used to

reach a resistive switch of the molecule, and reading would not be destructive.

The promises of this technologies are very high since theoretically each memory

cell could reach the dimensions of a single molecule: �rst studies on molecular

memory report exceptional power e�ciency and high switching speed. ITRS ad-

mits however that still many research e�orts have to be made in order to gain an

adequate understanding of this technology.

Other memories

Besides those o�cially included into the ITRS taxonomy, other technologies are

being investigated by researchers, laboratories and industries. Among these others

technologies, Racetrack [114], Millipede [134] and Nanocrystal [18] memories are

worthy to be here reminded.

1.3.3 From the memory cell to memories

Memory cells, as those that have been presented so far, are the building blocks of

actual memories. Each memory cell can contain at least one single bit of infor-

mation, and MLC technologies permit the encoding of more bits, usually two or

three. Then, memory cells are assembled together to provide bytes, cache lines,

pages, and so on.

Since the byte or the block addressability depends on the way in which cells

are linked together, as it is the case for Flash cells, decisions of engineers about

this topic are pivotal, as they in�uence the way in which those technologies will

be used: block addressable memories �t natively to I/O devices; vice versa, byte

addressable ones are well suited to be used both on the memory bus and on I/O

devices. Regarding the technologies that have been presented so far as prototypical

and emerging, it seems that engineers are expecting to connect each cell in such a

41

way as to allow byte addressability. In fact, as subsequently explained, one of the

most followed expectation about these memories is that they will be used attached

to the memory bus, which requires byte addressability.

One last remark about these technologies should focus on their e�ective per-

formances. Figures about performances are not always easy to �nd. Moreover,

semiconductor producers are sometimes reluctant to give extensive informations

about their product. Actually, they could evaluate as counterproductive disclos-

ing such data, as �gures could reveal issues that they prefer to hide and manage

eventually using software layers (through �rmware or FTLs, for example). Despite

these remarks, in table B.2 there are some �gures about current, prototypical and

emerging technologies that could be used for a �rst comparison.

Following those �gures, I would just underline how it seems that FeRAM and

STT-MRAM could su�er from problems about scalability, as their cell area is

excessively high (the goal for semiconductor producers is 4F 2). Moreover, it seems

that FeRAM is being produced with very dated production processes (180 nm),

this fact could be a clue that this technology seems to be starving. Among the

prototypical ones, Phase Change memories seem to be the only promising high

densities.

Performing a rough comparison between the prototypical and the emerging

technologies, it is apparent how the promises of the emerging ones are much higher:

better scalability, density, performances and endurance.

While these products are still not into production or, at best, produced only

in low volumes, researchers are nonetheless wondering extensively how these new

memories would in�uence operating system's operation and their design; to per-

mit their analysis, they have made the following assumptions: these memories will

be persistent, byte addressable, denser than DRAM and faster than Flash mem-

ories. The issue about the impact of persistent memories on operating systems is

therefore the topic of the next chapter.

Chapter 2

Operating Systems

Until now I have focused on the technical features of persistent memories, somehow

relating to just the �rst part of the title. From now on, the focus will shift to

operating systems, and I will try to present at best their design issues related to

persistent memories.

All the examples made hereafter follow the UNIX paradigm and, speci�cally,

are based on the Linux operating system: even if the same principles and ap-

proaches are similarly used into other classes of operating systems (i.e. Windows),

a speci�c paradigm is nonetheless necessary to maintain some concreteness; thanks

to the open-source nature of Linux, the access to its internals is easier: I will thus

take advantage of it.

Here follow some preliminary observations about the models that could be in-

�uenced, or even changed, under the pressure of new memory technologies. After-

wards, as persistent memories can be used either in a fast SSD or directly attached

on the memory bus, there is a presentation of each of the approaches starting from

the former one, indeed the more conservative.

Each operating system can be conceived both as an extended machine and

as a resource manager [130]. In the former perspective, an operating system is

responsible for hiding to the user all the complex details related to the hardware

by providing an �abstract machine� simpler to use and to program, whereas in

the latter one the operating system is responsible for the management of all the

43

resources available on a speci�c computing system. In either viewpoints, the op-

erating system is, most of times, a software product acting like a glue between the

hardware and the programs (and �nally the user).

The relation between hardware and software is somehow porous: even if each

of them represents a speci�c research domain, they are nonetheless inseparably

related. It could easily happen that advances (or di�erent approaches) in software

engineering would urge changes in hardware design, and vice versa. However,

since operating systems are conceived primarily to permit the most pro�table use

of hardware resources, it is not only licit, but rational too, to question whether new

hardware technologies do have the potential to in�uence software, and to what ex-

tent. Scientists, researchers and developers are claiming that the new technologies

just presented will urge deep changes in the operating system engineering.

2.1 Reference models

Each science uses models, abstracting from speci�c details of problem instances in

order to describe synthetically and generically the problems themselves. Models

are indeed valuable: they permit an elegant representation and resolution of prob-

lems, acting as a useful frame within which scientist, engineers and developers can

build real solutions. Changes in the founding models usually trigger changes in a

sort of chain-reaction: it happens in mathematics, physics, and computer science

is no exception. In particular, operating systems have some fundamental mod-

els used as a reference. Since these new memories have some features that main

memory never had, researchers are thus trying to understand to what extent such

features will urge changes into current operating systems models.

2.1.1 The Von Neumann machine

Von Neumann's model is probably one of the most important used in computer

science: it describes how computations are executed into computers. Its model,

shown in �gure 2.1 is in e�ect quite simple: memory (the �memory�), along with

a processing unit (the �control� = Control Unit + ALU) and with a input/output

function (the �I/O�), all connected by a single bus, compose a complete computing

44

system. Instructions are fetched from memory, then decoded by the control unit,

operands are retrieved from memory, execution is performed in collaboration with

the ALU, and results are �nally stored into memory.

Figure 2.1: The Von Neumann model

While in the past real implementations of computing devices was very close

to that of the model (the hardware design of the PDP-8 minicomputer, for exam-

ple, was very close to Von Neumann's model), today, real computer systems are

no more similar to it, and current architectures are much more complex: a stan-

dard workstation could use many input/output devices attached to many di�erent

buses, could use more CPUs, and a single CPU could have many cores, and so on.

Moreover, computing systems have evolved in time to o�er an ever increasing set of

features: multitasking, multithreading, networking, parallel computations, virtu-

alization, and many others. However, despite this complexity, the founding models

still are the same as when computing started to become a reality: CPUs still per-

form computations using a fetch-decode-execute cycle based on the Von Neumann

Machine model. In this model, each of the functional units (control, memory, I/O)

has a speci�c role and speci�c tasks, not shared among each other: as long as the

execution model do not change and each functional unit remains distinct from the

45

others, the model should hold fast. In time, due to performance reasons, some

portions of �memory� have been brought close to the �control� through the use of

L1, L2 and L3 caches; nonetheless, even if closer, �control� and �memory� remain

still logically separated. As brie�y outlined in the next paragraph, a di�erent con-

clusion would apply in the case that �control� merges with �memory�. Referring to

persistent memories, the most challenging hypotheses of use occur when they are

placed on the memory bus (see section 2.3). However, such a use is almost iden-

tical to that of common DRAM. Therefore, if engineered as such, faster, denser,

even persistent memories would not change the basics of the model.

Studies about memristance and memristors could have the potential to seri-

ously strain the Von Neumann model (see section A.2). Memristor-like memory

cells could be used to build recon�gurable processors or logic functional units: this

would result in a merge between �the memory� and �the control� [15, 140]. Some

researchers are endeavoring the use of memristive memories to build neuromorphic

chips and their use in neural networks studies [126]. These are however currently

futuristic scenarios. With the intention to remain concrete, the focus of this pa-

per will not investigate anymore aspects related to potential changes to the Von

Neumann model, giving as granted its aliveness for the years to come. Instead,

the Von Neumann model will be used as a reference model in background.

2.1.2 The �memory� and the memory hierarchy

It could be useful to restrict the focus on the �memory� side: after all, new memory

technologies natively relate to it. Although not properly a model, I wish to recall

here the memory hierarchy, since it is indeed a neat and synthetic representation

of how current computing systems do implement memory. When talking about the

memory hierarchy, the word �memory� is quite di�erent from the meaning used in

the Von Neumann model, hence the quote for the latter. In this scope, �memory�

represents generically a place where computing devices can store instructions and

data, either temporarily or persistently. Consequently, the memory hierarchy rep-

resents both the �memory� (in the upper part) and a portion of the I/O (in the

lower part) of the Von Neumann model.

The memory hierarchy represents at a glance, as already stated, the funda-

46

mental relationship between speed and density: however, some other relevant in-

formation is hidden inside it. In particular, information about the speeds of each

level is not apparent, nor is information of where volatility ceases and persistence

starts. Moreover, a dynamic view of the changes in the hierarchy in time could be

useful.

New solid state memory technologies, as those presented above, are ideally con-

tinuing an innovation path started with the upcoming of Flash memories. Before

them, the con�guration of the memory hierarchy had remained almost the same

for about thirty years (50s � 80s): it was built of registers, caches, RAM, hard

disks, tapes (or punched cards). Although the performances had changed in time,

such changes were in absolute terms, while the relative values and the structure

itself had remained almost unchanged. Picture 2.2 represents again the memory

hierarchy. To carry more information, these hints have been added:

- access time has been added on the right (as power of ten negative exponen-

tials of seconds);

- a thick gray margin represents the border between volatility and persistence;

- The border between memory bound and I/O bound devices has been pinned

up on the left;

- a dashed line represents both:

� the border between symmetrical and asymmetrical read/write timings;

� the border between (near to) in�nite and limited endurance.

The following further facts are to be stressed about �gure 2.2: �rstly, the

six-fold orders of magnitude gap between hard disks and RAM is apparent; sec-

ondly, the thick gray margin and the I/O limit coincide: fast memory is volatile,

whereas slow memory is persistent. Fast memory is accessed with load and store

instructions from CPU, whereas slow memory needs complex access mechanisms

(I/O). Finally, fast memories have symmetrical performances and su�er no wear-

ing, whereas slow memories su�er from limited endurance and from asymmetrical

performances.

47

Figure 2.2: The memory hierarchy with hints

The memory hierarchy carries within it also some clues about the problems

that arise from every item: in a perfect world, the memory hierarchy would be

in�nitely large, but �at. In an operating system viewpoint, the perfect memory

would o�er:

- CPU speed;

- persistence;

- native byte addressability;

- symmetric performances in reading and writing;

- technological homogeneity;

- in�nite endurance.

Unfortunately, real memories cannot have all these �good� features: each layer

of the memory hierarchy o�ers only a subset of them. As a result, each level of

the memory hierarchy also depicts the problems that an operating system have to

manage in order to use it e�ectively.

48

In the next paragraphs it will be attempted a brief summary of the main

techniques adopted in time by operating system designers to overcome the limits

of each layer of the hierarchy.

Speed issues

CPU speed is a luxury commodity enjoyed only by registers. Descending from

layer to layer, access time increases exponentially. This means that each access to

a memory of a given level has a cost in time. A well-engineered operating system

will try to minimize the cost and to maximize performances.

Unfortunately, not only the cost of memory accesses increases for each level

downwards, but historically, the gap between speeds of the �memory� and �I/O�-

driven memories, such as hard disks, tapes, punched cards, CDs, has always been

very large; today it is milliseconds vs nanoseconds, a six-fold orders of magnitude

delta. This fact has always been a problematic limit.

To circumvent this limit, developers usually adopt a large set of strategies;

among others, caching and interrupts will be analyzed.

Caching Every access to slower memories other than RAM has a de�nite cost:

in order to minimize it, it is better to store as much as data as possible in RAM.

This approach would be wasteful if indiscriminately used: the faster the memory,

the less its density, the more the need to use it optimally is. In association with

data space locality and time locality considerations, however, operating systems

can use e�ciently various levels of caches, thus permitting the processor to work

on data very fast: this way, both memory operations and very slow I/O operations

bene�t from an important speed boost.

This approach is of paramount importance to gain a performance incrementation:

it is therefore used in an in�nite number of software applications (operating sys-

tems, database engines, applications, and so on) and hardware (routers, switches,

hard disks, SSDs, GPUs, and so on). Moreover, this technique is fundamental in

big data centers to o�er fast performances, as it is the case for Facebook and its

use of server clusters using Memcached1 to serve faster requests from web servers

1www.memcached.org.

www.memcached.org

49

(front-end), interposing between them and databases (back-end) [58]. Another

famous example is Redis2, used in cloud services o�ered by Amazon [62]. The use

of caching however has its own cost:

- software and hardware are more complex: code for caching management and

accountability, or chip regions devoted to cache management;

- data is copied from the original data location and thus more memory is con-

sumed;

- data modi�ed into caches have to be, sooner or later, updated in the original

location;

- multiple caches need to comply to cache coherency rules.

The bene�t experienced on systems that use caching is easily calculated with the

formula recalled in a the famous article by Wulf and McKee [143], where the total

average time is calculated as follows:

Tavg = hc+ (1− h)M

Where h is the hit rate, (1−h) is the miss rate, c is the time to access the cache andM is the time taken to access memory. In case, this simple formula is expandable:

Tavg = xc+ yM + zH

Where x is the hit rate, y is the probability of a memory access, z is the probability

of a HDD access, whereas once again c is the time to access cache, M is the

time taken to access memory, H is the time taken to access a given I/O device

(x+ y + z = 1).

As just mentioned, caching has however a cost in complexity. Whilst hardware

caching (as it is the case with L1, L2 and L3 CPU caches) has a cost primarily

expressed in hardware complexity (higher price), software caching (as that used

for example in the Linux page cache) induces a complexity growth of the software.

This added complexity is thus accountable for some extra work made usually by

the operating system3, henceforth measurable in cycle times and, consequently, in

2redis.io.3Database systems usually perform caching autonomously, avoiding the operating system

intervention

redis.io

50

some additional time (latency) and energy (power). Caching advantage, respect to

its cost, is however tremendous in the case of standard HDDs: in a standard o�-

the-shelf Linux storage stack, software accounts for just 0.27% of I/O operational

latency (0.27% x 1 ms = 2.7 µs) [129]. The software portion responsible of cache

management is only a part of the entire I/O stack: other parts refer to system call

management, device drivers, memory management, and so on. Supposing that the

cost of the entire path traversal was entirely due to caching, it would be anyway

tremendously useful: microseconds vs milliseconds.

Taking the opportunity given by caching, the preceding observation can be further

generalized speaking more generally of software: software layers built upon slow

I/O devices do generally have a little impact on performances. As an example,

surveys made on LVM in Linux highlight the fact that it adds only a 0.03% of

software latency and 0.04% of energy consumption in case of a disk based storage

stack.

Interrupts, or asynchronous execution: interrupts permit to optimize the

CPU usage by suspending processes that requested slow operations. The wake-up

of the suspended processes is triggered by the hardware emitting an interrupt,

signaling this way to the operating system that its operations is �nished. This

approach does not technically avoid the slowness of I/O operations, but takes it

into account, hence permitting as a result to use the whole computing system much

more e�ciently.

This technique is fundamental: almost every computer uses it. However, it does

also has a cost in complexity. The operating system usually has to perform at

least a context switch to suspend the process, has to start I/O and to schedule an-

other process. This sequence, highly simpli�ed, is complex and time (and energy)

consuming: many sources refers this process as weighting about 6 µs. Once again,

this value is only a little percentage of the cost of waiting for a I/O (to HDD)

request to complete: microseconds vs milliseconds.

Hardware, data, failure models

I will now focus on some observations regarding models derived from the memory

hierarchy: while some of these considerations might seem obvious, the aim of

51

such deepening is to let emerge which ones are the assumptions implicitly given

as granted into the design of current operating systems. Moreover, some of the

following distinctions could result useless if considered in relation to the classical

memory hierarchy: practically every operating system uses the same models, since

the memory hierarchy on which they are funded is the same. Conversely, I �nd

such distinctions valuable in a change perspective: the reach of persistence into

the higher layers of the hierarchy do o�er many degrees of exploitation, not just

one. The need of some classi�cation tools could thus help the analysis.

Each operating system, at least implicitly, uses a �hardware model� and a �data

model�: the former would specify (generically) what are the main functional units

that it can manage, whereas the latter would, in turn, specify a series of choices

about how data is managed. In particular, as a further deepening, the data model

would describe, among others, both the speci�c design choices related to volatility

and those related to persistence. These last design choices could be referred as

�persistence model�: the part of the data model related speci�cally to persistence

and its management.

Another set of choices, transversally related to all of the previous models (hard-

ware, data, persistence and volatility), is the one about failures. Data inside mem-

ories is subject to a long series of potential threats: power losses, hardware failures,

electrical disturbs, programming errors, memory leaks, crashes, unauthorized ac-

cesses, and so on. These problems are well known by operating system designers,

who take countermeasures, decide which class of problems are managed, which ones

are instead ignored: these design choices could described as the �failure model�.

The current data model and the current failure model are deeply based on

the properties extrapolated from the classical memory hierarchy, which have been

taken for granted for many decades, and still they are: developers have always

engineered operating system consequently.

Current data model is quite simple: persistence is delegated to I/O devices,

whereas registers, caches and RAM are volatile. Data is stored into hard disks and

SSDs using �les located into �le systems. Talking rather generally, the �classical�

failure model, while guaranteeing security both in memory and in persistent de-

vices, focuses on safety in memory and on consistency in persistent devices. Safety

is preferred in memory as a consequence of its speed and its volatility: the goal is

52

to set policies to avoid corruption, while assuming the risk that such events could

happen. Consistency is instead needed in persistent devices as a consequence of

persistence itself: consistency permits the e�ective and correct preservation of data

in time (errors reaching persistent memories become precisely, persistent errors).

Moreover, the slow speed and the complexity of I/O operations of current persis-

tent memories (hard disks and SSDs) further exacerbates the need of consistency:

the slowness increases the likelihood of a power failure during a I/O operation,

thus it is important to design I/O operations to permit data survival even after

such events. Strategies as �le system checks, journaling, logging, transactional

semantics, are all designed to minimize problems on data in persistent memories

after a power failure event.

Continuing to refer to failures in computing devices, there is a substantial

di�erence between errors in memory and those into I/O devices: in memory the

potential sources of errors could be many, whereas in I/O devices errors are almost

always caused by power failures and hardware errors, not by software: as noted

in Chen's article [19], memory has always been perceived as an unreliable place to

store data. Firstly, this is due to its volatility, but secondly to the ease of access

and modi�cation of its content: people do know that operating system crashes

can easily corrupt the memory. On the other hand, I/O driven memory devices

have always been perceived as reliable places to store data not only because of

their persistent behavior: since the I/O stack is slow and complex, it is unlikely

that a faulting condition can successfully perform a correct I/O operation with

wrong data. Moreover, just because I/O operations are slow and complex, it is

easy to add additional security (in software) by means of transactional semantics

or some other similar techniques. These observations hold still today: persistence

has always been thought as being reliable, whereas the opposite happens with

volatility. Unfortunately, if persistence reaches the memory bus, this property

would not hold any more: this aspect too should be taken into account or at least,

acknowledged.

53

2.1.3 A dynamic view in time

All the aspects just outlined refer to the �classic� memory hierarchy model: this

model, however, has started to change after the advent of Flash memories: a level

was added, thus reducing part of the big gap between slow HDDs and fast RAM

memories. Such an �insertion� led to the current memory hierarchy.

The point is now to imagine the future con�guration of the memory hierarchy,

using as clues the promises of the new technologies presented in the �rst part.

Rather generically speaking, new technologies promise to be:

- Faster than Flash: but slower than RAM or eventually, as fast as RAM.

The speed would be anyway more next to that of RAM than that of Flash:

the order of magnitude is still in tens of ns.

- Denser than RAM.

- Persistent.

- Longer-lasting than Flash: the endurance is better than Flash, but worse

than RAM 4.

- Natively byte addressable.

- Su�ering from both read/write asymmetries and cell wearing.

Trying to imagine a next-generation memory hierarchy, these memories would

collocate natively between RAM and SSDs. The promise about density is coherent

with the pyramid logic: the hypothesis of a higher pyramid would be henceforth

legitimate.

Before trying to sketch a next-generation memory hierarchy, some other con-

siderations are needed:

- about the future of RAM memory, caches and registers;

- about the use of byte addressing on the memory bus or block addressing on

other paths.

4Phase Change technology, the one su�ering most from cell wearing, have a four orders ofmagnitude better endurance (109 cycles) compared to that of Flash (105 cycles).

54

Referring to RAM, the Hybrid Memory Cube technology has been described as

a viable enhancement of current DRAM technology: it is thus conceivable that also

RAM shifts �up� in the hierarchy, getting closer to caches and registers. However,

these technologies are all volatile. It could be theoretically feasible to build regis-

ters and caches with FeFETs, thus transforming them in at least semi-persistent

memories (FeFET transistors have been demonstrated to remember their status

only for some days) [48]; this approach represent however a scenario far in time,

taken into account in literature very few times [136]. The models presented in

the next paragraphs will thus continue to give for granted the volatility of higher

layers of the memory hierarchy.

Regarding to the addressing technique, the new non-volatile memories �t beau-

tifully the byte addressing schema, thus their use in the memory bus seems the

most natural choice. Conversely, Flash memories do integrate natively into the

block addressing schema, at least for what concerns NAND Flash. Even if it is

feasible to adapt a block-native memory to a byte addressing schema, it is nonethe-

less intricate [59]. The opposite is however simpler: byte addressing memories can

be adapted to be block addressed at cost of some added hardware complexity5.

The question about the use of NVMs either as slower RAM or as faster SSDs

is thus licit. The former approach will be referred to as Storage Class Memory:

non-volatile memories placed on the memory bus.

All that said, next-generation memory hierarchy might appear as in �gure 2.3

.

2.1.4 Viable architectures

This representation depicts at a glance all the possibilities that engineers might

have in the future to build real computer systems: not all of these layers are in-

dispensable. For example, smallest devices would have no RAM and no HMC,

but only few registers, a tiny cache, and a very low-power non-volatile memory on

the memory bus (to be used both as RAM replacement and as storage); such a

con�guration could permit engineers to build battery-less devices with advanced

memory capabilities [39]. Another implementation might use new memories just

5This approach is the one used with Moneta [16] and explained in section 2.2.2

55

Figure 2.3: A new memory hierarchy

to build a faster SSD, maintaining all the other components of a classic memory

hierarchy. Yet another implementation would use both HMC memory and Stor-

age Class Memory to o�er a �dual mode� fast-volatile and slow-persistent hybrid

memory.

The next paragraphs will present the main models of usage of the new mem-

ories taken into account in literature. Researchers are studying and are trying to

model the e�ects of the coming of persistent memories both on the I/O side and

on the memory bus side: the models will follow either the former or the latter

approach. Each scenario has the potential to improve greatly current computing

performances, but there are also important issues, as it will soon explained. In

particular, even the use �as is� of either:

- a hypothetical SSD featuring a speed close to that of RAM;

- a hypothetical persistent and dense DIMM on the memory bus;

56

on a system running a o�-the-shelf operating system would result problematic.

It has to be stressed that operating systems are developed to be a well-balanced

system on certain assumptions made by designers and developers: one of the main

assumptions made by designers in modern operating systems is the con�guration

of the memory hierarchy like the one of the classical pyramid and its consequences,

i.e. the data model and the failure model. Using a metaphor, operating systems

behave as a weight-lever-fulcrum mechanical system in an equilibrium state when

the assumptions do occur. Compliance with the assumptions assures that the

fulcrum stays in the right point. Un-compliance would result in a fulcrum shift and,

consequently, in a loss of equilibrium. Still following the metaphor, the e�orts made

by researchers on NVM-aware operating systems are similar to the re-equilibration

of a mechanical system that lost its equilibrium, thus modifying the weights placed

on both sides of the lever. Firstly, the easier approach will be analyzed, i.e. the

use of new non-volatile memories as bricks to build a very fast SSD. Afterwards,

the reader will be introduced to the various Storage Class Memories approaches

proposed either by developers, or by researchers.

Before delving into the speci�cities of each approach, it is useful to note here

the inversely proportional relation that arises (in this context) between the ease of

setup of a test environment and the quantity of e�orts required to adopt changes in

software. Ironically, whereas a Fast SSD is tougher to engineer, develop, prototype

and test [64], it is however much simpler (although surely not trivial) to conceive

and model software changes in order to drive it conveniently. On the other side,

as the reader can verify afterwards, in a SCM context it is much easier to setup

a test environment (for example, non-volatile memories can be �emulated� using

just normal DRAM); despite this ease of test, it is much more complex to develop

a complete and e�cient solution to the raised challenges.

2.2 Fast SSDs

This approach is the more conservative one, since the only change in the memory

hierarchy would be the presence of a new I/O device, running faster than common

SSDs. Such a solution would not in�uence neither the standard data model, neither

the standard failure model, as the only anomaly would be its speed. This approach,

57

moreover, would be in continuity with the path started with SSDs6.

The availability of new solid state memory technologies as Phase Change RAM,

would permit manufacturers to build and sell SSDs featuring much higher speeds

than the ones of current NAND Flash-based SSDs. The speed of PCM is about

100 ns in write mode, whereas in read mode the speed is about 12 ns. This speed

is 50 times worst than DRAM when writing and 6 times when reading. Despite

the speed decrease in comparison with DRAM, these memories would be however

very fast respect to common NAND Flash memories (∼100 µs, see table B.2).

2.2.1 Preliminary design choices

Before delving into operating system issues related to faster-than-Flash SSDs, it

can be worthwhile to linger on some hardware issues as:

- the I/O bus;

- the SSD choice vs a MTD-like choice.

These aspects and the related choices establish a sort of framework that oper-

ating systems must take into account, thus in�uencing their internal design.

The I/O bus

Since SSDs, as HDDs, use a I/O bus to transfer data, engineers will have to

make some choices about the bus used in those products. The speed of these

new technologies justi�es the concern of whether the bus is able to sustain the

performances of the SSD. Drivers design will subsequently follow the choices made

by engineers.

Table B.4 shows some �gures about data transfer speed of some of the most

important buses; the �rst two are I/O buses, whereas the last ones are memory

buses (it can be observed that there is a gap of one-two orders of magnitude in

data transfer speed between the two bus classes).

6This approach is convenient also as a learning tool: in the e�ort to build with a new technol-ogy a device otherwise quite common, the focus could be �xed on gaining an adequate know-howabout the peculiarities of those new technologies.

58

Current alternatives in I/O bus are SATA and PCI Express. In order to eval-

uate the two alternatives, �gures about hardware features are fundamental, but

these are not the only factors that must be taken into account. Other factors

in�uencing a choice are:

- protocol overhead: for example 8b/10b encodings are much less e�cient than

128b/130b ones7;

- potential of further technical improvements;

- scalability;

- ability to adapt to virtualization schemas;

- ability to adapt to multi-core and multi-processor requests;

- quality of the I/O stack and the potential to improve it: hardware features

do decide how device drivers work.

A well-built bus will permit to develop �good� drivers, whereas a problematic bus

will force developers to bypass problems in software, thus raising the software

complexity.

Speaking for a while only from a hardware standpoint, SATA has been con-

ceived to be used with standard HDDs, as an improvement on standard PATA:

this fact still in�uences its behavior. In comparison to HDDs (given a very low 1ms

access time in both read and write), SATA can theoretically execute a 4K transfer

in about 6.83 µs, or 146 times faster than the speed of the hard disk to service

the same amount of data (table B.5). Such a di�erence makes SATA appear as

a in�nite-fast channel to transport data to HDD. Transition from a SATA HDD

to a good performing SATA SSD does present di�erent proportions though: some

SSDs [96] do o�er 550 MB/s sequential read speed and 520 MB/s, thus reaching

very close the theoretical speed of 600 MB/s. Supposing that a 4K data arrived to

the SSD could be just written in bulk in one write cycle (about 0.1 ms or 100 µs),

this time would be only 14-15 times slower than the transfer time: the proportion

is very di�erent from that of HDDs. These observations alone would justify to

7XXb/YYb where XX is the payload, whereas YY is the transfer size (XX≤YY).

59

conceive SATA technology as not being suitable for SSDs faster than Flash. The

�gure of 0.9 times presented in table B.4 con�rms the same hypothesis. In case

SATA were used as the chosen bus for a PCM SSD, it would perform well only in

case of 4K writes performed byte per byte; in every other condition, it would per-

form near to its limit, whereas in case of 4K read performed in groups of 64 bytes

each, SATA would behave as a bottleneck. These �gures are deduced from bus

theoretical limits but sometimes implementations are slower, and this would ag-

gravate the problem. Finally, since new memory technologies are just �new�, there

is a high margin of improvement of their performances: a bus used at its limit

from the beginning would promise thus to a�ect and waste every technological

improvement.

Even if these observations are quite rough, they follow the path already un-

dertaken by scientists, researchers, storage manufacturers and technicians, who

reached the same conclusions: SATA is being abandoned, preferring instead PCI

Express as the bus to use fast SSDs [104, 91]. The motivations of this choice are

not only rooting in current hardware features but also in the other factors cited

previously: PCI Express has higher speed and lower overhead, is scalable, has

an appealing road-map towards future improvements, is usable e�ciently by both

virtualized environments, multi-core and multi-processors systems, and so on. En-

gineers have taken this choice focusing on current NAND Flash technologies: the

scenario of faster memory technologies is taken into account, but still as being quite

far in time. This fact underlines that they evaluated SATA to be obsolete even

for NAND Flash. New SSDs using PCI Express have already been released on the

marked, and this trend is expected to increase steadily in the next years(examples

are the Fusion-io SSDs, the Apple SSD in Mac Book Pro, the Plextor M6e). It is

thus likely to expect next-generation PCM SSDs to appear as PCI Express SSDs.

Finally, giving a last glimpse to table B.4 and B.5, PCI Express does not perform

with the same proportion of SATA in comparison with HDDs: it could thus also

happen to it to act as a bottleneck; this limit is however shifted forward in time

thanks to its improvement potential, which is much better than that of SATA (for

example, PCI-Express generation 4 should appear this year).

60

SSD vs MTD-like choice

Similar to what happened with Flash memories, a choice must be made about

whether the internals of memories are hidden or not to the other parts of a com-

puting system and, ultimately, to the operating system. The same issue arose

with Flash memories: used into SSDs, all internals are completely hidden to the

system, and they are employed just as common hard disks; on the other hand,

Flash memories could also be connected directly to the computing system with-

out any intermediation. In the former case, all issues related to Flash technology

(wear leveling, cell erase before re-write, error checking) are managed by a con-

troller acting as an interface between the bus and the Flash chips: this controller

implements a Flash Translation Layer (FTL), in order to present to the system

just a block device. In the latter, those issues must be managed by the operating

system, which must thus take charge of implementing in software all the functions

of a Flash Translation Layer, as it happens in Linux with MTD devices.

PCM technology has actually a better endurance than Flash. Moreover, PCM

cells do not need to be erased before re-writes. Newer technologies will o�er

even higher endurance. These observations should permit to build simpler and

thus faster translation layers. However, the increased speed also requires the same

translation layer to be extremely fast, in order to not a�ect memory performances.

Start-gap wear leveling technique is one of the suggested approaches to be used into

these translation layers [37, 118]. The need to have a translation layer extremely

fast would suggest the path of a hardware implementation, thus supporting the

choice of the SSD approach; this option will also permit to hide conveniently the

read/write time asymmetry. Examples presented hereafter use this same approach:

it seems that the MTD one not to be currently investigated.

2.2.2 Impact of software I/O stack

Researchers and students of the Non-Volatile Systems Laboratory of UCSD Uni-

versity have conducted a series of thorough and interesting studies about the con-

sequences of fast SSDs on operating systems since at least 2010. In particular,

their observations about operating systems were the o�spring of the experience

gained while developing two prototypes of fast PCIe-based SSDs and of the ef-

61

forts made to exploit at most their performances. Both the prototypes have been

built upon FPGA architectures: the �rst one used common DRAM to emulate the

behavior of new-generation non-volatile memories [16], whereas the second used

e�ectively Phase Change memories [1]. Researchers claimed performances of 38 µs

for a 4KB random read and 179 µs for a 4KB random write: these �gures are in

line with those of table B.4, given the fact that software time is included (here read

and write are meant as complete operating system operations). Their experience

is valuable: much of the considerations made about changes in operating system

design to exploit fast SSD do come from their work.

Two articles written by UCSD scholars in particular [17, 129], describe respec-

tively the initial e�orts (the former) and the �nal conclusions of their work (the

latter). In the �rst article, there is an accessible description of the various sce-

narios that they tested to evaluate the performances of new memory technologies

as PCM and STT-RAM. The description about the testing environment used to

model the behavior of PCM and STT-RAM (at that time still not available on

the market) is indeed very interesting: they used common DRAM along with a

programmable memory controller to introduce latencies compatible with those of

PCM and STT-RAM. The solution adopted is remarkable, since a programmable

memory controller can permit to measure (as they did in their study) how perfor-

mances are a�ected in presence of read/write latency asymmetries and when those

latencies increase. They evaluated and measured performances and latencies of:

- a standard RAID solution;

- a state-of-the-art PCI Express Flash;

- a NVM-emulated PCI Express SSD (this in particular became the basis for

their Moneta and Onyx projects);

- a DRAM portion used as a ramdisk to emulate future NVM Storage Class

Memory (this in particular is rather an introductory approach to SCM, since

it does not take into account nor persistence, nor safety; however, the �rst

focus of this model is to describe performance problems concerning the soft-

ware I/O stack, not a complete discussion about SCMs).

62

The most important achievement of their work is the evidence of the huge

impact on latency and throughput to account to the software I/O stack: as the

speed of the memory device raises, this impact raises. It is underlined that using

the ramdisk environment, the �cost of the system calls, �le system and operating

system are steep; they prevent the ramdisk from utilizing more than 12% of the

bandwidth that the DDR3 memory bus can deliver�. When looking at the FPGA

solution they veri�ed that the I/O software stack was responsible of an important

performance fall; in particular, they veri�ed how the �le system was responsible

of an important latency increase (about 6µs per access). Also, they observed too

how the �le system internal design in�uences throughput: they veri�ed that ext3

�le system was responsible of a 74% reduction in bandwidth, whereas this impact

was much lower when using XFS.

These observations are discussed thoroughly in [129] and many other papers in

literature do use similar observation to analyze software I/O stack. In particular,

besides the description of the developed solutions, the two following charts, that

summarize easily the increasing impact of software as speed increases, are included.

0.3Hard drive19.3SATA SSD21.9PCIe-Flash

70.0PCIe-PCM94.1DDR

0 10 20 30 40 50 60 70 80 90 100

Chart 2.1: I/O software stack impact on latency (percent)

Cost of software in latency jumps from around 20% in case of SATA or PCI

Express Flash-based SSD, up to 70% of a PCI Express PCM-based SSD. In the

case of a SCM the cost would account to an impressive 94%.

Causes

Talking about the causes of such ine�ciencies, as it has already been stressed

previously, the common assumption that has always been given as granted by every

63

0.4Hard drive96.9SATA SSD

75.3PCIe-Flash87.7PCIe-PCM

98.8DDR

0 10 20 30 40 50 60 70 80 90 100

Chart 2.2: I/O software stack impact on energy

operating system developer is that I/O devices are slow. This simple assumption

induced developers to:

- focus on o�ering functionality in software to alleviate hardware de�ciencies

(this is the case, as already cited, of page cache, bu�er cache, LVM, and so

on). Such a functionality, given a slow device, has a minimal cost in latency

and bandwidth;

- not bother too much about the e�ciency of the software layer because soft-

ware would account only to a minimal part of the I/O: since devices are

slow, e�orts to develop e�cient software would result only in a little improve-

ment; such improvements would thus have a highly unfavorable cost/bene�t

proportion. Instead, e�orts on safety, correctness and security were better

rewarded.

Unfortunately, these assumptions are the �philosophical� roots that cause the

I/O software stack to perform so badly when the device gets faster. Indeed, the

two charts just shown suggest that also with common Flash SATA SSDs software

becomes a problem.

Following these observations, researchers and developers agreed on the need to

identify which parts of the I/O stack are mostly responsible of latency and energy

cost. This analysis is the �rst step to proceed forward in making decisions about

the best strategy to enhance the I/O stack behavior. Researchers did �rstly some

of the conceptual observations that will now be presented.

O�-the-shelf I/O stacks are developed as modular stacks usually employing a

generic block driver that works in conjunction with a device-speci�c driver. This

64

design permits to virtualize and standardize the access of programs and kernel to

speci�c devices. However, most of the times it is the kernel that is responsible of

all the aspects regarding both storage access and storage management. The kernel

does not only set the access policy (space allocation, permission management) but

it is also responsible for the policy enforcement. This extensive use inside the

kernel of both policy set and policy enforcement can lead to ine�ciencies.

The second general observation is bound to the generality of the I/O stack:

whereas its design allows a great �exibility, its generality does not permit to im-

plement all of the optimizations that could improve it at most, thus sacri�cing

some opportunities. Whilst, if the devices are slow, these �opportunities� are not

so signi�cant, they become valuable opportunities in the case of fast devices.

These observations alone could suggest some improvement areas, as the need of

speci�c (not generic) I/O stacks devoted to fast memory devices and the necessity

to prevent whether possible, the intervention of the kernel in the management of

I/O accesses. Investigating further these observations, researchers have identi�ed

these �hot areas�:

- I/O request schema can hide bottlenecks: I/O requests in Linux use a I/O

scheduler to collect and issue at proper time I/O requests. This approach

permit �exibility but adds latency (about 2 µs).

- interrupt management is expensive, especially when requests are small. In-

terrupts are intrinsically complex and expensive procedures (they add at

least 6 µs of latency): in the case of a small request, the time between its

issue and its servicing can be shorter than the time between a sleep and a

wakeup. Moreover, a fast device using interrupts would frequently issue in-

terrupts themselves: the more the demand of little I/O operations, the more

the time lost just for interrupt managing. Finally, it is necessary to underline

that, usually, interrupts must be managed. The presence of many interrupts

due to a fast device can ironically sacri�ce system responsiveness precisely

because of the device speed;

- the �le system is one of the causes of added latency for each I/O request

(about 5 µs);

65

- the cost of entering and exiting the kernel is high (in case of small requests

about 18% of the total cost).

Solutions

The insights just explained inspired the strategy, subsequently referred to as

�RRR�, pursued by researchers from UCSD while trying to optimize Linux I/O

stack to use e�ciently their prototypes of PCI Express SSDs:

Reduce: to eliminate redundant or useless features, to avoid those parts of code

bad performing with fast memories. As examples, they avoided the use

of Linux standard I/O scheduler, thus preferring direct requests. Another

solution they undertook is the choice of preferring the spin alternative to

interrupts in case of small requests [146].

Refactor: to restructure the I/O stack in such a way that e�orts are distributed

between the actors (applications, operating system and hardware). For in-

stance, separate policy management from policy enforcement, preferably as-

signing the former to the operating system and the latter to the hardware,

where possible. As examples, in Moneta the following refactoring tasks have

been implemented : development of user-space driver to avoid kernel enter-

ing and exiting, virtualized hardware interface to permit each application

to issue requests directly, hardware permission checks in order to relief the

kernel from policy enforcement.

Recycle: to reuse the parts of software already created, where feasible. For ex-

ample, reuse some of the functionality o�ered by �le system tools and �le

system themselves.

The NVM Express group8 is currently pursuing a similar approach [91]: their

goal is to develop a new standard host controller interface to be used upon PCI

Express, conceived to be adopted by PCI Express SSDs. While the PCI Express

speci�cation sets the standards referred to the lower layers of communication be-

tween the CPU and a compliant device, NVM Express speci�cation �ts in a higher

8www.nvmexpress.org .

www.nvmexpress.org

66

level of abstractions that compliant drivers must follow. Their focus is the devel-

opment of a I/O stack able to support and exploit fast SSDs, thus deriving the

maximum bene�t from the PCI Express bus. Currently Windows Server, Linux

and VMware already o�er NVM Express drivers. NVM Express documentation

reports the latency and performance gains obtained using their I/O stack instead

of a standard one: although documentation never refers directly to the Reduce-

Refactor-Reuse approach, their work seems just to be following the same approach,

somehow certifying its e�ectiveness.

2.3 Storage Class Memory: operating systems

The use of non-volatile memories directly attached on the memory bus represents

both a big opportunity for next-generation computing systems and a tough chal-

lenge: persistence in the memory bus, along with a promised better density than

that of DRAM, are the opportunities.

Persistence in memory represents an opportunity because I/O operations, even

if much faster than in the past (as it happens in the case of NVM Express SSDs or

of the Onyx prototype), are intrinsically slower than memory accesses. Persistence-

related operations issued at memory speed would then permit an extremely fast

storage (and retrieval) of data.

Density too represents an opportunity: more density would permit both to

lower the cost of storage into memory and would instead permit to manage at

high speed a bigger amount of data.

The challenges, instead, are due principally to the side e�ects of persistence

into the memory: a thorough exploitation into the operating system is di�cult, as

it requires a complex re-design of some of its major parts. Other issues do exist

though. One of them is heterogeneity: in the case that SCM was placed along

with common DRAM on the same memory bus, the operating system would have

to decide how to use at best both the memories. A design with SCM only would

be architecturally simpler. Other challenges are the need of wear leveling along

with the need to cope with r/w asymmetries.

67

2.3.1 Preliminary observations

An analysis and a classi�cation of the proposed approaches to use SCM follows

in the next paragraphs. However, before presenting each speci�c proposal, it

would be here worthwhile to focus on some aspects shared by all the approaches

subsequently described.

Wear leveling and r/w asymmetry

While the most important issues, namely persistence and heterogeneity, are each

di�erently managed in each speci�c approach, the issues that arise from cell wear-

ing and r/w asymmetry are instead common whatever the approach.

About wear leveling: it is licit to forecast the intervention of hardware engi-

neers on memory controllers [23]: as new technologies have di�erent memory

switching mechanics, timings and electrical needs, it is thus likely that CPUs

will need new memory controllers to drive new memory technologies. As the

need of new memory controllers already exists, it would be easier and cheaper

to add more functionality in hardware than it usually is. As an example, it

could be implemented in hardware a fast wear leveling schema as that of

start-gap already cited. Other schemas are also proposed in [144, 46, 148,

20]. Further needs of wear-leveling could then be reached with the support

of software. The most part of the articles about SCM do not deal with this

issue, as the most of the times it is assumed that this issue is managed by

hardware.

About the r/w asymmetry: a mitigation in hardware is feasible with the uti-

lization of either a classic SRAM cache, or even a cache built with FeFETs:

in either ways the engineering must be careful, since a bad implementation

would a�ect persistence. Anyway, this issue is often completely ignored in

literature. As the r/w asymmetry is anyway a feature of new memory tech-

nologies, I assume that it is given for granted that r/w asymmetries are

exposed to the software. This approach is perhaps preferable: maintaining

the sight over the whole panorama of NVMs, memristive memory devices

promise to o�er a r/w asymmetry much smaller than that of PCM, thus

68

mitigating drastically this issue. This aspect would be anyway easy to test

and analyze using DRAM along with a memory emulator to get projections

about changes in latency and bandwidth upon changes in timings values [17].

From now on, both these issues will be ignored, as if they were managed by hard-

ware or, e�ectively, ignored.

Background literature

It might now be the right time to observe that, in the background of most part of

each and every solution presented hereafter stand some research e�orts made in

the past years that in�uenced more than others subsequent works. These studies,

focused on topics about �le systems, persistence and caching, were all carried

out during the 90s: they were, from a generic computer science viewpoint, not

only interesting, but they did anticipate some issues that are pivotal in SCM

context. Among the articles most cited in literature, the following are worthy to

be mentioned:

The Rio File Cache: Surviving Operating System Crashes [19]: this ar-

ticle has been written in the intent of describing a computer system that

uses a RAM I/O �le cache (RIO) with the aim to �make ordinary main

memory safe for persistent storage by enabling memory to survive to operat-

ing system crashes [. . . ] reliability equivalent to a write-through �le cache,

where every write is instantly safe, and performance equivalent to a pure

write-back cache�. The intent of the writers was to execute every I/O opera-

tion directly on the in-memory cache facility, using the classical storage just

as a backup facility. They proposed to protect the �le cache with extensive

memory protection and sandboxing in order to avoid its corruption upon

operating system crashes, and to allow only warm reboots in order to avoid

memory leaks when system is re-booted.

The article claims that the proposed approach would proof even more safe

against crashes than classical I/O to devices like HDDs and SSDs (their so-

lution reached a crash probability to corrupt the �le system of just 0.6%).

Interesting �gures about incidence of crashes into �le system corruption are

69

supplied, supporting the �common sense� according to which disks are more

reliable than memory: their �gures however shows how the increase in cor-

ruption incidence in a memory without protection is only slightly greater

than that of common HDDs (just 1,5% probability instead that of a write-

through cache, 1.1%). This articles is an evidence of the fact that as early

as in the 90s, researchers set sight on storage kept entirely in memory: this

topic is today a primary need in big data centers and in large database sys-

tems. Moreover, this article contains ideas similar to those used into the

�Whole System Persistence� approach (see section 2.3.3). Trials and testing

made to measure system crash e�ects are still usable today to design a cor-

rect mechanism to use persistence into main memory as those proposed in

the next paragraphs regarding �le systems and applications (see respectively

sections 2.3.4 and 2.4).

File System Design for an NFS File Server Appliance [35]: this article is

a technical report issued by NetApp, and it explains the design choices made

into their WAFL �le system (Write Anywhere File Layout), used into their

storage appliances. WAFL used extensively shadow paging to obtain data

consistency and fault tolerance, while o�ering a robust snapshot facility to

easily and e�ciently manage backups. The ideas described neatly into this

article are used extensively as a reference in many other articles about �le

system design and they certainly anticipated the times, since newer �le sys-

tems as ZFS [111] or BTRFS [121] use approaches similar to those �rstly

developed for WAFL. Finally, NetApp's appliances running WAFL did use

a non-volatile RAM (battery-backed) to maintain immediately available the

operation log after crashes: this solution somehow implicitly uses the non-

volatile memory paradigm.

The Design and Implementation of a Log-Structured File System [122]:

this article proposed a new pattern to use blocks into a �le system as a contin-

uous and cycling log. Following this pattern, every write operation triggers

new block writes into free spaces, thus �lling free spaces of the disk, as it

happens in circular bu�ers. Besides the fact that this approach requires

a garbage collection layer, this article proposed nonetheless a real new ap-

70

proach for �le system design. It was inspired on the copy-on-write approach,

and it permitted to avoid the need of �dual writes� for consistency reasons

(the �rst one on the journal and the second to the e�ective data block).

Finally, this approach implicitly enforces a wear level strategy: writing each

time di�erent blocks, writes are distributed around a disk, and endurance

of memory cells is consequently raised. This design has thus inspired those

of many Flash translation layers and many Flash �le systems [85]. This ap-

proach could be valuable also in the context of SCMs, where the issues of

cell wearing must be taken into account.

To a lesser extent, other articles from the 90s that anticipated the topics of non-

volatility are [141] and [9]: the former one describes the architecture of a computer

that used a persistent memory (Flash + SRAM) on the memory bus, and the latter

uses a non-volatile DRAM to use it either as cache or to speedup recovery times.

Choosing the hardware model

Recalling the observations made previously about the hardware model, the data

model and the failure model, it could be worthwhile to present here the two hard-

ware models that will be subsequently considered. Each of the proposals presented

afterwards necessarily uses one of them. Storage class memory could be used only

in two con�gurations: either alone, in replacement of DRAM, either in tandem

with it.

The �rst option, i.e. DRAM replacement, represents a simpler alternative, as

it avoids the needs to manage heterogeneity. However, there are also some prob-

lematic aspects: �rstly, as persistent memories would be slower than DRAM, such

a use would sacri�ce performances. Moreover, this option would force operating

system designers to necessarily manage the issues related with persistence: all the

memory available would be persistent.

The second option is more complex: standard DRAM would share with SCM

the whole set of physical addresses. Therefore, a portion of addresses would be

volatile, while the remaining one would be persistent. Despite the need to manage

memory heterogeneity, this approach is the preferred one in most implementations.

This con�guration, beside its complexities, permits to mix both the technologies

71

(DRAM and NVRAM) to achieve the best compromise between performances and

storage needs. Moreover, it gives to developers the faculty to decide when and to

what extent they want to use persistent memories.

As it will be soon apparent, the hardware model, the data model and the failure

model, are tightly related: some data and failure models are achievable only on a

given hardware con�guration.

2.3.2 No changes into operating system

A �rst and tempting approach to use SCMs could be the easier one: to use SCM

just as common DRAM using a standard operating system. This would correspond

to the choice of maintaining the same data model and the same failure model

currently used in operating systems. This approach would permit a standard

operating system to bene�t immediately from the density increase that SCM would

o�er: whatever the chosen hardware con�guration, the SCM would be anyway used

as a standard, volatile memory.

This solution would be however problematic since a part or all of the memory

(depending on the hardware con�guration) would become persistent. O�-the-

shelf operating systems do use DRAM with the expectation of its volatility: the

operating system enforces safety and security of data inside it just as long as

power is on, without the need to take care of it when power goes down, as it is

given as granted that it is erased. It has been evidenced in [32] that, even in

the case of standard DRAM, data is not lost immediately, but gradually, within

1 to 5 minutes: this fact alone can be a source of security concerns. More so, if

non-volatile memories were used with a standard o�-the-shelf operating system,

data security could be bypassed even more easily: each data ever stored in it would

persist as long as it would not be re-written, and the read operation would be much

easier. As temporary data can store passwords, encryption keys, sensitive data,

and an in�nite pattern of mixed information, it would be a terribly bad practice

to expose all this data to potentially unauthorized accesses. To mitigate these

problems, a change in the hardware con�guration or in the data and failure model

would be required: just with the intent of maintaining the same levels of safety

and security currently enforced by standard operating systems, the SCM should be

72

encrypted, either in hardware [26, 3], either in software [117]. The encryption of the

memory with random keys set at each system reboot would emulate the volatility

of the memory, thus permitting a safe use of persistent memories as common

DRAM. This strategy could even increase the resistance of standard DRAM to

security attacks as those presented in [32]. Another criticality about this way of

managing SCM arises from performances: since SCM performances are worse than

those of DRAM, a normal operating system using SCM as DRAM would su�er

from reduced performances, unless the density increase would be so necessary to

compensate the performance loss. Moreover, in a hybrid hardware architecture, the

operating system would use the whole memory as just a unique quality of memory,

whereas features between SCM and DRAM would be very di�erent from each

other: the timings of the operating system would issue a problematic variability

and unpredictability.

Finally, this �rst approach is not studied in deep into literature since practi-

cally it is a no-use strategy on persistence: opportunities o�ered by persistence

are simply ignored, whereas some problems arise and must be dealt with. This

approach is however useful as a cognitive tool to get the awareness that persistence

is just a property of certain memories: it must be managed in order to bene�t from

it.

2.3.3 Whole System Persistence

This approach, proposed by researchers from Microsoft Research in [56], is mainly

focused on the reality of large database systems and of large data centers, even if

it could also be used in standard computers.

The chosen hardware model uses only persistent memory. Since currently SCM

technologies are not as mature to be used as DRAM replacement, in this approach

persistence is achieved using NVDIMMs9, currently available on the market. How-

ever even if not originally thought to be used with SCM, this approach is nonethe-

less perfectly suited for them. Here, persistence is exploited to achieve practically a

zero impact on power failures (failure model). The goal of this approach is to trans-

9Non-volatile DIMMs. NVDIMMs are standard DRAM DIMMs that use super-capacitorsand NAND Flash modules to behave persistently.

73

form a commonly critical scenario as power interruptions into a suspend/resume

cycle: if achieved, this change would lead to systems completely resilient to power

outages.

The originating observations of this approach are:

- a consolidated trend in large databases is the storage of the entire dataset

into main memory. The �cloud� paradigm further urged the development of

caching servers using large dataset entirely into memory (see section 2.1.2);

- DRAM in servers can reach big sizes, currently around 6TB [69]: accord-

ingly to this �gure, clusters of servers can manage tens or even hundreds of

terabytes of memory each;

- power outages are expensive, especially for large environments. The cost

of resuming a system increases in complex environments because when re-

covering large datasets, many I/O requests are placed on storage back-ends,

typically slow. This �stress� su�ered by storage back-ends can be itself cause

for other critical events such as that experienced at Facebook in 2010 [89];

- as the quantity of memory in server raises, the cost of recovery from back-

ends raises, since the amount of data necessary to re-build the entire state

is raised too .

These observations have inspired the search for a mechanism that relieves ma-

chines from the need of re-building the entire in-memory dataset when power

outages happen. Key ideas of the Whole System Persistence are to:

- retain all the content present in memory thanks to persistence;

- save the entire state of a server (registers, caches) into persistent memory

when a power fail event is detected (Flush-on-fail strategy);

- modify the hardware in order to permit a residual energy window long enough

to permit the state save into memory, by adding capacitance into PSU using

supercapacitors;

74

- restore automatically into registers and caches the state saved previously, as

soon as power is restored.

These steps should appear to the operating system as transparent as possible,

emulating just a suspend/resume cycle. Actually, some subtleties about the saved

state and the real state of devices after the power cycle do point out that the

process of resume/restart needs some further adjustment in order to be completely

transparent. Even if some minor changes to the operating system may be needed,

this approach has been successfully tested and showed in [110].

This proposal will be certainly of great advantage into large data centers: the

focus is exactly on some of the major criticalities experimented in that context.

However, some remarks must be made about it:

- although its use on every day computing would allow to use a computer

immediately after the switch-on (after the switch-o� the system has managed

a power fail event), the same security problems noticed previously would arise

(see section 2.3.2);

- there is no speci�c data consistency management: it should be therefore

entrusted to the operating system. This fact however suggests that, since

current operating systems are designed to reload at each reboot, there must

be some mechanism to detect system degraded functionality and force a

system reboot;

- this approach uses persistence only as a temporary storage, not as a long

term memory. The rest of the system does not even have the perception of

being using persistence. This strategy is certainly simple, but some of the

opportunities of persistence are not used.

2.3.4 Persistence awareness in the operating system

A step further toward the exploitation of SCMs is the intervention of the operating

system: di�erently to what happens in the approaches shown before, in the ones

hereafter presented the operating system is aware of the presence of some persistent

memory device connected to the standard memory bus along with DRAM. The

75

hardware model here considered is thus hybrid. The presence of SCM can be

noti�ed by the �rmware at boot time as it normally happens for other hardware

features in BIOS, and the memory physical address space will then be divided in

a DRAM portion and a SCM portion.

Almost all of the approaches here presented use persistent memory to store

data into �les, as it happens today with common hard disks or SSDs: through a

�le system. This continuity with the well-known paradigm of �le systems has the

advantage of preserving software compatibility, being thus a viable and acceptable

approach to persistence-awareness.

Anyway, while still maintaining the persistence awareness only at the operating

system level, �le systems are not the only way to exploit it. As described at the

end of this topic, some proposals conceive a scenario both more complex and more

thorough. For the moment, however, my focus is on �le systems.

Typical services o�ered by �le systems, are:

- a high level interface to applications to use storage;

- an interposing layer between applications and devices whose job is arranging

data to be safely stored and lately safely retrieved;

- security enforcing and concurrency management on contained data;

- a low level interface to device drivers.

Despite all the features o�ered by persistent memories, the need of these ser-

vices (at least of the �rst three ones) is unchanged: the features of new non-volatile

memories do not in�uence the need of �le system services, at least as long as the

�le/folder paradigm is extensively used as it is today.

Instead, what may be subject to revision are the mechanisms used to o�er

these services: �le systems are software devices that have been always developed

keeping in mind the technical details of the underlying storage media (see section

A.3); as persistent memories are so di�erent from the other ones currently in use,

it is thus advisable to re-design and adapt operating systems to their features.

76

Brief analysis of current I/O path

In Linux, applications that need to operate on �les stored into I/O devices follow

a sequence of common steps involving many kernel layers, illustrated in �gure 2.4.

Figure 2.4: The Linux I/O path. © 2014 Oikawa

This sequence is triggered by applications, it arrives (when needed) to the I/O

devices and eventually it returns to application themselves. Applications use �le

system services through �lesystem-related system calls, as open(), read(), write(),

close(), and so on. All these system calls use in turn the services of a kernel

layer called Virtual File System (VFS). Its role is to hide to applications all of

the implementation details of each speci�c �le system, thus exposing to them just

a standard and well-documented interface. The VFS manages all the software

objects required to use �les, folders, links, and issues all the speci�c requests

directly to each speci�c �le system. In turn, each speci�c �le system interacts

with the page cache to check if the data is cached into memory (see also section

77

2.1.2); in case it is not, the kernel issues the needed I/O request to the block device

driver layer. This layer in turn issues the request to the right device driver. The

device driver then executes the task along with the driven device. This sequence,

even if rather roughly sketched, describes the impressive amount of operations

carried out by the kernel (other fundamental tasks, as security checking, have not

been even considered).

Caches anticipated the persistence shift to main memory

Following these steps, it is apparent how it is common to have �le system data into

memory, though used as cache, that can be reasonably seen as a (temporary, as it

is volatile) storage location, whose backing are I/O devices. Moreover, these �cache

locations� are used as the fundamental building block of the mmap() system call:

with this system call, an application can map into its own virtual memory address

range data stored into �les from some �le system. This mechanism is permitted

by the page cache: if data is not present, it is �rstly retrieved from the I/O device,

then when the page cache has retrieved it, the memory pages containing that

data can be mapped into the application's virtual address space. The mmap()

system call can be thus seen as a sort of byte addressability on persistent data

used into memory. Developers, in order to exploit the byte addressability of NOR

Flash memories, added during the 00s new functionalities to the mmap() system

call: XIP (eXecute-In-Place). These changes permit to mmap() , in cooperation

with a XIP-enabled �le system and a XIP-enabled device driver, to connect the

virtual address space of an application directly to the byte-addressable Flash chip,

without using the page cache as �middle ground�. This feature was added to permit

to lower the size of DRAMs into portable devices [11], and to decrease their bootup

time [14]. XIP in e�ect, allows to processes to address directly data existing into

persistent storage as if that data was into main memory.

Consistency considerations

As stated before, caches are used principally to raise the performances of devices

that otherwise behave poorly. I/O performances are raised because caching moves

I/O operations from the devices to the memory: here operations are faster, band-

78

width is high, latency is minimal. However, this mechanism has a further cost in

reliability10. Data into caches is always up to date, but data into devices becomes

up to date at a slower rate: since transfers costs are high, and particularly in the

case of small transfers, few bulk transfers are preferred against many little ones

(write-back caching strategy). This raises throughput and lowers transfer costs.

This need forces operating system to wait (usually for 30s) to transfer back to

I/O devices those pages of cache that are marked as �dirty�. The idea behind this

behavior is to raise performances at the cost of some uncertainty. Continuing to

discuss about I/O transfers, there are some other further uncertainties. Firstly,

data transferred to I/O devices is unsafe until when writes are e�ectively executed;

secondly, often modi�cations to data stored on I/O devices require more than just

one write operation: if a modify operation is not completely executed in all its

steps, there is the risk of �le system corruption. So, just to summarize this topic,

caches are useful, but they raise the risk of data loss; also, write operations are

risky while in progress, as well as logically-connected multiple operations are, until

they are completely executed.

These issues have been studied extensively throughout the past decades: correct

survival of data is strategic in both generic �le systems and, even more, in database

systems. Actually, it is from the database world that the acronym of �ACID�

comes (Atomicity, Consistency, Isolation, Durability), as well as the concept of

transaction [31]. In order to tackle these issues, in the years many approaches

have been proposed and successfully used both in database and in �le systems:

the most known among them are transaction logging and copy-on-write techniques.

Transaction logging is extensively used in journaling �le systems11, whereas copy-

on-write is used in �le systems that use shadow paging, and in those who use the

log-structured design. As a further remark, all of these techniques do �t perfectly

in a fault model that tries to nullify the adverse e�ects of power outages. These

strategies against faults do not take into account software faults, bugs, crashes, for

the same assumptions explained before in RIO.

10As said before, a �rst cost is in complexity, see section 2.1.2.11For example, �le systems as ext3 and ext4.

79

Approaching design

While trying to treasure the observations just made and to keep speaking in a

rather general way, a project of a �le system designed speci�cally for SCM should

at least take into account the following major di�erences between SCM and I/O

devices:

- access semantics is completely di�erent (easy load/store vs complex I/O

read/write);

- access granularity is di�erent (byte or 64 bytes vs blocks);

- cost of accessing persistent media changes (low with SCM, high with I/O

devices). The motivation that induced the widespread use of caches loses its

relevance: cache worthiness could be reconsidered;

- execution delay window is di�erent (short with SCM, long with I/O devices)

but still present. This fact can simplify the design of a �le system resilient

to power outages, but the risks of data loss still must be a concern. As

an example, journaling can lose its appeal compared to logging and shadow

paging, which seem to be patterns better suited for memory storage;

- ACID loses the last letter, Durability, as it is implicitly achieved through per-

sistence. Atomicity, Consistency and Isolation still are needed to guarantee

a long data lifespan.

Moreover, some new issues would arise. Among them, at least the following

should be considered:

- memory protection against operating system crashes and programming errors

should become a primary goal: memory is less safe than I/O. The experience

about this topic documented in [19] can be used as a reference;

- some subtle issues related to how atomicity is achieved do arise (it is fun-

damental in shadow paging for example). Memory operations (load/store)

reach e�ectively the memory only after CPU caches in a write-back manner.

80

Cache lines tagged as �dirty� must be rewritten into memory, but this opera-

tion could be subject to reordering: this behavior optimizes stores and raises

performances, but becomes a problem if the programmer relies on an exact

order in which certain stores happen (as it is the case of atomic writes where

an exact order is needed). While cache contents can be synchronized between

processors and even between cores with cache coherency algorithms (using

for example, in x64 instruction set, the mfence instruction), currently there

are no guarantees against memory reordering operations from cache to mem-

ory: such guarantees can only be achieved using the mfence instruction in

tandem with cache �ushing, but this practice lowers sensibly performances.

Personal hypotheses

Here I would like to expose some ideas that come up while imagining a system

that use persistent memories along with a �le system. The �rst one actually would

lead probably to a bad design, whereas the second one is only just sketched, even

if might proof quite interesting. I feel nonetheless that they are quite useful to

underline the di�erences that come out when compared to the approaches found

in literature. From a �Recycle� viewpoint a persistent memory could be exploited

by:

- Recycling the page cache facility: a �le system would be located completely

inside a part of it. Page cache should be programmed to use that zone

without a backing store;

- Using a ramdisk block device along with the directive O_DIRECT of the

open() sytem call to avoid the page cache and to not duplicate data into

memory.

Both these approaches focus on the fact that, if storage is placed on the memory

bus, then page cache would just replicate the same data, thus wasting space and

CPU cycles. So, my idea was to use storage either in the page cache only (the

former approach), or in the persistent storage only (the latter). While this intent

is paramount also in the approaches found in literature, my thoughts were indeed

much more airy.

81

The �rst approach would appear somehow redundant, since already both TMPFS

and RAMFS special �le systems take pro�t of the page cache facility in the same

way I imagined: without a backing store. This approach would be somehow simi-

lar to the proposals made into into [19], but it has to be reminded that RIO is well

suited only on systems whose entire memory address range is persistent, and this

would not be the case: this could be a clue that probably this might not be a cor-

rect design choice. However, similarly to what is objected in [142] about TMPFS

and RAMFS, page cache has been developed �rstly just as cache: the focus was

on speed, giving for granted its volatility. Moreover, the page cache still should be

used as a volatile cache for other standard I/O devices. Finally, at present there is

no easy way to instruct page cache to use a given range of physical addresses: this

information would be necessary to use page cache on persistent zone of memory.

Each of these remarks stresses how a hypothetical redesign of page cache would

be both not trivial and would unfortunately a�ect its core logic: simply, it would

probably be a bad design choice. A better choice would likely be the creation of a

new facility devoted to SCM [142].

The second strategy is the opposite of the previous one: if the page cache

cannot be used, the alternative way to not duplicate data is to avoid it. This

approach might be better than the previous one: surely it does not a�ect the logic

of other important facilities of the operating system. O_DIRECT is already used

as a open() modi�er, either to avoid double caching (as it is the case for database

engines, which have their speci�c cache mechanism) or to avoid cache pollution

(if caching is for some reason not needed). This interesting approach has already

inspired other developers in the e�ort to build a solution suitable for persistence

memory, as in the PRAMFS �le system [95] and in current e�orts of the Linux

community to develop DAX (a successor for XIP, see section 2.3.5). However, my

idea appears to me as somehow limited. The approach here chosen is the use of a

�le system to permit to every application to continue to use the already developed

persistence semantics without any need of re-developing the application code: this

means that if an application does not use O_DIRECT natively, reads and writes

still pass through the page cache. Moreover, O_DIRECT in�uences only �les

accessed with open(), not �les accessed through memory mapping. So, while my

idea was certainly in the right direction, it was however incomplete.

82

A gradual framework

In e�ect, to build a working solution as those found in literature, some facilities

must be developed:

- A manager: a means for reserving and managing persistent memory, in

order to avoid the standard memory manager to use it as common volatile

memories. The alternatives are: to build a speci�c driver to use persistent

memory as it was a device, to embed management capability in a �le system

developed speci�cally to be used on memory areas, to modify the standard

memory manager in order to use persistent memory as another type of mem-

ory, develop any other facility to manage persistent memory only.

- A translator: a means for changing the semantics to access data. This

change of semantics is necessary but the problem is where to place the trans-

lation mechanism: the solutions proposed in literature are either inside de-

vice driver, or in the �le system, or in a library devoted to act as a semantic

translator.

Moreover, as solutions become more and more thorough, these services can be

o�ered:

- E�ciency: a means to avoid completely the use of page cache and access

directly data.

- Safety: a means to enforce memory protection against operating system

crashes.

- Consistency: a means to resist to power failures during writes and to guar-

antee long data lifespan.

- Exploitation: a design to manage space that exploits the architecture of

the memory, thus leaving the design approaches �t for hard disks.

- Integration: an elegant solution would permit the use of persistent memory

both as storage and as memory for the kernel. Such use is the more complex,

as the kernel must be instructed about how to use persistent memory and

83

this can potentially expose the kernel to bugs. These types of solutions are

some sort of bridge between this class of approaches and those which propose

to expose persistent memories directly to applications.

The di�erent approaches are summarized in table B.6. Before presenting the

details of each speci�c line of the table, I shall make a �nal remark: these �le

systems do use persistent memories as storage, but all their internal structures

used to keep the accounting about �les, folders, open �les, etc. still are placed into

DRAM. This behavior is the same of common �le systems: it simply re�ects the

fact that still these approaches do not use persistent memory as memory available

to programs, but just as a place for storage.

2.3.5 Adapting current �le systems

The simplest approach

The simplest approach would only permit to use a standard �le system on per-

sistent memory: this would only need a manager and a translator. However, as

�le systems would run without changes, a block device driver would be needed:

developers should thus embed into it both the functions of the manager and those

of the translator. A viable starting point could be the modi�cation of the standard

Linux brd driver (o�ered by brd.c). Such a solution would be functional even if

ine�cient: page cache would be used normally.

Linux developer community

Linux kernel developers are however walking through a more thorough solution:

DAX. This acronym stands for Direct Access12 and it's developed to permit the

use of standard �le systems on persistent memories with minor changes. It is the

successor of XIP and it is a sort of complete solution that o�ers both automatic

O_DIRECT for open() system calls and XIP functionality for mmap() system

calls. Even if these solutions are not well documented in scienti�c literature (to

my knowledge there are only some slides and videos), current e�orts can be found in

12The �X� probably stands for the �rst letter of the XIP acronym. Direct access comes insteadfrom the main function that must be implemented into compliant �le systems, direct_access

84

the mailing lists and o�cial Linux documentation about experimental features [99,

93]. To follow this paradigm, modi�cations must be made into a driver (to become

DAX compliant and to use persistent memory) and in a standard �le system (that

uses then the DAX driver through DAX functions): currently a �le system subject

to these modi�cations is ext4. This approach is object of work by Linux kernel

developers, and this fact should be considered as a clue of its validity. Moreover,

as documented in [73], e�orts made in this direction induced developers to focus

on the refactoring of the design of I/O system calls to increase their e�ciency in

the case of use with fast storage. This fact, as well as the challenges raised by the

new NVM Express standard, could lead to a deep redesign of the mechanics of the

I/O subsystem in Linux. The issues presented about fast SSD apply also in this

context: software is a big cause of loss in latency and throughput and Reduce,

Refactor, Recycle still is a valuable methodology.

Quill

Quill is a proposal documented in literature [2] that has been developed to permit

even less modi�cations to common �le systems. As the previous approach, Quill

is not focused on a speci�c �le system, but its aim is instead to be used with

standard ones. Another of its goals is to use as least as possible the kernel to

avoid expensive context switches: the most of its code is in user mode. It acts as

a user mode translator, developed as a �service� library that interposes between

each I/O system call and its e�ective execution, adding a sort of indirection. Quill

is a software facility built of three components:

- the �Nib�, that catches the I/O system calls and forwards each of them to

the following component, the �Hub�;

- the �Hub�, that chooses the right handler for the system call, depending on

whether the request is relative to a XIP �le system or not;

- the �handlers�, that e�ectively manage the requests: if it concerns a standard

�le system, the handler selected by the �Hub� is the standard system call of

the �le system. On the other hand, if the request concerns a XIP �le system,

a special handler performs a mmap() operation.

85

Quill, comparing to the approach of the Linux community, simpli�es the need

of �le system refactoring. Anyway, a thorough comparison between the e�ective

performances of the two approaches does not exist. While both these approaches

are surely focused on guaranteeing the needed e�ciency, the management of safety

against operating system crashes is not well documented, and most probably is up

to the design of the manager. As a further remark, both the previous approaches

do rely on a driver that is used both as the manager and as the translator. This

design is indeed a requirement, since common �le system do expect to use a block

device and, hence, a block device driver. Regarding consistency, the semantics used

depends on the e�ective �le system used. Moreover, these solutions are �exible

and simple enough to be used as a springboard towards a fast use of persistent

memories into operating systems.

2.3.6 Persistent-memory �le systems

A personal convincement of the author is that since software latencies are problem-

atic at high speed, sooner or later, a �le system speci�cally designed to use memory

storage, will be needed in order to further increase performance; this would permit

to save latency from those optimizations that are related to common spinning disks

as well as to gain latency with a design suited to memory. The next approaches

will show the job made by researchers to develop �le systems speci�cally designed

for byte-addressable persistent memories: BPFS [23], PRAMFS [95], PMFS [25],

SCMFS [142].

As a generic observation, the following approaches are a step further to the

exploitation of persistence memories. Even if the speci�c features do vary between

each of them, these approach are the results of e�orts made to o�er a larger set of

features, especially the ones that are needed into a �le system used in �production�

environments: safety and consistency.

BPFS

BPFS is an experimental �le system developed by researchers at Microsoft. The

literature about it is focused on the internal structure of the �le system and in the

proposal of two important hardware modi�cations to permit a fast utilization of

86

its features. This �le system aims at providing high consistency guarantees using a

design similar to that used into WAFL. Like WAFL, BPFS uses a tree-like structure

that starts from a root inode: this design permits an update of an arbitrary portion

of the tree with a single pointer write. However, BPFS researchers argued that in

WAFL the mechanism used to perform �le system updates was too much expensive

(each update triggered a cascade of copy-on-write operations from the modi�ed

location up to the root of the �le system tree). This remark conducted their work

towards a proposal of a �short-circuit shadow paging�, i.e. a technique that would

use adaptively three di�erent modify approaches: in place writes, in place appends,

partial copy-on-write.

This technique permits e�cient writes, along with a management of operations

focused on consistency. The side e�ect of these choices is however the loss of the

powerful snapshot management of WAFL. An other speci�city about this approach

is the proposal of two hardware modi�cations. Usually this type of requests are to

be avoided for the simple fact that they go easily unheard: hardware modi�cations

are very expensive and happen only when the pro�tability is certain. However,

one of the two proposed modi�cations is indeed very interesting: it tries to address

the problem of memory re-ordering writes. As previously described, currently the

problem can be managed by �ushing the cache (in tandem with mfence instruc-

tion): this approach is anyway limited, as it lowers considerably the performances.

In [23] it is proposed to add into the hardware (as new instructions, similar to

mfence) a mechanism to allow programmers to set ordering constraints into L1,

L2, L3 caches: �epoch barriers�. An �epoch� would be �a sequence of writes to per-

sistent memory from the same thread, delimited by a new form of memory barrier

issued by software. An epoch that contains dirty data that is not yet re�ected to

BPRAM is an in-�ight epoch; an in-�ight epoch commits when all of the dirty data

written during that epoch is successfully written back to persistent storage. The

key invariant is that when a write is issued to persistent storage, all writes from

all previous epochs must have already been committed to the persistent storage,

including any data cached in volatile bu�ers on the memory chips themselves. So,

as long as this invariant is maintained, an epoch can remain in-�ight within the

cache subsystem long after the processor commits the memory barrier that marks

the end of that epoch, and multiple epochs can potentially be in �ight within

87

the cache subsystem at each point in time. Writes can still be reordered within

an epoch, subject to standard reordering constraints�[23]. This behavior would

be achieved through the insertion of two new �elds into caches: a persistence bit

and a Epoch identi�cation pointer. The other proposal made into [23] consists in

adding capacitance into RAM modules in order to reach the guarantee of e�ec-

tive completion of each write request already entrusted to the memory modules.

This last change proposal is similar to those found in [56]. BPFS proposal brings

both lights and shadows. Lights are surely related to the care of this design, and

to the search for a solid mechanism for consistency. Moreover, the design is well

adapted to memory access patterns. Shadows consists primarily on the fact that

some features are not mentioned: security issues are not taken into account, as

neither is a revision of mmap() system call (XIP functionality). Moreover, BPFS

relies on hypothetical hardware changes, and their concrete realization is aleatory.

Another issue is a certain amount of opacity about some details on the e�ective

implementation they built: in [23] is pointed out how this solution has been devel-

oped on the Windows operating system platform, even if there is a complete port

for Linux FUSE, but no further details are given. The problem of the depth of

documentation arises also in other proposals, as the one that follows (PRAMFS),

but in this case some information lacks completely: for example, it cannot be

clearly identi�ed where management and semantic translation happen. The most

likely answer is in a driver, as the previous scenarios, but it is indeed a guess.

PRAMFS

The next proposals, PRAMFS, comes from the open source community. It consists

in a much more classical �le system design, along with the required features about

XIP functionality, I/O direct access and security against operating system crashes.

It is a simpler approach than BPFS but it tries to address more issues. Memory

protection is achieved using current virtual memory infrastructure, thus marking

the pages into TLB and into page tables as read-only and changing the permis-

sions only when strictly needed. This method seems the same as that of [19]. For

what concerns the issues about the management and the semantic translation, this

approach seems to be better than the previous ones: these two fundamental func-

88

tions are executed directly in the �le system itself, without intervention of a block

device driver. To be exact, the documentation in this regard is not completely

clear, but some clues from PRAMFS documentation and from [60] con�rm what

just claimed. Another clue of this behavior, is the way this �le system is mounted:

directly by specifying the starting physical address.

mount �t pramfs �o physaddr=... (Example of mount code)

This behavior is thus similar to those of TMPFS and RAMFS, albeit adapted to

persistent memory: it is a great advantage, as it permits to avoid those overheads

that are accountable to the block device emulation that is at best, an unneeded

layer (see section 2.3.6). Unfortunately, re�ecting perhaps the very prototypical

status of this proposal, in the documentation there is no mention about consistency

concerns and relative mitigation techniques.

PMFS

The PMFS �le system was developed by the kernel developers of the Linux com-

munity before they started to focus on the DAX approach. In its design, direct

interaction with persistent memory is clear: the �le system manages directly the

persistent memory, which is reclaimed from the kernel at mount time. The man-

agement and the translation are executed by the �le system. Although PMFS

internal design is di�erent from the other ones reviewed before, the intent of the

developers was the same: to create a lightweight �le system for SCM in order to

give to application a support for standard read, write and mmap operations, while

o�ering consistency and memory optimizations to increase performances. XIP fea-

tures are used to permit e�cient use of mmap() system call. Standard I/O system

calls are conveniently translated into memory operations while avoiding any data

replication into page cache. In order to o�er high levels of consistency, metadata

updates are executed through atomic updates when feasible, or instead, by using

a undo journal, copy-on-write is used for data updates. As in other approaches,

BPFS developers remarked the need for hardware features to help consistency, and

they pointed out the same consistency issues about memory operations remarked

89

in other papers. In order to solve these issues, they proposed the insertion of a new

hardware instruction (pm_wbarrier) into the instructions set. This approach is

similar to that proposed by BPFS developers, though simpler (no cache structure

is changed). Such an instruction would guarantee the durability of those stores

(i.e. the store instruction has been e�ectively executed) to persistent memory

already �ushed from CPU caches. An original approach to obtain the desired

memory protection is explained: instead of using the expensive RIO strategy of

a continuous write-protect and write-unprotect cycle, it would be better to use a

uninterruptible, temporal, write window to protect virtual memory pages.

SCMFS

The last proposal here reviewed, SCMFS, comes from academic researchers from

Texas A&M University and is presented in [142]. Proposed through a neat and

thorough paper, their work is centered on a new �le system developed speci�cally

for SCMs (SCMFS stands indeed for Storage Class Memory File System). A ma-

jor strength of their approach is the high integration of such a �le system with

current Linux memory subsystem. Such an integration could pave the way to a

future use of persistent structures and data by the kernel. For this reason this

proposal really seems to represent a step further toward the concepts presented

into [60]. The paper presents how the team modi�ed the BIOS to advertise to the

operating system about the presence of SCM and the Linux memory manager, in

order to create a new memory zone (ZONE_STORAGE), that will be used only

with new non-volatile system calls (nvmalloc(), nvfree()). In turn, the �le system

uses these new system call to manage memory allocation of its structures into

persistent memory zone. Following these concepts, the manager is the standard

Linux memory manager, and the translator is the �le system, that uses directly

the allocated memory. Another major strength is the high integration with exist-

ing virtual memory hardware infrastructure: each structure used in SCMFS uses

extensively the virtual memory concept, and has been engineered to adapt easily

to current page tables, TLBs, and CPU caches. For example, each �le is seen

as a �at address range of the virtual address space starting from zero to a maxi-

mum address: this range is then remapped into a non-contiguous set of physical

90

memory, as it happens normally with application heap and stack. The whole �le

system space is managed into a range of virtual addresses. �Superpages� are used

to avoid excessive use of space in TLB. preallocation is used to save valuable time

in complex memory allocation procedures. Like BPFS, SCMFS also relies on some

guarantees about ordering of memory instructions, but contrary to BPFS, it uses

the slower (but viable) approach of mfence and cache �ushing. In this case, the

hardware modi�cation proposed by researchers from Microsoft would be of great

help. It is claimed that this operation is performed each and every time a critical

information is changed in order to achieve a good consistency enforcement. How-

ever, since the section about consistency is only just brie�y depicted, it should be

further investigated if these consistency guarantees would be su�cient when �in

production�. Another interesting feature of this approach is the need of a garbage

collection facility: since each �le receives a �big� virtual address range, under stress

circumstances it could be necessary to manage fragmentation (too many holes of

unused virtual addresses). This need is similar in �le system as that proposed

into [122]. Despite the depth of the proposal, some areas seem to remain quite

opaque: there is no description of the implementation of the I/O system calls and

of the much likely page cache bypass, as nor there is a source code made avail-

able on Internet. Moreover, while the focus on the extensive reutilization of the

virtual memory infrastructure can suggest that memory protection is enforced,

nonetheless this topic lacks.

2.3.7 Further steps

Other strategies to take advantage of non-volatile memories

Until now, the classical storage dichotomy between data used by processes (heap,

stack) and storage (�les), has been respected. When they reserve a memory portion

for �le storage, �le systems use it exclusively for �le storage. Anyway, as persistent

memory is byte addressable as just DRAM is and its use through a �le system is

not the only �pattern� of use. Rather, as long as persistence awareness is up to

the operating system only, �le systems are the only way to allow applications

to use persistence seamlessly. In fact, �le system are fundamental to permit to

applications to be executed correctly, as interaction with �le system is embedded

91

into a plethora of programs.

However, an operating system could use persistent memory for itself, instead

of servicing applications. Applications would thus only bene�t from a better job

done by the operating system thanks to persistent memory, instead of using it

directly, though unconsciously. An operating system could therefore:

- use persistent memory as DRAM extension to avoid the swapping of user

virtual memory pages or to store part of its memory structures when DRAM

runs low, or both;

- use persistent memory to store a part of its data structures persistently, in

order to boost boot-up, reboots and restore cycles after power failures.

Concerning DRAM extension, this operation would expose data moved from

DRAM to the persistent memory to the same security issues remarked before.

However, it could be remarked that, in case of swapping, the same issues arise when

process data is moved to hard disks. In literature, a potential use of persistent

memory as DRAM extension has been proposed into [53] and in [38].

The second approach would use persistent memory to store (persistently, not

as DRAM replacement) a part of the data structures used for the operating sys-

tem execution. This approach somehow anticipates issues presented in the next

paragraphs when the proposals to expose persistence to applications will be pre-

sented. Here it should be remarked only that this approach, though having a great

appeal, brings with it many potential programming bugs. Whilst risks will be an-

alyzed subsequently along with persistence into applications, the assumption that

is made here is that kernel code is expected to be safe. Correctness, safety and

quality of kernel code is achievable much more easily than in user applications.

The exploitation of persistent memory to decrease boot and restore times still has

not been studied thoroughly, and scienti�c literature about this topic is little. The

issues about boot and recovery time have been met before in the WSP approach,

but it had limitations about real exploitation of persistence. A better approach

could be the use of a mixed strategy for boot and recovery: strategic structures

are recreated at each boot to preserve system health across reboots (and to exploit

volatility too), while other structures and data can be left into persistent memory,

92

ready to be used. The work needed at each startup would thus be less, saving time

and increasing responsiveness. A study about this topic is presented in [41].

A step further: integration

If the approaches just shown were only alternatives to those about �le systems, it

would be very likely that the choice of operating system developers would reward

�le systems: a fast �le system is indeed appealing. However, those approaches

are not alternatives: although integration would be a complex task, memory ar-

chitecture would permit to use persistent memory both with a �le system and

for the other uses just shown, all at the same time. Just as memory is divided

into physical pages, and each physical page can be owned by di�erent processes

(through virtual address ranges), similarly, some pages could be used for the �le

system, whereas the others could be used for the operating system's advantage.

Such an architecture would permit to an operating system, when booting, to exe-

cute the kernel image immediately from the persistent memory, and then load the

persistent �le system at boot time.

Researchers still have not delved this topic: it seems an area needing further

investigations. To my knowledge, [60] is the only attempt of modelization of

integration. A key aspect of this topic is that, in order to use persistent memory

following the integration concept, there must be a facility that somehow allocates

persistent memory dynamically. Such a facility would manage requests from the

�le system and from the kernel as well as garbage collection (if needed), and

other related activities. A �rst approach to integration is made implicitly in the

SCMFS proposal: they modi�ed the standard Linux memory allocator to add the

nvmalloc() and nvfree() new system calls. This can indeed be one of the viable

ways to achieve integration: the two system calls can permit dynamic allocation

of persistent memory. Besides the fact that those system calls in SCMFS are used

exclusively by the �le system, the kernel itself could take advantage of them. In

[60] a di�erent choice is proposed: the management of persistent space should be

up to the �le system, that must then assign memory to the standard memory

manager upon request. Following Oikawa's proposal, the use of persistent storage

is for temporary DRAM substitution, not for kernel structure persistence. Oikawa

93

models the access management in three viable alternatives: directly, indirectly and

through mmap() operations. With the direct method, the Linux memory manager

accesses directly the data structures of the �le system to obtain memory pages; on

the other hand, the indirect method uses a special �le into the �le system to be

used by DRAM.

Issues about persistence-awareness in operating systems

Some �nal remarks are concluding this part, where persistence is exposed at the

operating system level. Articles which propose �le system services to exploit

persistence are the majority, and the e�orts made by researchers are extensive.

Nonetheless, achievements on this subject are, while promising, still experimental.

Software solutions o�ering a complete set of features still do not exist. The same

situation is re�ected by the level of persistence awareness of current operating

systems: an operating system e�ectively persistence-aware is still far to come. In-

deed, each of the software big players is currently investing in research and e�orts

about future exploitation of new memory technologies, but these are still intents.

Moreover, research has to be broadened and deepened in order to achieve products

suitable to real �in production� scenarios: unfortunately, still many issues are not

addressed properly or, worse, not even yet considered. For example, one issue still

centered on �le systems is the fact that the �RRR� approach is still underutilized:

this approach is only rarely used into the articles analyzed. Surely, refactoring is

a concern for the DAX implementors, but the other proposals do not cite the need

of analyzing thoroughly the e�ciency of the software stack. The risk, for example,

is that kernel execution could be over-used, thus incrementing the needs of expen-

sive context switches. Another issue is the fact that the discussion, until now, has

been entirely focused on systems working practically stand-alone, whereas concrete

needs are actually di�erent: the paradigms just seen must be proven to behave

properly also in distributed and in highly replicated environments, such as those of

big data centers. A �rst e�ort in this direction is presented in [147], but this topic

should still be faced in depth by researchers. Moreover, researchers must adapt

the architectures conceived for persistent memories to modern computing trends:

virtualization, multi-core architectures, concurrent and highly parallel computa-

94

tions, and so on. This branch of research in computer science will have to mature

in time: this process will be eventually stimulated by the e�ective release of the

new memory technologies on the market.

2.4 Storage Class Memory and applications

A topic related with those that have just been shown is the level of awareness

of persistence into applications, and, in particular, the level of awareness about

persistent-memory devices. Since operating systems are the foundations on which

applications are built, many researchers have wondered if the presence of persistent

memories could be exposed to applications. As it ordinarily happens, applications

use their own virtual addresses to perform their tasks: it is therefore natural to

question whether applications could use directly persistent memories as they do

with standard DRAM.

Persistence referred to applications is not a novelty, as applications have always

used �les and folders to manage persistent data. Moreover, the topic about the

generic concept of persistence in applications is a vast research domain: in the years

many e�orts have been made to permit the seamless use of persistent data structure

into applications. In particular, researchers concentrated their works to permit to

object-oriented programming languages the use of persistent object through the

use of database back-end. Some examples are ObjectStore [44], Thor [47], and

the Java Persistence Api (JPA) [81, 106]. Other examples referring to persistence

into applications in general [7] and in Java in particular [6] arise from the work of

researchers from the Glasgow University during the 90s. Such e�orts were inspired

principally by the fact that data structures used in programming languages are

badly suited to how persistence is managed by �le systems, and applications usually

perform expensive adaptation management (an example is the complex process of

serialization): a direct use of persistent data structures and persistent objects

into programming languages would be much easier. However, the use of database

back-ends conveys some complex mapping issues. Researchers think that the new

memory technologies could potentially remove those complexities, by removing

both the need of a database back-end and the need of serialization: the topic thus

moves from persistence in general to persistence into main memory, also in the

95

context of user applications.

Exposing storage class memory to applications sounds attractive also for other

reasons: if applications could directly address persistent-memory devices, they

could use their full power without intermediation, thus removing unnecessary over-

heads. In turn, this approach could relief applications from the burden of relying

on the kernel to execute functions related to persistence (through a �le system):

this would permit high savings in terms of latency and energy. As naturally ap-

plications use memory in terms of memory instructions, no translation would be

needed, increasing the e�ectiveness of the approach. Moreover, this strategy would

allow to programmers to �nally use all the researches and studies made in the past

decades to optimize in-memory data structures: until now, as persistence has been

always relegated to slow devices, such slowness has heavily discouraged the direct

use of these data structures. This new approach to persistence would allow to pro-

grammers to use highly e�cient data structures also when coping with persistence.

While the bene�ts could be many, the issues would be many too. A �rst

observation roots in the fact that, as previously shown, the idea of using SCM as

if it was just a slower DRAM has many contraindications: in order to be properly

used, SCM should be used consciously. Moreover, a part of this �consciousness� is

achieved through consistency management and enforcing, as persistence is deeply

related with data consistency13: if SCM was exposed to applications, applications

would have to manage consistency issues.

Another observation related with the preceding one refers to the average qual-

ity of code in user applications in comparison with that of kernel code: usually user

applications cannot be trusted as safe, while kernel code is almost safe. This is not

secondary: currently, by relying on �le system services, applications demand to

the operating system all the tasks related to the needed level of consistency. This

approach has the merit of giving the programmer the opportunity to concentrate

just on its main goal: the development of a functional application. Contrariwise,

exposing persistence to applications would require an increase of the issues that

developers should manage: they should use persistence knowingly. Such an ap-

proach would require extensive code restructuring and rewriting in order to take

pro�t of the potentially available performance gains.

13Consistency permits to persistence to be e�ective.

96

Another important claim made by scholars is that this approach would induce

the typical programming issues to reach the domain of persistence: this event

would yield to a scenario even more complex. For example, the reader would agree

that the management of pointers into programs is central in many programming

languages; the presence of SCM and DRAM together, however, would complicate

their use, and would permit the following pointers:

- non volatile (NV) to NV pointers;

- NV to volatile (V) pointers;

- V to NV pointers;

- V to V pointers.

Clearly, at shutdown, only NV memory areas would survive, exposing the code

that uses unsafe pointers to potential and subtle programming bugs. Moreover,

the risk of dangling pointer, memory leaks, multiple free()s and locking errors

would be present as in every programming language: the risk is however that, in

the absence of appropriate checks, such errors could persist in time, thus becoming

persistent errors.

Two competing university research groups have proposed, around 2011, two dif-

ferent approaches to expose SCM to applications: NV-Heaps [22] and Mnemosyne

[135]. These two papers are considered among scholars as the two reference works

about persistent-memory exploiting in applications: this fact is proved by the

numerous citations that both the papers have in literature.

While the two proposals are quite di�erent from each other, the goals that the

two research group had were nonetheless quite similar: their intent was that of

building a framework for programming languages able to permit to applications

a safe use of persistence into main memory. A major goal undertaken by both

the research group has been that of guaranteeing high consistency and, in the

meanwhile, high performances: the former goal would permit to relief programmer

from the di�cult and error-prone explicit management of consistency, while the

latter one is necessary to use conveniently persistent-memories without sacri�cing

excessively their high performances. Here follows a brief presentations of both the

approaches:

97

NV-Heaps: this proposal presents itself as the most complete. The group that

developed it had the intent to address most of the major problems that spring

from SCM exposition in applications. As it will soon showed, however, this

intent of completeness is payed at cost of generality. NV-Heaps consists in a

C++ library built upon a Linux kernel, and it's focused on �allowing the use

of persistent, user-de�ned object [. . . ] as an attractive abstraction for work-

ing with non-volatile program state�. The system requirements are a XIP �le

system, along with cache epochs14. The services o�ered by their library are

extensive: pointer safety through referential integrity, �exible ACID trans-

actions, a familiar interface (using C++ common syntax), high performance

and high scalability. Each NV-Heap represents a sort of persistent domain

for an application, where only safe pointers can be used: extra NV to NV and

NV to V pointers are avoided. Moreover, the library, through transactions,

permits the correct storage of data in time, while preserving performances.

Concurrency related primitives like atomic sections and generational locks

are supplied. Each NV-Heap, �nally, is managed through a �le abstraction:

each heap is completely self contained, �allowing the system to copy, to move

or transmit them just like normal �les�. NV-Heaps are used by applications

by recurring to the library services, using the �le name as a handle: the

library, then, in co-operation with the kernel, executes the mmap() through

the XIP functionality, mapping the application virtual address space with

the e�ective persistent memory area used by the NV-Heap. Interaction with

kernel is used only when strictly necessary.

Mnemosyne: this proposal o�ers fewer features than the preceding one, but this

simplicity preserves its generality. Mnemosyne too is developed as a library

that o�ers to user mode programs the ability to use safely persistent-memory.

The design goals that the developers decided to follow were respectively:

the prior need to maximize user-mode accesses to persistence, the need to

implement consistent updates and the need to use conventional hardware.

Mnemosyne is therefore developed as a low-level interface, similar to C, that

14This proposal has been developed by the same people that proposed epochs for the BPFS�le system (see section 2.3.6).

98

provides:

- persistent memory regions, allocatable either statically or dynamically

with a pmalloc() instruction similar to malloc();

- persistence primitives to consistently update data;

- durable memory transactions.

Persistent memory regions are managed simply by extending the Linux func-

tionality to manage memory regions, quite similarly to how the kernel is

modi�ed in the SCMFS proposal (see section 2.3.6). Consistent updates are

o�ered through single variable updates, append updates, shadow updates

and in place updates, whereas the implemented persistence primitives con-

sist in a persistent heap and in a log facility. Write ordering is achieved

through the easier approach (mfence and �ush). Finally, transactions are

o�ered through a compiler facility that permits to convert common C/C++

code in transactions.

Each of the two approaches presented above has its strength: while the NV-Heaps

is more thorough, the Mnemosyne one is more general. From the point of view

of generality, the former approach is somehow problematic, as it speci�cally relies

on the C++ programming model. Perhaps a more general model as the latter is

preferable: its services could be used to build, upon it, more specialized libraries

aimed each to serve di�erent programming languages; in such a way, further levels

of consistency could be o�ered. For example, referential integrity in the former

approach is managed through variable overloading, but this feature is not common

to all the programming languages. Perhaps a layered approach would be better

customizable in order to achieve a better �ne-grained level of service.

Trying to look at the eventual weaknesses of these approaches, the most remark-

able one is probably the one related to compatibility with current applications. If

this approach was the only mean to exploit SCM, applications would need to be

re-written or re-structured in order to bene�t from such improvements. Without

these modi�cations, otherwise, applications would continue to use volatile memory

as they always did.

99

While this topic has been just only sketched, it is indeed a valuable part of the

studies about persistence, and would be worthy of further investigations. As a last

remark about these approaches, I have the feeling that these works represent just

the �rst steps on a long path: these proposals remain somehow limited as they

address a part only of the approach to persistent memories, that should be, �nally,

when the times will be mature, a complete approach.

Conclusions

In the �rst chapter, current memories have been presented, along with their limits,

the economical and the technical ones. Afterwards, the new persistent memories

have been introduced, with some details about their internals. Then, my study

moved from the devices, to operating systems: the aim has been that of under-

standing at best the extent of the changes that operating systems should adopt

in order to use at best the new devices. Actually, these devices represent a real

disruptive change in the �eld of computer memories, probably the most notable of

the last decades: if exploited at their full potential, they surely would change the

way in which today storage is conceived. To reach this appealing goal, however,

operating system should undergo a deep restructuring: the approaches seen in the

preceding chapter are just the �rst steps in this direction. Even if those approaches

have been experimental attempts, all the measurements that researchers made to

verify the e�ective performances of their proposals con�rmed the high potential

revenue in terms of latency, throughput and other performance metrics: results

were indeed promising in each of them.

With the aim of reaching the conclusion of this work, my last intent is to

leave here some personal considerations about the future work that researchers

and developers will probably have to do in the next years to prepare operating to

persistent memories.

My personal convincement is that the e�orts that still must be made to achieve

a real persistent-memory awareness in operating systems are many: while each of

the proposals here presented is a concrete step toward it, the reach of the goal is

still far and, up to now, such an operating system doesn't exist yet. As these new

memories are expected to reach a �rst degree of maturity by the next decade, this

101

time window could permit to the scienti�c community to have a reasonable period

to prepare this transition also into operating systems.

A complete solution

While each of the approaches that have been described tries to manage a subset

of the potential bene�ts that persistent memories could o�er, what still is lacking

is a complete solution. Even if persistence awareness is valuable when achieved

through a fast �le system or else, when achieved through application awareness,

it would be even better if users or developers would have not to choose either

one, but if they could bene�t both from the former and from the latter. I think

that this should be one of the long term goals in operating system research: the

implementation of a complete SCM solution.

In the context of a stand-alone system, a complete approach should thus allow

the seamless use of persistent memories through:

- �le system services;

- kernel persistent objects and data structures;

- application persistent objects and data structures.

Anyway, these services, if conceived to remain stand-alone, would result to be

almost useless. Indeed, to reach an exhaustive level of completeness, a correct

approach to persistence should also:

- be highly scalable;

- �t distributed environments;

- �t virtualized and cloud environments;

- adapt easily to new hardware architectures;

- behave adaptively depending on the scale in which it would be used and on

the metrics on which performances would be measured.

102

Without doubt, these �wishes� represent, in their entirety, a tough target: there is

enough material for many and many years of research in operating systems. I have

nonetheless the feeling that slowly, one step at the time, many research domains

in computer science are converging. Perhaps, at the right time, the knowledge

reached in each of them would represent the �critical mass� that would permit

to obtain a complete product, able to exploit thoroughly the new storage class

memories.

Converging research domains

In particular, from what I have read for this work, research about persistence

and persistent memories, is increasingly related to some other research domains of

computer science. These other areas can contribute to pursue a complete approach

in order to exploit persistent memories and, in the meanwhile, they represent

in�uencing factors of how this goal is going to be achieved:

Changing hardware architectures: not only the memory panorama is chang-

ing; also current hardware architectures are experimenting a slow but con-

tinuous change, evolving toward platforms that use many cores, eventually

di�erent from each other: operating system design is trying to follow these

trends [10, 116]. Some new hardware approaches are being developed [103],

and new operating systems, in the case that these new technologies should

succeed, would have to adapt to these new architectures. Moreover, it is

much likely that new computing architectures will be developed taking into

account the recent achievement in memory technology: such e�orts would

thus represent a further opportunity for the search of a thorough persistence-

awareness.

Database systems: the �eld of databases is quickly approaching main memories;

as underlined before, the use of the entire dataset in memory is not new, but

this trend is increasing with the use of �No-SQL� database systems, like

the key-value stores used in distributed caching systems. The knowledge

gained in database systems has been proven to be fundamental in managing

consistency requirements into the approaches to persistence just seen, and it

103

is likely that each further achievement related to in-memory databases, could

be used also in the context of persistent memories. Database paradigms as

the key-value one, have already been hypothesized to be used to exploit

persistent memories [8].

Distributed systems: the need to scale software to big sizes, as those needed

into data centers, already motivated the use of distributed paradigms to

both databases and storage systems. The research in this scienti�c branch

is further advancing: currently, e�orts as those of RAMCloud, represent an

interesting approach to storage systems using only main memory [112, 128].

These e�orts too could represent a valuable and useful contribution to a

further exploiting of persistent memories.

File system design: while the log-structured approach has already been cited,

researchers have proposed to use this approach also when managing DRAM

[123]. These experiences could then be used also for persistent memories.

Transactional memories: transactional memory represents an important re-

search �eld in computer science; in the past, this approach and, more gen-

erally, the need to use implicitly consistency when using main memory, has

been deeply investigated [34, 33]; the knowledge gained in this �eld has been

used many times in the approaches previously presented and, much likely,

each new achievement in this �eld could have the potential to in�uence also

the research about persistent memories and their use in operating systems.

A futuristic operating system

Thinking of a hypothetical next-generation operating system, I would represent it

as being built similarly to current hypervisors used to achieve virtualization. The

storage facility would be an important piece of the hypervisor, and would be the

part conceived to use persistent memory. This hypotetical storage facility would:

- behave as a database, using extensively the fast key-value paradigm: such a

management of its data could then permit the use of variable sizes of data into

distributed environments, abandoning the use of �xed-size blocks. Moreover,

104

a database-like behavior would permit the storage facility to be used as

a service, used by many software layers: operating systems, �le systems,

applications, and so on. Such an approach would permit the transparent

move of the stored data as needed, permitting thus, for example, replication,

scaling, caching. The most fascinating hypotheses would be the ability to

scale a local heap (eventually persistent) from a local process to a distributed

one, in order to permit the use of that data concurrently from, for example,

a server cluster in a data center.

- perform data allocation following a log-structured pattern: this would permit

to manage easily the memory wearing; such a pattern seems to be well suited

to be used together with key-value databases. Data allocation should permit

the concurrent use of many di�erent services (persistent objects, �le systems,

kernel data, and so on);

- use a highly e�cient snapshot facility similar to that used in the WAFL �le

system, or similar to that proposed in [8].

- use transactions and ACID semantics to guarantee reliability at the highest

levels, in order to permit the usage in �production� environments;

- implement the services necessary to �t to multiple distributed environments,

using adaptive technologies that change behavior depending on performances,

tra�c and other metrics.

Final salutation

Despite these personal thoughts and whatever the future will hold about mem-

ory technologies and operating systems, I hope that my work can prove to be a

tool to help the understanding of the topic about persistent-memory awareness in

operating systems.

Appendix A

Asides

A.1 General

ITRS - International Technology Roadmap for Semiconduc-

tors

ITRS, acronym of International Technology Roadmap for Semiconductors is an

international organization built on the ashes of the former born United States'

national organization NTRS, National Technology Roadmap for Semiconductors.

ITRS is currently sponsored by the �ve leading chip manufacturing regions in

the world: Europe, Japan, Korea, Taiwan, United States. The sponsoring or-

ganizations are the semiconductor industry associations of each of those regions:

ESIA, JEITA, KSIA, TSIA and SIA1. Its aim is to help the semiconductor in-

dustry as a whole to maintain its pro�tability, o�ering, among other services, the

following: produce every two years a thorough report about the semiconductor

industry status and its roadmap to maintain the exponential growth. This in

particular is a key document drafted by a international committee of scientists

and technologists, conveying the most exhaustive and accurate assessment on the

semiconductor industry and promoting a deep and vast analysis e�ort on current

and future semiconductor technologies

1Respectively, European Semiconductor Industry Association, Japan Electronics and Infor-mation Technology industries Association, Korea Semiconductor Industry Association, TaiwanSemiconductor Industry Association and Semiconductor Industry Association

106

A.2 Physics and Semiconductors

Ferroelectricity

Property of the matter, usually noticed on some crystal structure materials: these

materials are able to be electrically polarized under the e�ect of an electric �eld,

maintain the polarization when the electric �eld ceases and to reverse (or change)

the polarization if the electric �eld reverses (or changes). Ferroelectricity discovery

roots on studies of piroelectric and piezoelectric properties conducted by Pierre and

Paul-Jacques Curie brothers around 1880, and was �rstly noticed as an anomalous

behavior of Rochelle salt in 1894 by F. Pockels (this salt was �rstly separated

in 1655 by Elie Seignette, an apothecary in the town of La Rochelle, France).

Ferroelectricity was then called as such and identi�ed as a speci�c property of the

matter in 1924 by W. F. G. Swann [75].

Ferromagnetism

From Encyclopedia Britannica: �physical phenomenon in which certain electri-

cally uncharged materials strongly attract others. Two materials found in nature,

lodestone (or magnetite, an oxide of iron, Fe3O4) and iron, have the ability to

acquire such attractive powers, and they are often called natural ferromagnets.

They were discovered more than 2,000 years ago, and all early scienti�c studies of

magnetism were conducted on these materials. Today, ferromagnetic materials are

used in a wide variety of devices essential to everyday life�e.g., electric motors

and generators, transformers, telephones, and loudspeakers.

Ferromagnetism is a kind of magnetism that is associated with iron, cobalt,

nickel, and some alloys or compounds containing one or more of these elements. It

also occurs in gadolinium and a few other rare-earth elements. In contrast to other

substances, ferromagnetic materials are magnetized easily, and in strong magnetic

�elds the magnetization approaches a de�nite limit called saturation. When a

�eld is applied and then removed, the magnetization does not return to its origi-

nal value�this phenomenon is referred to as hysteresis. When heated to a certain

temperature called the Curie point, which is di�erent for each substance, ferro-

magnetic materials lose their characteristic properties and cease to be magnetic;

107

however, they become ferromagnetic again on cooling.

The magnetism in ferromagnetic materials is caused by the alignment patterns

of their constituent atoms, which act as elementary electromagnets. Ferromag-

netism is explained by the concept that some species of atoms possess a magnetic

moment�that is, that such an atom itself is an elementary electromagnet pro-

duced by the motion of electrons about its nucleus and by the spin of its electrons

on their own axes. Below the Curie point, atoms that behave as tiny magnets in

ferromagnetic materials spontaneously align themselves. They become oriented in

the same direction, so that their magnetic �elds reinforce each other.

One requirement of a ferromagnetic material is that its atoms or ions have

permanent magnetic moments. The magnetic moment of an atom comes from its

electrons, since the nuclear contribution is negligible. Another requirement for fer-

romagnetism is some kind of interatomic force that keeps the magnetic moments

of many atoms parallel to each other. Without such a force the atoms would be

disordered by thermal agitation, the moments of neighbouring atoms would neu-

tralize each other, and the large magnetic moment characteristic of ferromagnetic

materials would not exist.

There is ample evidence that some atoms or ions have a permanent magnetic

moment that may be pictured as a dipole consisting of a positive, or north, pole

separated from a negative, or south, pole. In ferromagnets, the large coupling

between the atomic magnetic moments leads to some degree of dipole alignment

and hence to a net magnetization.

Since 1950, and particularly since 1960, several ionically bound compounds

have been discovered to be ferromagnetic. Some of these compounds are electri-

cal insulators; others have a conductivity of magnitude typical of semiconductors.

Such compounds include chalcogenides (compounds of oxygen, sulfur, selenium,

or tellurium), halides (compounds of �uorine, chlorine, bromine, or iodine), and

their combinations. The ions with permanent dipole moments in these materials

are manganese, chromium (Cr), and europium (Eu); the others are diamagnetic.

At low temperatures, the rare-earth metals holmium (Ho) and erbium (Er) have a

nonparallel moment arrangement that gives rise to a substantial spontaneous mag-

netization. Some ionic compounds with the spinel crystal structure also possess

ferromagnetic ordering. A di�erent structure leads to a spontaneous magnetization

108

in thulium (Tm) below 32 kelvins (K).�

Mott transition

�Mott transition describes the transition from insulating to metallic state of a

material. It appears if the electron density and therefore the electron screening of

the coulomb potential changes.

Normally we consider a material either to be a metal or an insulator, depending

on the position of the Fermi energy within the band structure. But due to screening

a transition can take place. To understand this we consider an electron in a �nite

quantum well. There is only a �nite number of bound states inside the well. If its

width is decreased all states move up in energy and the highest ones move outside

the well. Therefore the number of bound states decreases until a critical value is

reached. Below this width there are no more bound states. An insulating material

with a certain lattice and long distances between the atoms is considered. If the

atoms are moved closer together the electron density increases, screening of the

coulomb potential appears and the energy levels move up. After a certain point

there are no more bound states for the outer electrons and the material becomes

a metal.� [90].

Tunnel Junctions

As said into Tsymbal and Kohlstedt paper, �The phenomenon of electron tunneling

has been known since the advent of quantum mechanics, but it continues to enrich

our understanding of many �elds of physics, as well as o�ering a route toward

useful devices. A tunnel junction consists of two metal electrodes separated by

a nanometer-thick insulating barrier layer, as was �rst discussed by Frenkel in

1930. Although forbidden by classical physics, an electron is allowed to traverse

a potential barrier that exceeds the electron's energy. The electron therefore has

a �nite probability of being found on the opposite side of the barrier. A famous

example is electron tunneling in superconducting tunnel junctions, discovered by

Giaever, that allowed measurement of important properties of superconductors. In

the 1970s, spin-dependent electron tunneling from ferromagnetic metal electrodes

across an amorphous Al2O3 �lm was observed by Tedrow and Meservey. The latter

109

discovery led Jullière to propose and demonstrate a magnetic tunnel junction in

which the tunneling current depends on the relative magnetization orientation of

the two ferromagnetic electrodes, the phenomenon nowadays known as tunneling

(or junction) magnetoresistance. New kinds of tunnel junctions may be very useful

for various technological applications. For example, magnetic tunnel junctions

have recently attracted considerable interest due to their potential application in

spin-electronic devices such as magnetic �eld sensors and magnetic random access

memories.� [132].

Tunnel junctions are thus electronic compounds built with di�erent layers of

potentially di�erent materials, acting as resistive switching elements, containing

at least a �tunnel barrier� element. The �tunnel� term refers to the mechanics of

the electron passage into the tunnel barrier element: electron passage is by means

of direct tunneling as studied in quantum mechanics. The e�ective resistive switch

depends on the underlying physical principle, but the e�ect is the modulation and

the change in the electronic potential barrier between layers, resulting in changes

in resistivity of the tunneling layer(s).

Ferromagnetic Tunnel Junctions

A magnetic tunnel junction consists in a sandwich of two magnetic material layers

separated by a thin barrier. One of the two magnetic layers has a �xed magnetic

polarization (�xed layer), whereas in the ferromagnetic layer (free layer) magneti-

zation can be switched. Di�erent magnetic polarization in the free layer interacts

with the polarization of the �xed layer, changing the resistance into the tunneling

layer by means of Giant Magneto-Resistive e�ect.

Ferroelectric Tunnel Junctions

Once again, as said into Tsymbal and Kohlstedt paper, �Yet another concept is the

ferroelectric tunnel junction (FTJ), which takes advantage of a ferroelectric as the

barrier material. Ferroelectrics possess a spontaneous electric polarization that

can be switched by an applied electric �eld. This adds a new functional property

to a tunnel junction, which may lead to novel, yet undiscovered electronic devices

based on FTJs. The discovery of ferroelectricity goes back to 1921 �approximately

110

when the principles of quantum mechanical electron tunneling were formulated.

The basic idea of a FTJ (called a polar switch at that time) was formulated in

1971 by Esaki et al. Owing to a reversible electric polarization, FTJs are expected

to have current-voltage characteristics di�erent from those of conventional tunnel

junctions. The electric �eld�induced polarization reversal of a ferroelectric barrier

may have a profound e�ect on the conductance of a FTJ, leading to resistive

switching when the magnitude of the applied �eld equals that of the coercive

�eld of the ferroelectric. Indeed, the polarization reversal alters the sign of the

polarization charges at a barrier-electrode interface.�

Each ferroelectric tunnel junction is thus a device in which two electrodes sand-

wich a tunnel barrier with ferroelectric properties [28]. The electric polarization of

the barrier can be switched by means of opposite electrical �eld (su�ciently high

to reach the coercitive �eld of the ferroelectric), causing a change into electronic

potential barrier, in turn triggering a di�erent conductance by means of Giant

Electroresistance e�ect [149, 125].

Field E�ect Transistor

Transistors are foundamental semiconductive units featuring three electrodes. A

potential di�erence applied on one electrode (the gate) in�uences the passage of a

current between the other two electrodes (source and drain). Transistors are used

either as switches or as ampli�ers. There are two main types of transistors:

- bipolar junction transistors

- �eld e�ect transistors

A typical �eld e�ect transistor (FET) schema is shown on �gure A.1 [57, p. 247].

If a potential di�erence applied to G is less than threshold, there is no conduct-

ing channel between S and D; vice versa, a conductive channel establishes between

S and D, allowing thus current passage.

Memristor

In 1971 Dr. Leonard Chua hypothesized the existence of these devices, a fourth

basic type of electrical devices along with resistors, capacitors and inductors [21].

111

Figure A.1: Field e�ect transistor, perspective (a) and front (b)© 2001 The McGraw Companies

Fascinating studies on memristance have been undertaken since Chua's hypothesis,

because this type of devices could permit to change the computing paradigm:

networks of memristors can supersede transistors into a processor functional units

and can be used to build a computing paradigm based on neural networks [140].

Researchers are claiming that �nally, the new persistent memories using the 2-

terminal con�guration, are full-�edged memristors if their switching mechanics is

implicitly embedded inside them, as it is the case of redox memory cells [127].

A.3 Operating systems

Hardware technology in�uences �le system design

As already stated, �le systems are one of the sources of added latency and reduced

throughput. Moreover, it has been said that this impact di�ers among �le systems,

depending on their internal design.

File systems are software components designed to serialize and maintain data

persistently into a persistent memory device. However, for more than �fty years,

these �persistent memory devices� have always been identi�ed just as hard disks.

Flash, albeit appeared in 1984, is still considered like a sort of �new comer�. During

this long time, �le systems necessarily have adapted to the features of hard disks:

as they needed to enforce a safe endurance of data in time, this goal can be achieved

more incisively only when the internals of the memory medium are exploited (or,

112

at least, known and taken into account). An example is the transition from the

old traditional Unix �le system (the one developed in Bell Labs) to the newer Unix

Fast File System, then called UFS [51]:

- the �new� �le system did distribute inodes throughout the disk near data

blocks they pointed to, in order to reduce drastically seek time and the need

to execute random reads;

- the �new� �le system was organized into cylinder groups: one of the e�ects

was the added redundancy to replicate the superblock in such a way that

it was distributed between cylinders and platters too (to obtain a better

resiliency upon a single platter failure). It is apparent that it was thoroughly

taken into account the physical structure of hard disks.

The same adaptation on the physical features of the memory media happened

when Flash memories became widespread: many �le systems have been speci�cally

designed for Flash memories (JFFS, YAFFS, YAFFS2, UBIFS, and so on).

As an aside, it could be interesting to note that also in the case of common SSDs

or common Flash USB sticks, even if the internal architecture is hidden, di�erent

�le system settings can change performances because they adapt better or worse

to the underlying architecture: this is the case of block sizes of �le systems. Some

�le system block sizes are well suited to the dimension of the erase size and of the

internal block size of the Flash chips, whereas some other are not [87, 92].

These observations could easily explicate the reason why ext3 performs so badly

when used with fast SSDs

Appendix B

Tables

RankSource

Gartner [74] IEEE [80]

1 Computing Everywhere Wearable devices2 The Internet of Things (IoT) Internet of Anything3 3D Printing Security into software design4 Advanced, Pervasive, Invisible Analytics Software-de�ned Anything (SDx)5 Context-Rich Systems Cloud security and privacy concerns grow6 Smart Machines 3D Printing7 Cloud/Client Architecture Predictive Analytics8 Software-De�ned Infrastructure and Applications Embedded Computing security9 Web-Scale IT Augmented Reality Applications10 Risk-Based Security and Self-Protection Smartphones: new opportunities for Digital Health

Table B.1: Top 10 technology trends for 2015

114

Param

eters

Baselin

ePrototy

pical

Emergin

g

DRAM

NANDFlash

FeR

AM

STT-M

RAM

PCM

Redox

mem

oriesFTJ

ECM

VCM

TCM

BNF

Featu

resize

(nm)

3616

18065

4520

535

4050

Cell

area6

422

204

44

44

n/a

Read

latency

(ns)

<10

10040

3512

n/a

n/a

n/a

n/a

n/a

Write/E

raselaten

cy(ns)

<10

100/100065

35100

<1

<1

10/510

10Enduran

ce>10

16

105

1014

>10

12

109

1010

1012

106

106

104

Data

retention

64ms

10y10y

>10y

>10y

>10y

>10y

>10y

4h3d

Write

energy

(fJ)

40.4

302500

60001000/8000

115/<1000

n/a

100010

TableB.2:Perfo

rmance

compariso

nbetw

eenmem

ories

Figu

rescom

efrom

ITRS2013

Emergin

gResearch

Devices

tables

ERD3,

ERD4a

andERD4b

[82].Figu

resabout

pow

ercon

sumption

ofDRAM

andFlash

could

contain

someprob

lemsrelative

tohow

thevalu

esare

calculated

(tableERD3).

115

Technology - operation Latency (µs) 4K bit per bit (µs) 4K 64 bit (µs)

PCM - Write 0.1 409.6 51.2PCM - Read 0.012 49.152 6.144Emerging memory - Write 0.02 81.92 10.24Emerging memory - Read 0.005 20.48 2.56

Table B.3: 4K Transfer times with PCM and other memories

116

Bus

Year

Tran

sfers/sbit/tran

sferPayload

Bustran

sfertim

eRead

Read

64bit

Write

Write

64bit

SATAIII

20086G

.8600M

B/s

6.83µs

7.2x0.9x

59.97x7.5x

PCIExpress

gen3

20108G

.98985M

B/s

4.16µs

11.82x1.48x

98.46x12.31x

DDR3

20051333M

6410.6G

B/s

0.39µs

126.03x15.75x

1050.26x131.28x

Intel

QPI

20076.4G

1612.8G

B/s

0.32µs

153.6x19.2x

1280x160x

TableB.4:Buslaten

cycompariso

n

Bustran

sfertim

eisthetim

eelap

sedwhen

transferrin

g4K

usin

gthetheoretical

speed

ofthebus.

Thelast

fourcolu

mnsare

ratios:bustran

sfertim

e/mem

oryspeed

.Mem

oryspeed

sare

those

fromtab

leB.3.

Bus

Bustran

sfertim

eHDDLaten

cy(1000µ

s/bus)

SSDLaten

cy(100µ

s/bus)

SATAIII

6.83µs

146x14.64x

PCIExpress

gen3

4.16µs

240x24x

DDR3

0.39µs

2564x256x

Intel

QPI

0.32µs

3125x312x

TableB.5:HDDspeed

vsbustheoretica

lspeed

117

Approach

esRequired

Option

al

Nam

eType

Manager

Tran

slatorE�cien

cySafety

Consisten

cyIntegration

Linux

Std

FS+DAX

Block

driver

Nocach

eDriver

File

system

No

Quill

S.FS+XIP

+library

Block

driver

Nocach

eDriver

File

system

No

BPFS

SCM

FS

n/a

FStune

n/a

Yes

No

PRAMFS

SCM

FS

File

system

(clue)

FStune

Yes

No

No

PMFS

SCM

FS

File

system

FStune

Yes

Yes

No

SCMFS

SCM

FS

Mem

orymgr

File

system

n/a

n/a

Yes

Viab

le:nvmallo

c,nvfree

TableB.6:Persisten

ceaw

aren

essthrough�lesystem

s

LinuxandQuill

aredevelop

edto

use

only

standard

�lesystem

swith

minimal

change

(DAX

andXIP

complian

ceresp

ectively).

SCM

FS:storage

classmem

ory�lesystem

,i.e.

develop

ershave

built

a�lesystem

speci�

callysuited

aroundtheSCMsfeatu

res.Such

speci�

cdesign

usually

re�ects

toe�

ciency:�lesystem

arebuilt

touse

e�cien

tlySCMs;in

these

cases,cach

eavoid

ance

isgiven

forgran

ted.

Bibliography

[1] Ameen Akel et al. �Onyx: A Protoype Phase Change Memory Storage Ar-

ray�. In: Proceedings of the 3rd USENIX Conference on Hot Topics in

Storage and File Systems. HotStorage'11. Portland, OR: USENIX Asso-

ciation, 2011, pp. 2�2. url: http://dl.acm.org/citation.cfm?id=2002

218.2002220.

[2] Louis Alex, Eisner Todor, and Mollov Steven Swanson. Quill: Exploit-

ing Fast Non-Volatile Memory by Transparently Bypassing the File System.

2013.

[3] Ross Anderson and Markus Kuhn. �Tamper Resistance: A Cautionary

Note�. In: Proceedings of the 2Nd Conference on Proceedings of the Sec-

ond USENIX Workshop on Electronic Commerce - Volume 2. WOEC'96.

Oakland, California: USENIX Association, 1996, pp. 1�1. url: http:

//dl.acm.org/citation.cfm?id=1267167.1267168.

[4] Dmytro Apalkov et al. �Spin-transfer Torque Magnetic Random Access

Memory (STT-MRAM)�. in: J. Emerg. Technol. Comput. Syst. 9.2 (2013),

pp. 13�1. url: http://doi.acm.org/10.1145/2463585.2463589.

[5] Wolfgang Arden et al. More-than-Moore. Tech. rep. ITRS, 2010. url: ht

tp://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%

20Links/2010ITRS/IRC-ITRS-MtM-v2%203.pdf.

[6] M. P. Atkinson et al. �An Orthogonally Persistent Java�. In: SIGMOD

Rec. 25.4 (1996), pp. 68�75. url: http://doi.acm.org/10.1145/245882

.245905.

[7] Malcolm Atkinson and Ronald Morrison. �Orthogonally Persistent Object

http://dl.acm.org/citation.cfm?id=2002218.2002220




http://doi.acm.org/10.1145/2463585.2463589

http://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%20Links/2010ITRS/IRC-ITRS-MtM-v2%203.pdf



http://doi.acm.org/10.1145/245882.245905

http://doi.acm.org/10.1145/245882.245905

119

Systems�. In: The VLDB Journal 4.3 (1995), pp. 319�402. url: http:

//dl.acm.org/citation.cfm?id=615224.615226.

[8] Katelin A. Bailey et al. �Exploring Storage Class Memory with Key Value

Stores�. In: Proceedings of the 1st Workshop on Interactions of NVM/FLASH

with Operating Systems and Workloads. INFLOW '13. Farmington, Penn-

sylvania: ACM, 2013, pp. 4�1. url: http://doi.acm.org/10.1145/2527

792.2527799.

[9] Mary Baker et al. �Non-volatile Memory for Fast, Reliable File Systems�.

In: Proceedings of the Fifth International Conference on Architectural Sup-

port for Programming Languages and Operating Systems. ASPLOS V.

Boston, Massachusetts, USA: ACM, 1992, pp. 10�22. url: http://do

i.acm.org/10.1145/143365.143380.

[10] Andrew Baumann et al. �The Multikernel: A New OS Architecture for

Scalable Multicore Systems�. In: Proceedings of the ACM SIGOPS 22Nd

Symposium on Operating Systems Principles. SOSP '09. Big Sky, Montana,

USA: ACM, 2009, pp. 29�44. url: http://doi.acm.org/10.1145/1629

575.1629579.

[11] Tony Benavides et al. �The Enabling of an Execute-In-Place Architecture

to Reduce the Embedded System Memory Footprint and Boot Time�. In:

JCP 3.1 (2008), pp. 79�89. url: http://dx.doi.org/10.4304/jcp.3.1

.79-89.

[12] Keren Bergman et al. ExaScale Computing Study: Technology Challenges

in Achieving Exascale Systems Peter Kogge, Editor & Study Lead. 2008.

[13] R. Bez et al. �Introduction to �ash memory�. In: Proceedings of the IEEE

91.4 (2003), pp. 489�502. url: http://dx.doi.org/10.1109/JPROC.200

3.811702.

[14] Tim R. Bird. �Methods to Improve Bootup Time in Linux�. In: Proceedings

of the Linux Symposium 2004. Vol. I. 2004, pp. 79�88.

[15] Julien Borghetti et al. �Memristive switches enable stateful logic operations

via material implication�. In: Nature 464.7290 (2010), pp. 873�876. url:

http://dx.doi.org/10.1038/nature08940.



http://doi.acm.org/10.1145/2527792.2527799

http://doi.acm.org/10.1145/2527792.2527799

http://doi.acm.org/10.1145/143365.143380

http://doi.acm.org/10.1145/143365.143380

http://doi.acm.org/10.1145/1629575.1629579

http://doi.acm.org/10.1145/1629575.1629579

http://dx.doi.org/10.4304/jcp.3.1.79-89

http://dx.doi.org/10.4304/jcp.3.1.79-89

http://dx.doi.org/10.1109/JPROC.2003.811702

http://dx.doi.org/10.1109/JPROC.2003.811702

http://dx.doi.org/10.1038/nature08940

120

[16] Adrian M. Caul�eld et al. �Moneta: A High-Performance Storage Array

Architecture for Next-Generation, Non-volatile Memories�. In: Proceedings

of the 2010 43rd Annual IEEE/ACM International Symposium on Microar-

chitecture. MICRO '43. Washington, DC, USA: IEEE Computer Society,

2010, pp. 385�395. url: http://dx.doi.org/10.1109/MICRO.2010.33.

[17] Understanding the Impact of Emerging Non-Volatile Memories on High-

Performance, IO-Intensive Computing. Proceedings of the 2010 ACM/IEEE

International Conference for High Performance Computing, Networking,

Storage and Analysis. Washington, DC, USA: IEEE Computer Society,

2010, pp. 1�11. url: http://dx.doi.org/10.1109/SC.2010.56.

[18] Ting-Chang Chang et al. �Developments in nanocrystal memory�. In: Ma-

terials Today 14.12 (2011), pp. 608�615. url: http://www.sciencedirec

t.com/science/article/pii/S1369702111703029.

[19] Peter M. Chen et al. �The Rio File Cache: Surviving Operating System

Crashes�. In: Proceedings of the Seventh International Conference on Ar-

chitectural Support for Programming Languages and Operating Systems.

ASPLOS VII. Cambridge, Massachusetts, USA: ACM, 1996, pp. 74�83.

url: http://doi.acm.org/10.1145/237090.237154.

[20] Sangyeun Cho and Hyunjin Lee. �Flip-N-Write: A simple deterministic

technique to improve PRAM write performance, energy and endurance�.

In: Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM Inter-

national Symposium on. 2009, pp. 347�357.

[21] L. O. Chua. �Memristor-The missing circuit element�. In: Circuit Theory,

IEEE Transactions on 18.5 (1971), pp. 507�519. url: http://dx.doi.o

rg/10.1109/TCT.1971.1083337.

[22] NV-Heaps: making persistent objects fast and safe with next-generation,

non-volatile memories. Vol. 39. ACM SIGARCH Computer Architecture

News 1. 2011, pp. 105�118. url: http://dl.acm.org/citation.cfm?i

d=1950380.

[23] Jeremy Condit et al. �Better I/O through byte-addressable, persistent mem-

ory�. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating

http://dx.doi.org/10.1109/MICRO.2010.33

http://dx.doi.org/10.1109/SC.2010.56

http://www.sciencedirect.com/science/article/pii/S1369702111703029


http://doi.acm.org/10.1145/237090.237154

http://dx.doi.org/10.1109/TCT.1971.1083337

http://dx.doi.org/10.1109/TCT.1971.1083337

http://dl.acm.org/citation.cfm?id=1950380


121

systems principles. 2009, pp. 133�146. url: http://dl.acm.org/citat

ion.cfm?id=1629589.

[24] R. H. Dennard. �Technical literature [Reprint of "Field-E�ect Transistor

Memory" (US Patent No. 3,387,286)]�. In: Solid-State Circuits Society

Newsletter, IEEE 13.1 (2008), pp. 17�25. url: http://dx.doi.org/10.

1109/N-SSC.2008.4785686.

[25] Subramanya R. Dulloor et al. �System Software for Persistent Memory�.

In: Proceedings of the Ninth European Conference on Computer Systems.

EuroSys '14. Amsterdam, The Netherlands: ACM, 2014, pp. 15�1. url:

http://doi.acm.org/10.1145/2592798.2592814.

[26] W. Enck et al. �Defending Against Attacks on Main Memory Persistence�.

In: Computer Security Applications Conference, 2008. ACSAC 2008. An-

nual. 2008, pp. 65�74. url: http://dx.doi.org/10.1109/ACSAC.2008.

45.

[27] Michael Fitsilis and Rainer Waser. �Scaling of the ferroelectric �eld ef-

fect transistor and programming concepts for non-volatile memory applica-

tions�. Aachen, Techn. Hochsch., Diss., 2005. PhD thesis. Aachen: Fakul-

tat fur Elektrotechnik und Informationstechnik der Rheinisch-Westfalischen

Technischen Hochschule Aachen, 2005. url: http://publications.rwt

h-aachen.de/record/62096.

[28] Vincent Garcia and Manuel Bibes. �Ferroelectric tunnel junctions for in-

formation storage and processing�. In: Nat Commun 5.. (2014). Review,

p. . url: http://dx.doi.org/10.1038/ncomms5289.

[29] Paolo Gargini. �The Roadmap to Success: 2013 ITRS Update�. In: Sem-

inar on 2013 ITRS Roadmap Update. 2014. url: http://www.ewh.ieee

.org/r6/scv/eds/slides/2014-Mar-11-Paolo.pdf.

[30] Bharan Giridhar et al. �Exploring DRAMOrganizations for Energy-e�cient

and Resilient Exascale Memories�. In: Proceedings of the International

Conference on High Performance Computing, Networking, Storage and Anal-

ysis. SC '13. Denver, Colorado: ACM, 2013, pp. 23�1. url: http:

//doi.acm.org/10.1145/2503210.2503215.



http://dx.doi.org/10.1109/N-SSC.2008.4785686

http://dx.doi.org/10.1109/N-SSC.2008.4785686

http://doi.acm.org/10.1145/2592798.2592814

http://dx.doi.org/10.1109/ACSAC.2008.45

http://dx.doi.org/10.1109/ACSAC.2008.45

http://publications.rwth-aachen.de/record/62096

http://publications.rwth-aachen.de/record/62096

http://dx.doi.org/10.1038/ncomms5289

http://www.ewh.ieee.org/r6/scv/eds/slides/2014-Mar-11-Paolo.pdf

http://www.ewh.ieee.org/r6/scv/eds/slides/2014-Mar-11-Paolo.pdf

http://doi.acm.org/10.1145/2503210.2503215

http://doi.acm.org/10.1145/2503210.2503215

122

[31] Theo Haerder and Andreas Reuter. �Principles of Transaction-oriented

Database Recovery�. In: ACM Comput. Surv. 15.4 (1983), pp. 287�317.

url: http://doi.acm.org/10.1145/289.291.

[32] J. Alex Halderman et al. �Lest We Remember: Cold-boot Attacks on

Encryption Keys�. In: Commun. ACM 52.5 (2009), pp. 91�98. url:

http://doi.acm.org/10.1145/1506409.1506429.

[33] Lance Hammond et al. �Transactional Memory Coherence and Consis-

tency�. In: Proceedings of the 31st Annual International Symposium on

Computer Architecture. ISCA '04. München, Germany: IEEE Com-

puter Society, 2004, pp. 102�. url: http://dl.acm.org/citation.cfm?i

d=998680.1006711.

[34] Maurice Herlihy and J. Eliot B. Moss. �Transactional Memory: Archi-

tectural Support for Lock-free Data Structures�. In: SIGARCH Comput.

Archit. News 21.2 (1993), pp. 289�300. url: http://doi.acm.org/10.1

145/173682.165164.

[35] Dave Hitz, James Lau, and Michael Malcolm. �File System Design for

an NFS File Server Appliance�. In: Proceedings of the USENIX Winter

1994 Technical Conference on USENIX Winter 1994 Technical Conference.

WTEC'94. San Francisco, California: USENIX Association, 1994, pp. 19�

19. url: http://dl.acm.org/citation.cfm?id=1267074.1267093.

[36] John E. Hopcroft, Rajeev Motwani, and Je�rey D. Ullman. Automi, lin-

guaggi e calcolabilità. Ed. by Pearson Education. Prima Edizione Italiana.

Pearson Education, 2003.

[37] H. Hunter, L. A. Lastras-Montano, and B. Bhattacharjee. �Adapting

Server Systems for New Memory Technologies�. In: Computer 47.9 (2014),

pp. 78�84. url: http://dx.doi.org/10.1109/MC.2014.233.

[38] Ju-Young Jung and Sangyeun Cho. �Dynamic Co-management of Persis-

tent RAM Main Memory and Storage Resources�. In: Proceedings of the

8th ACM International Conference on Computing Frontiers. CF '11. Is-

chia, Italy: ACM, 2011, pp. 13�1. url: http://doi.acm.org/10.1145/

2016604.2016620.

http://doi.acm.org/10.1145/289.291

http://doi.acm.org/10.1145/1506409.1506429



http://doi.acm.org/10.1145/173682.165164

http://doi.acm.org/10.1145/173682.165164


http://dx.doi.org/10.1109/MC.2014.233

http://doi.acm.org/10.1145/2016604.2016620

http://doi.acm.org/10.1145/2016604.2016620

123

[39] Tolga Kaya and Hur Koser. �A New Batteryless Active RFID System:

Smart RFID�. in: RFID Eurasia, 2007 1st Annual. 2007, pp. 1�4. url:

http://dx.doi.org/10.1109/RFIDEURASIA.2007.4368151.

[40] Kyung Min Kim, Doo Seok Jeong, and Cheol Seong Hwang. �Nano�la-

mentary resistive switching in binary oxide system: a review on the present

status and outlook�. In: Nanotechnology 22.25 (2011), p. 254002. url:

http://dx.doi.org/10.1088/0957-4484/22/25/254002.

[41] Myungsik Kim, Jinchul Shin, and Youjip Won. �Selective Segment Ini-

tialization: Exploiting NVRAM to Reduce Device Startup Latency�. In:

Embedded Systems Letters, IEEE 6.2 (2014), pp. 33�36.

[42] Young-Jin Kim et al. �I/O Performance Optimization Techniques for Hy-

brid Hard Disk-Based Mobile Consumer Devices�. In: Consumer Elec-

tronics, IEEE Transactions on 53.4 (2007), pp. 1469�1476. url: http:

//dx.doi.org/10.1109/TCE.2007.4429239.

[43] B. T. Kolomiets. �Vitreous Semiconductors (I)�. in: physica status solidi

(b) 7.2 (1964), pp. 359�372. url: http://dx.doi.org/10.1002/pssb.19

640070202.

[44] Charles Lamb et al. �The ObjectStore Database System�. In: Commun.

ACM 34.10 (1991), pp. 50�63. url: http://doi.acm.org/10.1145/1252

23.125244.

[45] Simon Lavington. �In the Footsteps of Colossus: A Description of Oedipus�.

In: IEEE Ann. Hist. Comput. 28.2 (2006), pp. 44�55. url: http:

//dx.doi.org/10.1109/MAHC.2006.34.

[46] Benjamin C. Lee et al. �Architecting Phase Change Memory As a Scalable

Dram Alternative�. In: Proceedings of the 36th Annual International Sym-

posium on Computer Architecture. ISCA '09. Austin, TX, USA: ACM,

2009, pp. 2�13. url: http://doi.acm.org/10.1145/1555754.1555758.

[47] B. Liskov et al. �Safe and E�cient Sharing of Persistent Objects in Thor�.

In: Proceedings of the 1996 ACM SIGMOD International Conference on

Management of Data. SIGMOD '96. Montreal, Quebec, Canada: ACM,

1996, pp. 318�329. url: http://doi.acm.org/10.1145/233269.233346.

http://dx.doi.org/10.1109/RFIDEURASIA.2007.4368151

http://dx.doi.org/10.1088/0957-4484/22/25/254002

http://dx.doi.org/10.1109/TCE.2007.4429239

http://dx.doi.org/10.1109/TCE.2007.4429239

http://dx.doi.org/10.1002/pssb.19640070202

http://dx.doi.org/10.1002/pssb.19640070202

http://doi.acm.org/10.1145/125223.125244

http://doi.acm.org/10.1145/125223.125244

http://dx.doi.org/10.1109/MAHC.2006.34

http://dx.doi.org/10.1109/MAHC.2006.34

http://doi.acm.org/10.1145/1555754.1555758

http://doi.acm.org/10.1145/233269.233346

124

[48] T. P. Ma and Jin-Ping Han. �Why is nonvolatile ferroelectric memory

�eld-e�ect transistor still elusive?� In: Electron Device Letters, IEEE 23.7

(2002), pp. 386�388. url: http://dx.doi.org/10.1109/LED.2002.1015

207.

[49] F. Masuoka et al. �A new �ash E2PROM cell using triple polysilicon tech-

nology�. In: Electron Devices Meeting, 1984 International. Vol. 30. 1984,

pp. 464�467. url: http://dx.doi.org/10.1109/IEDM.1984.190752.

[50] Brian Matas and Christian De Suberbasaux. MEMORY 1997. Ed. by In-

tegrated Circuit Engineering Corporation. Integrated Circuit Engineering,

1997. url: http://smithsonianchips.si.edu/ice/cd/MEMORY97/titl

e.pdf.

[51] Marshall K. McKusick et al. �A Fast File System for UNIX�. in: ACM

Trans. Comput. Syst. 2.3 (1984), pp. 181�197. url: http://doi.acm.or

g/10.1145/989.990.

[52] Stephan Menzel et al. �Switching kinetics of electrochemical metallization

memory cells�. In: Phys. Chem. Chem. Phys. 15.18 (2013), pp. 6945�

6952. url: http://dx.doi.org/10.1039/C3CP50738F.

[53] Je�rey C. Mogul et al. �Operating System Support for NVM+DRAM Hy-

brid Main Memory�. In: Proceedings of the 12th Conference on Hot Top-

ics in Operating Systems. HotOS'09. Monte Verità, Switzerland:

USENIX Association, 2009, pp. 14�14. url: http://dl.acm.org/citat

ion.cfm?id=1855568.1855582.

[54] G. E. Moore. �No exponential is forever: but "Forever" can be delayed!

[semiconductor industry]�. In: Solid-State Circuits Conference, 2003. Di-

gest of Technical Papers. ISSCC. 2003 IEEE International. 2003, pp. 20�

23. url: http://dx.doi.org/10.1109/ISSCC.2003.1234194.

[55] O. Mutlu. �Memory scaling: A systems architecture perspective�. In: Mem-

ory Workshop (IMW), 2013 5th IEEE International. 2013, pp. 21�25. url:

http://dx.doi.org/10.1109/IMW.2013.6582088.

[56] Whole-system persistence. Vol. 40. ACM SIGARCH Computer Architec-

ture News 1. 2012, pp. 401�410. url: http://dl.acm.org/citation.cf

http://dx.doi.org/10.1109/LED.2002.1015207

http://dx.doi.org/10.1109/LED.2002.1015207

http://dx.doi.org/10.1109/IEDM.1984.190752

http://smithsonianchips.si.edu/ice/cd/MEMORY97/title.pdf

http://smithsonianchips.si.edu/ice/cd/MEMORY97/title.pdf

http://doi.acm.org/10.1145/989.990

http://doi.acm.org/10.1145/989.990

http://dx.doi.org/10.1039/C3CP50738F



http://dx.doi.org/10.1109/ISSCC.2003.1234194

http://dx.doi.org/10.1109/IMW.2013.6582088


125

m?id=2151018.

[57] Donald A. Neamen. Electronic Circuit Analysis and Design. Ed. by Mc-

GrawHill. 2nd. McGrawHill, 2000.

[58] Rajesh Nishtala et al. �Scaling Memcache at Facebook�. In: Presented

as part of the 10th USENIX Symposium on Networked Systems Design

and Implementation (NSDI 13). Lombard, IL: USENIX, 2013, pp. 385�

398. url: https://www.usenix.org/conference/nsdi13/technical-s

essions/presentation/nishtala.

[59] S. Oikawa. �Virtualizing Storage as Memory for High Performance Storage

Access�. In: Parallel and Distributed Processing with Applications (ISPA),

2014 IEEE International Symposium on. 2014, pp. 18�25. url: http:

//dx.doi.org/10.1109/ISPA.2014.12.

[60] Shuichi Oikawa. �Non-volatile main memory management methods based

on a �le system�. In: SpringerPlus 3.1 (2014), p. 494. url: http://www.

springerplus.com/content/3/1/494.

[61] [online]. 3D NAND: Bene�ts of Charge Traps over Floating Gates. 2013.

url: http://thememoryguy.com/3d-nand-benefits-of-charge-traps

-over-floating-gates/.

[62] [online]. Amazon ElastiCache. last accessed: 2015. url: http://aws.am

azon.com/elasticache/.

[63] [online]. An idiosyncratic survey of Spintronics. last accessed: 2015. url:

https://physics.tamu.edu/calendar/talks/cmseminars/cm_talks/200

7_10_18_Levy_P.pdf.

[64] [online]. BEE3. last accessed: 2015. url: http://research.microsoft

.com/en-us/projects/bee3/.

[65] [online]. Big Data. last accessed: 2015. url: http://lookup.computerl

anguage.com/host_app/search?cid=C999999&term=Big%20Data.

[66] [online]. Comparing Technologies: MRAM vs. FRAM. last accessed: 2015.

url: http://www.everspin.com/PDF/EST02130_Comparing_Technologie

s_FRAM_vs_MRAM_AppNote.pdf.



https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala

https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala

http://dx.doi.org/10.1109/ISPA.2014.12

http://dx.doi.org/10.1109/ISPA.2014.12

http://www.springerplus.com/content/3/1/494

http://www.springerplus.com/content/3/1/494

http://thememoryguy.com/3d-nand-benefits-of-charge-traps-over-floating-gates/

http://thememoryguy.com/3d-nand-benefits-of-charge-traps-over-floating-gates/

http://aws.amazon.com/elasticache/

http://aws.amazon.com/elasticache/

https://physics.tamu.edu/calendar/talks/cmseminars/cm_talks/2007_10_18_Levy_P.pdf

https://physics.tamu.edu/calendar/talks/cmseminars/cm_talks/2007_10_18_Levy_P.pdf

http://research.microsoft.com/en-us/projects/bee3/

http://research.microsoft.com/en-us/projects/bee3/

http://lookup.computerlanguage.com/host_app/search?cid=C999999&term=Big%20Data

http://lookup.computerlanguage.com/host_app/search?cid=C999999&term=Big%20Data

http://www.everspin.com/PDF/EST02130_Comparing_Technologies_FRAM_vs_MRAM_AppNote.pdf

http://www.everspin.com/PDF/EST02130_Comparing_Technologies_FRAM_vs_MRAM_AppNote.pdf

126

[67] [online]. DARPA Developing ExtremeScale Supercomputer System. 2010.

url: http://www.darpa.mil/WorkArea/DownloadAsset.aspx?id=1795.

[68] [online]. Datacenter Construction Expected To Boom. 2014. url: http://

www.enterprisetech.com/2014/04/17/datacenter-construction-expec

ted-boom/.

[69] [online]. Dell PowerEdge R920 Data Sheet. last accessed: 2015. url: ht

tp://i.dell.com/sites/doccontent/shared-content/data-sheets/en

/Documents/PowerEdge_R920_Spec-Sheet.pdf.

[70] [online]. European Exascale Software Initiative [Home Page]. 2013. url:

http://www.eesi-project.eu/pages/menu/homepage.php.

[71] [online]. FRAM Structure. last accessed: 2015. url: http://www.fujits

u.com/global/products/devices/semiconductor/memory/fram/overvi

ew/structure/.

[72] [online]. Fundamentals of volatile memory technologies. 2011. url: http:

//www.electronicproducts.com/Digital_ICs/Memory/Fundamentals_o

f_volatile_memory_technologies.aspx.

[73] [online]. Further adventures in non-volatile memory. last accessed: 2015.

url: https://www.youtube.com/watch?v=UzsPnw11KX0.

[74] [online]. Gartner Identi�es the Top 10 Strategic Technology Trends for

2015. 2014. url: http://www.gartner.com/newsroom/id/2867917.

[75] [online]. History of ferroelectrics. last accessed: 2015. url: http://www.

ieee-uffc.org/ferroelectrics/learning-e003.asp.

[76] [online]. How Does Flash Memory Store Data? last accessed: 2015. url:

https://product.tdk.com/info/en/techlibrary/archives/techjourn

al/vol01_ssd/contents03.html.

[77] [online]. HP and SK Hynix Cancel Plans to Commercialize Memristor-

Based Memory in 2013. 2012. url: http://www.xbitlabs.com/news/st

orage/display/20120927125227_HP_and_Hynix_Cancel_Plans_to_Comme

rcialize_Memristor_Based_Memory_in_2013.html.

http://www.darpa.mil/WorkArea/DownloadAsset.aspx?id=1795

http://www.enterprisetech.com/2014/04/17/datacenter-construction-expected-boom/



http://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/PowerEdge_R920_Spec-Sheet.pdf



http://www.eesi-project.eu/pages/menu/homepage.php

http://www.fujitsu.com/global/products/devices/semiconductor/memory/fram/overview/structure/



http://www.electronicproducts.com/Digital_ICs/Memory/Fundamentals_of_volatile_memory_technologies.aspx



https://www.youtube.com/watch?v=UzsPnw11KX0

http://www.gartner.com/newsroom/id/2867917

http://www.ieee-uffc.org/ferroelectrics/learning-e003.asp

http://www.ieee-uffc.org/ferroelectrics/learning-e003.asp

https://product.tdk.com/info/en/techlibrary/archives/techjournal/vol01_ssd/contents03.html

https://product.tdk.com/info/en/techlibrary/archives/techjournal/vol01_ssd/contents03.html

http://www.xbitlabs.com/news/storage/display/20120927125227_HP_and_Hynix_Cancel_Plans_to_Commercialize_Memristor_Based_Memory_in_2013.html



127

[78] [online]. Hybrid Memory Cube Consortium - Home Page. last accessed:

2015. url: http://www.hybridmemorycube.org/.

[79] [online]. IBM 350 disk storage unit. last accessed: 2015. url: http://ww

w-03.ibm.com/ibm/history/exhibits/storage/storage_350.html.

[80] [online]. IEEE-CS Unveils Top 10 Technology Trends for 2015. 2014. url:

http://www.computer.org/web/pressroom/2015-tech-trends.

[81] [online]. Introduction to the Java Persistence API. last accessed: 2015.

url: http://docs.oracle.com/javaee/6/tutorial/doc/bnbpz.html.

[82] [online]. ITRS 2013 ERD TABLES. last accessed: 2015. url: https://ww

w.dropbox.com/sh/2fme4y0avvv7uxs/AAAB10oeC7wNtQkFp5XAcenba/ITR

S/2013ITRS/2013ITRS%20Tables_R1/ERD_2013Tables.xlsx?dl=0.

[83] [online]. ITRS 2013 EXECUTIVE SUMMARY. last accessed: 2015. url:

http://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%

20Links/2013ITRS/2013Chapters/2013ExecutiveSummary.pdf.

[84] [online]. ITRS ERD 2013 REPORT. last accessed: 2015. url: https://ww

w.dropbox.com/sh/6xq737bg6pww9gq/AAAXRzGlUis1sVUxurZnMCY4a/201

3ERD.pdf?dl=0.

[85] [online]. Log-structured �le systems. 2009. url: http://lwn.net/Articl

es/353411/.

[86] [online]. Magnetic Core Memory. last accessed: 2015. url: http://www.

computerhistory.org/revolution/memory-storage/8/253.

[87] [online]. Managing �ash storage with Linux. 2012. url: http://free-e

lectrons.com/blog/managing-flash-storage-with-linux/.

[88] [online]. Mechanical roadmap points to hard drives over 100TB by 2025.

2014. url: http://techreport.com/news/27420/mechanical-roadmap

-points-to-hard-drives-over-100tb-by-2025.

[89] [online]. More Details on Today's Outage. 2010. url: https://www.face

book.com/notes/facebook-engineering/more-details-on-todays-out

age/431441338919.

http://www.hybridmemorycube.org/

http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html

http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html

http://www.computer.org/web/pressroom/2015-tech-trends

http://docs.oracle.com/javaee/6/tutorial/doc/bnbpz.html

https://www.dropbox.com/sh/2fme4y0avvv7uxs/AAAB10oeC7wNtQkFp5XAcenba/ITRS/2013ITRS/2013ITRS%20Tables_R1/ERD_2013Tables.xlsx?dl=0



http://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%20Links/2013ITRS/2013Chapters/2013ExecutiveSummary.pdf

http://www.itrs.net/ITRS%201999-2014%20Mtgs,%20Presentations%20&%20Links/2013ITRS/2013Chapters/2013ExecutiveSummary.pdf

https://www.dropbox.com/sh/6xq737bg6pww9gq/AAAXRzGlUis1sVUxurZnMCY4a/2013ERD.pdf?dl=0



http://lwn.net/Articles/353411/


http://www.computerhistory.org/revolution/memory-storage/8/253

http://www.computerhistory.org/revolution/memory-storage/8/253

http://free-electrons.com/blog/managing-flash-storage-with-linux/

http://free-electrons.com/blog/managing-flash-storage-with-linux/

http://techreport.com/news/27420/mechanical-roadmap-points-to-hard-drives-over-100tb-by-2025

http://techreport.com/news/27420/mechanical-roadmap-points-to-hard-drives-over-100tb-by-2025

https://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919



128

[90] [online]. Mott transition. last accessed: 2015. url: http://lamp.tu-gra

z.ac.at/~hadley/ss2/problems/mott/s.pdf.

[91] [online]. NVM Express and the PCI Express SSD Revolution. 2012. url:

http://www.nvmexpress.org/wp-content/uploads/2013/04/IDF-2012-N

VM-Express-and-the-PCI-Express-SSD-Revolution.pdf.

[92] [online]. Optimizing Linux with cheap �ash drives. 2011. url: http:

//lwn.net/Articles/428584/.

[93] [online]. [PATCH v10 11/21] Replace XIP documentation with DAX. last

accessed: 2015. url: http://lwn.net/Articles/610316/.

[94] [online]. PCM BECOMES A REALITY. 2009. url: http://www.object

ive-analysis.com/uploads/2009-08-03_Objective_Analysis_PCM_Whi

te_Paper.pdf.

[95] [online]. Protected and Persistent RAM Filesystem. last accessed: 2015.

url: http://pramfs.sourceforge.net/.

[96] [online]. Samsung 850 PRO Speci�cations. last accessed: 2015. url: http:

//www.samsung.com/global/business/semiconductor/minisite/SSD/g

lobal/html/ssd850pro/specifications.html.

[97] [online]. Seagate preps for 30TB laser-assisted hard drives. 2014. url: ht

tp://www.computerworld.com/article/2846415/seagate-preps-for-3

0tb-laser-assisted-hard-drives.html.

[98] [online]. Solid Memory by Toshiba. last accessed: 2015. url: http://www.

toshiba-memory.com/cms/en/meta/memory_division/about_us.html.

[99] [online]. Supporting �lesystems in persistent memory. last accessed: 2015.

url: http://lwn.net/Articles/610174/.

[100] [online]. The Discovery of Giant Magnetoresistance. 2007. url: http://

www.nobelprize.org/nobel_prizes/physics/laureates/2007/advance

d-physicsprize2007.pdf.

[101] [online]. The High-k Solution. 2007. url: http://spectrum.ieee.org/

semiconductors/design/the-highk-solution.

http://lamp.tu-graz.ac.at/~hadley/ss2/problems/mott/s.pdf

http://lamp.tu-graz.ac.at/~hadley/ss2/problems/mott/s.pdf

http://www.nvmexpress.org/wp-content/uploads/2013/04/IDF-2012-NVM-Express-and-the-PCI-Express-SSD-Revolution.pdf

http://www.nvmexpress.org/wp-content/uploads/2013/04/IDF-2012-NVM-Express-and-the-PCI-Express-SSD-Revolution.pdf




http://www.objective-analysis.com/uploads/2009-08-03_Objective_Analysis_PCM_White_Paper.pdf



http://pramfs.sourceforge.net/

http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/ssd850pro/specifications.html



http://www.computerworld.com/article/2846415/seagate-preps-for-30tb-laser-assisted-hard-drives.html



http://www.toshiba-memory.com/cms/en/meta/memory_division/about_us.html

http://www.toshiba-memory.com/cms/en/meta/memory_division/about_us.html


http://www.nobelprize.org/nobel_prizes/physics/laureates/2007/advanced-physicsprize2007.pdf



http://spectrum.ieee.org/semiconductors/design/the-highk-solution

http://spectrum.ieee.org/semiconductors/design/the-highk-solution

129

[102] [online]. The Inconvenient Truths of NAND Flash Memory. 2007. url: ht

tps://www.micron.com/~/media/documents/products/presentation/fl

ash_mem_summit_jcooke_inconvenient_truths_nand.pdf.

[103] [online]. The Machine: A new kind of computer. last accessed: 2015. url:

http://www.hpl.hp.com/research/systems-research/themachine/.

[104] [online]. The Transition to PCI Express for Client SSDs. 2012. url: http:

//www.flashmemorysummit.com/English/Collaterals/Proceedings/20

12/20120821_S102C_Huffman.pdf.

[105] [online]. Ultrastar He8. last accessed: 2015. url: http://www.hgst.com

/hard-drives/enterprise-hard-drives/enterprise-sas-drives/ultr

astar-he8.

[106] [online]. Understanding JPA. 2008. url: http://www.javaworld.com/ar

ticle/2077817/java-se/understanding-jpa-part-1-the-object-orien

ted-paradigm-of-data-persistence.html?null.

[107] [online]. Understanding Moore's Law: Four Decades of Innovation. 2006.

url: http://www.chemheritage.org/community/store/books-and-cat

alogs/understanding-moores-law.aspx.

[108] [online]. Ushering in the 3D Memory Era with V- NAND. 2013. url: http:

//www.flashmemorysummit.com/English/Collaterals/Proceedings/20

13/20130813_KeynoteB_Elliot_Jung.pdf.

[109] [online]. WD Black - Mobile Hard Drives. last accessed: 2015. url: http:

//www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771435.pdf.

[110] [online]. Whole System Persistence Computer based on NVDIMM. 2014.

url: https://www.youtube.com/watch?v=gFuXn2QHXWo.

[111] [online]. ZFS THE LAST WORD IN FILE SYSTEMS. 2008. url: http:

//lib.stanford.edu/files/pasig-spring08/RaymondClark_ZFS_Overi

view.pdf.

[112] John Ousterhout et al. �The Case for RAMCloud�. In: Commun. ACM

54.7 (2011), pp. 121�130. url: http://doi.acm.org/10.1145/1965724.

1965751.

https://www.micron.com/~/media/documents/products/presentation/flash_mem_summit_jcooke_inconvenient_truths_nand.pdf



http://www.hpl.hp.com/research/systems-research/themachine/

http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_S102C_Huffman.pdf



http://www.hgst.com/hard-drives/enterprise-hard-drives/enterprise-sas-drives/ultrastar-he8



http://www.javaworld.com/article/2077817/java-se/understanding-jpa-part-1-the-object-oriented-paradigm-of-data-persistence.html?null



http://www.chemheritage.org/community/store/books-and-catalogs/understanding-moores-law.aspx

http://www.chemheritage.org/community/store/books-and-catalogs/understanding-moores-law.aspx

http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2013/20130813_KeynoteB_Elliot_Jung.pdf



http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771435.pdf

http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771435.pdf

https://www.youtube.com/watch?v=gFuXn2QHXWo

http://lib.stanford.edu/files/pasig-spring08/RaymondClark_ZFS_Overiview.pdf



http://doi.acm.org/10.1145/1965724.1965751

http://doi.acm.org/10.1145/1965724.1965751

130

[113] Stanford R. Ovshinsky. �Reversible Electrical Switching Phenomena in

Disordered Structures�. In: Phys. Rev. Lett. 21.20 (1968), pp. 1450�1453.

url: http://link.aps.org/doi/10.1103/PhysRevLett.21.1450.

[114] Stuart S. P. Parkin, Masamitsu Hayashi, and Luc Thomas. �Magnetic

domain-wall racetrack memory.� In: Science (New York, N.Y.) 320.5873

(2008), pp. 190�194. url: http://www.ncbi.nlm.nih.gov/pubmed/?ter

m=18403702.

[115] David A. Patterson. �Latency Lags Bandwith�. In: Commun. ACM 47.10

(2004), pp. 71�75. url: http://doi.acm.org/10.1145/1022594.102259

6.

[116] Simon Peter et al. �Arrakis: The Operating System is the Control Plane�.

In: 11th USENIX Symposium on Operating Systems Design and Implemen-

tation (OSDI 14). Broom�eld, CO: USENIX Association, 2014, pp. 1�

16. url: https://www.usenix.org/conference/osdi14/technical-s

essions/presentation/peter.

[117] P. A. H. Peterson. �Cryptkeeper: Improving security with encrypted

RAM�. in: Technologies for Homeland Security (HST), 2010 IEEE Inter-

national Conference on. 2010, pp. 120�126. url: http://dx.doi.org/1

0.1109/THS.2010.5655081.

[118] Moinuddin K. Qureshi et al. �Enhancing Lifetime and Security of PCM-

based Main Memory with Start-gap Wear Leveling�. In: Proceedings of the

42Nd Annual IEEE/ACM International Symposium on Microarchitecture.

MICRO 42. New York, New York: ACM, 2009, pp. 14�23. url: http:

//doi.acm.org/10.1145/1669112.1669117.

[119] D. C. Ralph and M. D. Stiles. �Spin transfer torques�. In: Journal of Mag-

netism and Magnetic Materials 320.7 (2008), pp. 1190�1216. url: http:

//www.sciencedirect.com/science/article/pii/S0304885307010116.

[120] Simone Raoux, Welnic Wojciech, and Daniele Ielmini. �Phase Change

Materials and Their Application to Nonvolatile Memories�. In: Chemi-

cal Reviews 110.1 (2010). PMID: 19715293, pp. 240�267. url: http:

//dx.doi.org/10.1021/cr900040x.

http://link.aps.org/doi/10.1103/PhysRevLett.21.1450

http://www.ncbi.nlm.nih.gov/pubmed/?term=18403702

http://www.ncbi.nlm.nih.gov/pubmed/?term=18403702

http://doi.acm.org/10.1145/1022594.1022596

http://doi.acm.org/10.1145/1022594.1022596

https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter

https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter

http://dx.doi.org/10.1109/THS.2010.5655081

http://dx.doi.org/10.1109/THS.2010.5655081

http://doi.acm.org/10.1145/1669112.1669117

http://doi.acm.org/10.1145/1669112.1669117



http://dx.doi.org/10.1021/cr900040x

http://dx.doi.org/10.1021/cr900040x

131

[121] Ohad Rodeh, Josef Bacik, and Chris Mason. �BTRFS: The Linux B-Tree

Filesystem�. In: Trans. Storage 9.3 (2013), pp. 9�1. url: http://doi.ac

m.org/10.1145/2501620.2501623.

[122] Mendel Rosenblum and John K. Ousterhout. �The Design and Implementa-

tion of a Log-structured File System�. In: ACM Trans. Comput. Syst. 10.1

(1992), pp. 26�52. url: http://doi.acm.org/10.1145/146941.146943.

[123] Log-structured Memory for DRAM-based Storage. Proceedings of the 12th

USENIX Conference on File and Storage Technologies. Santa Clara, CA:

USENIX, 2014, pp. 1�16. url: https://www.usenix.org/conference/

fast14/technical-sessions/presentation/rumble.

[124] K. Sakui. �Professor Fujio Masuoka's Passion and Patience Toward Flash

Memory�. In: Solid-State Circuits Magazine, IEEE 5.4 (2013), pp. 30�33.

url: http://dx.doi.org/10.1109/MSSC.2013.2278084.

[125] Rohit Soni et al. �Giant electrode e�ect on tunnelling electroresistance in

ferroelectric tunnel junctions�. In: Nat Commun 5.. (2014). Article, p. .

url: http://dx.doi.org/10.1038/ncomms6414.

[126] D. B. Strukov and H. Kohlstedt. �Resistive switching phenomena in thin

�lms: Materials, devices, and applications�. In: MRS Bulletin 37.02 (2012),

pp. 108�114. url: http://journals.cambridge.org/article_S088376

9412000024.

[127] Dmitri B. Strukov et al. �The missing memristor found�. In: Nature 453.7191

(2008), pp. 80�83. url: http://dx.doi.org/10.1038/nature06932.

[128] Ryan Stutsman and John Ousterhout. �Toward Common Patterns for Dis-

tributed, Concurrent, Fault-Tolerant Code�. In: Presented as part of the

14th Workshop on Hot Topics in Operating Systems. Santa Ana Pueblo,

NM: USENIX, 2013. url: https://www.usenix.org/toward-common-p

atterns-distributed-concurrent-fault-tolerant-code.

[129] S. Swanson and A. M. Caul�eld. �Refactor, Reduce, Recycle: Restructur-

ing the I/O Stack for the Future of Storage�. In: Computer 46.8 (2013),

pp. 52�59. url: http://dx.doi.org/10.1109/MC.2013.222.

http://doi.acm.org/10.1145/2501620.2501623

http://doi.acm.org/10.1145/2501620.2501623

http://doi.acm.org/10.1145/146941.146943

https://www.usenix.org/conference/fast14/technical-sessions/presentation/rumble

https://www.usenix.org/conference/fast14/technical-sessions/presentation/rumble

http://dx.doi.org/10.1109/MSSC.2013.2278084

http://dx.doi.org/10.1038/ncomms6414

http://journals.cambridge.org/article_S0883769412000024


http://dx.doi.org/10.1038/nature06932

https://www.usenix.org/toward-common-patterns-distributed-concurrent-fault-tolerant-code

https://www.usenix.org/toward-common-patterns-distributed-concurrent-fault-tolerant-code

http://dx.doi.org/10.1109/MC.2013.222

132

[130] Andrew S. Tanenbaum and Albert S. Woodhull. Operating Systems - De-

sign and Implementation. Ed. by Pearson International. 3rd. International,

Pearson, 2009.

[131] Junji Tominaga et al. �Large Optical Transitions in Rewritable Digital Ver-

satile Discs: An Interlayer Atomic Zipper in a SbTe Alloy�. In: Sym-

posium G � Phase-Change Materials for Recon�gurable Electronics and

Memory Applications. Vol. 1072. MRS Proceedings. 2008. url: http:

//journals.cambridge.org/article_S1946427400030414.

[132] Evgeny Y. Tsymbal and Hermann Kohlstedt. �Tunneling Across a Ferro-

electric�. In: Science 313.5784 (2006), pp. 181�183. url: http://www.sc

iencemag.org/content/313/5784/181.short.

[133] Julian Turner. �E�ects of Data Center Vibration on Compute System Per-

formance�. In: Proceedings of the First USENIX Conference on Sustainable

Information Technology. SustainIT'10. San Jose, CA: USENIX Associa-

tion, 2010, pp. 5�5. url: http://dl.acm.org/citation.cfm?id=186315

9.1863164.

[134] P. Vettiger et al. �The "millipede" - nanotechnology entering data storage�.

In: Nanotechnology, IEEE Transactions on 1.1 (2002), pp. 39�55. url:

http://dx.doi.org/10.1109/TNANO.2002.1005425.

[135] Mnemosyne: Lightweight persistent memory. Vol. 39. ACM SIGARCH

Computer Architecture News 1. 2011, pp. 91�104. url: http://dl.acm

.org/citation.cfm?id=1950379.

[136] YiqunWang et al. �A 3us wake-up time nonvolatile processor based on ferro-

electric �ip-�ops�. In: ESSCIRC. IEEE, 2012, pp. 149�152. url: http://

ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6331297.

[137] Rainer Waser et al. �Redox-Based Resistive Switching Memories � Nanoionic

Mechanisms, Prospects, and Challenges�. In: Advanced Materials 21.25-26

(2009), pp. 2632�2663. url: http://dx.doi.org/10.1002/adma.200900

375.

[138] H. A. R. Wegener et al. �The variable threshold transistor, a new electrically-

alterable, non-destructive read-only storage device�. In: Electron Devices



http://www.sciencemag.org/content/313/5784/181.short

http://www.sciencemag.org/content/313/5784/181.short



http://dx.doi.org/10.1109/TNANO.2002.1005425



http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6331297

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6331297

http://dx.doi.org/10.1002/adma.200900375

http://dx.doi.org/10.1002/adma.200900375

133

Meeting, 1967 International. Vol. 13. 1967, pp. 70�70. url: http:

//dx.doi.org/10.1109/IEDM.1967.187833.

[139] S. A. Wolf et al. �Spintronics: A Spin-Based Electronics Vision for the

Future�. In: Science 294.5546 (2001), pp. 1488�1495. url: http://www.

sciencemag.org/content/294/5546/1488.abstract.

[140] C. David Wright, Peiman Hosseini, and Jorge A. Vazquez Diosdado. �Be-

yond von-Neumann Computing with Nanoscale Phase-Change Memory De-

vices�. In: Advanced Functional Materials 23.18 (2013), pp. 2248�2254.

url: http://dx.doi.org/10.1002/adfm.201202383.

[141] Michael Wu and Willy Zwaenepoel. �eNVy: A Non-volatile, Main Memory

Storage System�. In: Proceedings of the Sixth International Conference on

Architectural Support for Programming Languages and Operating Systems.

ASPLOS VI. San Jose, California, USA: ACM, 1994, pp. 86�97. url:

http://doi.acm.org/10.1145/195473.195506.

[142] Xiaojian Wu, Sheng Qiu, and A. L. Narasimha Reddy. �SCMFS: A File

System for Storage Class Memory and Its Extensions�. In: Trans. Storage

9.3 (2013), pp. 7�1. url: http://doi.acm.org/10.1145/2501620.2501

621.

[143] Wm A. Wulf and Sally A. McKee. �Hitting the Memory Wall: Implications

of the Obvious�. In: SIGARCH Comput. Archit. News 23.1 (1995), pp. 20�

24. url: http://doi.acm.org/10.1145/216585.216588.

[144] Yuan Xie. �Modeling, Architecture, and Applications for Emerging Mem-

ory Technologies�. In: Design Test of Computers, IEEE 28.1 (2011), pp. 44�

51.

[145] J. Joshua Yang et al. �Metal oxide memories based on thermochemical and

valence change mechanisms�. In: MRS Bulletin 37.02 (2012), pp. 131�137.

url: http://journals.cambridge.org/article_S0883769411003563.

[146] Jisoo Yang, Dave B. Minturn, and Frank Hady. �When Poll is Better Than

Interrupt�. In: Proceedings of the 10th USENIX Conference on File and

Storage Technologies. FAST'12. San Jose, CA: USENIX Association, 2012,

pp. 3�3. url: http://dl.acm.org/citation.cfm?id=2208461.2208464.



http://www.sciencemag.org/content/294/5546/1488.abstract

http://www.sciencemag.org/content/294/5546/1488.abstract

http://dx.doi.org/10.1002/adfm.201202383

http://doi.acm.org/10.1145/195473.195506

http://doi.acm.org/10.1145/2501620.2501621

http://doi.acm.org/10.1145/2501620.2501621

http://doi.acm.org/10.1145/216585.216588



134

[147] Yiying Zhang et al. �Mojim: A Reliable and Highly-Available Non-Volatile

Memory System�. In: ASPLOS '15, March 14�18, 2015, Istanbul, Turkey.

2015.

[148] Ping Zhou et al. �A Durable and Energy E�cient Main Memory Using

Phase Change Memory Technology�. In: Proceedings of the 36th Annual

International Symposium on Computer Architecture. ISCA '09. Austin,

TX, USA: ACM, 2009, pp. 14�23. url: http://doi.acm.org/10.1145/

1555754.1555759.

[149] M. Ye. Zhuravlev et al. �Giant Electroresistance in Ferroelectric Tunnel

Junctions�. In: Phys. Rev. Lett. 94 (24 2005), p. 246802. url: http:

//link.aps.org/doi/10.1103/PhysRevLett.94.246802.

http://doi.acm.org/10.1145/1555754.1555759

http://doi.acm.org/10.1145/1555754.1555759



Acknowledgments

The idea about this work has been conceived thanks to Professor De Paoli, who

talked to me about these new memories, which are the subject of this work: I

would like to thank him for both the idea and the trust that he reserved me.

Professor Mariani was asked to help me as advisor of this �nal work: I would

like to thank him for his advices and its helpfulness, which have been valuable.

Great many thanks to all the people that supported me during this tough

time.