Upload
piers-reeves
View
212
Download
0
Embed Size (px)
Citation preview
Apr 17, 2013
Persistent Data Structures
Definitions
An immutable data structure is one that, once created, cannot be modified Immutable data structures can (usually) be copied, with modifications, to
create a new version The modified version takes up as much memory as the original version
A persistent data structure is one that, when modified, retains both the old and the new values Persistent data structures are effectively immutable, in that prior references
to it do not see any change Modifying a persistent data structure may copy part of the original, but the
new version shares memory with the original
This definition is unrelated to persistent storage, which means keeping a copy of data on disk between program executions
Why persistent data structures?
Functional programming is based on the idea of immutable data—or persistent data, which is effectively immutable
The use of immutable data structures greatly simplifies concurrent programming
Synchronization is expensive, and immutable data structures don’t need to be synchronized
Copying large data structures is expensive and wastes space, but persistent data structures can use sophisticated structure sharing to reduce the cost on disk between program executions
Lists
Lists are the original persistent data structures, and are very heavily used in functional programming
x zy
original
w
insert w delete x
As you can see, persistence is automatic with a list, and requires no additional effort
Trees and binary trees
Trees and binary trees can also be implemented in a persistent fashion, though it takes a bit more work
5
A
B C
D E F G
H I J K L M N
A’
C’
G’
Arrays and vectors
It’s more difficult to implement a persistent array The programming language Clojure implements
persistent vectors, which are like arrays but can be expanded
Any location in a vector can be accessed in (almost) O(1) time
Vectors are represented as “fat trees,” or more precisely, as 32-tries
6
Tries
A trie is like a binary search tree, only each node may have many children
Tries are most often used with strings (and have up to 26 children per node)
Each node of a 32-trie may have 32 children
7
Vector implementation I A persistent vector in Clojure is implemented as an N-level trie (N <= 7),
where the root and internal nodes are arrays of 32 references, and the leaves are arrays of 32 values
The depth of the trie (1 to 7) is also kept as an instance value For example, consider accessing location 5000 in a vector
5000 decimal is 1001110001000 binary
To acess element 5000 in a trie of depth 4: The binary number in group 4 (green) says to take the 0th reference The binary number in group 3 (orange) says to take the 5th reference The binary number in group 2 (green) says to take the 28th reference The binary number in group 1 (blue) says to take the 8th value
8
Vector implementation II
The trie can be treated as a “fat tree,” with the structure sharing discussed earlier Because the trie is fat (many children per node), there is a
high proportion of actual data to structure Access time is “almost” O(1), but as the size increases, the
constant factor grows from 1 to 7 (depth of trie) This design is especially good for appending vectors For adding single elements to the end of the vector,
there are additional special-case optimizations
9
Persistent Hash Map Since (in Java and Clojure) a hash code is a 32-bit integer, a hash map could be
implemented just like a vector For a vector, the additional space required for the trie structure is a reasonable
proportion of the total space For a hash map, the additional space required is not reasonable
There will be a large number of 32-element arrays which contain mostly nulls
The hard part is to use only as much space as needed
Basic approach: Use arrays size N <= 32, where N is the number of non-null children Use a 32-bit word to indicate which children are actually present
For example: 00010000000100010000000000101000 indicates 5 children
Find a fast function to map numbers in the range [0, 31] into the range [0, N) Many processors have an instruction to count the number of 1 bits in a word
This would make a good assignment for the next time I teach this course
10
The End
11
Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.
--Sir Winston Churchill, Speech in November 1942