Upload
phungdan
View
213
Download
0
Embed Size (px)
Citation preview
US 20100333116A1
(12) Patent Application Publication (10) Pub. No.: US 2010/0333116 A1 (19) United States
Prahlad et al. (43) Pub. Date: Dec. 30, 2010
(54) CLOUD GATEWAY SYSTEM FOR MANAGING DATA STORAGE TO CLOUD STORAGE SITES
(76) Inventors: Anand Prahlad, Bangalore (IN); Marcus S. Muller, Tinton Falls, NJ (US); Rajiv Kottomtharayil, Marlboro, NJ (US); Srinivas Kavuri, Miyapur (IN); Parag Gokhale, Ocean, NJ (US); Manoj Vij ayan, Marlboro, NJ (US)
Correspondence Address: PERKINS COIE LLP PATENT-SEA PO. BOX 1247 SEATTLE, WA 98111-1247 (US)
(21) App1.No.: 12/751,953
(22) Filed: Mar. 31, 2010
Related US. Application Data
(60) Provisional application No. 61/299,313, ?led on Jan. 28, 2010, provisional application No. 61/221,993, ?led on Jun. 30, 2009, provisional application No. 61/223,695, ?led on Jul. 7, 2009.
130
Client 195
Data 165 agent
Secondary 130 storage computer
device Client
195 Data I agent .
165
2 Secondary ' storage computer
130 device
Client 1
Data agent
Publication Classi?cation
(51) Int. Cl. G06F 9/44 (2006.01) G06F 15/167 (2006.01) H04L 29/06 (2006.01)
(52) us. c1. ........................ .. 719/328; 709/216; 713/153
(57) ABSTRACT
Systems and methods are disclosed for performing data stor age operations, including content-indexing, containeriZed deduplication, andpolicy-driven storage, Within a cloud envi ronment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud envi ronment that requires data transfer over Wide area networks, such as the Internet, Which may have appreciable latency and/or packet loss, using various network protocols, includ ing HTTP and FTP. Methods are disclosed for content index ing data stored Within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containeriZed deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data ?les subject to a storage policy. Further, systems and methods for providing a cloud gateWay and a scalable data object store Within a cloud environment are disclosed, along With other features.
115A
‘ , Cloud storage site A
http/https/ftp protocols
1 15B
Cloud storage site B
115N
Cloud storage site N
Patent Application Publication Dec. 30, 2010 Sheet 1 0f 33 US 2010/0333116 A1
29% m 9.6 @9906 U320
mm: < 9m @9206 “520
N ENE
Emmm Ema mm?
E96
om;
2822a @3559;
m2>wn 6:588 @9206 Bmucoowm mg
8320 bmucooow
“comm Ema
E26
on?
All
mm?
Ewmm Ema mm?
E26
on?
Patent Application Publication Dec. 30, 2010 Sheet 2 0f 33 US 2010/0333116 A1
245 105 150 storage manager
I 235 233 l : network mgmt l | agent agent :
211 l 220 225 : mgmt. : jobs interface r
' l 1?,0 Index I agent agent I 130 Chem ‘. L “ _ ' ' ' _ _ - ' - “ , client
270 255 195 195 255 270
m t network network - ea Client data data client meta
base agent agent agent agent base
A
260 260
_l?“i/s£"e£e_____ __¢_______ secondary storage
261 - 165 - * V 165
secondary storage computing E secondary storage computing dev'ce 235 205 deViCe 235 205
content 247 Network content Network indexing agent lndexmg agent component 38 component
299 light 299
deduplication '"dex tie-duplication module module
240 240
Media file system agent 236 Media ?le system agent 236 Cloud storage Cloud storage submodule submodule
297
- A 297 Deduplication , .
database 115 " 115 Deduphcatlon Storage Storage database Device Device
(e.g., cloud (e.g., cloud storage site) storage site)
FIG. 2
Patent Application Publication Dec. 30, 2010 Sheet 3 0f 33 US 2010/0333116 A1
340 Receive a ?le system request to write data to a target cloud
storage site
i 350 Add data associated with
received file system request to buffer
Buffer full?
Convert file system requests to vendor-specific API calls
ii 380 Transmit buffer using vendor
specific API calls
Transmission successful?
FIG. 3A
Patent Application Publication Dec. 30, 2010 Sheet 4 0f 33 US 2010/0333116 A1
300
c > 310
Receive copy of an original data set from a file system
320
Index data
330 Deduplicate data and store deduplicated data on cloud
storage
( Return )
FIG. 3B
Patent Application Publication Dec. 30, 2010 Sheet 5 0f 33 US 2010/0333116 A1
400
130 297
Client 1 Deduplication Database
Deduplication Module 299
410 420
tion generation
Client 2 425 430
Identi?er Criteria comparison evaluation
130 1 15
. Storage
Chent n device
FIG. 4
Patent Application Publication
500
510
5151
Dec. 30, 2010 Sheet 6 0f 33 US 2010/0333116 A1
502
chunk folder
504
——> metadata file
506
--——> N file
508
———> S file
FIG. 5A
502
chunk folder 1
504
———> metadata file 1
506
—————> N file 1
508
__—__> 8 file 1
U
chunk folder 2
504
—————-+ metadata file 2
506
—--—> N file 2
FIG. 5B
Patent Application Publication Dec. 30, 2010 Sheet 7 0f 33 US 2010/0333116 A1
522 524 522 524 522 524
Stream Stream Stream Stream . _ . Stream Stream Header 1 Data 1 Header 2 Data 2 Header 11 Data n
520 ; FIG. 5 C
542 542 542 542 542
C0 C1 C2 C3 - ~ - C”
O 5 10 15 65
544 544 544 544 544
FIG. 5D
Patent Application Publication Dec. 30, 2010 Sheet 8 0f 33 US 2010/0333116 A1
600
( Prune )
v 605
Receive selection of an archive ?le to prune
v 610
Perform lookup of archive file
615 Does
archive file have references out?
620
Delete the references out
archive files reference by references out have other
references in?
630 Prune archive files referenced by
references out
635 Does archive file have
references in?
640 v 650
Delete references in Prune archive file
\ 645 655
Add reference to archive file to deleted Add deleted time stamp archive file table to archive file table
FIG. 6
Patent Application Publication Dec. 30, 2010 Sheet 11 0f 33 US 2010/0333116 A1
802
804
_> Chunk_001
Metadata ?le 806
——> Non-SI data
Metadata index ?le 808
—> Index to metadata file
Container file 001 810
‘—> B1 B2 B3 - - ~ Bn
Container file 002 811
--> B1 B2 B3 ' ~ ' Bn
+ Container index file 812
001_B1 001__B2 . _ . 002_B1 0O2_Bn I 0 1 1 O
805
_> Chunk_002
Metadata file 807 Non-Si . . Non-SI
——> data Link Link data
Metadata index file 809
—> Index to metadata file
Container file 001 813
——> B1 B2 B3 B4 B5 --- Bn
Container index file 814
0011_B1 001o_B2 ._. 0011_Bn
FIG. 8
Patent Application Publication Dec. 30, 2010 Sheet 12 0f 33 US 2010/0333116 A1
900
905
Receive selection of a job to be pruned 932
entries in container index file corresponding 0 the container equa
to zero?
v 907
Determine archive file, volume folders, and chunk folders
corresponding to job
i 910 933
Delete metadata ?les and metadata index ?les in chunk Delete container file
folders A
V 915
Access container file in chunk folders More
container files in chunk folders?
920 For the
block in the container file, is its reference count
in primary table equal
Free up space in container files?
Set corresponding entry in container index file equal to
zero
W Free up space in container files
l ‘
V
Return i
More blocks in container file?
FIG. 9
Patent Application Publication Dec. 30, 2010 Sheet 13 0f 33 US 2010/0333116 A1
C Index content >
Select copy of data set
1010
1020
Identify content
1030
Update content index
C Return D
FIG. 10
Patent Application Publication
1200
Dec. 30, 2010 Sheet 15 0f 33
1 Restore 1
v 1205
Receive selection of a file to restore
v 1210
Determine archive file ID and offset
v 1215
Access secondary storage
\ 1220
Open chunk folder
v 1225
Parse metadata file
v 1230
Determine location of file from metadata
v 1235
Open file
v 1240
Restore ?le
V
1 Return 1
FIG. 12
US 2010/0333116 A1
Patent Application Publication Dec. 30, 2010 Sheet 16 0f 33
1310 1320 1330
Archive File ID File ID Offset
AF1 F1 OF1 F2 OFZ F3 OF3
FN OFn
1370 1380 1390
Archive File ID Media Chunk Start
C, J, Cycle, AF M1, C1 AF1, OF1, Size M, C2 AF1, OF2, Size M2, C3 AF1, OF3, Size
FIG. 13B
US 2010/0333116 A1
1300
1350
Patent Application Publication Dec. 30, 2010 Sheet 17 0f 33 US 2010/0333116 A1
( Search Index )
1410
Receive Search Request
1420
Search Content Index
1425
Generate Search Results
1430
Get Next Search Result
Archived?
Retrieve Archived Content
More Results?
1460
Provide Search Results
( Return 1
FIG. 14
Patent Application Publication Dec. 30, 2010 Sheet 18 0f 33 US 2010/0333116 A1
2 25 $906 “520 2m:
m 26 mmmLBw @320 mm:
< 26 @9906 “520 (m:
36E
E26 02
EE w<z
=9
mom? H .6 cm;
cm_
ow? H .5
$2650 02‘ @320
Patent Application Publication Dec. 30, 2010 Sheet 19 0f 33 US 2010/0333116 A1
29% m 96 @9906 2.20
mm:
\ij E20 mmmr
E20 mmmv