11
EMC Storage Pool Deep Dive: Design Considerations & Caveats Posted by veverything on March 5, 2011 This has been a common topic of discussion with my customers and peers for some time. Proper design information has been scarce at best, and some of these details appear to not be well known or understood, so I thought I would conduct my own research and share. Some time ago, EMC introduced the concept of Virtual Provisioning and Storage Po ols in their Clariion line of arrays. The main idea for do ing this is to make management for the storage admin simple. The traditional method of managing storage is to take an array full of disks, create discrete RAID groups with a set of disks, and then carve LUNs out of those RAID groups and assign them to hosts. An array could have dozens to hundreds of RAID groups depending on its size, and often times this would result in stranded islands of storage in these RAID groups. Some of this could be alleviated by properly planning the layout of the storage array to avoid the wasted space, but the problem is that for most customers, their storage requirements change and they very rarely can plan how to lay out an entire array on day 1. There was a need for flexible and easy storage management, and hence the concept of Storage Pools was born. Storage pools, as the name implies, allows the storage admin to create “pools” of storage. You could even in some cases, create one big pool with all of the disks in the array which could greatly simplify the management. No more stranded space, no more deep architectural design into RAID group size, layout, etc. Along with this comes a complimentary technology called FAST VP, which allows you to place multiple disk-tiers into a storage pool, and allow the array to move the data blocks to the appropriate tier as needed based on performance needs. Simply assign storage from this pool as needed, in a dynamic, flexible fashion, and let the array handle the rest via auto tiering. Sounds great right? Well, that’s what the marketing says anyway. First let’s take a brief look at the difference between the traditional RAID group based architecture and Storage Pools.

EMC Storage Pool Deep Dive

  • Upload
    yhaseeb

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 1/11

EMC Storage Pool Deep Dive: Design Considerations

& Caveats

Posted by veverything on March 5, 2011

This has been a common topic of discussion with my customers and peers for some time. Proper 

design information has been scarce at best, and some of these details appear to not be well

known or understood, so I thought I would conduct my own research and share.

Some time ago, EMC introduced the concept of Virtual Provisioning and Storage Pools in their Clariion line of arrays. The main idea for doing this is to make management for the storage

admin simple. The traditional method of managing storage is to take an array full of disks, create

discrete RAID groups with a set of disks, and then carve LUNs out of those RAID groups and

assign them to hosts. An array could have dozens to hundreds of RAID groups depending on itssize, and often times this would result in stranded islands of storage in these RAID groups. Some

of this could be alleviated by properly planning the layout of the storage array to avoid thewasted space, but the problem is that for most customers, their storage requirements change and

they very rarely can plan how to lay out an entire array on day 1. There was a need for flexibleand easy storage management, and hence the concept of Storage Pools was born.

Storage pools, as the name implies, allows the storage admin to create “pools” of storage. Youcould even in some cases, create one big pool with all of the disks in the array which could

greatly simplify the management. No more stranded space, no more deep architectural design

into RAID group size, layout, etc. Along with this comes a complimentary technology calledFAST VP, which allows you to place multiple disk-tiers into a storage pool, and allow the array

to move the data blocks to the appropriate tier as needed based on performance needs. Simply

assign storage from this pool as needed, in a dynamic, flexible fashion, and let the array handlethe rest via auto tiering. Sounds great right? Well, that’s what the marketing says anyway.

First let’s take a brief look at the difference between the traditional RAID group basedarchitecture and Storage Pools.

Page 3: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 3/11

Page 4: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 4/11

Depicted in the above figure is what a storage pool looks like under the covers. In this example,

it is a RAID5 protected storage pool created with 5 disks. What FLARE does under the covers

when you create this 5 disk storage pool is to create a Private RAID5 4+1 raid group. From thereit will create 10 Private LUNs of equal size. In my test case, I was using 143GB (133GB usable)

disks, and the array created 10 Private LUNs of size 53.5GB giving me a pool size of ~530GB.

This is what you would expect from a RAID5 4+1 RG (133*4 = 532GB).

When you create a LUN from this pool and assign it to the host, the I/O is processed in a

different manner than in a traditional FLARE LUN. In a traditional (ignoring Meta-LUNs for simplicity) FLARE LUN, the I/O is going to one LUN on the array which is then written directly

to a set of disks in its RAID group.

However, as new host writes come into a Pool LUN, space is allocated in 1GB slices. For Thick 

LUNs, this space is contiguous and completely pre-allocated. So, if one were to create a 10GB

Thick Pool LUN, there would be 1GB slices allocated across each of the 10 Private LUNs for a

total of 10x 1GB slices. As host writes comes into the Pool LUN, the LBA (Local Block 

Address) corresponding with the host write has a 1:1 relationship with the Pool LUN; meaning,LBA corresponding to 0-1GB on the host would land on Private LUN0 since it contains the first

1GB slice; LBA 1-2GB writes on Private LUN1, LBA 2-3GB writes Private LUN3…. and so onas shown below:

Page 5: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 5/11

These LUNs are all hitting the same Private RAID group underneath and hence the same disks. I

assume EMC creates these multiple Private LUNs for device queuing/performance related

reasons.

Caveat/Design Consideration #1: One very important aspect to understand is EMC’s

recommendation to create R5 based pools in multiples of 5 disks. Again this is a VERYimportant thing to note because it could lead to unexpected results if you don’t fully understand

this, and proceed to create pools from non-5disk multiples. The pool algorithm in FLARE tries to

create the Private RAID5 groups for as 4+1 whenever possible. As an example, if you ignoredthe 5 disk multiple recommendation, and created a pool with 14 disks, you will NOT get the

capacity you might expect. FLARE will create 2x 4+1 R5 Private RGs, and 1x 3+1 Private RG,

 NOT a single 13+1 Private RG that you may expect. So you would end up capacity which is

lower than you are expecting.

In my case, using 143GB disks (133GB usable), with a 14disk R5 pool I would get

(4*133)+(4*133)+(3*133)=~1460GB. Not the expected (13*133)=1730GB. A difference of 

almost 300GB; quite significant! The best option in this case is to add another drive and create a15disk R5 pool, achieving 3x 4+1 RGs under the covers. This is important to consider when

configuring the array as you could end up with one irate customer if multiple 300GB slices gomissing over the span of the array!

 Next, let’s take a look at some aspects of I/O performance, and some things to consider whenexpanding the pool.

With a pool composed of 5 disks, things are pretty simple to understand because there is 1x 4+1Private RG underneath handling the I/O requests, but what happens when we expand the pool?

Keeping in mind we need to expand this pool by a multiple of 5, lets add another 5 disks to it,

 bringing the total capacity to 530*2 = ~1060GB. Underneath the covers, the pool now looks likethis:

Page 6: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 6/11

 

After adding the 2nd set of 5disks, FLARE has created another 4+1 Private RAID group and 10

more Private LUNs from that RAID group. The Private LUNs currently have no data on them.

Design Consideration / Caveat #2: Note that when the Storage Pool is expanded, the existingdata is NOT re-striped across the new disks. Reads to the original Pool LUN will still happen

only across the first 5disks, and so will writes to the existing 10GBs LBAs that were previously

written to. So do not expect a sudden increase in performance on the existing LUN by expandingthe pool with additional disks.

In my testing, I brought my Pool LUN into VMware and put a single VM on it, and then

expanded the pool, and put another VM on it. Before putting the 2nd VM on the LUN, the datalayout looked exactly as depicted above. There was data spread across the Private LUNs

associated with the first Private RAID group, and no data on the Private LUNs of the secondRAID group. When I cloned another VM onto the LUN, this is what it looked like:

Page 7: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 7/11

 

VM1s data is still spread across the first Private RG and first 10 Private LUNs as expected, but

VM2s data is spread across BOTH Private RAID groups and all 20 Private LUNs! Think aboutthat for a second: 2VMs, on the SAME VMFS, on the SAME Storage Pool, one get the I/O of 5

disk striping, and the other gets the I/O of 10disk striping; talk about non-deterministic

 performance! That second VM will get awesome performance as it is wide striped across10disks, but the first VM is still using the only first 5 disks. These are both 100GB VMs (in my

testing), so all the slices aren’t depicted, but it still illustrates the point. The actual allocation

would show 100 slices (1slice = 1GB as previously mentioned) allocated across Private LUNs 0-9 for VM1, and 50 slices across Private LUNs 0-9 and 50 slices across Private LUNs 10-19 for 

VM2 as the overall slice distribution. If I keep placing VMs on this Pool LUN, they will

continue to get 10disk striping, UNTIL the first Private RG gets full, at which point any

subsequent VMs will get only 5 disk striping.

 Now this imbalance occurred because there was still free space in the first RG, so the algorithm

allocated slices there for the 2nd VM because it does show in a round robin fashion. If the poolwas at capacity before being expanded, we would likely get something like this (not tested,

extrapolating based on previous behavior):

Page 8: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 8/11

 

In this diagram, the blue simply re presents “other” data filling the pool. If the pool was at

capacity, and then expanded, and then my 2nd VM placed on it, the 2nd VM could not get slicesfrom the first Private RAID Group (because its full) so its slices would come ONLY from the

2nd Private RAID group, spreading its data across only 5 disks, instead of 10 like last time.

Imagine a situation where a VM was created before the first Private RG filled up. Some of theVMs I/O could be striped across 10disks, and the rest across 5 disks as the first Private RG fills.

Design Consideration / Caveat #3: As illustrated above If you expand a storage pool before it

gets full or close to being full, you may get unpredictable I/O performance capabilities asdepending on under what condition you expand the pool, you could get different levels data

striping on the data sets. Things can get even more hairy if you decide to add disks outside the5disk multiple recommendation. If you just need enough space for 4more disks as an example,and expanded the pool by 4 disks, you would end up with 2x 4+1 RGs, and 1x 3+1 RG

underneath. At some point, some of the I/O could be restricted to just 3disk striping, instead of 5

or 10.

From this, it seems the best way to utilize storage pools is to allocate as many disks as you can

upfront. By this I mean, if you have a tray of disks on a Clariion or VNX, allocate all 15disks

Page 9: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 9/11

when creating the pool. This will give you 3x 4+1 RGs underneath, and any data placed in the

 pool will get striped across all 15disks consistently. It would be good to avoid creating small disk 

count pools, and expanding them frequently with 5disks at a time, as you could run into issueslike the above very easily and not realize it.

There is one other issue to consider in pool expansion. Let’s say you create a pool with 15disks,and start placing data on it. All of your I/O is being wide-striped across the 15disks and all is

well, but now you need more space, and need to expand the pool. Going by the 5disk multiple

rule you should be safe adding 5 disks right? While this is something you can do, and it willwork, it may again give unexpected results.

Before expansion, your 15 disk R5 pool looks like this:

All data is spread across 15 disks, but the pool is at capacity (imagine it is full); if the pool isexpanded at this point, here is what it would look like with any new VMs (or any data) are placed on it:

Page 10: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 10/11

After the pool is expanded, the new data is only getting striped across 5 disks, instead of the

original 15! So if you placed a new VM on this device, expecting very side striping, you could be

sorely disappointed as it is only getting 5 disks worth of data striping.

Design Consideration / Caveat #4: From this, the recommendation to expand storage pools

would be expand it by the number of disks it was initially created with. So if you have a 15disk storage pool, expand it by another 15disks so the new data can take advantage of the wide

striping. I have also heard people recommend doubling the storage pool size as a

recommendation, but this may be overkill. As an example, if you have a 15disk storage pool, andadd another 15 disks to it, you could theoretically have some hosts I/O striping their data over 30

disks; so should you now expand this pool by 30disks instead of 15? And then 60 disks the next

time? As always, understand the impact of your design choices and performance requirements

 before making any decisions as there is no blanket right/wrong approach here.

Hopefully EMC will introduce a re-balance feature to the pool like what exists in the latest

VMAX code to alleviate most of these issues. But until then, these are some things to be awareof when designing and deploying a Storage Pool based configuration.

Another thing to watch out for is changing default SP owner of the Pool LUN. Because the LUN

is made up of Private LUNs underneath, that can introduce performance problems as it has to use

Page 11: EMC Storage Pool Deep Dive

7/27/2019 EMC Storage Pool Deep Dive

http://slidepdf.com/reader/full/emc-storage-pool-deep-dive 11/11

the redirector driver to get to the LUNs on the other SP; so make sure to balance the pool LUNs

when they are first created.

Utilizing Thin LUNs introduces a whole new level of considerations as it does not pre-allocate

the 1GB slices, but rather writes in 8K extents. This can cause even more unpredictable behavior 

under the circumstances outlined above, but that should be something to be aware of when usingThin provisioning in general. Then there comes the variable of utilizing Thin Provisioning on the

host side adding another level of complexity in how the data is allocated and written. I may write

a follow up post to this illustrating some of these scenarios in a Thin provision environment on both host and array sides. Also, I did not even touch on some of the considerations when using

RAID10 pools, and I will probably follow up with that later as well.

Generally speaking, if ultra deterministic performance is required, it is still best to use traditional

RAID groups. Customers may have certain workloads that simply need dedicated disks, and I see

no reason not use RAID groups for those use cases still. Again, its about understanding the

requirements and translating them into a proper design; the good news is the EMC arrays give

that flexibility. There is no question that using storage pool based approaches take themanagement headache out of storage administration, but architects should be aware of the

considerations and caveats of any design, always. Layering FAST VP on top of storage pools isan excellent solution for the majority of the customers, and it is important to note that the ONLY

way to get automated storage tiering is to use Pool based LUNs.

As always comments/questions/corrections always welcome!