4
5/23/2018 PersistanceStorage-slidepdf.com http://slidepdf.com/reader/full/persistance-storage 1/4 When you do modifications - these are written to memory (immediately), when transaction is committed then it is also written to disk (log files) and later asynchronously as part of periodic save point (by default once per 5 minutes) it is also written to disk to data files... you can open table (right click + Open Definition) and then you can switch to tab "Runtime Information" to see information related to memory vs. disk. Also notice information on tabs Parts (listing partitions) and Columns (listing individual columns). You might add additional columns into this view by right clicking the table to get more detailed information. In general you do not need to worry about table being persisted. Table is always on disk and can be fully or partially loaded into memory where it is operated. I thought that all of the data will be stored in-memory and hard disk will only be used for taking back up and only be used for recovering when the HDB crashes or undergoes failure.  As per your saying, i just created a small column store table with only 5 records in it and saw the run-time information. Below is the screenshot.

Persistance Storage

Embed Size (px)

Citation preview

When you do modifications - these are written to memory (immediately), when transaction is committed then it is also written to disk (log

files) and later asynchronously as part of periodic save point (by default once per 5 minutes) it is also written to disk to data files...

you can open table (right click + Open Definition) and then you can switch to tab "Runtime Information" to see information related to memory

vs. disk. Also notice information on tabs Parts (listing partitions) and Columns (listing individual columns). You might add additional columns

into this view by right clicking the table to get more detailed information. In general you do not need to worry about table being persisted. Table is always on disk and can be fully or partially loaded into memory where

it is operated.I thought that all of the data will be stored in-memory and hard disk will only be used for taking back up and only be used for recovering when the HDB crashes or undergoes failure.As per your saying, i just created a small column store table with only 5 records in it and saw the run-time information. Below is the screenshot.

As per this, most of the data is residing in-memory and only few percentage was on disk.Estimated Max size = Size in memory + Size on Disk.Size in Memory = Main Storage size + Delta Storage SizeAlso can you please throw some light on Main storage and Delta storage size?

discussions memory vs. disk are mostly rhetorical discussions... here is maybe more clear explanation:- memory is volatile = you will loose it in case of outage- to protect the data you need to "persist" the information - you need to write the log file to disk as part of commit to ensure transactional consistency --> direct impact on performance = we need fast medium for logs (usually flash technology)- if you have logs you are protected - data in data files does not need to be instantly written and can be persisted later as part of save point operationSo now you have everything in both memory and disk... Now if you do restart of SAP HANA..- when you stop - all data from memory is gone (volatile medium)- during startup only row based tables are loaded - column based tables are loaded only if marked with preload flag- remaining column based tables are loaded when first used - and not completely but only those columns that are required (lazy loading allowing faster startup)Now you have all on disk but only relevant part that is really required in memory...Where is SAP HANA operating? In memory (for inserts, updates and deletes also on flash disk in log files)...Is data in memory loaded completely? No... only what is required is loaded... SAP is also working on mechanism to push data out of memory (like PSA tables) because these are not required to be kept in memory...Are data on disk required? Yes - logs are super-important for consistency and data files to have something from where you can populate the memory in case that data needs to be loaded into memory (during startup or later)...As you can see it is just matter of perspective...Regarding your image - notice that you have status LOADED = PARTIALLY - this means that table is not completely in memory...Also regarding why it is in memory bigger - because not everything is persisted on disk - some memory structures are existing only in memory and are populated during the load operation.

now second part - regarding main storage and delta storage...Just quick (simplified) recap what is delta storage - SAP HANA column based tables (main storage) are optimized for read performance - that means that data for each column is stored in sorted dictionary and just dictionary positions are forming the column...Adding new record into this is quite expensive operation - you need to check if dictionary entry exist and if not then you need to create new entry and sort the dictionary again... and again for another modification... and again...To avoid repeating this expensive operation it is more logical to have some temporary area (delta store) where you will queue modified records and then process them in batch...That is all - delta area can be seen as temporary area for modified records before these are "merged" inside the main storage - which is called delta merge operation...In your case you inserted rows into table but database did not yet perform delta merge (which is done automatically when certain conditions are met) - therefore you can see main store is empty while delta is full.....you can trigger delta merge manually (see SQL guide) and then you will be able to see that delta will be empty and main store will be occupied...

if you are saying that the table when we create are on persistent layer,yes - all tables must be on persistent level so that in case of power outage you do not loose data... (but you will not update them instantly but later asynchronously)> ...then why are logging all the changes and creating the save points regularly...we are logging the changes to be covered in case of power outage (when content of memory is lost) - in that case data is loaded from data files and then post-processed using information from log files (containing every transaction that was ever committed)save points are required to have once upon a time (usually once per 5 minutes) consistent data and to ensure that we do not need more then last 5 minutes of log information (otherwise it could happen that we might need A LOT of log files to process)> ..Since the data is already on persistent layer, & also the in memory data is loaded from persistent layer only. we dont require any backup. Right ?Not at all - backup is vital and super-important - if there is failure and you will loose or corrupt data file (I never seen this but it might hypothetically happen) that it is very likely that SAP HANA will fail (and you will loose information in memory) - and even if not then SAP HANA does not have mechanism how to reconstruct data file from information in memory...If this would happen that you need data file and all log files to restore the operation - alternatively you might switch to second data center in case that disaster recovery solution is implemented (but even in this case backups are important)...What you might encounter are also logical errors or human errors - for example someone will make mistake and delete table and you might need to use one week old backup to undo the damage...Backup strategy is equally important as with other databases...> And all the data that is changed in the source system will be replicated to the tables. i.e. which are stored in the persistent layer.In case you are using SAP HANA as side-car scenario (you replicate data from other systems and you do not have any native data) there are still other objects that should be preserved - for example modeling, users / roles, etc...

more or less yes - when you open table in Studio then Studio will send SQL query to the server, server will process the query - if table columns that are required are not in memory than required table columns are loaded into memory, data is extracted from memory and query result is returned back to Studio where it is displayed.