Case Study–Data Warehouse遠傳電信 CDR DW POC
Situation overviewCustomer currently use Teradata with high maintain cost feeAuto process complex proceduresCompeting against Oracle
Strategy & solution Provide a stable, scalable and reliable Operational Data Store(ODS) solution that can share CDR(Call Detail Records) space occupation and heavy processes in Teradata EDW system, in order to enhance the performance of FET EDW
Solution Design Consideration– Import/Export Strategy
Voice CDR
SMS CDR
GPRS CDR
WAP CDR
CDR ODS- MS SQL 2008
Loading PSTAGE
PDATA
PMART
Teradata
Bulk Insert
SSIS
FastLoad
FastExport
Bulk Insert
Bulk Export
Trans-form
CubesSSAS Cube
s
Fact Table Data Mart- MS SQL 2008
Bulk Copy
SubscriberProfile
CDRSummary
Our Solution Architecture
Teradata
12 nodesCapacity :11
TB
Current Data Volume: ~9TB(FET+KGT)
CDRs
HP DL580 + EMC CX3-20SQL 2008
750 GB874 Text files
Multi-threadBulk Insert
ADO .netProvider
Data Partitioning
Page Compression
Improved Query Performance
SQL 2008 New Features in this POC
SQL Query Analyzer
ODS POC Final ResultItem Description KPI Data
Source Row Count
Data Source Size
MSFT
1 Case I : Load 8/1 ~ 8/30 CDR data into database with pre-defined schema & Perform EOD process then split into 6 modulesCase II : Load 8/31 CDR data & Split it into 6 modules
2.0 Hrs
15 Mins
755 GB CDR Text files
3hr4m
6m58s
2 Replicate PMART.CUS_SUBSCR_CURR & PMART.CUS_SUBSCR_CURR_PP data from Teradata via ETL Automation tools
Yes or No
45GB in Teradata
YES
3 Case I : Generate 8/30 ~ 8/31 CDR_VOICE_DLY data into database with pre-defined schemaCase II : Generate 2008/08 CDR_VOICE_MLY monthly CDR aggregation data into database with pre-defined schema
24 Mins
30 Mins
MO:389,400,000MT:370,040,000
MO : 140G(58G after Compression)MT : 100G(54G after Compression)
22m
90m
4 Write back 2008/08 CDR_VOICE_MLY monthly and daily CDR aggregation data into Teradata with pre-defined schema via ETL Automation tools
Yes or No
N/A YES
SAN Storage儲存設備部分,微軟建議在遠傳現有的 EMC CX3-80 SAN 的擴充,新增磁碟增加所需儲存空間; IT 人員不須學習新的維運管理技能就可有效管理本系統。以下為所建議之 SAN 組成之方式,以資料使用狀況分開存放於不同的 RAID 群組中。
資料類型 放置資料 資料容量 Raid 組成方式 單顆硬碟容量
High Performance
F1/F2 Voice CDR, SMS CDR, GPRS CDR
4TB Raid 0 + 1 146GB
Ordinary Performance
PMart 資料延伸
3TB Raid 5 300GB
Solution Design Consideration- High Availability
兩台資料庫伺服器分別處理不同工作,預估除了在處理EOD/EOM 等尖峰時間外,平時的 CPU 使用率並不會超過40%規劃兩台 SQL 資料庫伺服器將組成 Active-Active mode Fail-over Cluster ,互為備援;當一台伺服器故障後,另一台伺服器將可接手所有的工作負載
6
Case Study – 高效能警政署 150 億筆通聯資料 4~5 seconds search
Situation overviewCIB was pure Oracle database environmentCustomer needs to deal with 15 billoins records Oracle ran insert & index building 15 hours
Strategy & solutionPartner came to MTC for a POC supportPOC result in preparing time for 6 hours and search for 5 seconds in 15 billions record to exceed CIB’s expectation.Also provide best backup and synchronized VLDB solution in different options for CIBMicrosoft provides enterprise-class, mission-critical systems and support
警政署通聯分析平台架構
QM
Nod1 Node2
Linked servers
Distribution Link Architecture
SSIS(Bulk Insert ) + Create 5 Indexes
SSIS(Bulk Insert ) + Create 5 Indexes
Testing Result- Size / Records
Description Size Records
1 1008GB 15,000,000,000
2 1.008TB 150 億筆
Testing Result- Demo
Action Time duration in different flash test
Over all (hr)
Oracle
SSIS (loading data)
1:54 ~2:07 2 5.5
Create 5 Indexes
3: 53~4:20 4.2 10.3
Search 3~6 sec 5 Sec
資農業產銷資訊整合平台
•廣度規劃以農糧署為資料提供單位,以農糧署、農委會統計室與企劃處 經濟研
究科為資料使用單位。本年度訪談計畫將安排統計室及農糧署各三場、經濟研究科二場,訪談內容包括決定重要農產品項目、監控基準、資料來源、資料清理原則、資料採礦模式、展現方式等。•深度
以時間、地區、產量及價格為主要分析項目•作業功能 :
1. 最新新聞閱覽 2. 文件管理 3. 量化指標
品項範圍選定原則 : (依本年度時程考量)
11
IBM HS22
12
資農業產銷資訊整合平台
農委會 BI 專案範圍
(1)平台軟體
•資料庫基本功能• 資料 ETL(Extract Transform Load)功能
•多維分析功能•指標管理及儀表板功能•報表服務功能•網頁展示功能•決策支援展示功能
(2)應用範圍驗證
•資料介接功能•篩選設定重要農產品項目功能•設定農產品監控基準功能•時間序列之資料比較功能•提示功能• 資料下鑽( Drill down)或上鑽( Drill up)功能
•圖形呈現功能•排序功能
13
14
個人化首頁
15
農情 BI 中心
16
農產品價格分析
17
時間地區分解樹分析
18
地區時間分解樹分析
19
價格預測分析平台
20
青蔥產量價格分析
The Common, non-AMO pitfalls
Too large dimensionsBig Distinct Count Measure Groups
... With bad partitioningMany-to-Many dimensions
...with large intermediate fact tablesParent/Child dimension
... With too many membersNear Real Time
... With a constant, high throughput flow of dataPartitioning
... With too many partitions
See a
pattern?
22
Excel 工具整合 時間序列預測分析
23
Excel 工具整合 狀況分析 (What-if 假設分析 )
異動資料擷取 (CDC)提供資料表的歷程記錄變更資訊有效監控資料狀態提供具效率的資料整理方式