Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
BigDataandAnalyticsUseCases
Dr.Abzetdin AdamovSchoolofInformationTechnologyandEngineering
ADAUniversitywww.site.ada.edu.az/~aadamov
Objectives
à Technological Landscape
à DefineDataScienceandwhataDataScientistdoes
à ListBigDataandDataScienceUseCases
à DefineStandardParametersforUseCases
Aftercompletingthislesson,youshouldbeableto:
MostRequestedUsesofBigData
• LogAnalytics&Storage• Fusionofmulti-INT(multi-formats)• RFIDTracking&Analytics• FraudDetection&Modeling• RiskModeling&Management• 360°ViewofaPerson,Place,orThing• WarehouseExtension(casepatterns)• Email/CallCenterTranscriptAnalysis• CallDetailRecordAnalysis• IBMWatson
DowehaveBigData?
real-timeBigDatausecases
AdTechnologyDigitalMarketingFraudDetectionInternetofThings
Cyberthreat SecurityNetworkMonitoringPersonalizedMedicine
BIGBIGDATA
DatathatisTOOLARGE&TOOCOMPLEXforconventionaldatatools
tocapture,storeandanalyze.
SharestradedonUSStockMarketseach
day:
7Billion
DatageneratedinoneflightfromNY
toLondon:
10Terabytes
NumberoftweetsperdayonTwitter:
400Million
Numberof‘Likes’eachdayonFacebook:
3Billion
The3V’sofBigData
VOLUME VARIETY VELOCITY90% OFTHEWORLD’S
DATAWASGENERATEDINTHELASTTWOYEARS
BigDataEverywhere!
6
WhatisAnalytics?
Dataonitsownisuselessunlessyoucanmakesenseofit!
WHATISANALYTICS?Thescientificprocessoftransformingdataintoinsightformakingbetterdecisions,offeringnewopportunitiesforacompetitive
advantage
7
TheCaseforBusinessAnalytics
• TheBusinessenvironmenttodayismorecomplexthaneverbefore.
• Businessesareexpectedtobediligentlyresponsivetotheincreasingdemandsofcustomers,variousstakeholdersandevenregulators.
• Organizationshavebeenturningtotheuseofanalytics.
• Morethan83%ofGlobalCIOssurveyedbyIBMin2010singledoutBusinessIntelligenceandAnalyticsasoneoftheirvisionaryplansforenhancingcompetitiveness.Inmostcasestheprimaryobjectiveof
anorganizationthatseekstoturntoanalyticsis:• Revenue/Profitgrowth• Optimizeexpenditure
SOLUTION
BUSINESSNEED
GOAL
8
TypesofAnalytics
1
32
AnalyticsAnalytics
PrescriptiveAnalytics
DescriptiveanalyticsPredictiveanalytics
Enablingsmartdecisionsbasedondata
What should we do?
Miningdatatoprovidebusinessinsights
What has happened?
PredictingthefuturebasedonhistoricalpatternsWhat could happen?
9
Disrupt or Be Disrupted41% of executives say
digital disruption increases risk of being put out of business
“New technologies combine to create a business innovation
platform, not just a technology platform, helping transform
every industry on the planet.”IDC (Dec. 2014)
Digital Vortex 2015
ChallengingEnvironment– 5Vs5VsthatbestdescribethenatureofBigDataProblem
BillyBeanTransformedBaseballScouting
Theuseofdataanalyticshaschangedthewaymanymajorleaguefrontofficesdobusiness– emphasizingdataovertraditionalplayermetrics.
GoogleTransformedAdvertising
LinkedInTransformedNetworkGrowth
NetflixTransformedStreamingVideo
AmazonTransformedProductRecommendations
RetailTransformedMarketBasketAnalysis
QuestionWhichproductsareveryfrequentlypurchasedtogether?
SolutionApproachDeterminefrequentitemsetsfromtransactionlogsUsetodesignphysicalstorelayoutaccordinglyUsetodesignspecialoffersandcouponstrategy
ExampleTheurbanlegendofDiapers&Beer
FinancialUseCasesTransformedAnalytics
Financialfirmsuseparametersaboutcustomerstodeterminerisk
à Likelihoodofcustomerrepayingaloan• Collectdata(FICO,networth,etc.)
• Buildapredictivemodel• Usemodeltopriceloans
Creditcardcompanieslookattransactionfactorstodetectfraud
à Likelihoodanygiventransactionisfraudulent• Trackspendinghabits• Buildaspendingmodel• Alertwhentransactionfallsoutsidemodel
CustomerProfiling FraudDetection
DemandforDataScientists
TheDataScienceSkillsetContinuumSoftwareengineer
ResearchScientist
DataEngineer
DataScientist
AppliedScientist
Role Data Engineer AppliedScientist
Function Buildproduction-gradedataproducts Findsignal/meaninginthe dataAppliesstatistical/ML modelsandtunesthealgorithm
Goodat…. DataandSystemsarchitectureHadoop,PIG/HIVE,MapReduce,ops/adminJava,Python, Perl,SQL,C++,NoSQL(HBase,Cassandra,Mongo)
Statistics, MachinelearningTextprocessing,NLPR,Python,Matlab,SAS,SQLScriptingVisualization /tellingthestory
WhatisDataScience?
Whatisadatascientist?Apersonwhodoesthis
“Buildingsoftwareproducts(akaDataProducts)whosecore functionalityreliesonapplyingstatisticalormachinelearningmethodstodata.”
Nicehistoricalperspective:http://whatsthebigdata.com/2012/04/26/a-very-short-history-of-data-science/
DataScienceSkillSetDataScience
ComputerScience
Math&Statistics
MachineLearning
Unicorn
SubjectMatterExpertise
TraditionalSoftware
TraditionalResearch
DataScientistSkillSetOurperspective:• It’sahybridrole:dataengineer+appliedscientist• Combinesmanydisciplines:
DataScience
Machinelearning
Statistics
Datamunging
HackerskillsData
engineering
Visualization
Businessmindset
Harness thegrowingandchangingnatureofdataWhatisBigData?
StreamingStructured
Challengeiscombining transactionaldatastoredinrelationaldatabaseswithless structureddata
BigData=AllData
Gettherightinformationtotherightpeople attherighttimeintherightformat
Unstructured
“ ”
Connectivity Data AnalyticsThings
IoT =sensor-acquireddata
UsingaDataLakeModernArchitecture
Alldatasources areconsidered
Leveragesthepowerofon-premtechnologies andthecloudforstorageandcapture
Nativeformats,streamingdata,bigdata
Extractandload,no/minimal transform
Storageofdatainnear-nativeformat
Orchestrationbecomespossible
Streamingdataaccommodationbecomespossible
Refineriestransformdataonread
Producecurateddatasetstointegratewithtraditionalwarehouses
Usersdiscoverpublished datasets/services using familiartools
CRMERPOLTP LOB
DATA SOURCES
FUTURE DATA SOURCESNON-RELATIONAL DATA
EXTRACT AND LOADDATA LAKE DATA REFINERY PROCESS
(TRANSFORM ON READ)
Transform relevant data into data sets
BI AND ANALYTCIS
Discover and consume predictive analytics, data sets and other reports
DATA WAREHOUSE
Star schemas,viewsother read-optimized structures
A Broad Perspective To Set The Scene
SaaS
PaaS IaaS
Data Center Cloud Edge / IoT
APP
APP
APP
APP
APP
APP
APP
27© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
TheExplosionofUnstructuredData
2005 20152010
• More than 90% is unstructured data
• Approx. 500 quadrillion files• Quantity doubles every 2 years• Most unstructured data is neither stored nor analyzed!
1.8 trillion gigabytes of data was created in 2011…
10,000
0
GB
of D
ata
(IN B
ILLI
ON
S)
STRUCTURED DATA
UNSTRUCTURED DATA
Source:Cloudera
Value
Simple
Secure
Scalable
Open
Manageable& Visible
Cost Effective $
ApplicationCentricInfrastructure
ModernDataWarehouseThe
Dream
TheReality
Let’ssetONlightbulbsinyourhead
Recommenda-tionengines
Smartmetermonitoring
Equipmentmonitoring
Advertisinganalysis
Lifesciencesresearch
Frauddetection
Healthcareoutcomes
Weatherforecastingforbusinessplanning
Oil&Gasexploration
Socialnetworkanalysis
Churnanalysis
Trafficflowoptimization
ITinfrastructure&WebAppoptimization
Legaldiscoveryanddocumentarchiving
DataAnalyticsisneededeverywhere
IntelligenceGathering
Location-basedtracking&services
PricingAnalysis
PersonalizedInsurance
CROSS-INDUSTRIALUSECASESMicrosoft’sVision
TheInternetofThings– Manufacturing
GLOBALOPERATIONS
Icanseemyproductionlinestatusandrecommendadjustmentstobettermanageoperationalcost.
Iknowwhen todeploytherightresourcesforpredictivemaintenance tominimizeequipmentfailuresandreduceservicecost.
Igain insightintousagepatternsfrommultiplecustomersandtrackequipmentdeterioration,enablingmetoreengineer productsforbetterperformance.
MANUFACTURINGPLANT
Aggregate productdata,customersentiment,andotherthird-partysyndicateddatatoidentifyandcorrectqualityissues.
Manageequipmentremotely, usingtemperature limitsandothersettingstoconserveenergyandreducecosts.
Monitorproductionflowinnear-realtimetoeliminatewasteandunnecessaryworkinprocessinventory.
GLOBALFACILITYINSIGHT
Implementcondition-basedmaintenancealerts toeliminatemachinedown-timeandincrease throughput.
THIRD-PARTYLOGISTICS
Providecross-channelvisibilityintoinventoriestooptimizesupplyandreducesharedcostsinthevaluechain.
CUSTOMER SITE
Transmitsoperationalinformationtothepartner (e.g.OEM)andtofieldserviceengineers forremoteprocessautomationandoptimization.
Management
R&D
FieldService
TheInternetofThings– Oil&Gas
Utilizeadvanced3Dand4Dvisualizationsbasedonanalyticalgorithmstomodelsubsurfacegeology
ProductionManager
Onsitepersonnel
Establishnear real-timecommunicationandautomaticallypublisheventsandalarms tothefieldtoguideandprotectonsitepersonnelandassets
Integrate allupstreamdataontoaunifiedplatformtofacilitateanalytics,informationsharing,andorganizationaltransition
1.Exploration 2.Development
3.Drilling4.Production
Geologist
Consolidatedatafromsurveys,drilllogs,andexternal sourcestogenerateadvancedreservoirmodelsandproductionforecasts
Maximizerecoverybymonitoringnearreal-time productiondataandgeneratingalertsforconditionalmaintenanceneeds
Combinenear real-timedrillingandseismicdatatooptimizedrillingtrajectoriesandrecoverypotential,whileminimizingenvironmental risk
OperationsControlCenter
Findnewhydrocarbonreservoirsquickerwithseismicdatauploadedtothecloudandprepared foranalysis
NORTHSHOREPRODUCTION
PHARMACY
TheInternetofThings– Pharma
CustomerService
Monitordevicedatatomakemore timelyhealthdecisions,suchasadjustingdosages
Enableadvancedproducttrackingandauthenticationtopreventcounterfeits
Developbetterproducts,faster,informedbyamuchlargerdatasetbasedonpatientoutcomes
R&D
Anticipatemedicaldevicemaintenanceneeds,andalertpatientstoscheduleadoctorvisitforreplacementorrepair
HealthcareProvider
Monitormedicaldevice functionalityforbettercustomerservice,reduced risk,andinsighttoimproveproductdesigns
Manageequipmentremotely, usingappropriateKPIs
Reducemachinedowntimewithcondition-basedmaintenancealerts
PatientHome
DistributionManufacturing
Aggregate andcorrelatedatafromdisparatemedicaldeviceswithmedicationsandhealthoutcomesforadvancedinsight
TheInternetofThings– Healthcare
HOSPITALPATIENTHOME OUTPATIENT FACILITY
Connectpatientdatatocontextualdata, sothelatestpatientdataautomaticallydisplaysoncareproviderdevicesbasedontheirlocationandrole.
Transformthevehicle intoasmartenvironment thatmonitorshealthindicators.
Monitorpatientconditionswithin-homemedicaldevicesthatalert care teamstaffwhenahealtheventoccurs.
Makeauthorizedpatientdataaccessiblefromaunifiedpoint,enablingaholisticviewofthepatient’sjourneysoproviderscanoptimizeeachcare interaction.
Integrate datafromexistingandnon-traditionalsourcestodriveBigDataanalytics,enablingcareprocessinnovationandhealthcare transformation.
HEALTHCARE ECOSYSTEMCombinedatafromvarioussourcestouncoverinsightsthatenableanenhancedpatientjourney,improvedoperationalefficiency,andbetter riskmanagement.
Makepatientdatavisibleandactionableinnear real-time,enabling improvedoutcomesthroughdata-drivendecisionmaking,bettercoordinationanderrorreduction.
PHARMACY
GOVERNMENT
RESEARCH
INSURANCECOMPANIES
DEMOGRAPHICS
WEATHER RETAIL
Enableaninteractiveexperiencebetweenpatientsandcollaborativecare teams,andreduceresponsetimesbyprovidingremoteaccesstothelatestpatientdata.
35
TheInternetofThings– Retail
Marketing
MOBILEEXPERIENCE
STOREPURCHASEHISTORY:
Dogfood
MTWThF
WeatherData
Merchandizing
IN-STORESHOPPING
OnlineBehavior
ShoppingRoute
REFLECTION
BESTDEAL
INSPIRATION,DISCOVERY,
PRE-SHOPPING
PurchaseHistory
RIGHTOFFER,RIGHTTIME,RIGHTPLACE
IoTDATAFUELSCUSTOMERANDPRODUCTINSIGHTS
100100100
1001
0010
0101
1010
0101
0110
0101
1010
0101
0000
1010
1010
1011
0100
1011
0100
1010
10
CHECKOUT
Retail
200ft
Have youseenthese!
We’re ready fortherain!#ShoppingSuccess
42
CROSS-INDUSTRIALUSECASESbasedonStandardParameters
USE-CASEPARAMETERS
1.Company 2.DataSources 3.Techniques 4.BusinessValue
EXAMPLE:DATASOURCE
• ServerTelemetry• MonitoringLogs• MemoryConsumption• Processing(CPU)Consumption• NetworkFlow
EXAMPLE:TECHNIQUES
• PatternRecognition• ProactiveMonitoring• EarlyAlertDelivery
EXAMPLE:BUSINESSVALUE
P R O F I T = R E V E NUE – CO S T Toincreasethis… … increasethis… …ordecreasethis…
EXAMPLE:TELECOMMUNICATIONS
DataSources Techniques BusinessValue
• CustomerRecords• ContractData• PurchaseOrders• CallCenter
• ETL• Analytics
Problem:ETLOffload
PRO F I T = R E V E NUE – C O S T Toincreasethis… … increasethis… …ordecreasethis…
+
ETL- HadoopDataAnalytics
EXAMPLE:CARDISSUER
DataSources Techniques BusinessValue
• CustomerPurchaseHistory• MerchantDesignations• MerchantSpecialOffers
• ETL• MachineLearning
Problem:LowProfit
P RO F I T = R E V E NUE – C O S T Toincreasethis… … increasethis… …ordecreasethis…
+ +
EXAMPLE:WASTE&RECYCLING
DataSources Techniques BusinessValue
• TruckGeolocationData• 20000trucks• 5secinterval
• LandfillGeo-boundaries
• Realtime Streaming• BatchComputation• GraphAlgorithms
Problem:IdleAlerts
P RO F I T = R E V E NUE – C O S T Toincreasethis… … increasethis… …ordecreasethis…
+ +ShortPath–GraphAlgorithms
• ImmediateAlerts• TaxReduction• RouteOptimization
EXAMPLE:BANKING
DataSources Techniques BusinessValue
• Anti-MoneyLaundering• ConsumerTransactions
• BayesianLearning• PeerGroupAnalysis
Problem:FraudDetection
PRO F I T = R E V E NUE – C O S T Toincreasethis… … increasethis… …ordecreasethis…
+ +SuspiciousEvents=
EXAMPLE:DNAANALYSIS
DataSources Techniques BusinessValue
• Birth,Death,Census,• Military,ImmigrationRecords• SearchBehaviorActivity• DNASNP
• RecordLinking• SearchRelevance• Clickstream• SecurityForensics• DNAMatching
Problem:SearchRelevance,DNAMatching
PRO F I T = R E V E NUE – C O S T Toincreasethis… … increasethis… …ordecreasethis…
+ +
Q&A ?Abzetdin Adamov,Assoc Prof.Emailmeat:[email protected]:@Linktomeat:www.linkedin.com/in/adamovVisitmyblogat:aadamov.wordpress.com