Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Milano, September 12, 2013 Andrea Ghiglietti
Francesca Ieva
BigBig DataData meansmeans……
ExampleExample II:: BigBig DataData andand SocialSocial NetworksNetworks
ExampleExample IIII:: BigBig DataData inin HealthCareHealthCare ContextContext
TableTable overviewoverview
ExampleExample IIII:: BigBig DataData inin HealthCareHealthCare ContextContext
�� �� �� discussiondiscussion �� �� ��
ProfileProfile MonitoringMonitoring,, ProcessProcess MonitoringMonitoring && MultichannelMultichannelDataData AnalysisAnalysis inin ManufacturingManufacturing ApplicationsApplications
�� �� �� discussiondiscussion �� �� ��
Big Data Big Data meansmeans . . . . . 3V. 3V
Volume Volume VarietyVariety VelocityVelocity
http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//
. . . a . . . a lotlot ofof
data, more data, more
than can than can bebe
easilyeasily bebe
Big Data Big Data meansmeans . . . . . 3V. 3V
easilyeasily bebe
handledhandled byby a a
single single
database, database,
computer or computer or
spreadsheetspreadsheet
http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//
. . . . . . differentdifferent
kindskinds ofof
information, information,
lackinglacking
inherentinherent
Big Data Big Data meansmeans . . . . . 3V. 3V
inherentinherent
structurestructure or or
predictablepredictable
sizesize, rate , rate ofof
arivalarival, ,
transformatiotransformatio
nn or or analysisanalysis
whenwhen
processedprocessed
http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//
. . . . . . ProcessProcess
incomingincoming
data and data and getget
answersanswers
quicklyquickly
Big Data Big Data meansmeans . . . . . 3V. 3V
quicklyquickly
enoughenough,,asas to to
notnot delaydelay
researchresearch or or
decisiondecision
makingmaking . . .. . .
Big Data Big Data meansmeans . . . . . 3V. 3V
Big Data Big Data meansmeans . . . . . more . more VsVs??
http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//
--------> FROM STATISTICIAN TO DATA SCIENTIST > FROM STATISTICIAN TO DATA SCIENTIST <<--------
Article “THE DEATH OF STATISTICIAN”
http://www.analyticbridge.com/profiles/blogs/the-death-of-the-statistician
Do the Do the statisticianstatistician rolerole havehave to to changechange??
WhichWhich skillsskills are are requiredrequired todaytoday to a data to a data scientistscientist??
Big Data Big Data meansmeans . . . . . more . more VsVs??
EstablishEstablish a a methodmethod
forfor assessingassessing the the
viabilityviability ofof informainforma--
tiontion, , regardlessregardless ofof
fieldfield typetype and and sizesize
ofof datadata
EstablishEstablish a a methodmethod
thatthat isis quickquick and and
http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//
thatthat isis quickquick and and
costcost--effectiveeffective
ConfirmConfirm a a variable’svariable’s
relevancerelevance beforebefore
investinginvesting in the in the
creationcreation ofof a a fullyfully
formedformed modelmodel
PREPRE--PROCESSING!!!PROCESSING!!!
Big Data Big Data meansmeans . . . . . NEW data. NEW data
Big data Big data
“proxies” “proxies”
of social of social
lifelifelifelife
Big Data Big Data meansmeans . . . . . NEW data. NEW data
Big data Big data originate from originate from Big data Big data originate from originate from
clinical practiceclinical practice
and support clinical practiceand support clinical practice
Big Data Big Data meansmeans . . . . . NEW . NEW questionsquestions
"You don't know what question you're going to answer tomorrow, and when you ask it,
you'll be relieved that you kept the data"
http://www.informationweek.com/big-data/news/big-data-analytics/big-datas-big-question-what-to-keep/240158277
“Who exactly is going to help us make sense of all this data and do we need to recruit
new people or re-train existing staff?”
http://www.itpro.co.uk/business-intelligence/20200/big-data-creates-equally-big-questions#ixzz2dpoxN6ythttp://www.itpro.co.uk/business-intelligence/20200/big-data-creates-equally-big-questions#ixzz2dpoxN6yt
CRITICAL QUESTIONS FOR BIG DATACRITICAL QUESTIONS FOR BIG DATA
Provocations for a cultural, technological, and scholarly PhenomenonProvocations for a cultural, technological, and scholarly Phenomenon
�� Big Data changes the definition of knowledge?Big Data changes the definition of knowledge?
�� Claims to objectivity and accuracy are misleading?Claims to objectivity and accuracy are misleading?
�� Bigger data are always better data?Bigger data are always better data?
�� Just because it is accessible does not make it ethical?Just because it is accessible does not make it ethical?
Big Data Big Data meansmeans . . . . . NEW . NEW questionsquestions
Ask the right QuestionsAsk the right Questions
Don’t get bogged down by Big Data
Big data is massive and messy, and it’s coming at you
fast. These characteristics pose a problem for data
storage and processing, but focusing on these factors
has resulted in a lot navel-gazing and an unnecessary
emphasis on technology.
http://www.thoughtworks.com/big-data-analytics
emphasis on technology.
It’s not about Data. It’s about Insight and Impact
The potential of Big Data is in its ability to solve
problems and provide new opportunities.
So to get the most from your Big Data investments, focus
on the questions you’d love to answer for your business.
This simple shift can transform your perspective,
changing big data from a technological problem to a
business solution.
Big Data Big Data meansmeans . . . . . NEW . NEW questionsquestions
The value of data is only The value of data is only
realisedrealised through insight.through insight.And insight is useless until it’s And insight is useless until it’s
turned into action. turned into action.
Finding the right questions will Finding the right questions will
lead you to the well.lead you to the well.
To strike upon insight, you first To strike upon insight, you first
need to know where to dig.need to know where to dig.
Big Data Big Data meansmeans . . . . . New . New answersanswers fromfrom newnew ““usus””
Data scientist, the sexiest job of 21st century
… … a a new kind of new kind of
professionalprofessional has has
emerged, the emerged, the
data scientistdata scientistdata scientistdata scientist
who who combines the combines the
skillsskills of of software software
programmer, programmer,
statistician statistician andand
storyteller/artist storyteller/artist to to
extract the nuggets of extract the nuggets of
gold hidden under gold hidden under
mountains of data. mountains of data.
ThereThere isis no no pointpoint in in
bringingbringing data data intointo the the
datawarehousedatawarehouse
withoutwithout integratingintegrating itit. .
IfIf the data the data arrivesarrives at the at the
Big Data Big Data meansmeans . . . . . New . New answersanswers fromfrom newnew ““usus””
IfIf the data the data arrivesarrives at the at the
datawarehousedatawarehouse in in anan
unintegratedunintegrated state, state, itit
cannotcannot bebe usedused to to supportsupport
a corporate a corporate viewview ofof data.data.
And a corporate And a corporate viewview ofof
data data isis the the essenceessence ofof the the
architectedarchitected environmentenvironment..
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
more than 950 million users spending on avarage
6.5 hours per month generates every day…
“If you aren’t taking advantage of big data, then you
don’t have big data, you have just a pile of data”
Is this Big Data or just a pile of data?
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
Facebook on Big Data analytics: an insider’s view
(http://www.informationweek.com/cloud-computing/platform/facebook-on-big-
data-analytics-an-inside/240150902?pgno=1)
- very interesting interview to Jay Parikh.
said Jay Parikh,
VP of infrastructure
of Facebook.
How to deal with this huge amount of data?
- a sweeping software platform
for processing and analyzing an
epic amounts of data. -
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
In terms of raw Hadoop capacity, Facebook has reached the upper limit:
the company owns the world's largest Hadoop cluster, weighing in at 100 petabytes.
..and yet, the company says,
that's not big enough!
..but Hadoop is not perfect...
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
“single point of failure” :
if a master server
overseeing the cluster
went down, the whole
cluster went down.
Facebook has solved the
problem with Corona.
Traditionally, Hadoop used
a single “job tracker” to
manage tasks across a
cluster of servers, but
Corona creates multiple
job trackers.
..but Facebook will soon outgrow this cluster!
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
It’s not possible to run
Hadoop across
geographically separate
facilities because
network packets
couldn’t travel between
the servers fast enough.
Prism replicates and
moves data wherever
it’s needed across a
vast network of
computing facilities.
• What can Facebook do with that
amount of data?
Descriptive statistics
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
• Which techniques can Facebook used?
A/B testing
• ..and what else?
The human face of Big Data
(https://www.facebook.com/FaceOfBigData)
interesting links on Facebook and Big Data you may like to explore..
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
Big Data: Facebook’s next big idea
(http://www.zdnet.com/big-data-facebooks-
next-big-idea-7000001983/)
Traditional EDW vs Big Data
(http://blog.prabasiva.com/2012/04/09/traditi
onal-edw-vs-big-data/)
Most Data isn’t big, but businesses
are wasting money pretending it is
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
(http://qz.com/81661/most-data-isnt-big-and-businesses-are-wasting-money-pretending-it-is/)
Big Data! If you don’t have it, you better get yourself some!
“If your data is little, your rivals are going to kick sand
in your face steal your girlfriend”
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
Even web giants like Facebook and Yahoo generally arn’t dealing
with big data, and the application of Google-style tools is
inappropriate.
Is more data Is more data alwaysalways better? better?
_ Big data has become a synonym for “data analysis,” which is con-
fusing and counter-productive.
_ Supersizing your data is going to cost you and may yield very little.
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
Is more data Is more data alwaysalways better? better?
if you’re looking for
correlations gathering more
data could actually hurt you.
In some cases, big data is
as likely to confuse as it is
to enlighten.
Does your business need data?
But buying into something as faddish as
the supposed importance of the size of
one’s data is the kind of thing only pointy-
ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks
one’s data is the kind of thing only pointy-
haired Dilbert bosses would do.
The important thing is gathering
the right data, not gathering some
arbitrary quantity of it.
ExampleExample II: II: BigDataBigData in in HealthCareHealthCare contextcontext
OneOne ofof the major challenge the major challenge forfor statisticsstatistics appliedapplied to to clinicalclinical practicepractice isis to to
destildestil causalcausal conclusionconclusion fromfrom observationalobservational data, data, wheneverwhenever available…available…
Long Long complexcomplex sequencessequences ofof eventsevents and and measurementsmeasurements
willwill becomebecome increasinglyincreasingly availableavailable in medicine.in medicine.
VOLUMEVOLUME VARIETYVARIETY
MainMain issuesissues::
-- EnhancingEnhancing information information availableavailable in in routinelyroutinely collectedcollected datadata
-- GainingGaining insightsinsights ofof economiceconomic burdensburdens ofof diseasesdiseases
-- CausalityCausality relationshipsrelationships betweenbetween covariatescovariates and and complexcomplex outcomesoutcomes
-- HeterogeneityHeterogeneity betweenbetween individualsindividuals ((frailtyfrailty and and riskrisk assessmentassessment))
-- ChangesChanges in in structuresstructures overover timetime
-- ProvidersProviders’ ’ profilingprofiling
VOLUMEVOLUME VARIETYVARIETY
BirthBirth
VisitsVisits and and controlscontrols
LetLet’s ’s thinkthink about…about…..
… … howhow manymany timetime duringduring youryour life YOU life YOU maymay contactcontact the National the National HealthHealth ServiceService
ExampleExample II: II: BigDataBigData in in HealthCareHealthCare contextcontext
VisitsVisits and and controlscontrols
((bloodblood, , dentaldental care, screening,…)care, screening,…)
HospitalizationsHospitalizations
((orthopedyorthopedy, …), …)
DrugsDrugs
((headacheheadache, , stomachachestomachache, …), …)
AllAll thesethese data data
are are routinelyroutinely
storedstored and and
collectedcollected in in
healthcarehealthcare
databasedatabase
… … thatthat thisthis happenshappens forfor youryour PARENTS, PARENTS, youryour FRIENDS, FRIENDS, youryour FELLOW CITIZENS , …FELLOW CITIZENS , …
forfor allall theirtheir life.life.
LetLet’s ’s thinkthink about…about…..
ExampleExample II: II: BigDataBigData in in HealthCareHealthCare contextcontext
Millions of people interact
every day with the
national health service
ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext
ThereThere isis no no pointpoint in in
bringingbringing data data intointo the the
datawarehousedatawarehouse
withoutwithout integratingintegrating itit. .
IfIf the data the data arrivesarrives at the at the datawarehousedatawarehouse in in
anan unintegratedunintegrated state, state, itit cannotcannot bebe usedused to to
supportsupport a corporate a corporate viewview ofof data.data.
And a corporate And a corporate viewview ofof data data isis the the essenceessence
ofof the the architectedarchitected environmentenvironment..
ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext
… CLINICIANS … CLINICIANS
whowho take care take care ofof people people affectedaffected byby a a diseasedisease
NowNow, , letlet’s ’s changechange perspectiveperspective and and thinkthink about…about…..
2010 MDC 01: 108199
2010 MDC 04: 101739
2010 MDC 05: 191255
2011 MDC 01: 106791
2011 MDC 04: 100680
2011 MDC 05: 186742
… HEALTHCARE … HEALTHCARE governmentgovernment, ,
whichwhich hashas to to quantifyquantify the the burdenburden ofof suchsuch diseasedisease
2010 MDC 05: 191255
2010 MDC 11: 61633
2.448.111
hospitalizations for heart failures between 2000 and 2012
1.424.106
Hospitalizations (tot.) 2010
1.398.318
Hospitalization (tot.) 2011
2011 MDC 05: 186742
2011 MDC 11: 60983
ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext
ID Date of Date of Date of Gender …..
ChronicChronic HeartHeart FailureFailure (CHF)(CHF)
ChronicChronic HeartHeart FailureFailure is a degenerative
disease of the cardiovascular system.
Starting from the AdministrativeAdministrative DatabaseDatabase of a Regional District, it
can be defined by epidemiologists and clinicians using MDCMDC codescodes
It is of interest the joint modelling of the deathdeath outcomeoutcome andand thethe
hospitalizationshospitalizations processprocess.
Hospitalizations represent the observable process
of the latent degenerative disease evolution.
Major Diagnostic Category
for identification of cases
01 - Nervous System,
04 - Respiratory System,
05 - Circulatory System
11 - Kidney
List of ICD-9-CM codes referred
15.298 patients (35.224 records)
with first admission ending in 2006 (pts with admission date = discharge date =
death date have been removed)
4 years follow up
(up to December 31st 2010)
ID Date of
admission
Date of
discharge
Date of
death
Gender …..
1 15/09/2006 31/09/2006 NA F
1 21/02/2007 23/02/2007 NA F
1 31/3/2007 04/04/2007 NA F
1 10/11/2007 18/11/2007 NA F
2 10/01/2008 15/01/2008 23/10/2009 M
2 16/06/2009 01/07/2009 23/10/2009 M
3 11/04/2008 28/04/2008 3/07/1010 F
…. … … … … …
List of ICD-9-CM codes referred
to HF has been created as the
union of codes from
“Heart failure mortality rate” by
AHRQ-IQI and from CMS-HCC
Model Category 80.
ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext
ExampleExample: :
identificationidentification ofof clinicalclinical patternspatterns ofof patientspatients affectedaffected byby chronicchronic or acute or acute heartheart diseasesdiseases
birth
X X XX
infarction
R-410 AD VC
Start of follow-up End of follow-up
SDO PH
X
time
Xinfarction
X=clinical event
birth
X X X X X
death
diabetes
R-250 E-P20F-A10AVC PS
ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext
S1 S2 S3 S4
Hosp 1 Hosp 2 Hosp 3 Death
Hosp 1 Hosp 2Hosp 3 Hosp 4Hosp 5Hosp 6/death
Hidden Markov Process
representing the latent disease progression
generating patients’ trajectories
Record Record LinkageLinkage
EpidemiologyEpidemiology
Performance Performance AssessmentsAssessments
SurvivalSurvival AnalysisAnalysis
ContinuityContinuity ofof carecare
CostCost--EffectivenessEffectiveness analysisanalysis
MainMain IssuesIssues::
ReferencesReferences and and linkslinks
All the material will be available soon on the website
(Please, interact and send links or papers you’d like to add)
Contact us @ Polimi!