Upload
simranjyotsuri7646
View
219
Download
0
Embed Size (px)
Citation preview
8/18/2019 An Introduction to Big Data Concepts - Data Otaku - Site Home - MSDN Blogs
1/1
An Introduction to Big Data Concepts
0Bryan C Smith 1 Nov 2011 4:41 P
!he idea that data co""ected in computeri#ed systems cou"d $e used to in%orm and there$y improve
decision ma&ing has $een around %or 'uite some time( )ver the "ast coup"e decades* ideas o% ho+
to assem$"e a decision support system have coa"esced around the concept o% a data +arehouse(
!he construction o% a proper data +arehouse re'uires a non,trivia" investment( !his investment is
made +ith the e-pectation o% $ene%its* $ut these are o%ten di%%icu"t to enumerate prior to the
+arehouse.s construction and su$se'uent emp"oyment( /or this reason* the data +arehousere'uires a "eap o% %aith(
/or many years* preparation %or this "eap +as a signi%icant part o% the conversation +ith customers
interested in Business Inte"" igence BI( !oday* in recognition o% the data +arehouse as a too" %or
navigating $usiness cha""enges and uncertainty* the conversation tends to %ocus on ma-imi#ing the
impact o% BI on the organi#ation(
As customers %ocus on ho+ $est to e-tract insights %rom data* there is gro+ing recognition o%
untapped data resources especia""y unstructured data( !hese data remain "arge"y untapped
$ecause:
!he va"ue o% these data re"ative to the cost o% their processing and storage is "o+(1(
!hese data are not easi"y stored and ana"y#ed +ithin the con%ines o% the traditiona" data
+arehouse(
2(
!o i""ustrate these points* consider the data in a +e$ "og( !hese data cou"d $e very insight%u" to a
$usiness interested in engaging customers through a +e$site( o+ever* individua" data records*ho"ding in%ormation on a sing"e page re'uest or sing"e image retrieva"* are not "i&e"y to $e high in
va"ue* especia""y over the "onger periods o% time in +hich data are stored in a traditiona" data
+arehouse(
/urthermore* the structure o% many e"ements +ithin the "og records* such as the 3I o% the
re%errer or the 'uery string associated +ith a re'uested resource is high"y varia$"e in nature(
Di%%ering 'uestions posed against these data may re'uire them to $e interpreted in di%%ering +ays(
Signi%icant pre,processing o% the data in order to neat"y %it it into the traditiona" data +arehouse
may $e unnecessary or even counter,productive(
5e$ "ogs are a common"y cited %orm o% unstructured data( A $etter term %or these data may $e
comp"e- or mi-ed,typed data as at some "eve" these data have a +e"" understood and meaning%u"
structure( o+ever* this structure is o%ten as a "eve" o% granu"arity higher than the "eve" at +hich
ana"ysis is to $e per%ormed* and it.s this mismatch that "eads to the unstructured moni&er( )ther
%orms o% unstructured data inc"ude 67 or 8S)N documents* images* video* or PD/* 5ord* or !7
documents(
!he cha""enges o% +or&ing +ith unstructured data* i""ustrated in the +e$ "og e-amp"e* are o%ten
characteri#ed in terms o% four Vs( !he four Vs are identi%ied as:
9o"ume De%ined as the tota" num$er o% $ytes associated +ith the data( 3nstructured data
are estimated to account %or ;0, o% the data in e-istence and the overa"" vo"ume o% data
is rising(
1(
9e"ocity De%ined as the pace at +hich the data are to $e consumed( As vo"umes rise* the
va"ue o% individua" data points tend to more rapid"y diminish over time(
2(
9ariety De%ined as the comp"e-ity o% the data in this c"ass( !his comp"e-ity esche+s
traditiona" means o% ana"ysis(
?(
9aria$i"ity De%ined as the di%%ering +ays in +hich the data may $e interpreted( Di%%ering
'uestions re'uire di%%ering interpretations(
4(
!he %our 9s articu"ate the $road cha""enges o% +or&ing +ith unstructured data* $ut the dominant
cha""enge tends to $e in terms o% data vo"ume( As a resu"t* the e%%ort to e-tract insights %romunstructured data is o%ten re%erred to as Big Data(
Because o% the cha""enges o% the %our 9s* Big Data necessitates an a"ternative approach to
Business Inte""igence( !his a"ternative approach* +hich +e might re%er to as the unstructured data
+arehouse or the Big Data +arehouse* does not inva"idate the traditiona" data +arehouse $ut does
ac&no+"edge its "imitations in e-tracting insights %rom the %u"" range o% avai"a$"e data resources(
5hat e-act"y is the unstructured data +arehouse and ho+ it +i"" re"ate to the traditiona"
structured data +arehouse has yet to $e determined* $ut ideas are $eginning to coa"esce around
distri$uted* a"gorithmic techno"ogies such as Apache adoop(
Comments
ntroduction to Big Data Concepts - Data Otaku - Site Home - MSD... https://blogs.msdn.com/b/data_otaku/archive/2011/11/01/an-introductio...
1 8/10/2012 11:48 PM