DanielDao&NickBuroojy
GuestLecture
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
OVERVIEWWhatisCivitasLearning
WhatWeDo
MissionStatement
Demo
WhatIDo
HowIUseDatabases
NickBuroojy
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
WHATISCIVITASLEARNING
CivitasLearning
Mid-sizedstartup
Datadrivencompany Education
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
“Wepartnerwithforward-thinkingcollegesanduniversities,harnessingthepowerofinsightandactionanalyticstohelpamillionmorestudentslearnwellandfinishstrong.”– TheMillionMoreMission
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
WHATWEDO
• Workwithinstitutionstoprovideinsightsthroughvariousapplications• Inspire
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
InspireforFacultyDemo
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
WHATIDO
• Myroleinthecompany• Howmyworkisbrokendown• Product
• Devmanagers,PSMs,engineers• Frontend
• WorkwithHTML/CSS/ReactJS• Backend
• WritingAPIs• Workingwithmodels• WritingSQL• Optimizingperformance• Writingtests
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
HOWIUSEDATABASES
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
NickBuroojy• GraduatedfromCarnegieMellon• BachelorsinComputerScience
• SoftwareEngineering• I'vebeenworkinginSoftwareforabout6 years• I'vebeenatCivitasforthreeyears• I’veworkedatApple,Google,Civitas
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
GoalsAttheendofthislecture,youwillbeableto:• DescribetheprocessCivitasusestomanipulatedata.• Describethedifferencesbetweencolumnandroworienteddatastores• ExplainhowRedshiftusesdistributedcomputeforqueryperformance• DescribetheuseofthedatalayoutoptionsDIST_KEYandSORT_KEY
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
CivitasDataFlow
LoadExtract
RedShift
WebServer
SecureFileTransferProtocol
Transform
Extract
Load
ExtractTransform
LoadLoad
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Extract• Aslongasthedataisinthetables,thereareexportcommandsthatcansimplydumpthedatatoafile.
APP
APP
LOADRAW
RAW
RAW
PostGres
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Transform
SRC_
SRC_
SRC_
RAW
RAW SELECTSPBPERS.SPBPERS_PIDMASraw_person_id,SPBPERS.SPBPERS_BIRTH_DATEASraw_birth_dt,SPBPERS.SPBPERS_DEAD_DATEASraw_death_dt,SPBPERS.SPBPERS_SEXASraw_gender,nullASraw_primary_language,nullASraw_country_of_originFROMsrc_banner_saturn.spbpers
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
LoadRedShift
TableFile
SFTP
Flatfile:PlainTextfilethatisnon-hierarchical,usuallyintheformofCSV,orTSV.Eachrowrepresentsonerowinthedatabase.
Table
Table
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
DataFlow
RedShift
WebServer
SecureFileTransferProtocol
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
RedshiftPerformance• Columnardatastorage• Distributeddatastorage• DIST_KEY• SORT_KEY• Parallelqueryexecution• COPY/UNLOAD
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Columnardatastorage
Row-orienteddatastoreexample:
Source:docs.aws.amazon.com
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Columnardatastorage
Column-orienteddatastoreexample:
Source:docs.aws.amazon.com
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Distributeddatastorage• Why?• DBconstraints• Disk• CPU• Network
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Partialaggregations• SUM
136 61683567
10+21+15+6
6152110
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Partialaggregations• COUNT
136 61683567
3+4+3+1
1343
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Partialaggregations• AVG=SUM/COUNT
136 61683567
SUM/COUNT
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Partialaggregations• Redshiftcandistribute• AVG• SUM• COUNT• MAX• MIN• STDDEV• …
• Morechallenging(slower)• COUNTDISTINCT• ORDERBYxLIMITn
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
DISTKEY• AllowsRedshiftusertospecifywhichrecordsareonthesamenode• Usedtokeepbalanced• Usedforjoinlocality• Canperformajoinwithout“shuffling”.Thatis,sendingdatabetweennodes.
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
SORTKEY• Ordersofstorageforrecords• Allowsqueriestoskipranges• Allowsforfasterjoins(mergevs.hash)• FasterORDERBYqueries
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
PRIMARYKEY• Redshiftdoesn’tenforceprimarykeysorforeignkeys• Primarykeymustbenon-nullandunique• Usedbyqueryoptimizer• Civitas checksourkeysafterbuildingeachtable• COUNT(pk)==COUNT(*)==COUNT(DISTINCTpk)
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
COPY• Loadsflatfiledatafrombulkstorage(S3)intoRedshift• Eachnodeloadssomepartsofthedata• Masterdoesn’ttouchthedata,andisnotabottleneck• Unload:oppositedirection.Redshift->S3
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Summary• ProcessCivitasusestomanipulatedata.• Columnardatalayout• Distributedqueryaggregations• Datalayoutoptions
• CareersatCivitas Learning
CIVITASLEARNING,INC.– CONFIDENTIALINFORMATION
Questions?