22
Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13 th , 2016 Grant No. 421-2015-2036 William Ashton, Pallabi Bhattacharyya, Eleni Galatsanou, Sally Ogoe and Lori Wilkinson 1 This research was supported by the Social Sciences and Humanities Research Council of Canada. 1 The names are written in alphabetical order and do not reflect order of authorship. Special thanks to Kim Lemky for her input with proposal development.

Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

EmergingUsesofBigDatainImmigrationResearch

FinalReportSubmittedtoSSHRCOctober13th,2016

GrantNo.421-2015-2036

WilliamAshton,PallabiBhattacharyya,EleniGalatsanou,SallyOgoeandLoriWilkinson1

Thisresearchwassupportedbythe

SocialSciencesandHumanitiesResearchCouncilofCanada.

1Thenamesarewritteninalphabeticalorderanddonotreflectorderofauthorship.SpecialthankstoKimLemkyforherinputwithproposaldevelopment.

Page 2: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

1

TableofContentsKeymessages...............................................................................................................................................................2

Executivesummary......................................................................................................................................................3

1.Introduction.............................................................................................................................................................4

2.Context–theissue..................................................................................................................................................5

3.ImplicationsandPotentialUsesofbigdatainImmigrationResearch....................................................................7

4.Methodology...........................................................................................................................................................8

5.Results.....................................................................................................................................................................9

5.1.Howhas“bigdata”beenusedinimmigrationstudies?...................................................................................9

5.2.Challengesandlimitationsofusingbigdatainimmigrationstudies.............................................................15

5.3.Opportunitiesforimmigrationresearchwithbigdata...................................................................................17

6.KnowledgeMobilization........................................................................................................................................18

6.1.Integratedknowledgemobilization................................................................................................................18

6.2.End-of-Grantandpost-grantknowledgemobilization...................................................................................18

7.References.............................................................................................................................................................19

Page 3: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

2

KeymessagesHowhasbigdatabeenusedinimmigrationstudies.

§ Thereisanincreasingtrendinutilizingbigdatainimmigrationstudiesfrom2007onwards,withmoreandmorepublishedstudiesusingbigdatathaneverbefore.ThemajorityofstudiesarepublishedinjournalarticleswhileLabormarketandSocialIntegrationarethetwomainresearchthemesthatdominatetheliterature.

§ Closeto60%ofimmigrationstudiesuseStatisticsCanadadatafortheirresearch,eitherintheformoftheCensus,LongitudinalandothersurveysbutalsolinkedbyStatisticsCanadadatabases.TheCanadianCensusandtheLongitudinalSurveyofImmigrantstoCanada(LSIC)areutilizedthemostamongthelargedatabases.

Challengeswithusingbigdatainimmigrationstudies

§ Bigdatasets,despitetheirlargenumberofparticipantsandvariables,haveissuesofpopulationunderrepresentation,especiallyinadministrativedataandoftencontributetoinaccurateandunsettledrelationshipbetweenselectedvariables.Largedatasetsoftenlackspecificvariablesoradditionalexplanationonvariablesandimmigrationresearchersrelyonassumptionsandproxiestocompletetheirstudies.

§ Methodologicalchallengesandchallengeswithdatamanipulationandvalidationinfluencesthekindofstudiesresearcherscandoandthekindofquestionstheycanask.Storage,management,andsharingofbigdataremainachallengeforallusersofbigdata.Withoutauniversity-affiliation,mostofthisdataisoutofthereachofnon-governmentsettlementorganizations,groupsthatcancriticallyanalyzethisinformationforpracticalpurposes.

§ Thereareethicalissueswiththeuseoflinkeddataandwithadministrativedataandmanyofthelargedatasetsdidnotseekethicalapprovalbyparticipants.Credibilityofresultsisamajorissuesincedatafromadministrativesources,collectedfromsocialmedia,internetandonlinecommunicationsourcesaresubjecttobias.

Opportunities

§ Bigdatacanproveverybeneficialforsettlementandcommunityorganizationsfordesigningandimplementationofpoliciesandprograms,provideditistimelyreleasedandeasytoaccess.Researchersshouldbeencouragedtopartnerwithinstitutionsandorganizationstoexaminequestionsrelevanttothepracticalneedsofthesettlementsector.

§ Greatknowledgecanbegainedfromexaminingdatafromonlinesources(largedatasets,socialmedia,webloginhistory,IPaddressesandothers)andcanprovidenewopportunitiesforimmigrationstudies.Addressinginherentrisksandtrainmoreresearchersinstatisticalmodelinganddataanalysisaresomemandatorystepsinorderforimmigrationresearchtoutilizeallformsofbigdata.

§ Bytheyear2018,newdevelopmentsconcerninglinkedadministrativeandsurveydataareexpectedtobereleasedandwillgiveadditionalopportunitiesforfurtherresearchonavarietyofdifferenttopics.

Page 4: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

3

ExecutivesummaryThisprojectseekstounderstandtheuse,challenges,andopportunitiesofbigdatasetsinimmigrationresearch.Bigdataarelargesetsofdatawhosesizeisbeyondtheabilityoftypicalsoftwaretoolstocapture,store,manage,andanalyze.Thedatasetscomefromgovernmentsurveys,fromgeolocationlikeFacebooktags,anddigitaltransactionlikefinancialservices.Attheoutsetofthisproject,itisnotclearhoworifbigdataarebeingusedbyresearcherstodigdeeperandmorecomprehensivelyaboutthelivesofimmigrants.Suchfindingscaninformimmigrationsettlementservices,businessproductsandservicesofferedtoimmigrants,alongwithgovernmentpoliciesandprograms.Ascopingreviewmethodisolatedkeythemesembeddedwithinalargebodyofimmigrationliterature.Theestablishsix-stageframeworkbyLevac,Colquuhoum,andO'Brien(2010)enabledarigorousapproachthatidentified453papers,andafterconfirminghighlevelsofinter-raterreliability,therewere251immigrationstudies(1994-2016)usedinthisknowledgesynthesisproject.About60%werejournalarticles,withanother20%asreports,andthebalancefromgreyliteratureandtheses.Thepapersexaminedfivethemes,yetdominatedbylabourmarket(57%)andsocialintegration(25%)papers.Thestudiesused15databases,with60%ofthemdrawingfromStatisticsCanadaand65%usedonedatasetand15%utilizedmorethanthreesets.Authorsreportedmanychallengesthatweregroupedunderfivetopics,includingsamplingissues,lowresponserates,invalidatedvariables,access,andconfidentiality.Theauthorsidentifiedopportunitiesaswell,rangingfromtheneedfordiverseknowledgeondatamanipulationgiventhecomplexityandgrowingnatureofsomedatasets,todesignexperimentsonunprecedentedscale,skillsdevelopmentofresearcherstousedatasets,linkingdatabasestoexaminedifferentaspectsofimmigrants'lives,andneedtoadvancehardwareandsoftwarealongwithtoolsandtechniquesofdatamining.Bigdatahasgreatpotentialinimmigrationresearch.Asmentionedearlier,geocodeddatacanhelpaidorganizationsidentifywherepeopleinneedmightbelocated,butalsowhattheymightneed!TheSyrianHumanitarianTrackerhasbeenusedbythousandsofinternallydisplacedpeopleandasylumseekersabroadtoidentifysafetravelroutesandtoavoidhumantraffickers.ThiskindofdatahasbeenusedinNorthAmericatoproviderealtimealerts,withinformation,aboutmissingchildren(e.g.,AmberAlertsinCanada).Themonitoringofsocialmediadatabynon-governmentaidagenciescanhelpthembetterdeterminehowmanypeoplemightbewaitingtoaccessEnglishorFrenchlanguagecourses.Canadahasrecentlyprovidedextensiveandalmoston-demanddataontheSyrianrefugeearrivals.ThroughImmigrationRefugeeandCitizenshipCanada’sonlinedatabase,thepubliccannowviewmapswhichareuploadedfromtheirwebsiteandprovidegeographicallylinkedinformationonthenumberandlocationofresettledSyrianrefugeesinCanada.ThegovernmentofCanada’s“OpenData”initiativeprovidessomedataoncitizenshipuptake,internationalstudents,temporaryforeignworkpermitholdersandotherimmigrationrelatedissues.Thesearestatictables,butprovideadditionalinformationforuserswhichwerenotpreviouslymadeavailable.StatisticsCanadaalsoprovidestabularinformationbasedonresultsfromtheNationalCensusthatallowsitevisitorstoidentifyvarioustrendsinimmigrantsandlabourmarketidentifiers,ethnicidentityandsecond-generationandlanguageuseamongnewcomers.

Page 5: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

4

StatisticsCanada,togetherwithImmigrationRefugeesandCitizenshipCanada,havebeenworkingtoprovideresearcherswithevenmoredataopportunities.ThisisinadditiontothewealthofdataStatisticsCanadaprovidesinbothPublicUseMicrodatafileswhichnoviceandintermediatestatisticianscanusetoproducetheirownstatisticalestimatesandequations(e.g.,EthnicDiversitySurvey,LongitudinalSurveyofImmigrantstoCanada,etc.).Moreadvanceddatausersaffiliatedwithuniversitiescanaccessthemasterdatafilesofoverthreedozencurrentsurveys.Thisallowsuserstolookatsmall-scaletrendsandpatternsindatathatcannotbereleasedtothegeneralpublic.Althoughitispossibleforindependentresearcherstogainaccesstoadministrative-leveldatasuchasthemasterdatafileforIMDBwhichishousedinOttawa,theseopportunitiesaregiventoonlyafewselectresearchers.Therehavebeensomeveryinnovativeandrecentusesofthedata,suchastheGISMappingProject(Garceaetal.,2016)whichusesdatafromtheImmigrantLandingFileandIMDBtogeo-codelandingsandotherrecordsamongimmigrantsandrefugeestoCanada’swesternandnorthernregions.

1.IntroductionWhatdobigdataandrefugeemovementshaveincommon?Turnsout,alotactually.Thepast18monthshavebeenabooninthespeed,volumeandtypeofdatabeingsharedbothpubliclyandbygovernmentsconcerningrefugeemovementsinEurope,AfricaandtheMiddleEast.Large-scalemigrationfromSyria,althoughoverfiveyearsold,reallybeganinearnestinSpring2015whenlargenumbersofasylumseekersbravedthe4KMjourneybetweenBodrumTurkeyandKosGreeceinricketyplasticdinghiesandbrokendownovercrowdedfishingboats.ThesigningoftheEU-TurkeyRepatriationofRefugeesAgreementinlate2015wassupposedtoend,oratleastgreatlydecrease,thenumberofwould-berefugeesmakingtheperilousjourneyacrosstheMediterraneanOcean.Instead,itisestimatedthatbyearlyDecember2016,over170,000newasylumseekerswillarriveontheshoresofItaly,surpassingthenumbersarrivinginGreece(IOM,2016).Howdoweknowthis?TheIOMreleasesweeklyfiguresofarrivals,deportationsandresettlementofrefugeesinEurope.Theydependondatasharingagreementswiththevariouscountriesandthentabulateanddistributethisinformationtoanetworkofpolicyanalysts,researchersandnon-governmentalorganizationsonaweeklybasis.TheCanadiangovernmentdidthesame,reportingweeklyonImmigrationRefugeeandCitizenshipCanada’s(IRCC)websiteonthenumbersandlocationsofthenewlyarrivedSyrianstoCanada.Itisthefirsttimeinhistorythatthislevelofdetailed,geographicallyconnecteddatawaswidelydistributedandpubliclyavailableinsuchashortperiodoftime.BoththeIOMandIRCCreportthattraffictothesewebsiteshasincreaseddramaticallyinthepastyearasthethirstforknowledgeanddataaboutthenewlyarrivingrefugeesincreased.Theserefugeemovements,alongwiththevariousgovernmentandnon-governmentinterventionstocountandaccountforrefugeemovement,gotourteamthinkingabouttheextenttowhichbigdatahasbeenusedinimmigrationresearchinCanada.Ourprojectaimstoidentifystudiesusingbigdataandtooffersomeinsighttothetypeofdatatheyuse,thetopicstheypursueandwherethesestudiesmightbepublished.Themovementtouse‘bigdata’inallsortsofresearchanddecisionmakingispartofagrowingtrendthattakesadvantageofthelargeamountofdatawegenerateashumansinthe21stcentury.In2015,“weproducedasmuchdataaswascreatedinallpreviousyearsofhumancivilization”(RattiandHelbing,2016:1).ThisdatahasbeenusedtoharnessthemovementandactivitiesoftheSyrianpeopleastheyfleetheconflictthatmirestheircountry.TheSyriaTrackerCrisisMap(HumanitarianTracker,2016),isanappcreatedbydisplaced

Page 6: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

5

SyriansintendedtoprovideinformationforSyriansontherun(Rathnam,2015).ThedatacomeslargelyfromsocialmediaplatformssuchasTwitter,FacebookandSkype,alongwithmediareportstoproduceinformationwhichispostedtothesiteandintendedtoassistNGOsandotherorganizationswiththeneedsofdisplacedSyrians.Ithashelpedidentifygeographicregionswherediseaseamongstthedisplacedpeoplemightoccurnext,assistedinthepreventionofhumantrafficking,andiscreditedwithpredictingwhenthenextattackwilloccur(Rathnam,2015).WewonderediftheseinitiativesweretakingplaceintheCanadianresearchcontext.Thisreportbeginswithadiscussionofthecontextofbigdataandresearchingeneral.Theimplicationsofusingthiskindofdata,alongwiththemethodologyofthestudyfollow.Sectionfourdetailsthequantitativefindingsofourstudy,whilesections5and6detailtheknowledgemobilizationplansandsomeconcludingcomments.

2.Context–theissueTraditionally,bigdatahasbeendefinedasmega-sizedsetsofinformationthataretoolargeforcommondataprocessingsoftwaretohandle.Notsolongago,bigdatawouldhaveincludedthepublicusemicrodatafiles(PUMF)oftheCensusofCanada.Backinthemid-1990s,microcomputerpowerwassounder-developedthatPUMFfileshadtobehand-loadedontoamicroprocessorandtimetoanalyzetheinformationhadtobe‘booked’onaninstitution’smainframecomputer.Inthatage,acensusfileofhalfamillionpeoplewouldtakehourstocomputeeventhesimplestcross-tabulationsbecausecomputerprocessingpowerwasweak.Bythestartofthe21stcentury,microprocessorsbecamesmallerandfasterandourabilitytostoredatadigitally,insteadofbyanalog,hasmeantanexplosionindatastoring,sharingandcomputingcapabilities.Andbecausethecostofowningacomputerandstoringlargeamountsofdatahavegreatlydeclined,almostanyonewithdatabaseandstatisticalknowledgecanworkwithlargedatasets.Now,researcherscandownloadandanalyzecensusandothersimilardataontheirownandcanmanipulatethedatainmeresecondstoproducesimilar(andoftenbetter)andfasterresultsthaninpreviousyears.Alongsidethedevelopmentofbetterpersonalcomputerprocessors,companies,governmentsandotherorganizationsbegantodiscoverthatvalidpredictionsandtrendscouldbecalculatedbyexaminingthedatatheyalreadycollect.Andwecollectalotofdata.It’sbeenestimatedthat2.5exabytesofnewdataaregeneratedworldwideonadailybasis(HilbertandLópez,2011).This,inadditiontothedatacollectingpowerofvarioussoftwareprogramsdevelopedtogleaninformationfromtheinternethasmeantthatithasbecomebotheasierthanevertoconductdataanalyses,butalsomeantaproliferationindatacollectionactivitiesincludingsocialmediaandinternettrafficmonitoring,administrativedatamining,andlinkingdatabetweendatasets,theextentofwhichmanyoftheseactivitiesthepublicareunaware.Dataisn’tjustlimitedtotheaffluentorindevelopednations.Nearly3.5billionpeoplehavesomeaccesstotheInternet,representing40%oftheworld’spopulationandalmosthalfofthemarecurrentlylivinginAsia(InternetLiveStatistics,2016).AccordingtoICT(2016)thereareover7billionmobilephonesubscriptionsrepresentingabout60%oftheworld’spopulation.Thismeansagrowingnumberofpeople,regardlessoftheirlivingsituation,havesomeaccesstoelectroniccommunication,atypeofdatathatiseasilycollectedandmonitored.Thistypeofdataissometimesminedinbigdataprojectssoourabilitytomeasurecertainfeaturesoflifeandcertainlyaspectsofonlineconnectivityhavegreatlycontributedtothebigdatamovement.Thisdataisparticularlyimportantintermsofaddingageographic

Page 7: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

6

dimension—wherepeoplearemakinginquiriesaboutinformationisaveryimportantindicatorforseveralsocialandhealthproblems.Internetsearchesfor‘flu’helphealthresearchersidentifypotentialoutbreaksandclustersbeforetheyarereportedtohealthauthorities.TheSyrianHumanitarianTrackermentionedaboveisanotherexampleofgeo-codeddataused.Governmentsareincreasinglyusingtheiradministrativedatabasestotracktrendsamongitscitizensandtomonitorcertainaspectsoftheirlives.Thisadministrativedataisadifferenttypeofinformation,noless‘messy’thanthedataminedfrominternetsources.Governmenttaxrecords,provincialhealthdatabasesandcriminaljusticeadministrativedatahaveallbeenusedtoproducereports,informpolicyandprovidemeaningful,large-scaleinformationtogovernmentoffices.Historically,thisdatahasonlybeenavailabletogovernmentemployees.Outsiders,suchasNGOs,independentresearchersandacademicswerelargelypreventedfromaccessingtheseinterestingdatasourcesduetoprivacyconcerns.Today,however,governmentsareincreasinglyconnectingwithacademicsanduniversitiestoanalyzethesedata.TwoexamplesaretheManitobaCentreforHealthPolicyResearchandPolicywiseforChildrenandFamiliesinAlbertaaretwoinitiativeswhereindependentresearchersroutinelygainaccesstovariousadministrativedatabaseswiththeexpresspurposeofproducinganalysesandreportsthatinformgovernmentpolicyanddecisionmaking.Thistypeofdataisalsoassociatedwiththebigdatamovementbecauseofthesizeandeverchangingnatureoftheinformationheldwithinthedatabase.Lesscommonareother,usuallyabitsmaller,formsofbigdatathatrelyondatalinkage.Datalinkagereferstotheconnectionoftwoormoreunrelatedsetsofdata.Intheimmigrationfield,themostwidelyusedandwellknownlinkeddataistheIMDBorImmigrantDatabase.ItlinkstheImmigrantLandingFileandtheCanadianTaxFilesforallimmigrantswhohavelandedinCanadasince1980.Thedataispreparedasa‘cube’meaningthatitislessflexiblethanotherdataasusersareusuallyconstrainedtochangingtheconditionsofthreeorfourvariablesatatimeandthatstatisticalanalysisisnotlargelypossible.Wecanthinkofthesekindsofdataasmorestatic.Inanattempttomakethemmoreuserfriendly,administratorshavemountedtheminplatformssuchas2020IvisionBrowserwhichallowuserstocreatetheirowntablesbymanipulatingaseriesofrowsandcolumns.Sadly,theseformatstendtobeclumsyandfrustratingfornoviceusers.Additionally,theselargedatabasesaresourcesofpotentialidentitydisclosure,soonlyafewvettedandpre-approvedresearchersandgovernmentemployeeseverhaveachancetousetheseverycomplexbutrichdata.Itisthislatestformofdata,linkeddatabasesthathaveproducedthemostrecentandvoluminousimmigrationresearchinCanada.“Bigdatareferstodatasetswhosesizeisbeyondtheabilityoftypicaldatabasesoftwaretoolstocapture,store,manage,andanalyze”(Manyikaetal.,2011:1).Butwhatistypicaldatabasesoftware?Althoughmanywell-knownandcommonlyusedstatisticalpackagesdohavelimitsastothenumberofgigabytesorterabytestheycanprocess,theircomputinglimitationsdon’tseemtopreventresearchersandstatisticiansfromharnessingthevastinformationcontainedwithinthem.TheOECD(2013)hascharacterizedbigdataashavingthesethreeattributes:

• Volume:theamountofdatawithinadatabase“challengesthecapacitiesoftraditionalsoftwareandstorage”

Page 8: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

7

• Velocity:datacanbedistributed,disseminatedandanalyzedwithincreasingspeed.Thegoalistoprocessdatainrealtime

• Variety:datathatcomesfromavarietyofdifferentdatasources,preferablytriangulated

Thereareother‘Vs’thathavebeenaddedbyvariousresearchersincludingvalue,whichreferstotheeconomicandsocialusesofdata(LaczkoandRango,2014).OtherVsidentifiedincludevalue,visualization,veracityandvariability(vanRijmenam,2014).AsMcNulty(2014)observes,itisthe3Vsthatremainthemostwidelyusedtocategorizedataandarethecategorieswerelyuponinthisreport.

3.ImplicationsandPotentialUsesofbigdatainImmigrationResearchBigdatahasgreatpotentialinimmigrationresearch.Asmentionedearlier,geocodeddatacanhelpaidorganizationsidentifywherepeopleinneedmightbelocated,butalsowhattheymightneed!TheSyrianHumanitarianTrackerhasbeenusedbythousandsofinternallydisplacedpeopleandasylumseekersabroadtoidentifysafetravelroutesandtoavoidhumantraffickers.ThiskindofdatahasbeenusedinNorthAmericatoproviderealtimealerts,withinformation,aboutmissingchildren(e.g.,AmberAlertsinCanada).Themonitoringofsocialmediadatabynon-governmentaidagenciescanhelpthembetterdeterminehowmanypeoplemightbewaitingtoaccessEnglishorFrenchlanguagecourses.Canadahasrecentlyprovidedextensiveandalmoston-demanddataontheSyrianrefugeearrivals.ThroughImmigrationRefugeeandCitizenshipCanada’sonlinedatabase,thepubliccannowviewmapswhichareuploadedfromtheirwebsiteandprovidegeographicallylinkedinformationonthenumberandlocationofresettledSyrianrefugeesinCanada.ThegovernmentofCanada’s“OpenData”initiativeprovidessomedataoncitizenshipuptake,internationalstudents,temporaryforeignworkpermitholdersandotherimmigrationrelatedissues.Thesearestatictables,butprovideadditionalinformationforuserswhichwerenotpreviouslymadeavailable.StatisticsCanadaalsoprovidestabularinformationbasedonresultsfromtheNationalCensusthatallowsitevisitorstoidentifyvarioustrendsinimmigrantsandlabourmarketidentifiers,ethnicidentityandsecond-generationandlanguageuseamongnewcomers.StatisticsCanada,togetherwithImmigrationRefugeesandCitizenshipCanada,havebeenworkingtoprovideresearcherswithevenmoredataopportunities.ThisisinadditiontothewealthofdataStatisticsCanadaprovidesinbothPublicUseMicrodatafileswhichnoviceandintermediatestatisticianscanusetoproducetheirownstatisticalestimatesandequations(e.g.,EthnicDiversitySurvey,LongitudinalSurveyofImmigrantstoCanada,etc.).Moreadvanceddatausersaffiliatedwithuniversitiescanaccessthemasterdatafilesofoverthreedozencurrentsurveys.Thisallowsuserstolookatsmall-scaletrendsandpatternsindatathatcannotbereleasedtothegeneralpublic.Althoughitispossibleforindependentresearcherstogainaccesstoadministrative-leveldatasuchasthemasterdatafileforIMDBwhichishousedinOttawa,theseopportunitiesaregiventoonlyafewselectresearchers.Therehavebeensomeveryinnovativeandrecentusesofthedata,suchastheGISMappingProject(Garceaetal.,2016)whichusesdatafromtheImmigrantLandingFileandIMDBtogeo-codelandingsandotherrecordsamongimmigrantsandrefugeestoCanada’swesternandnorthernregions.

Page 9: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

8

Insummary,therearemanypossibilitiesforusingthisdataandnon-governmentagencies,researchersandpolicyanalystsaremakingincreasinguseofthistimelyandimportantdata.Thisreportoutlinessomeoftheoutcomesofthisresearch.

4.MethodologyThisprojectseekstounderstandtheuseandchallengesofbigdatainimmigrationresearch.Ascopingreviewmethodservedtoisolatekeythemesembeddedwithinsubjectareaswhichenabledapreliminarysummaryofalargebodyofimmigrationliterature.Givenashorttimeline,Levac,ColquhounandO’Brien’s(2010)six-stageframeworkprovidedarigorousmethodformappingliterature.Steponeofthescopingreviewisdeterminingthesearchtermstobeusedandinourcase,thedatabasesmostcommonlyused.Weusedseveralsearchtermsincluding:administrativedata,refugees,mentalhealth,evidencebasedresearch,datadrivenresearch,bigdataimmigration,labourmarketincomeandjobs,amongothers.Wealsosearchedformajordatabaseswhichwewereawareof.Theseinclude:bigdata,LongitudinalSurveyofImmigrantstoCanada(LSIC),LandedImmigrantsDatafile(LIDS),ImmigrantLandingFile(ILF),ImmigrantDatabase(IMDB),CensusofCanada,NationalHouseholdSurvey(NHS),andCanadianCommunityHealthSurvey(CCHS).Thesecondstageidentifiedarticlesfromjournalsaswellasotherrelevantnon-indexedliteratureincludinggovernmentreports,OECDreports,andunpublishedstudies.Thesesearchtermswereusedtosearchuniversitylibrarywebsitesanddatabases,suchasEBSCOhostthatindexesandabstractsover3,000periodicals,andfulltextofover1,500periodicals.Manyoftheseperiodicaljournalsarepeer-reviewed.Inaddition,librariansfromtheUniversityofManitobaandBrandonUniversityreviewedandtheirsuggestionswereusedtoassistusinthesearchprocedures.Thissearchresultedinadatapoolwiththerelevantliteratureextracted.Thethirdstageofthestudyinvolvedtworesearchersindependentlyreviewingtheabstracts,researchthemes,methodologyandstudylimitationsforeacharticleandreport.Allthisinformationhelpedindecidingiftheiteminvolvedbigdataandhowtheyhavebeenusedinimmigrationstudies.Themappingofthedatawasdoneintwoseparatespreadsheetsforatotalof453studies.Bothreviewersindependentlyextractedtherequiredinformationandcodedtheliteratureas“include”,“exclude”or“uncertain”.Twenty-fourstudieswereduplicatesofoneanotherand127studieswereexcluded.Atotalof251studieseventuallymettheprojectcriteria:beingaboutatopicofimmigrationinCanadaandpossiblyinvolving‘bigdata’.Therewerealsoabout60generalresearchpapersandresearcharticlesthatexaminebigdataasaresearchmethodology,whichwereseparatelycodedforansweringfewofthestudyquestionsonchallengesandopportunities.Thefourthstageinvolvedconfirmingconsistencyofcodingthearticlesbetweenthetworesearchers.Tenrandomentriesweregiventoeachresearchassistanttorecodeasaprocessofcheckingvalidityandreliabilityincodingofdata(foratotalof20studiesrecoded).Aftersmallchangesweremadetoensureconsistencytheyproceededtocodeallarticles.Athirdresearcherscrutinizedthecodingresultsandresolvedanydiscrepancies.Thesechecksservedtostrengthenthereliabilityofthecoding.

Page 10: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

9

Thefifthstageinvolvedcollatingandsummarizingresultswithadescriptivenumericanalysis(numberofdocuments,type,yearpublished,etc.)andathematicanalysis(e.g.identifythemesused).Reportingresultsincludedidentifyingimplicationsforresearch,policyandpractice.Thelaststage,consultation,wascompletedwiththeadvisorypanelforthepurposeofknowledgemobilization.Therewereinbetweenconsultationswiththeadvisorypanelforincludingtheirideasandsharingtheresearchprocessandoutcomesfromtimetotimeduringthewholetenureoftheproject.Therewerefewmethodologicalchallengeswhichtheresearchersfacedwhiledoingthescopingreview.Itwasdifficulttodecideatthebeginningifthearticlesinthesearchlistwereappropriatefortheintendedresearchtopic,especiallyifthedatathatwasusedwasreally‘bigdata’.Intheend,wedecidedtocodethedatasetsinawaythatallowedustoeasilyidentifytheadministrativelinkeddataandcouldseparatethosefromothertypesofdata(someofwhichisarguablynot‘bigdata’atall).Wefoundthatkeywordsthatauthorsusedtodescribetheirstudiesdidnotaccuratelyreflecttheactualcontentofthearticle.OursearchwasconductedusingEnglishtermsonly,whichbiasedoursearchresults.Giventhesixmonthtimeconstrainttocompletethisproject,weranoutoftimetoincludethevaluableFrenchlanguagepapers.Thiswaywemighthavemissedsomescopingreviewsintheavailableliterature.Itwasalsoverydifficulttodecideifwehaveincludedalltherelevantarticlesandtheonlywaywecouldassurethatwehavesearchedexhaustively,wasthroughrepeatedsearchleadingtosimilararticlesshowingupagainandagain.Oneoftheprincipleinvestigatorsassistedtheteambyidentifyingthenamesoflargedatasetsasawayoffilteringinandoutrelevantmaterial.Forthisreview,boththereviewershadtousetheirstrongjudgmenttodeterminethateachreviewasawholesufficientlymetourstudyhypothesisandresearchquestionsofscopingreview.Thismighthaveleftsomegapwithinthemethodologicalprocesssubjecttoreviewerbias.

5.Results

5.1.Howhas“bigdata”beenusedinimmigrationstudies?Accordingtothereport“bigdatainActionforDevelopment”,producedbytheWorldBankGroup(López.H,etal.2014.),therearetwobasicbigdataapproachesidentifiedinGlobalcontext.Amongthetwoapproaches,onewayofutilizingbigdatais“forprojectsorprocesseswhichseektoanalyzebehaviorsoutsideofgovernmentordevelopmentagenciesinordertoheightenawarenessandinformdecisionmaking”(López.H,etal.2014,Pg.11).Thesecondapproachhelpsinanalyzing“behaviorsinternaltoasingleinstitution,suchasthegovernment,oftentostreamlineandimproveservices.”(López.H,etal.2014,Pg.11).Generallyspeaking,ourreviewfindsthatmostofthedatausedinpublishedresearchonimmigrantsdealswithdatainthesecondapproach,governmentleveladministrativedata.Therewere,however,ahandfulofstudiesthatutilizednon-administrativedatatoexaminecertainaspectsofthemigrationexperience.Thereportonthe“BigdatainActionforDevelopment”(López.H,etal.2014),givesanaptexampleonhowtextanalysisofsocialmediadatahasbeenabletosuccessfullyidentifyproblemsofdifferentgroupsofpeoplesuchasrefugeesandalsomaphumanmigrationpatterns,utilizingBigDatafordevelopingactionprojectsinvolvingbothdatascientistsandpractitionerstogetherforsolvingdifferentsocialissues.Topicsuncoveredinourscopingreviewonimmigrationand‘bigdata’includelabourmarketoutcomes,culturalandethnicdiversity,healthassessments,andvariousaspectsrelatedtosocial

Page 11: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

10

integration.Thefollowingchartsandtablesreflectsomeofthemajorthemesandobservationswemadeaswesurveyedtheexistingresearch.Themajorityofimmigrationstudiesutilizingbigdatawerepublishedafter2007,butthetrendappearstobeupward,withmoreandmorepublishedstudiesusingbigdatathaneverbefore.Figure1-NumberofBigDataStudiesPublishedbyYearofPublication

Figure1examinesthegrowingtrendofpublicationsinvolvingbigdataandimmigration.Approximately79%(199outof251)ofthebigdataresearchstudiesidentifiedhavebeenpublishedbetween2007and2016.In2010,thelargestnumberofstudiesinvolvingbigdatawerepublished,withatotalof41.After2010,about20studiesperyearinvolvedbigdata.Clearly,researchersandpolicymakersarebeginningtorelymoreonbigdatafortheirresearchinthefieldofimmigrationstudiesandthetrendseemstobecontinuing.Thisislikelyareflectionofbetteraccesstobigdataandresearcherswithbetter‘bigdata’skillsetsthaninpreviousyears.

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

0

5

10

15

20

25

30

35

40

45

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

NumberofBigDataStudiesPublishedbyYearofPublica9on

#studies Percentage Linear(#studies)

Page 12: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

11

Themajorityofimmigrationresearchutilizingbigdataispublishedinacademicjournals.Figure2-Frequencydistributionofbigdatastudiesbypublicationtype

Mostofthebigdataimmigrationresearchwecouldlocatehasbeenpublishedasjournalarticles(60%).Another20%arereports,greyliterature(18%)ordoctoralormaster’stheses(2%)(seeFig.2).Outofthe251studies,150appearinpeer-reviewedacademicjournals.Another51studiesaregovernmentreportsand44appearinthegreyliterature(definedastechnicalandworkingpapers,statisticalreports,etc.).Labourmarketissuesdominatestudiesusingbigdata.Withinthereviewedliterature,thepredominantthemewasimmigrantlabourmarketoutcomeswith57%ofthereviewedpapers(144outof251).Thisisnotsurprisinggiventhatanexaminationofpublicationthemesofthemajorpeer-reviewedjournalsinimmigrationworldwiderevealsthatlabourmarketstudiesarethethemeofover60%ofallpublishedarticlesforthepast15years(Wilkinson,2015)andisatrendthathascontinuedsincethemid-1990s(Li,1997).Thenextmostpopulartopicwassocialintegrationwith25%(63outof251)ofthereviewedstudies.Health,ethnicandculturaldiversityandothersmakeuptheremaining18%ofthearticlesusingbigdata.Figure3presentsthemajorthemesandsub-topicsofimmigrationstudiesthatusebigdata.Asmentionedabove,themajorityofstudiesutilizingbigdataintheirresearchhaveexaminedthefieldofimmigrantlabourmarketoutcomes.Humancapital,labourmarketdiscrimination,Labourhealthandjobs,andincomearethetopicsmostdiscussed.Labourmarketintegrationisthemostpredominantsub-theme,notonlywithinthelabourmarketdomain,butinalloftheresearchweidentified(72.9%).Another26studieswerecategorizedbroadlyas“labourmarket”andincludedrelatedtopicssuchasImmigrantSelf-EmploymentandEntrepreneurship,ImmigrantsandtheDynamicsofHigh-WageJobs,Immigrantearningsandeconomicoutcomes.Withinthebroaderthemeof“socialintegration”,wefindsub-themessuchas“housing”,“immigrantattractionandretention”,“immigrationandeducation”,“integration”,“settlement”and“socialcapital”amongthemostpublishedtopics.A‘catch-all’categorycontainsthepapersthatcouldn’tbecategorizedin

60%20%

18% 2%

Frequencydistribu9onofbigdatastudiesbypublica9ontype

Ar]cle Report Grey Thesis

Page 13: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

12

otherways.Itcontainsresearchfocusingonaspectsrelatedtomigrationandchildren,thenumberofimmigrantarrivals,andotherstudiesthatcouldn’tbecategorizedsuchasastudyon“racializedselectionofimmigrantsinCanada”andastudyon“Globalbankingandfinancialservicestoimmigrants”.Withinthehealththeme,physicalhealthwasmostpredominant,withmentalhealthand“theinfluenceoflanguageonvariousaspectsrelatedtohealthasmajorthemesofresearch.Therewere5studieswhichwerebasedon“multiculturalismanddiversity”includedwithinthebroadercategoryofdiversity.Figure3-Indepthexaminationofbigdataimmigrationtopics

Closeto60%oftheimmigrationstudiesutilizingbigdatauseStatisticsCanadasurveysastheirdatasource.InCanada,thefederalgovernmentfundsthebulkofsocialscienceresearch.Theyarealsothemaincollectorsofpublicusemicrodataintheformsofsurveys.Itisnotsurprisingthatgovernmentdatamakesupthemajorityofdatasourcesforimmigrationandbigdataresearch.Almostathird(29%)ofallthedatasourcesinvolvedsomelinkedgovernmentdata,usuallybetweenanadministrativedatabaseandasurvey.Anotherthirdusedlongitudinalsurveys.Figure4alsoshowsthatalmost60%ofthestudiesreviewedutilizeStatisticsCanadadata.Onlyninepercentofthestudiesusedataextractedfromindependentandnon-governmentsurveysandinternetsources.

5 8 2 4 6

26

2 2 3

105

2 518

311 15 20

4 10

MULTICULTURA

LISM

/DIVER

SITY

HEALTH

HEALTH

ANDLANGU

AGE

MEN

TALHE

ALTH

HUMAN

CAP

ITAL

LABO

URMAR

KET

LABO

URMAR

KETAN

DDISCRIMINAT

ION

LABO

URMAR

KETAN

DHE

ALTH

LABO

URMAR

KETAN

DINCO

ME

LABO

URMAR

KETINTEGR

ATION

IMMIGRA

NTCH

ILDR

EN

MIGRA

TIONFLO

WS

OTH

ER

HOUSING

IMMIGRA

NTAT

TRAC

TIONAND

RETENTION

IMMIGRA

TIONAND

EDUCA

TION

INTEGR

ATION

SETTLEMEN

T

SOCIALCAP

ITAL

DIVERSITY HEALTH LABORMARKET OTHERS SOCIALINTEGRATION

Indepthexamina9onofbigdataimmigra9ontopics

Page 14: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

13

Figure4–Typesofdatasourcesusedinimmigrationresearch

TheCanadianCensusandtheLongitudinalSurveyofImmigrantstoCanada(LSIC)arethedatasetsmostutilizedbyresearchers.Figure5-NumberofDataSourcesperstudy

Figure5representsthenumberofbigdatasourcesusedinthereviewedstudies.Amongthe251studies,163(65%)usedonedataset,35(15%)usedtwodatasets,14(6%)usedthreedatasets

Sta]s]csCanada-LongitudinalSurvey

33%

Sta]s]csCanada-Census12%

Sta]s]csCanada-Survey11%Censuslinked

withSta]s]csCanada-Survey

2%

CensuslinkedwithSta]s]csCanada-LongitudinalSurvey

2%

LinkedDatafromGovernmentSources14%

LinkedDatafromGovernmentand

IndependentSources8%

CensuslinkedwithGovernment

Administra]veData3%

Other(mostlyindependentandnon-gov.surveys,internet

sources)9%

GovernmentAdministra]veData

6%

65%14%

6%15%

Fig.5NumberOfDataSources

1datasource 2datasources 3datasources >3datasources

Page 15: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

14

whileanother39(15%)ofthestudiesusedmorethanthreedatasets.Insummary,bigdataisoftennotlimitedtotheexaminationofasingledatasource,whichmakesanalysismoredifficultbutmoreholistic.Figure6-MostPopularDatasetsused

Figure6showsthetypesofdatabasesusedbythestudiesincludedinourscopingreview.Studiesusingmorethanonedatabasewerecountedmorethanonce.TheCensusdatahasbeenusedin73studiesoutof251(29%)andisbyfar,themostciteddata.ThisisfollowedbytheLongitudinalSurveyofImmigrantstoCanada(LSIC),appearingin24%ofthestudiescapturedinthescopingreview.Thisisinteresting,giventhatthesurveyendeddatacollectionin2004butresearcherscontinuetopublishfromthewealthofthiswork.Therearedatabasesusinginternet-baseddata,independentsurveys,datafromothercountries(forcomparativepurposes)whichwerecategorizedas“others”andwerefoundinfiftythreestudiesReflectingthepreoccupationwithlabourmarketoutcomes,theSurveyofLabourandIncomeDynamics(SLID)(29studies),LabourForceSurvey(LFS)(15studies),WorkplaceandEmployeeSurvey(WES)(14studies),FinancialSecurity(SFS)(2studies),andSurveyofFamilyExpenditures(Famex),YouthinTransitionSurvey(YITS)arealsocitedassourcesofdata.

1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 5 5 7 8

14 15 15

24 29

32 53

59 73

0 10 20 30 40 50 60 70 80

SurveyofFamilyExpenditures(Famex)

LongitudinalSurveyofLow-IncomeStudents(L-SLIS)

Na]onalSurveyofWorkandLifelongLearning(WALL)

LabourMarketAc]vitySurvey(LMAS)

Interna]onalAdultLiteracyandSkillsSurvey(IALSS)

PanelStudyofIncomeDynamics(PSID)

Intergenera]onalIncomeDatabase(IID)

LongitudinalAdministra]veDatabank(LAD)

Na]onalPopula]onHealthSurvey(NPHS)

CanadianCommunityHealthSurvey(CCHS)

WorkplaceandEmployeeSurvey(WES)

LabourForceSurvey(LFS)

SurveyofLabourandIncomeDynamics(SLID)

Other(internetdata,independentsurveys,othersmaller

Census

MostPopularDatasetsused

Page 16: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

15

5.2.ChallengesandlimitationsofusingbigdatainimmigrationstudiesAlthoughtherearenumerousbenefitstousingbigdata,weknowtherearevariouschallengesandlimitationsandtheresultsofthescopingreviewrevealedmany.Wehavesummarizedthemhereandaddedsomeofourownideas.Experts inbigdata talk about limitations regarding the sample selectionand samplingbiasbuiltintobigdata.Bigdatasets,despitetheirlargenumberofparticipantsandvariables,haveissuesofpopulation underrepresentation, especially in administrative data and often contribute toinaccurateandunsettledrelationshipbetweenselectedvariables.Forexample,peoplewhohavenocontactwiththemedicalsystemmaynotappearinadministrativedatabasesandskewtheresults.This has bigger implications for immigration research. Researchers who engage in immigrationresearch face the issue of having to deal with relatively small sample sizes even within largedatasets. TheNational Longitudinal Study of Children and Youth, for example, chronically undersampledimmigrantandrefugeechildrenforthefirstfewyearsofitsexistence.EvenLSIChasbuilt-inbiases,particularlyinthesmallsamplesizeofadultsaged55andolder(Hochbaum2013).Tobefair,LSICwasintendedtofollownewcomersatarrivalandveryfewimmigrantsarrivetoCanadawhoareaged55andolder.Samplingissuescanalsooccurevenintargeteddatabases.Forinstance,even though theLongitudinal Surveyof Immigrants toCanada (LSIC) contains a large volumeofinformationabouttheimmigrantexperience,itisverydifficulttoexaminenewcomersfromallbutthefivelargestimmigrant-sendingcountries.Evensexdifferencescanbedifficulttoexaminegiventhe small sample size. Pottie and colleagues (2008) describe the recruitment problems withlongitudinal studies, particularlywith immigrants, asmany fear for their future in Canada (theymightbedeported)andwerereluctanttoparticipateinthestudy.Simoneandherresearchteam(2014) complain that LSIC does not include refugee claimants and temporaryworkers so thesegroupsarelargelyignoredinresearch.Administrativedatacanbedifficulttouse,particularlyinimmigrationstudies.Often,thisdatamustbelinkedwiththeImmigrantLandingFile(ILF)—andeventhoughitisacensusofallnewcomersto Canada, itmay not be possible tomake positivematcheswith newdata. Individualswho aredifficulttomatch(whomayhaveincorrectidentifiersenteredintotheadministrativedatabase,forinstance)arenot‘matched’andarethenexcludedfromthedatabase.Thereareknownpatternstothis type of information—with those arriving as children or as teens more likely to be‘unmatchable’ due to identification-based database errors of this sort. Undoubtedly,when ILF isusedtolinkwithaStatisticsCanadaSurvey,researchershaveachancetoexaminedatainwaysthatwerenotpreviouslypossible.Thishasopenedupresearchonquestionsrelatedtorefugees,whohave historically been excluded from analyses or are mixed in with other types of newcomers.Whilethishasbeenimportanttoadvancingourknowledgeofthisgroup,thereareotherproblemswithlinkingILFtosurveysmeantforthegeneralpopulation.Lightmanandhercolleagues(2013)haddifficultyanalyzing theSurveyonLabourand IncomeDynamicsbecause itemspertaining tothe immigrant experience, such as country of last job and country of highest level of education,were not asked in that survey. It is frustrating from a Canadian standpoint because nearly one-quarter of the Canadian population was not born here and the failure of general surveys toroutinelycollecttheseitemsembedsevenmorebiasintoresults.Itisafrustrationechoedbyotherresearchersinvestigatingotherdatabases(Marzia2015;Papademetriouetal.,2009).

Page 17: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

16

Another challenge is that some comprehensive big data sets, such as the ones obtained fromStatisticsCanada,havebeenreportedbyusersasbeingnearlyimpossibletovalidatethemeasuresandindicatorswithinthedataduetomissingdata,notknowinghowtomanipulatetheinformation,inconsistencyinhowthesurveysorcensuseswerecollected,difficultywithhowtooperationalizeand measure selected variables (Nikolaos 2002). The impact it has on immigration studies inCanadaisthat, it influencesthekindofstudiesresearcherscandoandthekindofquestionstheycanask.Storage,management, and sharingofbigdata remaina challenge for all usersofbigdata.Thesethree issues are connected. In Canada, as elsewhere, many of these large data sets, bothadministration and survey-based, are located in secure facilities. The protection of data is a bigissue in 21st century research and the possibility that the anonymity of participants could bejeopardized if the data were to fall into the wrong hands is real (Franke et al, 2015).With theincreasedsecurityofdata, comesrestriction toaccess,meaning fewerand fewerresearcherscanaccessthedata.Rigoroussecurityscreeningisoftenrequiredbeforearesearchercanaccessdata.They are also constrained by the office hours of the organization ofwhose data they are using.Studentshaveaparticularlydifficulttimeinaccessingdata,althoughtheResearchDataCentresofStatistics Canada have been a golden opportunity for them to access data that only theirsupervisors could examine in previous decades. One of the unintended consequences of secureaccessisthatresearchersoutsidethegovernmentanduniversityfinditnexttoimpossibletoaccessdata. Without a university-affiliation, most of this data is out of the reach of non-governmentsettlementorganizations,groupsthatcancriticallyanalyzethisinformationforpracticalpurposes.Itsetsupatwo-tieredsystemofaccesswhereresearchersareincreasinglythegatekeepersofdatathatcommunityorganizationscoulduse.HeatherandJames(2013)pointtoarelatedissuewhichtheycall“immaturetechnology”.Byimmaturetechnology,theylamentthepaceatwhichbigdatabecomes available to the public as compared to the technology needed to manage and analyzemega-datasets. Alongside immature technology, we would like to add that the number ofresearcherswiththeadvancedskillsetsinbothdatamanagementandstatisticsareinshortsupplyso the reality is that even ifmorebigdatawere tobecomeavailable, therewill be a shortageofqualifiedpeopletoanalyzeandinterpretthenumbers.AccordingtoIBM,therewillbeashortageofover4.6milliondataandanalyticalworkersinthecomingdecade.Thereareethicalissueswiththeuseoflinkeddataandwithadministrativedata.Often,thisdataiscollected from individualswho are in need of a service such asmedical care. No one tells themwhentheyvisittheirphysician(orfiletheirincometaxorsendtheirchildtoschool)thatthedatacollected could be used for research purposes in the future. All universities and federally andprovinciallyfundedresearchrequiresanethicsreviewpriortothebeginningofdatacollectionandmanyof thedata setsused in studies for this scoping reviewdidnot seekethical approval fromparticipants.Onenotorious example is the IMDBwhich links the ILFwith the tax record file.AllimmigrantswhohavearrivedtoCanadaonorafter1980arepartofthisdataset.Datafromadministrativesources,collectedfromsocialmedia,internetandonlinecommunicationsourcesaresubjecttobias,eveniftheyarepresentedasacensus.Credibilityofresultsisamajorissue.Zagheniandhiscolleagues(2014)pointoutthatinformationcollectedfromsocialmediacanbe produced by anyone (or anything such as a cyberbot) without going through the rigorousprocess of data verification, validation and analysis. There is no valid way for researchers toidentifydatageneratedbycomputerversushumansothissortofdataisalwayssuspected.

Page 18: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

17

5.3.OpportunitiesforimmigrationresearchwithbigdataDespite these problems, there is real potential for big data in advancing future immigrationresearch. Inthisperiodofdisplacementcrisis,“accurateandtimelystatisticsareneededtoassistmigrants effectively” (Marzia, 2015, para. 3). Migration data can be very helpful for settlementorganizations to usewhen developing new programs or expanding on existing ones. One of thebiggestchallengesthatcommunityorganizationsfacedduringthearrivalofSyrianrefugeesoverashortperiodoftimewasnotknowinghowbigtheirfamilieswere.Thisiswherethetimelyreleaseofbigdata,muchlikethehandymapsofSyrianarrivalstoCanadareleasedbyIRCC,canbeusedtoa greater degree. Even basic information such as average family size can help communityorganizationslocateappropriatehousinginamoretimelyfashion.Bigdataonmigrantshelpsnotonlyincreating“sensiblemigrationpolicies”butalsoensuresadequatedistributionofresources.Ifthesharingofthesedatacanbemoredemocratic,witheasieraccesstothisvitaldata,particularlytocommunityorganizations,thenbigdatacanbeharnessedtohelpdesignandimplementbetterpoliciesandprogramsfornewcomersinamoretimelyfashion.Good research is informed by both policy and practice. Researchers should be encouraged topartnerwithinstitutionsandorganizationstoexaminequestionsrelevanttothepracticalneedsofthe settlement sector. More of this research needs to be done and if these partnerships can besustained,moredatawillbeusedandmorearticlesandreportswillberead.Frankeetal.(2015)remindusthatresearcherscanbenefitfromtheinsightofcommunitysettlementserviceworkers,particularly in interpreting the results of data analysis. Since big data is complex, continuouslychangingandgrowing,itrequiresdiverseknowledgeondatamanipulation.TheSyriancrisishasshown,withgreatclarity,theubiquitousnatureofcellphoneuseandinternetaccess.Onlyafewstudieshaveusedthisdatatoexamineimmigration.Althoughthereareinherentriskswithusing thisdata, greatknowledgecanalsobegainedwhenweexamine it.This reflectshow“unprecedentedlylargeandcomplexamountofdataisbeinggeneratedinrealtimeeverytimeacalloranonlinepaymentismade,oreverytimepeopleinteractonsocialmedia,whichisusuallyreferredtoasbigdata”(Marzia,2015,para.9).Activitiesusingsocialmedia,webloginhistory,IPaddresses, emails and text messages can be geolocated and turned into migration data usingstatisticalmodeling.“Massivedatasetsandsocialnetworkingsitesprovideopportunitiestodesignexperiments on a scale that was previously impossible in the social sciences”, (Grimmer, 2015,p.81).Whenusedappropriately,thisdatacanshedlightonimportantaspectsofimmigration.In order for immigration research to benefit from the current and future opportunities with avarietyofnewbigdatasources,thereisaneedtotrainmorepeopleindataanalysisandstatisticalmodeling. The complex work done by researchers such as Jedwab and Soroka (2014) is oneexampleofhowcomplex theseanalyses canbe. Ifwe invest in additional trainingof students tohandle largedatasets,more researchcanusebigdata.Renewed trainingmayactuallyencouragemorepeopletoentertheimmigrationfieldasmoredataanalysisopportunitiesbecomeavailable.Therearesomeexcitingdevelopmentsconcerninglinkedadministrativeandsurveydatarelatingtoimmigrationstudies inCanada. In2016,IRCC,alongwithStatisticsCanada,announcedsomenewexciting data linkages to be released in the coming 18 months. The ILF will be linked to the

Page 19: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

18

following datasets: the Canadian Employer and Employee Dynamic Database (includinginternational students and temporary workers), the 2011 National Household Survey, the 2016CensusofCanada,theCanadianCommunityHealthSurveyandthe2013GeneralSocialSurveyonSocialIdentity. IMDBwillcontinuetobereleased.ThenewSettlementOutcomesSurveywillalsobe linkedtoadministrativedatabases.HealthrecordsfromOntario,BCandManitobawillalsobelinkedtotheILF.SomeresearchersmaybeabletoaccessiCARE,theadministrativedatabasethathousesinformationonallsettlementservicesusedbynewcomersacrossCanada(IRCC,2016).Inshort, there will be great opportunities in the very near future for researchers and communityorganizationstolearnmoreaboutnewcomersonavarietyofdifferenttopics.

6.KnowledgeMobilizationThepotentialknowledgeusersofsynthesisresultsinclude,butarenotlimitedto,researchersinvariousacademicinstitutions,governmentofficials,policyanalysts,andrepresentativesoftheintegrationandsettlementsectors,aswellasgraduatestudents.Theresearchfindingsfromthisreportwillbedisseminatedwidelytoensureanoptimalresearchuptakefromallresearchknowledgegroups.Thisprojectutilizedthreetypesofknowledgemobilizationstrategies:integrated,end-of-grantandpost-grant.Theknowledgemobilizationstrategiesweredevelopedbytheresearchteamandwerecommunicatedwiththeadvisorypanelmembers.

6.1.Integratedknowledgemobilization.TheAdvisoryCommitteewascomprisedofthreeindividuals;onerepresentativeoftheManitobaprovincialgovernmentandtworepresentativesofnon-governmentalorganizations(NGOs)inBritishColumbiaandAlberta.ThepurposeoftheAdvisoryCommitteeistoassisttheresearchteamintheinterpretationandthedisseminationoftheresearchfindingsandtoprovideinputandfeedbackonthereporting.Thethreeindividualsrepresentgroupsofpotentialknowledgeusersandparticipatedinonemeetingtoprovidefeedbackontheresearchapproachandtheirinputonhowthisresearchcanbenefittheirorganizations(May).Aninformalconsultationtookplaceviae-mailsinAugustwherethepreliminaryresearchoutcomesweresharedwiththeAdvisoryCommitteeinvitingthemembers’feedback.Theadvisorypanelwillassisttheteamindisseminatingtheresultsoftheresearch.

6.2.End-of-Grantandpost-grantknowledgemobilization.Theresearchfindingswillbedisseminatedinvariouswaystoensureanoptimalresearchuptakefromallresearchknowledgegroups.Thefollowingend-of-grantandpost-grantknowledgemobilizationactivitiesareplanned:• ConductameetingwiththeAdvisoryCommittee(November)todiscusstheresearch

findingsandthemostefficientandpracticalwaystopackageanddisseminatetheresearchfindingstonon-academicaudiencesandfront-endbigdatausers.

• Presenttheresearchfindingsat“TheCanadianInstituteforIdentitiesandMigration(CIIM)&ImmigrationResearchWest(IRW)RegionalSymposium:MigrationandRefugeinWesternCanada”onOctober21st,2016.

• PresenttheresearchfindingsatSocialSciencesandHumanitiesResearchCouncil’sFallForuminOttawaonNovember22,2016

• Submitaproposalforapresentationatthe19thNationalMetropolisConference.• SubmitapublicationtotheCanadianDiversity,AssociationforCanadianStudies.

Page 20: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

19

• Produceaweb-basedresourceofthetabulateddatabaseusedduringthestudyselectionandchartingthedataphasesoftheresearchprocess.AllmaterialsproducedfromthisstudywillbehostedontheRuralDevelopmentInstitute,BrandonUniversityandImmigrationresearchWest,UniversityofManitoba’websites.LinkstotheabovewebsiteswillbedistributedthroughResearchPartnersNetwork(e.g.RuralPolicyLearningCommons)andImmigrationResearchWestNetwork.

7.ReferencesBenton,M.(2014).Smartinclusivecities:Hownewapps,bigdata,andcollaborativetechnologiesaretransformingimmigrantintegration.Washington,DC:TheMigrationPolicyInstitute.Papademetriou,D.G.,Somerville,W.,&Sumption,M.(2009).Thesocialmobilityofimmigrantsandtheirchildren.MigrationPolicyInstitute,Washington.Franke,B.,Plante,J.F.,Roscher,R.,Lee,A.,Smyth,C.,Hatefi,A.,...&Hoffman,M.M.(2015).StatisticalInference,LearningandModelsinBigData.arXivpreprintarXiv:1509.02900.Garcea,J.,JasonD.,WinstonZ.,&Wilkinson,L.(2016).Geo-spatialDataofImmigrationtoCanada’sWesternandNorthernRegion.Winnipeg;ImmigrationResearchWest.Accessedonlineathttp://gistest.usask.ca/irw/Grimmer,J.(2015).Weareallsocialscientistsnow:howbigdata,machinelearning,andcausalinferenceworktogether.PS:PoliticalScience&Politics,48(01),80-83.Smith,H.,A.,&McKeenJ,D.(2012).BigdataandDataAnalytics.CIOBrief,18(3),1-6.Hilbert,M.,&López,P.(2011).Theworld’stechnologicalcapacitytostore,communicate,andcomputeinformation.science,332(6025),60-65.Hochbaum,C.V.(2013).TooOldtoWork?:TheInfluenceofRetrainingonEmploymentStatusforOlderImmigrantstoCanada.CanadianEthnicStudies,44(3),97-120.Huebner,R.A.(2013).ASurveyofEducationalData-MiningResearch.Researchinhighereducationjournal,19.HumanitarianTracker(2016).SyriaTracker.Accessedonlineathttp://www.humanitariantracker.org/syria-trackerICT(2016)WorldTelecommunication/ICTIndicatorsDatabase.12July2016.Accessedonlineathttp://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspxInternationalOrganizationforMigration(2016).MixedMigrationFlowsintheMediterraneanandBeyond.Geneva:IOM.Accessedonlineathttp://migration.iom.int/docs/WEEKLY%20Flows%20Compilation%20No26%206%20October%202016.pdf

Page 21: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

20

InternationalLiveStatistics(2016)InternetUsersbyCountry.Accessedonlineathttp://www.internetlivestats.com/internet-users/ImmigrationRefugeesandCitizenshipCanada(2016).Mapofdestinationcommunitiesandsettlementserviceproviderorganizations,updated21September2016.Ottawa:IRCC.Accessedonlineathttp://www.cic.gc.ca/english/refugees/welcome/map.aspImmigrationRefugeesandCitizenshipCanada(2016).ResearchingSyrianRefugees:DataAvailabilityandAccess.Ottawa:IRCC.Jedwab,J.&Soroka,S.(2014).“IndexingIntegration:AReviewOfNationalAndInternationalModels”.AreportpreparedfortheDepartmentofCitizenshipandImmigrationCanadabytheAssociationforCanadianStudies(CanadianInstituteforIdentitiesandMigration),30.Khandor,E.,&Koch,A.(2011).Theglobalcity:newcomerhealthinToronto.TorontoPublicHealth.Laczko,F.&Rango,M.(2014).Canbigdatahelpusachievea‘migrationdatarevolution’?MigrationPolicyPractice6(2),20-29.Levac,D.,Colquhoun,H.,&O'Brien,K.K.(2010).Scopingstudies:advancingthemethodology.ImplementationScience,5(1),1.Li,P.S.(1997)AReviewoftheAcademicLiteratureonImmigrationStudiesinCanada.Ottawa:CitizenshipandImmigrationCanada.López.H,etal.2014.BigdatainActionforDevelopment.TheWorldBankgroup.Accessedonlineathttp://live.worldbank.org/sites/default/files/Big%20Data%20for%20Development%20Report_final%20version.pdfManyika,J.,Chui,M.,Brown,B.,Bughin,J.,Dobbs,R.,Roxburgh,C.,&Byers,A.H.(2011).Bigdata:Thenextfrontierforinnovation,competition,andproductivity.Accessedonlineathttp://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovationMcNulty,E.(2014).“Understandingbigdata:the7vs”Dataconomy.Accessedonlineathttp://dataconomy.com/seven-vs-big-data/Mendez,P.,Wyly,E.,&Hiebert,D.(2011).Landingathome:InsightsonimmigrationandmetropolitanhousingmarketsfromthelongitudinalsurveyofimmigrantstoCanada.Liodakis,N.,&Satzewich,Victor.(2002).TheVerticalMosaicWithin:Class,GenderandNativitywithinEthnicity,ProQuestDissertationsandTheses.OECD(2013).ExploringData-drivenInnovationasaNewSourceofGrowth:MappingthePolicyIssuesRaisedby‘bigdata’.Accessedonlineathttp://www.oecd-ilibrary.org/science-and-technology/exploring-data-driven-innovation-as-a-new-source-of-growth_5k47zw3fcp43-en

Page 22: Emerging Uses of Big Data in Immigration Research...Emerging Uses of Big Data in Immigration Research Final Report Submitted to SSHRC October 13th, 2016 Grant No. 421-2015-2036 William

21

Pandey,M.,&Townsend,J.(2011).Provincialnomineeprograms:Anevaluationoftheearningsandretentionratesofnominees.PrairieMetropolisCentre.Pottie,K.,Ng,E.,Spitzer,D.,Mohammed,A.,&Glazier,R.(2008).Languageproficiency,genderandself-reportedhealth:ananalysisofthefirsttwowavesofthelongitudinalsurveyofimmigrantstoCanada.CanadianJournalofPublicHealth/RevueCanadiennedeSante'ePublique,505-510.Rango,M.2015.“HowbigdataCanHelpMigrants”.WorldEconomicForum.Extractedfromonlinesourcehttps://www.weforum.org/agenda/2015/10/how-big-data-can-help-migrants/Rathnam,L.(2015)“CanbigdatahelptheSyrianconflict?”icrunchdata.comOctober8.Accessedonlineathttps://icurnchdata.com/big-data-help-syrian-conflict/Ratti,C.&Helbing,D.(2016).“Thehiddendangerofbigdata”TheStraitsTimes,September12,Accessedonlineatwww.straitstimes.com/opinion/the-hidden-danger-of-big-data.comSimone,D.,&Newbold,K.B.(2014).Housingtrajectoriesacrosstheurbanhierarchy:analysisofthelongitudinalsurveyofimmigrantstoCanada,2001–2005.HousingStudies,29(8),1096-1116.Sweetman,A.,&Warman,C.(2012).ThestructureofCanada’simmigrationsystemandCanadianlabourmarketoutcomes.Queen’sEconomicsDepartmentWorkingPaperNo,1292.UNGlobalPulse.2014.EstimatingMigrationFlowsUsingOnlineSearchData,GlobalPulseProjectSeriesno.4.VanRijmenam,M.(2014).Whythe3Vsarenotsufficienttodescribebigdata.Datafloq.Accessedonlineathttp://dataconomy.com/seven-vs-big-data/Wilkinson,L.(2015)“Immigrationresearchandpolicydevelopments:whatdoweknowandwhereareweheaded?”plenarypanelpaperpresentedattheCanadianSociologicalAssociationAnnualConference,Ottawa,UniversityofOttawaZagheni,E.,Garimella,V.R.K.,&Weber,I.(2014).Inferringinternationalandinternalmigrationpatternsfromtwitterdata.InProceedingsofthe23rdInternationalConferenceonWorldWideWeb(pp.439-444).ACM.