Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
HAL Id: hal-01672282https://hal.archives-ouvertes.fr/hal-01672282
Submitted on 23 Dec 2017
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Visual Network Exploration for Data JournalistsTommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray
To cite this version:Tommaso Venturini, Mathieu Jacomy, Liliana Bounegru, Jonathan Gray. Visual Network Explorationfor Data Journalists. Scott A. Eldridge II; Bob Franklin. The Routledge Handbook of Developmentsin Digital Journalism Studies, Routledge, 2018, 9781138283053. �hal-01672282�
1
VISUALNETWORKEXPLORATIONFORDATAJOURNALISTSTOMMASOVENTURINI,MATHIEUJACOMY,LILIANABOUNEGRU,JONATHANGRAY
Networksareclassicbutunder-acknowledgedfiguresofjournalisticstorytelling.Whoisconnectedtowhomandbywhichmeans?Whichorganizationsreceivesupportfromwhichothers?Whatresourcesorinformationcirculatethroughwhichchannelsandwhichintermediariesenableandregulatetheirflows?Theseareallcustomarystoriesandlinesofinquiryinjournalismandtheyallhavetodowithnetworks.Additionally,therecentspreadofdigitalmediahasincreasinglyconfrontedjournalistswithinformationcomingnotonlyinthetraditionalformofstatistictables,butalsoofrelationaldatabases.Yet,journalistshavesofarmadelittleuseoftheanalyticalresourcesofferedbynetworks.Toaddressthisprobleminthischapterweexaminehow“visualnetworkexploration”maybebroughttobearinthecontextofdatajournalisminordertoexplore,narrateandmakesenseoflargeandcomplexrelationaldatasets.Weborrowthemorefamiliarvocabularyofgeographicalmapstoshowhowkeygraphicalvariablessuchasposition,sizeandhuecanbeusedtointerpretandcharacterisegraphstructuresandproperties.Weillustratethistechniquebytakingasastartingpointarecentexamplefromjournalism,namelyacatalogueofFrenchinformationsourcescompiledbyLeMonde’sTheDecodex.Weestablishthatgoodvisualexplorationofnetworksisaniterativeprocesswherepracticestodemarcatecategoriesandterritoriesareentangledandmutuallyconstitutive.Toenrichinvestigationwesuggestwaysinwhichtheinsightsofthevisualexplorationofnetworkscanbesupplementedwithsimplecalculationsandstatisticsofdistributionsofnodesandlinksacrossthenetwork.Weconcludewithreflectionontheknowledge-makingcapacitiesofthistechniqueandhowthesecomparetotheinsightsandinstrumentsthatjournalistshaveusedintheDecodexproject–suggestingthatvisualnetworkexplorationisafertileareaforfurtherexplorationandcollaborationsbetweendatajournalistsanddigitalresearchers.
INTRODUCTIONFewpeopleknowaswellasjournaliststhattheworldismadeofrelations.Followingalliances,unveilinglinks,unravellingthreadsis,andhaslongbeen,acentralpartoftheirinvestigations.Ifsocialscientistscanspeculateaboutlongstandingstructuresandglobalarrangements,journalistshavenosuchleisure.Theirworkconsistsintracingthespecificassociationsthatconnectindividualsandinstitutionstouncoverhowlumpsofmoney,influenceandknowledgeareexchangedthroughthemandwhereunethicalbehaviour,corruption,fraudorunfairpoliticalinfluencemayoccur.Theadventofdigitaltechnologieshasmadesuchworkbotheasierandmoredifficult.Easier,becauseithasincreasedthetraceabilityofeconomicandpoliticalassociations.Moredifficult,becauseithassubmergedjournalistswithmoreinformationthantheirinvestigativetoolkitisusedtohandling.
When,forexample,thereportersoftheInternationalConsortiumforInvestigativeJournalism(ICIJ)receivedthe2,6terabytesand11,5milliondocumentscomposingtheso-called'PanamaPapers',theyobviouslycouldnotprocessthemmanually(Baruch&Vaudano,2016).Notethatthisisnotjusta‘bigdata’problem.Thetroublewiththeleakwasnotonlyitssize,butthefactthatitsinterestcamefromthelinksitestablishedbetweenspecificindividualsandparticulartax-havens.Extracting“key”figuresthroughstatisticalaggregationorabstractedcomputationalmodelswouldmissthepointofmanyofthestoriesthatjournalistsweremostkeentoexplore.Theinquirycouldnotsimplifythedataset,buthadtoexploreeachandeveryoneoftheconnectionsitexposed.Thiswasdone,amongothersways,throughatoolcalledLinkurious(http://linkurio.us),whoseinterestcomeslessfromitscomputationalpowerthanfromthewayinwhichitallowsitsuserstoseeandfollowtheconnectionsofanetwork.
ThePanamaPapercaseisinteresting,butalsointerestinglyisolated.Despitelongstandinginterest,theuseofnetworksinjournalismremainscomparativelymarginal(cf.Bounegruetal.,2016foranoverviewoftheemergingusesofnetworksinjournalism).Thereasonsarenotdifficulttoimagine.Graphmathematicsismoredemandingandlesswidelyknownthantraditionalstatisticalapproachesanddoesnotcomewiththesamereadilyaccessibleandpubliclyrecognisedvocabularyofvisualmotifs.Withallits
2
computationalpower,graphmathematicsdoesnotfitjournalisticneedsbecauseittendstobeobscureforbothreportersandtheirreaders.
Inthischapter,weaddressthisdifficultybysuggestingatechniqueforthevisualexplorationofnetworks.Aswewilltrytoshow,whenperformedcorrectly,thevisualrepresentationofnetworktranslatessomeofthemostimportantgraphstructuresintographicalvariables(therebysupportinginvestigativework)andallowingtheinterpretationofnetworkswithconventionssimilartothosedevelopedforgeographicalmaps(therebyremaininglegibleforalargeaudience).Afterhavingintroducedthemathematicalandhistoricalbasesofourapproach,wewillpresentourtechniqueforthevisualexplorationofnetworks.Usingasanexample,athenetworkoftheFrenchinformationsphere,wewillillustratetherecursiveworkofinterpretationandcategorisationthatallowtoreadthenetworkasanorganisedterritory.Visualnetworkexploration,whichisgrowinginprominenceamongstdigitalmethodsresearchersforsocialandculturalresearch,maybeusefulnotonlyforstudyingmedialandscapes,butalsofordigitaljournalismpractitionerswhoareinterestedinexploringandtellingstorieswithnetworksandrelationaldata.
UNDERSTANDINGFORCE-DIRECTEDLAYOUTSFarfrombeingmerelyaesthetic,thegraphicalrepresentationofnetworkshasanintrinsichermeneuticvalue,whichyouwillhaveexperiencedifyouhaveeverusedapublictransportationmap.Suchmapsaredistinctivelydifferentfromroadmapsorcitymaps.Itisnotonlythattransportationmapsaresimpler(thelevelofdetailsdependingonlyontheresolutionofthemap),itisthattheyrepresentanetworkandnotageographicalterritory.AnillustrationofthisdifferencecanbefoundinthefamousmapoftheLondontubeasdesignedbyHarryBeckin1933.BeforeBeck’sredesign,thediagramwasaclassicgeographicalmaplocatingstationsaccordingtotheircoordinates.Aftertheredesign,itbecameanetworkofcorrespondencesinwhichstationsarepositionedaccordingtotheirrelativeproximityandconnectivity.Thegaininlegibilityisevidentasthefunctionofthetransportationmapisnottosituatestationsinurbanspace,butrelativetoeachother,soastohelpuserstomovefromonetoanother(atypeoforientationthatresemblesstrikinglytooneusedbyoftraditionalseanavigators,see,forexample,Turnbull,2000,pp.133-165).
a. b.
Figure1.Londontubemap(a)in1920beforeBeckredesignand(b)in1933afterBeckredesign.
AnotherexampleofsuchmappingapproachcomesfromearlyworksinSocialNetworkAnalysis(Freeman,2000).JacobMoreno,founderofSNA,isexplicitabouttheimportanceofvisualization:'Aprocessofchartinghasbeendevisedbythesociometrists,thesociogram,whichismorethanmerelyamethodofpresentation.Itisfirstofallamethodofexploration'(1953,pp.95-96).InaninterviewreleasedbyMorenototheNewYorkTimesin1933,networkanalysisispresentedasa'newgeography'.Moreimportantthanthetitle,however,isthefigurethataccompaniesthatinterview,depictingfriendshipsamongfourthgradepupils.Thesociogrampresentedbythesefigurespowerfullyrevealshowfriendshipisnotequallydistributedintheclass.Oneonlyneedtoknowthattrianglesrepresentboysandcirclesgirlstoseehowinter-genderrelationshipsarediscouragedatthatspecificage(oratleastthedeclarationofsuchfriendships).Thetrick,ofcourse,onlyworksbecausethenodesarenotpositioned
3
randomlyinthespace,butinawaythatminimizesline-crossing(inMoreno’sownwords'thefewerthenumberoflinescrossing,thebetterthesociogram',1953,p.141).Itisbecausetrianglesarepushedononesideandcirclesonanotherthatitiseasytospottheexistenceofasingleinter-genderconnection.
a. b.
Figure2.Sociogramrepresentingfriendshipamongschoolpupils(originaltitleandimageaccompanyingMoreno’s1933NewYorkTimeinterview)(a)intheoriginalversionand(b)inthemodernforce-directedspatialisation.
Moreno’sruleofspatialisationiseasytofollowonagraphofafewdozennodesandedgesbutimpracticableonlargernetworks.Graphswiththousandsofnodesandedgesaresointricatethatthedirectcountingofline-crossingsbecomesprohibitivelytime-consuming.Anindirectapproachconsistsofdrawingclosertheconnectednodestominimizethelengthoftheedgesandthereforethepossibilityofcrossings.Buteveninthiscase,sinceeachnodemaybeconnectedtoseveralothernodeswhicharethemselvesconnectedmanyothernodes,minimizingthelengthoftheedgesisfarfromatrivialexercise.
Thuswemightexplorethenetworkusingatechniquecalled'force-directedspatialisation'.Suchspatialisationfollowsaphysicalanalogy:nodesarechargedwitharepulsiveforcethatdrivesthemapart,whileedgesactasspringsbindingthenodesthattheyconnect.Oncethealgorithmislauncheditchangesthedispositionofnodesuntilitreachesabalancesuchofforces(Jacomyetal.,2014).Suchequilibriumreducesline-crossingsandimprovesthelegibilityofthegraph.FrüchtermanandReingold(1991),whoproposedthefirstefficientforce-directedalgorithm,citeline-crossingasthesecondoftheiraestheticcriteria.
Yet,scholarsworkingwithnetworkssoonrealisedthatavoidingline-crossingisnotthemostinterestingeffectofforce-directedlayouts.Atequilibrium,thevisualdensityofnodesandedgesbecomesanapproximatebutreliableproxyofthemathematicalstructureofthegraph(foradetailedmathematicalproof,seeVenturinietal.,forthcoming).Groupsofnodesgatheringinthelayouttendtocorrespondtotheclustersidentifiedbycommunity-detectiontechniques(Noack,2009);structuralholes(Burt,1995)tendtolooklikesparserzones;centralnodesmovetowardsmiddlepositions;andbridgesarepositionedsomewaybetweendifferentregions(Jensenetal.,2015).
Thetrickofforce-directedalgorithmsisallthemoreremarkable,giventhatthespaceofnetworksisrelativeratherthanabsolute(itcanberotatedormirroredwithoutdistortionofinformation)andthatitisaconsequenceandnotaconditionofelementpositioning.Intraditionalgeographicalrepresentation,thespaceisdefinedaprioribythewaythehorizontalandverticalaxesareconstructed.Pointsareprojectedonsuchpre-existingspaceaccordingtoasetofrulesthatassignaunivocalpositiontoapairofcoordinates.ThesameistrueforanyCartesiandiagram(scatterplotsforinstance),butnotfornetworks,inwhichthespaceisdefinedbythepositionofthenodesandnottheotherwayaround.
Despitesuchdifferences(whichshouldnotbeforgotten),force-directedalgorithmsallowreadingnetworksasgeographicalmaps,translatingcomplicatedmathematicalconceptsintomoreconventionalvocabularyofregionsandmargins,pathandlandmarks,centresandperipheries(Lynch,1960).Thisisacrucialadvantagethatexplainswhyforce-directedalgorithmshavebecomethede-factostandardofnetworkvisualisation:theyfacilitatetheexplorationofnetworksandrelationsbymeansofmorefamiliarandintuitivespatialmetaphors,aswellasthroughlessfamiliarcomputationalandstatisticalmetrics.
4
THEDÉCODEX:ACONTROVERSIALCASESTUDYInthefollowingpages,wewillillustratethetechniqueofvisualnetworkexplorationdrawingonaconcreteexample.OurcasestudyisanetworkofwebsitesextractedfromalistingcompiledbytheFrenchjournalLeMonde.Since2009,agroupofjournalistsgatheredunderthenameofLesDécodeurs(www.lemonde.fr/les-decodeurs/article/2014/02/12/l-equipe-des-decodeurs_4365082_4355770.html)hasverifiedtheaccuracyofthousandsofstoriescirculatingintheFrenchblogosphereandinsocialmedia.InJanuary2017(atthebeginningtheFrenchpresidentialcampaign),LesDécodeurshavelaunchedanonlinetoolcalledtheDécodex(www.lemonde.fr/verification),allowingreaderstosearchforthemostimportantsourcesofonlineinformationrelevanttoFrenchpublicdebates(thoughnotnecessaryinFrench).Eachsourceisaccompaniedbyashortdescriptionand,morecrucially,byanevaluationofitstrustworthinessaccordingtothejournalistsofLeMonde.
Figure3.UserinterfaceoftheDécodextoolbyLeMonde
Notsurprisingly,theclassificationprovidedbyLesDécodeurshasstirredmuchdebateintheFrenchmediaspheres.Severalofthesourcescategorizedasimpreciseorunreliable,alongwithothernewspapersandblogs,havecontestedtheDécodex,withcritiquespanningfromchallengingthewayinwhichwebsitesareover-simplisticallyclassified;toquestioningtherightofLeMonde(whichisitselfarivalsourceofinformation)tonotethereliabilityofotherwebsites;todisputingthelegitimacyandinterestofsuchclassificationingeneral(arguingthatsomeofthewebsitesinthelistmeanstocirculateopinionsratherthaninformation).LesDécodeursthemselvesadmittedthedifficultyoftheirexercise,themanyambiguitiesthattheywereobligedtodecideonandtheerrorsandinaccuraciesthatmayhavederivedfromthem.Atthesametime,theydefendedtheirworkbypointingattheincreasingquantityoffalseorpartisaninformationcirculatingonlineandbyaffirmingtheiropennesstodiscussingtheirclassificationandrevisingitifnecessary.
ThecontroversyaroundtheDécodexisagoodexampleofdifficultiesconnectedtothedetectionoffakenewsonline(Bounegruetal.,2017),butalsoofthemoregeneraldebatessurroundingallkindofclassifications.Categorizingthingsisneveraself-evidentorinnocentpractice(Bowker&Star,1999)andshouldalwaysbecarriedoutwiththegreatestcaution.ThisistruefortheinitialclassificationoftheDécodex,butitisalsotrueforthenetworkextractedfromit.Aswewillseeinthefollowingpages,thevisualexplorationofnetworkinvolvesaconstanttoingandfroingofcategorizationandobservation,typologyandtopology.
Tobuildourexamplenetwork,wehaveextracted,incollaborationwithLesDécodeurs,allthewebsites
5
containedintheDécodexandinvestigatedthewayinwhichtheyciteeachother.Todoso,weemployedHyphe(http://hyphe.medialab.sciences-po.fr)awebcrawlerdevelopedbythemédialabofSciencesPo,whichfacilitatestheexplorationofwebsitesandfollowingthehyperlinkspresentintheirpages.AllthewebsitescomprisingtheDécodexcorpushavebeencrawledatadepthofoneclickstartingfromthehomepage.Wesoobtainedanetworkwith653nodesand5943edges.WhilstLesDécodeursfocusoneditorialjudgementsabouthowtoclassifywebsitesintheFrenchmedialandscape,ournetworkexplorationexaminestherelationsbetweenthemandotherwebsitesbymeansoftheirlinkingpractices.Whilesomeresearchersfocusonhownetworksareheldtogetherthroughfinancialties,organisationalaffiliations,businessrelationshipsandfamilyandsocialrelations–weconsidertheirrelationsaccordingtothehyperlink,inaccordancewithalongertraditionofdigitalmethods,digitalsociologyandnewmediastudiesresearch(see,e.g.Marres&Rogers,2005;Rogers,2013)
Thetreatmentofsocialplatforms(suchasFacebook,Twitter,YouTube…)inourcrawlrequiressomeadditionalexplanation.Theseplatformsarebothsourcesofinformationasawholeandcontainersofmultipleindividualsourcesintheformofpagesoraccounts.SinceextractingallthehyperlinksfromasiteaslargeasFacebookwouldhavebeenimpossible,weonlycrawledtheaccountsthatwerespecificallymentionedintheDécodex.Wehave,however,keptarecordofallthelinkspointingtowardthemainsocialmediaplatformtoinvestigatehowtheyarecitedbytheotherwebsitesofourcorpus.
AVISUALEXPLORATIONOFTHEDÉCODEXNETWORKThevisualexplorationofnetworksexploitsthreevisualvariablestographicallyrepresenttheirfeatures:position,sizeandhue(foradefinitionofthesevariablesandtheirsemioticaffordances,seeBertin,1967).Forthereasonsdiscussedabove,positioniscrucialintranslatingthemathematicalcharacteristicsofthegraphs.Force-directedlayoutscreateregionswherenumerousnodesaredenselyassembledandregionsthatarelesscrowded.Thesedifferencesofdensity,determinedbytheunevendistributionoflinks,revealtheunevenassociationbetweentheentitiesofthenetwork.Everythingmaybeconnectedinthisworld,butnoteverythingisequallyconnected.
Discerningthespatialstructureofnetworks,however,isnotalwaysstraightforward.Intheeasiestcases,thedifferenceinthedensityofassociationissuchthatclustersappearaswelldefinedknotsofnodesandedgesseparatedbyempty(oralmostempty)zones.Thesezonesarecalled'structuralholes'(Burt,1995)and,whentheyexist,theyprovideacrucialguidancefortheinterpretationofthenetwork.Thankstotherupturescreatedbystructuralholes,theboundariesofclusterscanbeeasilydetected,likecliffsseparatingaplateaufromavalley.Mostofnaturalandsocialnetworks,however,donotexhibitsuchaclearseparationandthebordersoftheirclustertendtobegradualasthehillsideslopes.Thefuzzinessofclusters’frontiersisnotnecessarilyanobstacletotheirrecognition(onecanpointatahillevenwhenitisimpossibletosayexactlywhereitstartsandends),butitcertainlymaketheiridentificationmoredifficult.Thisiswhyvisualnetworkanalysisisoftenmorelikeanexploratoryexpedition-wheremeaningsandfindingsareprogressivelyandhermeneuticallygenerated-thantothestatisticalconfirmationofasetofpre-existinghypotheses(onthedifferencebetweenexploratoryandconfirmatoryanalysisseeTuckey,1997andBehrensandChong-Ho,2003).
ThisiscertainlythecaseforourDécodexnetwork,which,atafirstlook,doesnotpresentanymanifeststructuralholeoranyclearspatialstructure.Tovisualiseournetworkweusedtwomaintools:Gephi(https://gephi.org)forfilteringandspatializingthenetwork(usinginparticulartheforce-drivenalgorithmForceAtlas2)andGraphRecipes(http://tools.medialab.sciences-po.fr/graph-recipes)totweakthevisualrenderingofthenetwork.ThoughnostructuralholesareevidentintheDécodexnetwork,lookingcloselyatthelayoutmakesitispossibletonoticethatthenetworkdoesnotspatializeasaperfectcircle,butratherinanavocado-likeshapewithasmallertopandandalargerbottom.Theseirregularities(asweakandsubtleastheycanbe)oftensuggestthepresenceofpolarisingeffectswhichcanbeinterestingtoinvestigatefurther.
6
Figure4.TheDécodexnetworkspatializedbyForceAtlas2.Thesizeofnodesisproportionaltoin-degree.
Thefirstandmostcrucialwaytoexploreournetworkistolookattheidentityofthenodesthatoccupyitsdifferentregions.Thismayseemtrivial,butitisnot.Itisadistinctadvantageofvisualexplorationcomparedtootherformofstatisticalanalysis,thatitdoesnotaggregatetheindividualentitiesthatcomposeitscorpus:eachandeverynodeisvisibleinthelayoutandcanbeinterrogatedbytheresearcher.Evenonasmallnetworkastheoneinourexample,however,thequantityofnodescanmakeitdifficult(andtimeconsuming)tolookatallofthem.
Thisiswherethesecondvariableofourvisualexploration,size,comesinhandy.Since,innetworks,nodesaredefinedfirstandforemostbytheirconnections,wehaverankedthenodesaccordingtothenumberofedgespointingtothem.Inthejargonofnetworkanalysisthisnumberiscalled'in-degree'andnodeswithanelevatedin-degreearecalled'authorities',becausetheyarerecognisedandreferredtobymanyothers.Inthepreviousfigureandinallfollowing,wehavesizedthenodesaccordingtotheirin-degreesothatagreaterauthorityliterallytranslatesintoincreasedvisualprominence.
Readingthenamesofwebsitesthatoccupythetwopolesofouravocado,itseemsnaturaltosupposethattheirseparationderivesfromalinguisticfracture.ThewebsitesinthelowerpartarepredominantlyFrench,whilethoseintheupperpartaremoreinternational.AwaytohighlightthisistoshowtheunevendistributionofTLD(TopLevelDomain)inthenetwork.
7
Fig.5.DistributionofTDLintheDécodexnetwork.
Thelinguisticseparationwejusthighlighted,however,isnotparticularlysurprisingorinteresting.Thiskindofdivisionisregularlyobservedinnetworkofwebsitesandhyperlinks.Detectingitisimportant,butratherinanegativeway-itmakesusawarethatinordertogeneratemoreinterestingfindings,wewillhavetolookbeyondit.
Furtherexploringthenetwork,wemaynoticetheroleofnotjustlanguages,butalsosocialnetworkplatforms,suchasYouTube,Facebook,Twitter,InstagramandDailymotion.WiththeremarkableexceptionofWikipedia,allthemainsocialmediaplatformsarelocatedinthemiddlerightofthelayout-somewherein-betweentheEnglishandtheFrenchwebsites(asonewouldexpectgiventhemultilinguality),butalsoseparatedfrombothbytheirdistinctivenature(andpossiblybythedifferentwayinwhichtheyhavebeentreatedinthecrawl).
Moreover,byfocussingonthelowerandlargerpartofthenetwork,wecanrecognisetwodifferentsub-poles,withnationalsources(suchasLeMonde,LeFigaro,FranceInfo,Libération...)occupyingmostofthelowerregionandtheregionalpressclusteringatthebottom-rightofthelayout.
8
Fig.6.ZoomontheFrenchregionalpress
Thedistinctivepositionoftheplatformsandthenational/regionalpressarebothinterestingandnontrivialfindings,butwecanpushouranalysisfurther.Thewaytodosoisbyplayingwiththethirdvisualvariableexploitedbyvisualexplorationofnetwork:thehueofthenode.Thisisalaboriousbutrevealingpartofourvisualexploration.Itconsistsincategorizingthenodesofthenetworkaccordingtomultipleclassificationsandvisualizingtheseclassesonthenetworkasdifferentcolorsor(asinthispaper)asdifferentshadesofgrey.Itisimportanttonoticethattheoperationofclassifyingthenodesandofreadingthedispositionofclassesarenotseparated,butperformedatthesametime.Asitwillbecomeclearinthenextpages,ourtechniquedoesnotconsistsimplyintheprojectionofasetofpre-existingcategoriesonaconnectivity-basedlayout,butonrecursivelyusingthecategoriestomakesenseofthelayoutandthelayouttodefinethecategories.Itisimportanttorememberthatthecolorisa‘non-mixable’visualvariable.Anodecanberedorblue,forexample,butnotthetwoatthesametime.Whencategorizingnodes,itisthereforenecessarytoemployexclusivecategories.Awebsite,forexample,canbeclassedinthecategory'news'or'satire',butnotinboth.Inthe(notuncommon)caseofnodesresistingauniqueclassification,researchercanintroducearesidualcategorysuchas'multiple'or'misc'.
Asafirststepinourcombinedexplorationoftopologyandtypology,wewillcolorthenodesofthenetworkaccordingtotheoriginalcategoriesoftheDécodex.ThesecategoriesrefertothetrustworthinessofthesourcesasmanuallyassessedbythejournalistsofLeMondeinthefourcategoriesare'reliable','imprecise','unreliable'and'satirical'.Preciselybecausethesecategorieshavebeendefinedbeforeandindependentlyfromtheextractionofthenetwork,theirdispositiondoesnotfollowthespatialarticulationofthenetwork.Rather,itispossibletofindnodesofeverycategoriesinalmostofregionsofthenetwork.Aremarkableexceptionarethesatiricalwebsitesthataretobefoundontherightsideofthelayoutbothinitsupperandlowerpart.Arguably,thispositionisnotduetothehyperlinksbetweenthesatiricalwebsites(whichdonotciteeachotherverymuch),butbytheirstrongconnectionwithsocialmediaplatformstowhichallthesesitesextensivelylink.
9
Fig.7.The'satirical'websitesaccordingtotheoriginalDecodexclassification(nodehavebeenemphasizedbytheblack
colorandbydoublingtheirradiusdespitetheirlowdegree)
Theotherclassesaredistributedmoreevenlybutnotrandomly.The'reliable'websitestendtooccupythecenterofbothintheinternationalandFrenchpole,whilethe'imprecise'and'unreliable'takeamoremarginalposition.Moreinterestingly,lookingatthelowerpartofthenetwork,weobservetwogroupsof'imprecise'and'unreliable'sources-whileamajorityofthesenodesarepositionedabovethecoreofnationalandreliablewebsites(andhencein-betweentheFrenchandtheinternationalwebsite),asignificantminorityislocatedbelowthem.
10
Fig.8.Highlightofthe'reliable'websites(left)and'unreliable'and'imprecise'websites(right)
Toaccountforthisseparation,weintroduceanadditionalcategorisationbasedonthepoliticalleaningofthewebsites.Inparticular,wedistinguishthewebsitesthatdisseminateunreliableorimpreciseinformationbecausetheypursuearight-wingorextreme-rightagenda(whichoccupythecenterofthenetwork)andthewebsitesexhibitingamoregeneralconspiritorialattitude(whichoccupythebottomofthenetwork).
Fig.9.Highlightofthe‘conspiritorial’websites(left)and'right'and'extremeright'websites(right)
Throughouriterativeexplorationoftypologyandtopologywehaveeventuallyrevealedapartitioningof
11
thenetworkthat,whileinvisibleatfirstglance,allowstointerpretsomeofthemaincontoursoftheFrenchmedialandscape.Thoughtheseterritoriesarenotseparatedbyclearstructuralholes,thenodesthattheycontainarefairlyconsistent.Interestingly,ourfinalclassificationproducesahomogeneouspartitionofthelayoutnotinspite,butbecauseofitsheterogeneity,whichmixeslinguisticcategories,trustworthinessclassesandpoliticalleanings.Thefactthatanon-homogenouscategorizationturnsuptoofferthebestcharacterizationofthestructureofournetworkshouldnotcomeasasurprise.Networksarecomplexobjectswhicharticulatediverseelementsthroughdisparatelogics.Inthis,theyremindusofapassagebyJorgeLuisBorgescitedbyFoucaultasaperfectexampleofaheterogenousclassificationthat,whiledefyingourtraditionalcategories,isnonethelesshighlyefficienttodescribethecultureinwhichithasbeenelaborated:
“[Borges] quotes a ‘certain Chinese encyclopaedia’ in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies’. In the wonderment of this taxonomy, the thing we apprehend in one great leap, the thing that, by means of the fable, is demonstrated as the exotic charm of another system of thought, is the limitation of our own”. (Foucault,1970p.XV).
Fig.10.TheheterogenousterritoriesoftheDécodexnetwork.
LINKINGPATTERNSINTHEDÉCODEXNETWORKNowthat,bymeansofvisualexploration,wehavedefinedaheterogenousbuthermeneuticallyrobustpartitioningofournetwork,wecanuseitasabasisforastatisticalanalysis.Whilepraisingtheadvantagesofthevisualinterpretation,wearealsoawarethatnotallstructuralpropertiescanberenderedvisually.Thedirectionofedgesortheconnectionbetweendifferentclasses,inparticular,arenoteasilyreadinnetworkimages.Thesequestions,however,canbeinvestigatedbyothermeansoncethepartitioningofthenetworkhasbeendefined.
12
Fig.11.Distributionofthenumberofnodespercategory
Fig.11showsthedistributionofnodesintheregionsidentifiedinourfinalclassification(seefigure10),towhichwehaveaddedthe‘satirical’websites(whichwediscussedabovebutnotincludedinfigure10forthesakeoflegibility)aswellas“otherreliable”and‘otherunreliable’.Thesetworesidualcategoriescomprisetogetheraboutonefifthofthenodesofthenetwork.Thisrelativelyhighfigureisnotuncommon.Giventheheterogeneityofthenetworkstheyworkwith,socialscientistsandjournalistsshouldaimatclassificationsthatarerobustandinsightful(capableofdelineatinghomogenouszonesinthegraph)ratherthancomprehensive.
13
Fig.12.Connectivitybetweenthecategoriesofourfinalclassification.Rowsconveyhowmanytimethenodesofagivencategorycitesthenodesofothercategories.
Columnsconveyhowmanytimethenodeofagivencategoryarecitedbythenodesofothercategories.
Ourempiricalcategoriesarepowerfultoolstounveildifferentlinkingstrategiesinthenetwork.Figure12abovepresentsthelinksinthecorpusaggregatedbycategories.Aswecansee,notallcategoriesciteorarecitedthesameway.‘Frenchnationalmedia’and‘platforms’aremuchcitedandbyvariousactors(theircolumnscontainlargercircles),while‘satirical’websitesarescarcelycited(theircolumnisalmostempty).Platformsdonotcitemuch,butthisismerelyaconsequenceofourmethodsince(asexplainedabove)mostofthemhadnotbeennotcrawled.‘Right-wing’,conspiracytheoristandother‘unreliable’websitesareonthecontrarytheoriginsofthehighestnumberofcitationsand,veryinterestingly,theyseemtofavour“reliable”sourcesover“unreliable”ones.Asexpected,thereliablewebsitesdonotlinkbacktothem,andthisasymmetryrevealsanimportanthierarchy.Toinvestigatethislinkingpattern,wewillcomparetheincomingandoutgoinglinksofsomeofthemostinterestingcategories.
14
fig.13.Hierarchicalstructureinthecorpus,basedonourfinalcategories.Blackarrowsontherightsidesummarizethe
linksstructurebetweenthesecategories.
15
fig.14.Simplifiedversionofthestatisticalanalysispresentedinfigure13.
Thiskindofhierarchicalstructureiscommononthewebandhasbeenexplainedasaconsequenceofpreferentialattachment(Barabási&Albert,1999):actorstendtolinktootherwebsitesthattheyperceiveashigherinthehierarchyandavoidlinkingtothosethattheyperceiveaslower.Thisstyleofpreferentialattachmentwherebysmalleractorslinktoestablishmentactorswithoutreciprocationofthelinkingacthaselsewherebeencalled“aspirationallinking”(Rogers,2013).Linksinanetworkdonotalwaysproduceahierarchyofcategoriesbutthisbehaviourdoes.Thislinkingpatternandthewayitfitsourempiricalcategories,maysuggestanalternativewaytocharacterisethetrustworthinessbeinginvestigatedbyLeDécodeurs:reliablesourcesarecitedbyalltypesofwebsites,whileunreliablesourcesareonlycitedbyfewothertypes(ifany).
Thisobservationisinmanywaysatoddswithwhatisoftenaffirmedabout“post-truthera”inwhichwehavesupposedlylanded.Whilefakenewsissaidtoleveragethehorizontalityofdigitalmediatoblurtheboundariesbetweentrueandfalse,thelinkingpatternsofthe(French)informationspheressuggestadifferentpicture.Despitetheirdifferentideologicalleanings,allwebsitesagreeontheoverallhierarchyofreliabilitybycitinginonesenseandnotintheother.The‘right-wing’websites,forexample,trytoblurthelinesbycitingboththeirpeersandmorereliablesources,buttheyalsotrytodrawalinebetweenthemandtheevenlessreliable‘conspiracytheorist’websites.Whateveritspositioninthepyramidofhyperlinking,everyactortriestoimproveitssituationbylinkingupwardstoauthoritiesabove,andnotlinkingtolessreputablewebsitesbelow,thusreinforcingthehierarchy.
CONCLUSIONThischapterdiscussedthevisualexplorationofnetworkswiththeaimofimprovingtheunderstandingofoneofthedominantvisual-analyticalformsofourdigitalage–thenetworkdiagram–anditspotentialroleinrelationtothestudyandpracticeofdigitaljournalism.Drawingongraphsemioticsandtraditional
16
cartography,thischapterproposedamodelwherebytheinterpretationofnetworktopologywithitsregions,paths,coresandperipheries,isguidedbythreevisualvariables:position,sizeandhue.Theprocessthatwedescribedisonethatemphasizestheexploratoryanditerativecharacteroftheinvestigation.Whilecounter-intuitiveatfirst,weemphasisedthatinordertosurfacethemultiplelogicsthatplayoutinthestructureofanetworkgraph,analysisshouldnotlimititselftooneclassificatoryprinciple.Multipleheterogeneouscriteriaofclassificationareoftennecessarytocharacterizethetopologyofanetworkmap.Finally,weadvocatedformixingmethods,complementingvisualnetworkexplorationwithstatisticalanalysesinordertofurthercharacterisenetworkproperties.ThroughthecasestudyofFrenchmediahyperlinkmap,wetriedtoshownhowthevisualexplorationofnetworksrevealsnewangleswhichotheranalysesmayleaveunexplored.Inthiscasethechapterillustratedanalternativewaytoassesswebsites’reliabilitythatcomplementsthetraditionalfact-checkingapproachofqualifyingcontentwithanexaminationofthelinkingpatternsbetweendifferentregionsofthenetworkasreputationalmarkers(Rogers,2013).InthisanalysisthuswehavecombinedthemanualclassificationofreliabilityundertakenbyLeMonde’sjournalistswiththestandingofasourceaccordingtothehyperlinksthatitreceivesandgives.Thisapproachenabledustobringfreshfindingstocurrentdebatesaroundfakenews.Inspiteoftheproliferationoffabricatedcontentofvariousshades,reputationhierarchiesonthewebseemtobemaintained(atleasttosomeextent),asfakeandhyper-partisansitesdeployaspirationalhyperlinkingstyleswhichfavour,perhapssurprisingly,authoritativesources.
REFERENCESBarabási,A.L.,&Albert,R.(1999).Emergenceofscalinginrandomnetworks.Science,286(5439),509.Retrieved
fromhttp://www.sciencemag.org/cgi/content/abstract/sci;286/5439/509
Baruch,J.,&Vaudano,M.(2016,April8).« Panamapapers » :undéfitechniquepourlejournalismededonnées.LeMonde.Paris.Retrievedfromhttp://data.blog.lemonde.fr/2016/04/08/panama-papers-un-defi-technique-pour-le-journalisme-de-donnees
Behrens,J.T.,&Chong-Ho,Y.(2003).ExploratoryDataAnalysis.InI.B.Weiner(Ed.),HandbookofPsychology(pp.33–64).London:Wiley.http://doi.org/10.1002/0471264385.wei0202
Bounegru,L.,Gray,J.,Venturini,T.,&Mauri,M.(2017).AFieldGuidetoFakeNews.Retrievedfromfakenews.publicdatalab.org
Bounegru,L.,Venturini,T.,Gray,J.,&Jacomy,M.(2016).NarratingNetworks:ExploringtheAffordancesofNetworksasStorytellingDevicesinJournalism.DigitalJournalism,
Bowker,G.C.,&Star,S.L.(1999).SortingThingsOut:ClassificationandItsConsequences(InsideTechnologyS.).CambridgeMA:MITPress.
Burt,R.S.(1995).StructuralHoles:TheSocialStructureofCompetition.CambridgeMA:HarvardUniversityPress.Retrievedfromhttp://books.google.com/books?id=E6v0cVy8hVIC&pgis=1
Foucault,M.(1970).TheOrderofThings.NewYork:PantheonBooks.Freeman,L.C.(2000).VisualizingSocialNetworks.JournalofSocialStructure,1(1).Fruchterman,T.M.,&Reingold,E.M.(1991).Graphdrawingbyforce-directedplacement.Software:Practiceand
Experience,21(NOVEMBER),1129–1164.Retrievedfromhttp://onlinelibrary.wiley.com/doi/10.1002/spe.4380211102/abstract
Jacomy,M.,Venturini,T.,Heymann,S.,&Bastian,M.(2014).ForceAtlas2,aContinuousGraphLayoutAlgorithmforHandyNetworkVisualizationDesignedfortheGephiSoftware.PloSOne,9(6),e98679.http://doi.org/10.1371/journal.pone.0098679
17
Jensen,P.,Morini,M.,Karsai,M.,Venturini,T.,Vespignani,A.,Jacomy,M.,…Fleury,E.(2015).Detectingglobalbridgesinnetworks.JournalofComplexNetworks,cnv022.http://doi.org/10.1093/comnet/cnv022
Lynch,K.(1960).Theimageofthecity.CambridgeMA:MITPress.Retrievedfromhttp://books.google.com/books?hl=it&lr=&id=_phRPWsSpAgC&pgis=1
Marres,N.,&Rogers,R.(2005).RecipeforTracingtheFateofIssuesandTheirPublicsontheWeb.InB.Latour&P.Weibel(Eds.),MakingThingsPublic:AtmospheresofDemocracy(pp.922–935).Cambridge,MA:MITPress.
Moreno,J.(1953).WhoShallSurvive?(SecondEdition).NewYork:BeaconHouseInc.Noack,A.(2009).Modularityclusteringisforce-directedlayout.PhysicalReviewE,79(2).
http://doi.org/10.1103/PhysRevE.79.026102
Rogers,R.(2013).DigitalMethods.Cambridge,MA:MITPressTheNewYorkTimes.(1933).EmotionsMappedbyNewGeography.TheNewYorkTimes,3April.Tukey,J.W.(1977).ExploratoryDataAnalysis.Reading,MA:Addison-Wesley.Turnbull,D.(2000).Masons,TrickstersandCartographers.London:Routledge.Venturini,T.,Jacomy,M.,&Jensen,P.(n.d.).WhatdoweSee,WhenweLookAtNetworks.TowardsaPositive
MeasureofSpatialisationQualityforForce-DrivenNetworkLayouts.Forthcoming.