Dbpedia Statistics

Embed Size (px)

Citation preview

  • 8/10/2019 Dbpedia Statistics


    DBpedia Usage Reportusage data as of 2014-11-15

    covering DBpedia 3.3 (2009) to 3.9 (2013)

    a periodic report onthe DBpedia SPARQL endpoint,,

    and associated Linked Data deployment

    Some of the statistics in this document were previously published as part of:

    DBpedia - A Large - scale , Multilingual nowledge Base !"tracted from #i$ipedia

    by %ens Lehmann, &obert 'sele, Ma" %a$ob, An(a %ent)sch, Dimitris onto$ostas, *ablo + Mendes,Sebastian ellmann, Mohamed Morsey, *atric$ van leef, S.ren Auer, /hristian Bi)er

    publised 2015-01-0!

  • 8/10/2019 Dbpedia Statistics



    irtuoso ! Anytime Query ! "unctionality

    #$$P Statistics

    #$$P Lo%s

    &um'er of #its

    &um'er of isits

    (eneral $rends

    #its per )ndpoint

    #its per Statement $ype

    #its per *ey+ord



    "unctions like -&A$ , -&$AI&S , ISIRI

    .se of ()- o'/ects

    (R-.P B0

    LI1I$ 2 -""S)$

    -P$I-&AL-RD)R B0


    Query lause Patterns

    Additional $opics of Interest

    1emory "ra%mentation

    irtuoso 3 1is 45 onfi%uration

  • 8/10/2019 Dbpedia Statistics



    DBpedia is comprised of67 irtuoso .ni8ersal Ser8er Instance3s5 44 handlin% SPARQL endpoint and Linked Data

    Deployment, supportin% ne%otia'le RD" and other document formats7 A Physical omputer 44 hosted in -penLink Soft+are9s data center

    As the si:e of the DBpedia dataset has increased, and its use 'y the Linked Data community%ro+n, irtuoso soft+are and computin% hard+are ha8e 'een mi%rated to increasin%ly morepo+erful 38irtual5 machines, as outlined in the ta'le 'elo+6






    Processor CoresRAM

    3.3 3.4;= Ro+ Store, 4

    node lusterA1D -pteron ?@@, @? (h: @ (B

    3.5 3.;= Ro+ Store, 4

    node lusterIntel Ceon )@, @@E (h: ? ? (B

    3.!;= Ro+ Store, 4

    node lusterIntel Ceon )4@;, @ (#: ? ; (B


    E= olumn

    Store, Sin%le4


    Intel Ceon )4@;, @ (#: ? ; (B

    Prior to DBpedia F, +e used the irtuoso ;= Ro+ Store )n%ine, in a four4node Shared4&othin% luster confi%uration As of DBpedia F, +e mo8ed to the ne+er irtuoso E= olumnStore )n%ine, operatin% in Sin%le4Ser8er 3ie, one node, no clusterin%5 mode

    $he irtuoso ;= Ro+ Store )n%ine luster pro8ides paralleli:ation of Guery e=ecution, e8en+hen the cluster nodes are on the same machine, and hori:ontal scale4out A irtuoso luster3Ro+ Store or olumn Store, 8; or 8E5 can 'e %ro+n to satisfy desired response times for %i8enRD" dataset collections

    $he irtuoso E= olumn Store )n%ine pro8ides similar paralleli:ation to the ; luster setup,'ut its 8ectored e=ecution model does so +ith a Sin%le4Ser8er setup In addition, it le8era%escolumn4+ise stora%e and key compression for hi%hly compact +orkin% sets

  • 8/10/2019 Dbpedia Statistics


    DBpedia9s irtuoso confi%uration 3follo+in% some re8isions discussed later in this document5no+ includes6

    7 Query Cost Estimation Timeout of 120 seconds $his is the Guery plan optimi:ationthreshold that comes into play durin% the early sta%es of solution construction

    7 Query Execution Timeout of 120 seconds. $his is the Guery solution preparation

    threshold If the timeout stops e=ecution 'efore the solution is complete 44 ie, if thesolution is partial 44 this is si%nified to the Guery client 8ia #$$P response headers

    7 Maximum SPARQL query soution !a"#"a resut set$ si%e of 10&000 ro's. $his is thema=imum num'er of solution ro+s 3SELECTGueries5 or statements 3CONSTRUCTorDESCRIBEGueries5 returned per Guery solution retrie8al round4trip

    Virtuoso #An$ti%e &uer$# 'unctionalit$

    $he !Anytime Query! is a core feature of irtuoso that ena'les it to handle the challen%esinherent in pro8idin% a pu'licly accessi'le interface for ad4hoc Gueryin%, at He' scale $his

    feature allo+s any SPARQL4 and #$$P4protocol4sa88y user a%ent 3a2k2a client5 to issue lon%4runnin% and2or lar%e4solution Gueries, of +hich the complete solutions +ould e=ceed confi%uredGuery timeout and2or result set limits, and to recei8e partial solutions conformin% to thosethreshholds, +hile also ena'lin% the use of LIMITand OSETto create +indo+s 3a2k2acursors5 that slide throu%h the set of data that constitutes the Guery9s complete solution (ote)E*en '+ie ,a-in- t+rou-+ a ,artia query soution& irtuoso continues to 'or# to'ards acom,ete soution in t+e /ac#-round.

    ())P *tatistics

    ())P +ogs

    $he #$$P ser8er lo% files used in this report e=clude traffic %enerated 'y67 IP addresses that +ere temporarily rate limited after their 'urst period7 IP addresses that +ere 'anned after misuse7 Applications, Spiders, and other cra+lers that +ere 'locked after freGuently hittin% the

    rate limiter or %enerally claimed too many resources

    $he system uses a com'ination of fire+all rules and ALs 3Access ontrol Lists5 to Guickly dropsuch connections, so le%itimate users of d'pediaor% can connect and perform their lookup $osa8e time, these dropped connections are not recorded in the lo% files

    $he data +as e=tracted from reports %enerated 'y He'ali:er 8 @@

    ,u%-er of (its

    In the ta'le 'elo+, the Duration (Days)column represents the num'er of days for +hich lo%s+ere analy:ed, +hich may not ha8e 'een all days that DBpedia 8ersion +as li8e A !hit! is anyreGuest from an #$$P client



    )otal (its+ogged for


    A0erage (itsper Da$

    Median (itsper Da$


    Ma1i%u% (itson a *ingle


  • 8/10/2019 Dbpedia Statistics


    3.3 @F !"#$$%#&!' ())#*%% (%%#)+$ %**#!!% %#)%!#&)!

    3.4 %*&%!#!)+ %#'%'#&"! %#%$*!) )&%#''$ '#)(%#$&(

    3.5 @@ '*'#*!*#'(! %#%''#$%' %#+)""" )*$#"&" '#!+*#('+

    3.2 ; '%!#%(*#&*( %#)'*#)&& %#'*$#(&+ '&$#!"& '#"!+)%

    3. @? &!"#))*#$(& '#+*)!! %#!)+#('* %#+&(#)!* *#*()#"()

    3.! F; &(+#""+#"%+ '#!%+#"%+ '#(%(#((& %#+*$"+ (#$(*#"!+

    3." %#+$'#)!!#*"+ )#+)"'* '#*$*'* %#&')#''' %+#*!"#"**

    $he increasin% popularity of DBpedia is clearly 8isi'le in this %raph

  • 8/10/2019 Dbpedia Statistics


    ,u%-er of Visits

    In the ta'le 'elo+, the Duration (Days)column represents the num'er of days for +hich lo%s+ere analy:ed, +hich may not ha8e 'een all days that DBpedia 8ersion +as li8e A !hit! is anyreGuest from an #$$P client



    )otal Visits+ogged for



    per Da$

    Median Visitsper Da$


    Ma1i%u%Visits on a*ingle Da$

    3.3 @F %#'&(#(&% !#(&+ !#((& '#+)$ %)#%'$

    3.4 %#())#'!$ %%#)'! %%#"() %#&"$ %"#%!*

    3.5 @@ "#%*%#'$) %$#&!' %$#!+' '#!$! ')#%'!

    3.2 ; )#'%'#$!( %!#"(% %(#$$" $!% &$#+$"

    3. @? $#*)'#+'( ')#!(' ''#'$' %+#("% %'(#*$!

    3.! F; )#)+'#**( %$#*&% %$#(%% '#!$+ '(#"%$

    3." (#(+!#%&% ''#+'$ %*#%!( !#+(% &'#''(

    A%ain, a %raph of this data clearly sho+s DBpedia9s increasin% popularity

    $he sudden drop in 8isits4per4day 'et+een the E and the ? datasets is e=plained 'y thecom'ination of a fe+ factors6

    7 Some applications started to use their o+n pri8ate DBpedia endpoint7 -ther applications that had 'een a'usin% the DBpedia endpoint +ere 'locked7 Lan%ua%e4specific DBpedia endpoints emer%ed and took on some of the 'urden

    $he a8era%e hits per day +ere unchan%ed 'y the decrease in 8isits per day

  • 8/10/2019 Dbpedia Statistics


    eneral )rends

    (its per ndpoint

    $he DBpedia ser8er does not pro8ide only a SPARQL endpoint, 'ut also ser8es as a LinkedData #u', returnin% resources in a num'er of different formats

    "or each dataset +e selected days +orth of lo% files at random and processed those in orderto sho+ the 8arious endpoints called

    ndpoint 3.3 3.4 3.5 3.2 3. 3.! 3."

    /class %)*#$&' ''+#($+ %)$#*)% %($#$&* )+*#*!& '"*#!"* ))'#'$!

    /data %#)&"#((" '#$*%#%*% '#''!#(') '#&"$#!!' )#(%)#!"$ "#'"$#"%& $#(""#))'

    /fct '#+') %+#*'$ %%#("% %*#%"$ "#+)) %&!(' %#+"'#%*'

    /ontology *+#'&" %$*#)(+ %"%#!') %(*#'&+ %%("+ !$#$(+ $'#!'%

    /page '#$)"#+"+ "#(+'#"%* %#*+!#&&$ %#$*(#%** )#(*(#"*% (#)((#%') +$)#)'!

    /property ')+#!&+ )%%#"!' %)(#'!) %($(+ %(*%% %'*#")" (+#**(

    /resource '#!(*"' "#+*+#'%* '#&&"#+*$ '#)+!#+%% "#")$#%+! (#%")#+%& )#!(*#*!'

    /sparql '#+!+#)*( "#%((#')% '#'$&($ !#%%'#+&' %!+'#%%) %"()(! %"#'*(#)+"

    other %%%#$*) %**#+)) %"$#(!+ *'#%+$ '$&!+ '(+#)'" *"*#+*(

    total 9,618,605 16,541,529

    9,422,519 16,286,073




    (its per *tate%ent )$pe

    &e=t +e focused on the calls to the /sparqlendpoint and counted the num'er of statements pertype As the lo% files only record the full SPARQL Guery on a ,ETreGuest, all the -UTreGuestsare counted as un#no'n

    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS &!(+ '$!#"*% )$+#+%+ )'$#+'! %&*#($% $((#'%! %#("(#)$!

    !"#S$%&!$ &+"' )%#+&( %)#!!* %%#")$ ''#)+" )(#(&% '&+#!$&

    '(S!%)*( %+#*%* (#$%' "#"!! $#&'' $'#++( %%%#'!' "("#$%"


    !)#$$'#('& %#$&(#$&! *#+)+#'+" %%#'+"#')! %)#&%&(+ %%#'&('$

    unno6n ((#(** '+$#)&$ ''!#"%+ ()(#*$% "#"&"#*+' %#%))#&"( &&*#$)+

    total 2,090,387



    9,112,052 15,902,113



  • 8/10/2019 Dbpedia Statistics


    (its per 7e$6ord

    "inally +e analy:ed each SPARQL Guery and counted the use of some common key+ords andconstructions


    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS + + + + + + +

    !"#S$%&!$ & + + + + ( +

    '(S!%)*( + + + + + % +

    S(+(!$ )$*#!") "%$#+)$ '*$#&"+ %#&$+#$(" %#"*$#(!$ )#"))#""" %+#*(+'"


    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS (%( )*% '#&%* )#+'( %#""$ %"!* %#()$#)("

    !"#S$%&!$ "'#!*% '!#*)) !#!++ %#)*" $#'*' '%#+*% !(+

    '(S!%)*( %" %( )& '" &! %% ')

    S(+(!$ *$"#"(! &++#*"+ &'(#++& '#+'*#)!' )#'$(#("$ "#**)#%*+ $#&(%#)+!

    unctions i#e CONCAT, CONTAINS, ISIRI

    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS $!* %(' '% )! $& "(( %$%

    !"#S$%&!$ "'#!$& '!#($( !#($' &)$ )#!*% '+#)$) %#')(

    '(S!%)*( & %" )& '% "* + !

    S(+(!$ %$$#(%& '&'#%%' )*!#$"& %#(%'#**& '#*$%#*+( )#"!)#!++ "#*!%#%""

    se of GEOo/ects

    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS + + + + + + '%

    !"#S$%&!$ '+ ) + &() '#)() $(+ )!

    '(S!%)*( + + + + + + +

    S(+(!$ &'"#'&" '&"#$)! $&$#"$$ &)$#++* %#+)$#"&' %#'"(#'"$ &**#%)(

  • 8/10/2019 Dbpedia Statistics



    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS + + + + + + +

    !"#S$%&!$ + + + + + + +

    '(S!%)*( + + + + + + +

    S(+(!$ "+ %!* &' !&! &(* '%#()% %++#+)&


    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS + + + + ' + +

    !"#S$%&!$ %! ! (" $** %'#!++ %!#$)& &+#$)'

    '(S!%)*( ' ") ) + )( '#"+$ !!

    S(+(!$ *$#++% ')!#%!* %!%#!*( *"'#!($ *&(#!&* %#+&$#"$' '#'$!#(+!


    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS ($ + + + %#%$& % +

    !"#S$%&!$ %' %*& *#!&% "#(*' ()% %*#"+( )*#(*(

    '(S!%)*( + + + + + + $

    S(+(!$ &(*#($& *('#(+$ (*"#&$' '#%+*#'(* %#*(%#!%* '#)%(#(*' )#!%'#&*"


    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS + + + + + + +

    !"#S$%&!$ + % % )) %"% '+ '+

    '(S!%)*( % % + + + & %

    S(+(!$ ")#&&$ "*)% )'#++( '$+#&'% %'!#)($ '%+#++' %$+#+)%


    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    AS %#'$* %$#!!+ * ($ )"#"!& "#&)) )#*+"

    !"#S$%&!$ ' %%+ &*+ !$" *!" %#$%" "'

    '(S!%)*( ' &*" %#)(( )&% + + "

    S(+(!$ $%#$'& %%&'( ")$#*+* !+$#$*& %#)&*#$') '#(*$#+*' )#($!#$(+

  • 8/10/2019 Dbpedia Statistics


    &uer$ Clause Patterns

    It is also interestin% to look at the percenta%es of the sample set includin% each SPARQL Gueryclause "or instance, the ,ROU- Bclause +as apparently an o8erlooked feature, untilDBpedia F +as released, +hile the ORDER Bclause remains infreGuently used 3e8en insituations +here OSETand LIMITare used for pa%in%5

    *tate%ent )$pe 3.3 3.4 3.5 3.2 3. 3.! 3."

    ')S$)#!$ %!.&% %%.)$ %(.'! %!."" %).'( '&."+ %$.$$

    )+$(% "&.(' %).$( )%.(! '&.'$ '!.%( )$.%) &*.)*

    unct-ons *.*' $.** ').&% '%.)) '&.&" '&.*& ")."$

    .(" oects '(.() $.!& )!.$+ $.$( !.'& !.') &.')

    .%"& * +.++ +.+% +.++ +.+% +.+% +.%$ ")."$

    +))$ /"S($

    ".&& $.&) %%.&* %+.&+ (.$$ (.*' '+.%$

    "$)"#A+ )+.$% ').*) "(.)) '$.'& %$.(% %(.%& )".($

    "%'(% * '.)+ %.'& %.!) ).'" %.%& %.&& %."'

    )"# ).'$ ).%& '$.)& %%.'! %'.%) '+.$% )).)!

  • 8/10/2019 Dbpedia Statistics


    Additional )opics of Interest

    Me%or$ 'rag%entation

    In recent times 3leadin% up to the preparation of this report5, a particular user a%ent +as issuin%the follo+in% Guery a'out times e8ery hour6

    DEINE o0tp0t:1or2at 3CS43DEINE sql:sig5al67oid67ariables %DEINE i5p0t:de1a0lt6graph60ri 8http://dbpedia.org9SELECT p# l5a2e# o;;0patio5# ge5der#

    < gro0p=;o5;at < altNa2e > separator?3@3 A S altNa2es AERE p

    a dbpedia6oFl:-erso5 >rd1s:label 5a2e .

    BIND < CONCT < 5a2e# 3G3# LN,

    rd1s:label altNa2e O-TIONL p

    dbpedia6oFl:o;;0patio5 o;;0patio5 O-TIONL p

    dbpedia6oFl:ge5der ge5der ,ROU- B p l5a2e o;;0patio5 ge5der altNa2eOSET %%*!((*LIMIT %++++

    It may not 'e o'8ious at a Guick %lance, 'ut the full solution of this Guery +ould ne8er ha8e

    more records than the reGuested ,?F,EE? record OSET44 so it +ould ne8er return anyrecords to the client, ne8er mind approachin% or e=ceedin% the reGuested LIMITof ,&onetheless, the Guery had to 'e processed each time it +as recei8ed

    Query patterns like this led irtuoso, or more accurately, the standard glibmemory allocator32allo;5, to create a fra%mented memorystate 3at the operatin% system le8el5 en route tomemory e=haustion and ine8ita'le in8ocation of Linu=9s out4of4memory 3--15 process killer

    $he resultant memory fra%mentation couldn9t 'e 'e addressed +ith 2allo; $herefore, Linu='uilds of irtuoso no+ incorporate the $LS" 3 $+o 4Le8el Se%re%ate "it5allocator

    &ote6 $his allocator is still in a testin% phase, 'ut so far it appears to 'e ha8in% the desiredeffectJ ie, these kinds of Gueries are effecti8ely controlled 'y restrictions in the I&I fileJS-RKLsection !MaxQueryCostEst!ato"T!eand MaxQueryCostEst!ato"T!e$&com'ined +ith irtuosoKs Anytime QueryM functionality

  • 8/10/2019 Dbpedia Statistics


    Virtuoso Mis8/Configuration

    A num'er of I&I file parameters +ere inad8ertently set to 8alues inappropriate to the DBpediaser8ice and its host en8ironment $hese 8alues also contri'uted to the o8erall insta'ility inrecent months

    $he inappropriate I&I file settin%s included N7 'efault)solat-on 2 %(A'!")$$('

    $his is not appropriate for a read4only data'ase that pro8ides ad +ocGueryin% o8er the+e', 'ecause it introduces si%nificant o8erhead due to AID implications

    7 %esultSeta%os 50000$his encoura%ed Gueries constructed +ith +holesale data e=traction in mind, at thee=pense of other users $he current settin% of , may 'e further reduced to O,,in the near future

    urrent settin%s are sho+n 'elo+

    J-ara2eters...De1a0ltIsolatio5 ? % > RED=UNCOMMITTED# Fas RED=COMMITTED