Hunter X Scholar – Finger out Famous Men in Your Research Area

  • Upload
    cjwu

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    1/15

    1

    Chi-Jen Wu

    Electrical Engineering

    National Taiwan University

    [email protected]

    Abstract

    As the growth of the WWW, scientists and researchers publishing their research

    information on the web may become an essential comportment in academia, an

    enormous number of web pages provide information on scientists, research papers,

    and technical documents in the Internet and indexed by search engines. For a junior

    student or junior researcher, it is a nontrivial task to know/search authoritative

    scholars (experts) in his research area. Since an interesting challenge arises, how to

    finger out the important scholars on a research topic ? However, an excellent scholar

    searching system involves analysis of reputation, publication, citation, and activities

    among a large number of scholars.

    In this project, we present the design and implementation of a scholars searching

    system prototype based on a web mining approach. Our system computes the ranking

    of scholars that are relevant to a give research area, e.g. Data mining area, and shows

    out top- k scholars. We also designed %-index, a new ranking function for positioning

    scholar in a specific research field ranking. The ranking criteria of scholars are based

    on publications, citations by computing the query results from Google, Google

    Scholar. Based on the experimental results, our approach outperforms other existing

    approaches in a specific research field.

    1 IntroductionAs the growth of the WWW, academicians publishing their research information

    on the web may become an essential comportment in academia, an enormous number

    of web pages provide information on scientists, research papers, and technical

    documents in the Internet and indexed by search engines. We can expect that the

    contents of the Web will become vaster and vaster as time goes by. However, the

    variety of research area also becomes further diversity during the past decade, moreand more new research areas had been motivated, such as Data Mining, P2P systems,

    Hunter X Scholar Finger out Famous Men in Your Research Area

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    2/15

    2

    Network Coding, and so on. For a junior student or junior researcher starting on

    his/her research work, it is a nontrivial task to know/search authoritative scholars in

    his research area by a general purpose search engine. For example, we use the popular

    web search engine, Google, to search the research field Data Mining, Google

    retrieves and lists more than 20,800,000 web pages that are relevant to data mining,

    and even we use the Google Scholar [1] that is a special purpose search engine for

    scholar literature has been developed by Google, it also returns more than 1,400,000

    papers in data mining research area. For most of people, these data is too huge to find

    out the important scholar and his significant papers among the search results.

    Since an interesting challenge arises, how to finger out the important/famous

    scholars on a research topic ? In fact, constructing rankings of scholar authorities is a

    relatively new subfield of information retrieval research. This problem is different to

    the traditional expert finding problem [2-5], in essence, the goal of expert findingsystem is to identify a list of people who are with appropriate skills and knowledge

    about a given topic. However, the scholar searching problem is a deeper expert

    finding problem, it is not only identifying right scholars who possess a required

    knowledge in a research community, but also ranking their level of authority in the

    research field. Let us consider a simple scenario, a junior researcher practically wishes

    to find a list of scholars that made significant contributions and/or published a seminal

    paper in his/her research field. Unfortunately, the traditional expert finding system

    using standard IR techniques may return a Ph.D. student because he/she may haveparticular levels of expertise in the research area, but this Ph.D. student may be not a

    famous scholar in the research field [10,19]. In general, scholar searching is a more

    complex and difficult task than expert finding, especially, there are no standards

    specifying the criteria or popular qualifications necessary for particular levels of

    authority of scholar.

    In this project, we present the design and implementation of a scholar searching

    system prototype based on a web mining approach, however, we first focus on the

    problems of scholar finding and scholar ranking in this work. Our system assorts the

    ranking of scholars that are relevant to a give research area, e.g. Data mining area, and

    shows out top- k scholars. For scholar finding, we utilize the search engine and digital

    libraries to find a lot of documents about a certain topic, and extract the authors from

    the collected documents. Then we estimate the extracted authors relevance to the

    given topic on web pages through statistical analysis. We assume that authors with a

    plenty of articles about a certain topic are more likely to be an expert on that topic and

    authors with highly cited papers are indicative of the authorities. For scholar ranking,

    we design %-index, a ranking function for positioning scholars. The ranking criteria

    of scholars are based on publications, citations by computing the query results from

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    3/15

    3

    the scientific literature digital archive, such as Google Scholar, MS Libra Academic

    Search [17], or CiteSeer X [18], and the ranking function, called %-index, is a novel

    way of estimating an individual scholars impact in a single research field. The

    %-index is indicated that the total citation of a scholars papers is m% percentage of

    total citation of whole papers in this research field. In summary, we wish our system

    could make junior students more convenient in studying way. Our system URL:

    http://140.109.22.36/cjwu/dm_scholar/mycgi.htm

    Our contributions in this work include: 1) proposal of a web mining approach to

    famous/authoritative scholar searching, 2) we developed a flexible ranking function

    called %-index that facilitates scholar ranking in a smaller research field, and 3) we

    developed and demonstrated our scholar searching system in a realistic web service. A

    main advantage of our approach is that users can query any research topic and find a

    list of authoritative scholars without dedicated databases for the demand. In addition,another interesting application of our scholar searching system is automatically

    routing submitted papers to reviewers in conferences [6,7]. The assignment of

    submitted manuscripts to reviewers is a common task of journals editors or

    conference chairs, our system can help committees to find the right person for paper

    review under severe time pressure.

    In the following section we describe the relevant related work in expert finding

    system and previous efforts on ranking of scholars and research institutions. Section 3

    describes our system design and methodology. Then, we show the preliminaryexperimental results of the system and demonstrate several examples in Section 4.

    Finally, we conclude our paper in Section 5.

    2 Related WorkEspecially, we defined an excellent scholar searching system is composed of

    expert finding, expert ranking and expert profiling. First, expert finding [2,8-11] is the

    task of finding right scholars about a specific topic with high probability. Within a

    research community, such as computer science, there should be many possible

    candidates who could be relevant to a given topic, the expert finding operation could

    retrieve a list of candidates that are deemed the most likely scholars for this topic.

    Second, expert ranking [15,16] assorts the levels of authority among the candidates,

    and it involves analysis of reputation, publication, citation, and activities among a list

    of candidate scholars. Finally, expert profiling [12,13] to dig and extract the profile

    information of a individual scholar from the Web, it includes basic information,

    contact information, and the educational history. In this section, we descript the

    related work includes the above three components.

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    4/15

    4

    2.1 Expert FindingTraditional expert finding is identifying a list of people who are with appropriate

    skills and knowledge about a given topic [5]. Most of previous efforts rely on the

    development of an expert database through manual [21], or base on the text, citation

    or document analysis in matching users research topic [3,8,11,14].

    Since our research is based on Web Mining and search engines, we discuss that

    the relevant work in here. In 1997, Kautz et al . [2] developed a first expert extraction

    system called Referred Web based on Web mining and search engine techniques.

    Referred Web automatically generates a representations of social network based on

    evidence gathered from the Web pages. Based on the links of social network, Referred

    Web allows users to search people who are likely to be experts in a given topic in a

    workgroup. Recently, Harada et al . [14] proposed NEXAS system, an extension to

    web search engines that attempts to find real world entities reflected on the Web, anduse it to search people relevant to a topic. And Zhang et al . [9] proposed a mixture

    model for expert finding, main idea is to utilize the Probabilistic Latent Semantic

    Analysis (PLSA) [21] to capture the semantic relevance between the query and the

    experts. The authors also developed Arnetminer system [13] that addresses several

    key issues in scholar searching, such as scholar finding, scholar profiling. In 2007,

    Microsoft Asia Research Group also developed a similar system, called MS Libra

    Academic Search [18], which is a free computer science bibliography search engine,

    and its principal idea is based on the object-level vertical search technique proposedby Nie et al . [22].

    2.2 Expert RankingWe are aware that a few systems employed expert ranking techniques, such as

    Arnetminer, Libra, and CiteSeer X. The main idea of Arnetminer is similar to Referred

    Web, and other systems are based on the Information Retrieval schemes. Because no

    standards specifying the criteria for particular levels of authority of scholar, how to

    rank scholar is a difficult task and hard to result in a unanimous solution. The

    well-known ranking index is impact factor that is defined as the average number of

    citations per journal over a two years period. In 2005, the h-index [16] has been

    proposed to measure an individual scholars impact. The h-index is indicated that a

    scholar has published h papers and these papers received more than h citations. In the

    state of the art, Ren and Tayloar [15] provided an automatic publication-ranking based

    framework to support such ranking for scholars and research institutions. They

    discussed the most important ranking policy and pointed out some problems for

    publication-ranking. In our scholar searching system, we focus on the user-define

    research area. However, the above criteria are better at discriminating between

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    5/15

    5

    scholars within a whole research field than within a single research area. And it is also

    hard to gather the impact factors for every papers and authors. Lacking these materials,

    we wish to use %-index to work better in smaller research areas and yield more valid

    evaluation results for more prominent scholars.

    2.3 Expert ProfilingAnother important challenge of scholar searching system is expert profiling task.

    Specifically, it focuses on studying how to extract the profile of an individual

    researcher from the Web automatically. Several research efforts have been made for

    expert profiling task [12,13,23]. Recently, Tang and coworkers [12,13] present a

    unified approach to extract the scholar profile on an academic social network. This

    system also addresses the name disambiguation problem [24] in integration. Actually,

    many profile extraction methods have been proposed, an overview can be found inhere [25].

    3 Our ApproachIn this section, we describe our scholar searching system and its components, and

    demonstrate the system by several simple experiments for scholar search. First, we

    give an overview of the systems main concepts, the corresponding task components,

    and their interplay. Then we had built the system prototype based on these concepts

    and started experimenting, we demonstrated a number of search experiments, andcompared our approach with other systems, such as Arnetminer and Libra.

    3.1 System OverviewOur scholar searching system consists of three main components that are

    depicted in Figure 1. In the first step of our approach, we design a specific crawler to

    crawl the scientific literature digital archive to gather the candidates set, called c set.

    This crawler collects a lot of scientific literatures related to a given the query topic q

    and extracts the name of authors in the crawled articles. In addition, the crawler also

    analyses the citations of each literature and candidate, respectively. After obtaining

    the c set, the second step of our system is to estimate the associations between topic

    and candidates. For estimating the relevance to the given topic, we have the following

    claim.

    Claim 1 : Authors with a plenty of articles about a certain topic are more likely to

    be an expert on the topic q.

    This claim should be reasonable because a more important scholar, his/her name

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    6/15

    6

    should be more popular on the Web pages [14], e.g. a more important scholars name

    should be enthusiastically recorded on a amount of web pages, such as conference

    program, seminar web pages, journal papers. Since our idea is based in this claim, we

    estimate the extracted authors relevance to the given topic on web pages through

    statistical analysis. A number of statistical analysis methods have been proposed for

    estimating term association based on co-occurrence measures [26]. In our study,

    Chi-square test is adopted because the required parameters for it could be easily

    gathered by a search engine. After this step, we can rank candidates according to the

    results of Chi-square test, and determine top- k candidates in the c set. Finally, we use

    the ranking function, called %-index, which is a novel way of estimating an individualscholars impact in a single research field. We will define the %-index in next section.

    In the following section, we describe the details of our two components includes

    scholar extraction and scholar ranking, respectively.

    3.2 Relevance EstimatingIn our system, we use the Chi-square test to estimate the strength of relation

    between extracted scholar and given research topic by co-occurrence of their on web

    pages. The Chi-square test is easy to implement in our system. We list the requiredparameters for Chi-square test in the Table 1. Given a query topic q and a candidates

    Figure 1. A system overview of our approach and its main components

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    7/15

    7

    name c, and we assume q and c are independent. Same with [27], we can get the

    following equations,

    E(q,c) = (( a+c)(a+b))/ n,

    E(q, c) = (( b+d )(a+b))/ n,

    E( q,c) = (( a+c)(c+d ))/ n, and

    E( q, c) = (( b+d )(c+d ))/ n,

    Then, we can get a conventional chi-square test as follow:

    )()()()()(

    ),()],(),([

    ),(

    2

    },{},,{

    22

    d cd bcabacbd an

    Y X E Y X E Y X n

    cq

    ccY qq X

    ++++

    =

    =

    This Chi-square test plays a crucial role in our system, and it could be a

    co-occurrence index approximately. In the implementation, we use Google as a search

    engine, but other major search engines are also applicable platforms. The Chi-square

    test method provides a simple way to estimate the relevance between candidate and

    research topic, and it is easy to implement, but its performance strongly is dominated

    by the size and amounts of retrieved web pages.

    The required parameter Notation

    The total number of Web pages n

    The number of Web pages containing both candidates

    name and topic

    a

    The number of Web pages containing topic but without

    candidates name

    b

    The number of Web pages containing candidates

    name but without topic

    c

    The number of Web pages without both candidates

    name and topic

    d (d=n-a-b-c)

    Table 1. The required parameters for Chi-square test (we set n=8 billions in our

    experiments)

    3.3 Scholar RankingExisting ranking indexes have several limitations. One major limitation is that

    they are used to rank the whole research field, such as all of computer science. It ishard to infer the contributions of a scholar in a sub research field. For example, the

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    8/15

    8

    CiteSeer x provides a ranking of whole computer science scholar by counting citations,

    but it is not easy to find a significant researcher in the Network Coding research

    area. Hence we design a novel ranking function, called %-index, to estimate an

    individual scholars impact in a single research field, such as network coding. We

    define the %-index as follow.

    The %-index is indicated that the total citation of a scholars papers is m%

    percentage of total citation of whole papers in this research field. We define %-index

    as follow,

    100%

    =

    citations

    C taionsciC iindexi ,

    where C i is dedicated a scholar in the c set, and the is indicated a set of collected

    scientific literatures.

    However, assessing scholars is a complex social and scientific process. Our%-index could be used alone, but it should probably serve as one quantitative

    indicator in a more comprehensive methodology. In addition to publications, many

    important factors, such as research impact, funding, students, can reflect the

    importance of a scholar and we could take into consideration in advance works.

    3.4 System Implementation and DemoOur scholar searching system is implemented in CGI dynamic web page using

    the Perl language. Figure 2 is the portal page of our system. Users enter a researchtopic to retrieve top- k scholars in the given topic (here we set k =20 generally). Figure

    3 shows the output results of our system. However, our system has a drawback, it

    needs more than five minutes to process a request, due to the Google search engine

    does not accept a lot of flash requests. In additional, we also show the results of

    querying data mining and association rules in Table 2 and Table 3, individually. Noted

    that association rules is a subset of data mining field. In following section, we will

    analyze these results.

    Figure 2. Portal of our system Figure 3. The ranking results

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    9/15

    9

    Ranking Candidate %-index 2 test

    1 J Han 9.647531009 2652383

    2 M Kamber 4.334521751 3927792

    3 E Frank 3.833536147 2044977

    4 IH Witten 3.833536147 1977850

    5 G Piatetsky-Shapiro 3.728926937 28694484

    6 P Smyth 3.595141009 4588678

    7 T Hastie 3.498359699 2951988

    8 R Tibshirani 3.498359699 2219211

    9 J Friedman 3.498359699 186502

    10 JC Bezdek 3.280601752 424565

    11 UM Fayyad 3.218690179 25534724

    12 R Agrawal 2.558300065 4679901

    13 JA Hartigan 2.019598215 180509

    14 J Shawe-Taylor 1.906449478 1827019

    15 N Cristianini 1.906449478 1503047

    16 J Pei 1.84311465 1646410

    17 PS Yu 1.813226305 2855124

    18 U Fayyad 1.641012503 6263246

    19 Y Yin 1.609700903 248230

    20 Ming-Syan Chen 1.60258463 3107667

    Table 2. The ranking result of data mining research area

    Ranking Candidate %-index 2 test

    1 R Agrawal 31.18867812 63317224.61

    2 R Srikant 20.87173693 255685481.9

    3 A Swami 9.587025034 234630270.1

    4 T Imielinski 8.729262056 350995142.7

    5 J Han 8.076648629 33356763.11

    6 G Piatetsky-Shapiro 6.964678599 107474023.8

    7 H Mannila 6.30909199 67368432.23

    8 P Smyth 5.750133793 13005709.05

    9 H Toivonen 5.503359696 41276775.5

    10 UM Fayyad 5.249152643 26075405.81

    11 PS Yu 3.891895106 18344775.38

    12 Ming-Syan Chen 3.432538503 16059136.61

    13 R Motwani 3.041565083 8234266.614

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    10/15

    10

    14 S Brin 2.84979485 7073997.568

    15 MJ Zaki 2.347327109 53639929.75

    16 A Savasere 2.229886424 35516464.2

    17 DW Cheung 1.948920735 5068106.193

    18 B Liu 1.934054825 3484209.628

    19 N Pasquier 1.87905096 17276391.84

    20 R Taouil 1.855265505 18117526.04

    Table 3. The ranking result of association rules research area, a subset

    of data mining

    4 Experimental ResultsTo validate our system, we use it to perform two rankings. The first ranking

    assessed the scholars in data mining area, and second ranking evaluated association

    rules field. We compared both two ranking results with Arnetminer and MS Libra

    system by analyzing the co-occurrence information of a scholars name and the query

    topic on the Web. First we give the perspective statistics of these two fields. Noted

    due to the limitations of Google Scholar, our crawler only retrieved first 1,000 papers

    in Google Scholar search results.

    Figure 4 depicts the citation distribution of collected papers. In this figure, we can

    know that the citation impact is on the decline after the 100 th paper. And there is a

    very high citation paper in Data mining field. In fact, it is a book, called Neuralnetworks: a comprehensive foundation by Simon Haykin, received more than 11,407

    citations. Although this author has a very high citation paper, his importance may not

    be more than the top- k scholars in data mining field, his chi-square test is 107836

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    11/15

    11

    Next, we compared our approach with Arnetminer and MS Libra system by

    analyzing the co-occurrence information of a scholars name and the query topic on

    the Web. We show the rankings of our approach, Arnetminer and MS Libra in

    following tables. The Table 4 shows the product by querying Data Mining area, and

    the Table 5 shows the output of querying association rules. And the number in right

    side of a name is the co-occurrence of a name and given research topic on the Web

    (by Google, again). Following, we discussed the performance metric for evaluating

    the three systems. Let Ri be the number of co-occurrence of a name and given

    research topic in the tables. And T i is indicated the set of scholars produced by each

    mechanism individually.

    =

    =

    T Ri median R

    T T T medianmedian

    0:1?)(

    )( 321

    U U

    is our performance metric for evaluating the systems, the median () is used to

    get the median number in the union set. Figure 6 and 7 show the comparison results of

    two experiments. In first experiment, query data mining, the performances of three

    systems are similar. Then in the second experiment, our approach is more

    distinguishable than the two approaches. Noted mining association rules is the sub

    field of whole data mining area. The reason is the impact of our %-index metric,

    which is designed to ranking a specific research field.

    Figure 6. Citation distribution

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    12/15

    12

    Our Approach ( T 1) Arnetminer ( T 2) MS Libra ( T 3)J Han (46,600) Jiawei Han (46,600) Rakesh Agrawal (21,500)

    M Kamber (13,900) Christos Faloutsos (17,300) Tomasz Imielinski (5,610 )

    E Frank (20,600) Philip S. Yu (19,800) Jiawei Han (46,600)

    IH Witten ( 23,900) Mohammed Javeed Zaki (2,510) Philip S. Yu (19,800)

    G Piatetsky-Shapiro (12,000) Heikki Mannila (19,100) Christos Faloutsos (17,200)

    P Smyth (15,800) Rakesh Agrawal (21,500) Ramakrishnan Srikant (14,300)

    T Hastie (10,600) Jian Pei (11,700) Heikki Mannila (19,100)

    R Tibshirani (8,470) Usama M. Fayyad (9,230) Ian H. Witten ( 23,900)

    J Friedman (7,420) Eamonn J. Keogh (1,750) Padhraic Smyth (15,800)

    JC Bezdek (2,620) Charu C. Aggarwal (8,060) Hans-peter Kriegel (9,760)

    UM Fayyad (9,230) Johannes Gehrke (10,900) Gregory Piatetsky-shapiro (12,000)

    R Agrawal (21,500) Wei Wang (57,900) Arun N. Swami (2,500 )

    JA Hartigan (1,280) Srinivasan Parthasarathy (5,410) Ming-syan Chen (8,870)

    J Shawe-Taylor (6,210) Haixun Wang (6,020) Mohammed Javeed Zaki (2,510)

    N Cristianini (4,980) Jiong Yang (7,570) Hannu Toivonen (7,070)

    J Pei (11,700) Salvatore J. Stolfo (5,660) Raymond T. Ng (7,120)

    PS Yu (19,800) Bing Liu (18,100) Usama M. Fayyad (9,230)

    U Fayyad (8,240) Gregory Piatetsky-Shapiro (12,000) Salvatore J. Stolfo (5,660)

    Y Yin (4,460) Chris Clifton (5,130) Jim Gray (15,300)

    Ming-Syan Chen (8,870) Ming-Syan Chen (8,870) Vipin Kumar (21,400)

    Table 4. The comparisons of scholar ranking system, a result of data mining research

    area (median = 9230)

    Figure 7. Citation distribution

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    13/15

    13

    Our Approach ( T 1) Arnetminer ( T 2) MS Libra ( T 3)R Agrawal (13,000) Jiawei Han (13,600) Rakesh Agrawal (13,000)

    R Srikant (9,360) Philip S. Yu (6,860) Tomasz Imielinski (5,320)

    A Swami (1,720) Rakesh Agrawal (13,000) Ramakrishnan Srikant (9,360)

    T Imielinski (5,320) Ramakrishnan Srikant (9,360) Arun N. Swami (1,720)

    J Han (13,600) David Wai-Lok Cheung (1,210) Heikki Mannila (6,700)

    G Piatetsky-Shapiro (5,470) Ke Wang (15,100) Hannu Toivonen (19,000)

    H Mannila (6,700) Bing Liu (4,150) Jiawei Han (13,600)

    P Smyth (3,200) Mohammed Javeed Zaki (1,300) A. Inkeri Verkamo (2,880)

    H Toivonen (19,000) Yasuhiko Morimoto (976) Philip S. Yu (6,860)

    UM Fayyad (4,150) Takeshi Tokuyama (1,070) Ming-syan Chen (4,300)

    PS Yu (6,860) Takeshi Fukuda (1,090) Yongjian Fu ( 2,770)

    Ming-Syan Chen (4,300) Shinichi Morishita (1,610) Shamkant B. Navathe (1,360)

    R Motwani (3,250) Charu C. Aggarwal (2,910) Jong Soo Park (1,920)

    S Brin (3,020) Frans Coenen (897) Mohammed Javeed Zaki (1,300)

    MJ Zaki (1,300) Paul H. Leng (27) Edward Omiecinski ( 1,770)

    A Savasere (75) Yiming Ma (1,810) Rajeev Motwani (3,250)

    DW Cheung (1,210) Ling Feng (1,660) Wei Li (13,500)

    B Liu (4,150) Ming-Syan Chen (4,300) Vipin Kumar (4,450)

    N Pasquier (3,290) Vassilios S. Verykios (67) Ashoka Savasere (75)

    R Taouil (2,980) Wynne Hsu (2,530) Srinivasan Parthasarathy (2,280)

    Table 5. The comparisons of scholar ranking system, a result of association rules

    research area (median = 2880)

    5 ConclusionIn this project, we present the design and implementation of a scholar searching

    system prototype based on a web mining approach. Our system computes the ranking

    of scholars that are relevant to a give research area, e.g. Data mining area, and shows

    out top- k scholars. We also designed %-index, a new ranking function for positioningscholar in a specific research field. Our contributions in this work include: 1) proposal

    of a web mining approach to famous/authoritative scholar searching, 2) we developed

    a flexible ranking function called %-index that facilitates scholar ranking in a smaller

    research field, and 3) we developed and demonstrated our scholar searching system in

    a realistic web service. A main advantage of our approach is that users can query any

    research topic and find a list of authoritative scholars without dedicated databases for

    the demand. Based on the experimental results, our approach outperforms Arnetminer

    and MS Libra in a specific research field. We wish our system could make juniorstudents more convenient in studying way.

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    14/15

    14

    References

    1. P. Jacso, Google Scholar: the Pros and the Cons, Online Information Review,pages 208-214, 2005.

    2. Henry Kautz, Bart Selman, and Mehul Shah. ReferralWeb: Combining SocialNetworks and Collaborative Filtering, the Communications of the ACM, vol. 30

    no. 3, March 1997.

    3. Krisztian Balog, Toine Bogers, Leif Azzopardi, Maarten de Rijke, Antal van denBosch: Broad expertise retrieval in sparse data environments. SIGIR 2007:

    551-558.

    4. M. Maybury. Expert Finding Systems. Technical Report MTR. 06B000040,MITRE Corporation, 2006.

    5. Nick Craswell and Arjen P. de Vries, Overview of the TREC-2005 Enterprise

    Track, In Proceedings of the 15th Text Retrieval Conference (TREC), 2006.6. Dumais, S. T. and Nielsen, J. , Automating the assignment of submitted

    manuscripts to reviewers." In N. Belkin, P. Ingwersen, and A. M. Pejtersen (Eds.),

    SIGIR'92: Proceedings of the 15th Annual International ACM SIGIR Conference

    on Research and Development in Information Retrieval. ACM Press,

    pp.233-244 ,1992.

    7. Stefano Ferilli, Nicola Di Mauro, Teresa Maria Altomare Basile, Floriana Esposito,Marenglen Biba: Automatic Topics Identification for Reviewer Assignment.

    IEA/AIE 2006: 721-7308. Toine Bogers, Klaas Kox, and Antal van den Bosch, Using citation analysis for

    expert retrieval in workgroups, Proceedings of the 8th Belgian-Dutch

    Information Retrieval Workshop (DIR 2008), pp 21-28. Maastricht, April 2008.

    9. Jing Zhang, Jie Tang, Liu Liu, and Juanzi Li. A Mixture Model for Expert Finding.In Proceedings of 2008 Pacific-Asia Conference on Knowledge Discovery and

    Data Mining (PAKDD2008).

    10. Jun Zhang , Mark S. Ackerman , Lada Adamic, Expertise networks in onlinecommunities: structure and algorithms, Proceedings of the 16th international

    conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada.

    11. Tim Reichling , Michael Veith , Volker Wulf, Expert Recommender: Designing fora Network Organization, Computer Supported Cooperative Work, v.16 n.4-5,

    p.431-465, October 2007.

    12. Jie Tang, Duo Zhang, and Limin Yao. Social Network Extraction of AcademicResearchers. In Proceedings of 2007 IEEE International Conference on Data

    Mining (ICDM2007).

    13. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Extractionand Mining of Academic Social Network. In Proceedings of the Fourteenth ACM

  • 8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

    15/15

    15

    SIGKDD International Conference on Knowledge Discovery and Data Mining

    (SIGKDD2008).

    14. Masanori Harada, Shin-ya Sato, Kazuhiro Kazama : Finding Authoritative Peoplefrom the Web, Joint Conference on Digital Libraries (JCDL 2004), June, 2004.

    15. Jie Ren, Richard Taylor, Automatic and Versatile Publications Ranking forResearch Institutions and Scholars, the Communications of the ACM, June 2007.

    16. Hirsch, J. E., An index to quantify an individual's scientific research output,Proceedings of the National Academy of Science, vol. 102, Issue 46,

    p.16569-16572, 2005.

    17. MS Libra Academic Search, http://libra.msra.cn/ 18. CiteSeer X, http://citeseerx.ist.psu.edu/ 19. Littlepage, G. E., & Mueller, A. L. Recognition and utilization of expertise in

    problem-solving groups: Expert characteristics and behavior. Group Dynamics:Theory, Research, and Practice, 1, 324-328 (1997).

    20. D. Yimam-Seid and A. Kobsa. Expert finding systems for organizations: Problemand domain analysis and the demoir approach. Journal of Organizational

    Computing and Electronic Commerce, 13(1): 1--24, 2003.

    21. Thomas Hofmann, Probabilistic Latent Semantic Analysis, Proceedings of theFifteenth Conference on Uncertainty in Artificial Intelligence (UAI'99), 1999.

    22. Zaiqing Nie, Ji-Rong Wen, Wei-Ying Ma, Object-Level Vertical Search, in

    Proceedings of the Third Biennial Conference on Innovative Data SystemsResearch (CIDR) 2007.

    23. K. Balog and M. de Rijke, Finding Experts and their Details in E-mail Corpora, in15th International World Wide Web Conference (WWW 2006), May 2006.

    24. R. Bekkerman and A. McCallum, Disambiguation Web Appearances of People ina Social Network, In Proc. of the 14th International World Wide Web Conference,

    pp. 463-470, 2005.

    25. Jie Tang, Mingcai Hong, Duo Zhang, Bangyong Liang, and Juanzi Li. InformationExtraction: Methodologies and Applications. In the book of Emerging

    Technologies of Text Mining: Techniques and Applications, Hercules A. Prado

    and Edilson Ferneda (Ed.), Idea Group Inc., Hershey, USA, 2007. pp. 1-33

    26. R. Rapp. Automatic Identification of Word Translations from Unrelated Englishand German Corpora, In Proceedings of 37th Annual Meeting of the Association

    for Computational Linguistic (ACL), pp. 519-526, 1999.

    27. Pu-Jen Cheng, Jei-Wen Teng, Ruei-Cheng Chen, Jenq-Haur Wang, Wen-HsiangLu, Lee-Feng Chien: Translating unknown queries with web corpora for

    cross-language information retrieval. SIGIR 2004: 146-153.