Hunter X Scholar – Finger out Famous Men in Your Research Area

8/14/2019 Hunter X Scholar Finger out Famous Men in Your Research Area

1/15

1

Chi-Jen Wu

Electrical Engineering

National Taiwan University

[email protected]

Abstract

As the growth of the WWW, scientists and researchers publishing their research

information on the web may become an essential comportment in academia, an

enormous number of web pages provide information on scientists, research papers,

and technical documents in the Internet and indexed by search engines. For a junior

student or junior researcher, it is a nontrivial task to know/search authoritative

scholars (experts) in his research area. Since an interesting challenge arises, how to

finger out the important scholars on a research topic ? However, an excellent scholar

searching system involves analysis of reputation, publication, citation, and activities

among a large number of scholars.

In this project, we present the design and implementation of a scholars searching

system prototype based on a web mining approach. Our system computes the ranking

of scholars that are relevant to a give research area, e.g. Data mining area, and shows

out top- k scholars. We also designed %-index, a new ranking function for positioning

scholar in a specific research field ranking. The ranking criteria of scholars are based

on publications, citations by computing the query results from Google, Google

Scholar. Based on the experimental results, our approach outperforms other existing

approaches in a specific research field.

1 IntroductionAs the growth of the WWW, academicians publishing their research information

on the web may become an essential comportment in academia, an enormous number

of web pages provide information on scientists, research papers, and technical

documents in the Internet and indexed by search engines. We can expect that the

contents of the Web will become vaster and vaster as time goes by. However, the

variety of research area also becomes further diversity during the past decade, moreand more new research areas had been motivated, such as Data Mining, P2P systems,

Hunter X Scholar Finger out Famous Men in Your Research Area


2/15

2

Network Coding, and so on. For a junior student or junior researcher starting on

his/her research work, it is a nontrivial task to know/search authoritative scholars in

his research area by a general purpose search engine. For example, we use the popular

web search engine, Google, to search the research field Data Mining, Google

retrieves and lists more than 20,800,000 web pages that are relevant to data mining,

and even we use the Google Scholar [1] that is a special purpose search engine for

scholar literature has been developed by Google, it also returns more than 1,400,000

papers in data mining research area. For most of people, these data is too huge to find

out the important scholar and his significant papers among the search results.

Since an interesting challenge arises, how to finger out the important/famous

scholars on a research topic ? In fact, constructing rankings of scholar authorities is a

relatively new subfield of information retrieval research. This problem is different to

the traditional expert finding problem [2-5], in essence, the goal of expert findingsystem is to identify a list of people who are with appropriate skills and knowledge

about a given topic. However, the scholar searching problem is a deeper expert

finding problem, it is not only identifying right scholars who possess a required

knowledge in a research community, but also ranking their level of authority in the

research field. Let us consider a simple scenario, a junior researcher practically wishes

to find a list of scholars that made significant contributions and/or published a seminal

paper in his/her research field. Unfortunately, the traditional expert finding system

using standard IR techniques may return a Ph.D. student because he/she may haveparticular levels of expertise in the research area, but this Ph.D. student may be not a

famous scholar in the research field [10,19]. In general, scholar searching is a more

complex and difficult task than expert finding, especially, there are no standards

specifying the criteria or popular qualifications necessary for particular levels of

authority of scholar.

In this project, we present the design and implementation of a scholar searching

system prototype based on a web mining approach, however, we first focus on the

problems of scholar finding and scholar ranking in this work. Our system assorts the

ranking of scholars that are relevant to a give research area, e.g. Data mining area, and

shows out top- k scholars. For scholar finding, we utilize the search engine and digital

libraries to find a lot of documents about a certain topic, and extract the authors from

the collected documents. Then we estimate the extracted authors relevance to the

given topic on web pages through statistical analysis. We assume that authors with a

plenty of articles about a certain topic are more likely to be an expert on that topic and

authors with highly cited papers are indicative of the authorities. For scholar ranking,

we design %-index, a ranking function for positioning scholars. The ranking criteria

of scholars are based on publications, citations by computing the query results from


3/15

3

the scientific literature digital archive, such as Google Scholar, MS Libra Academic

Search [17], or CiteSeer X [18], and the ranking function, called %-index, is a novel

way of estimating an individual scholars impact in a single research field. The

%-index is indicated that the total citation of a scholars papers is m% percentage of

total citation of whole papers in this research field. In summary, we wish our system

could make junior students more convenient in studying way. Our system URL:

http://140.109.22.36/cjwu/dm_scholar/mycgi.htm

Our contributions in this work include: 1) proposal of a web mining approach to

famous/authoritative scholar searching, 2) we developed a flexible ranking function

called %-index that facilitates scholar ranking in a smaller research field, and 3) we

developed and demonstrated our scholar searching system in a realistic web service. A

main advantage of our approach is that users can query any research topic and find a

list of authoritative scholars without dedicated databases for the demand. In addition,another interesting application of our scholar searching system is automatically

routing submitted papers to reviewers in conferences [6,7]. The assignment of

submitted manuscripts to reviewers is a common task of journals editors or

conference chairs, our system can help committees to find the right person for paper

review under severe time pressure.

In the following section we describe the relevant related work in expert finding

system and previous efforts on ranking of scholars and research institutions. Section 3

describes our system design and methodology. Then, we show the preliminaryexperimental results of the system and demonstrate several examples in Section 4.

Finally, we conclude our paper in Section 5.

2 Related WorkEspecially, we defined an excellent scholar searching system is composed of

expert finding, expert ranking and expert profiling. First, expert finding [2,8-11] is the

task of finding right scholars about a specific topic with high probability. Within a

research community, such as computer science, there should be many possible

candidates who could be relevant to a given topic, the expert finding operation could

retrieve a list of candidates that are deemed the most likely scholars for this topic.

Second, expert ranking [15,16] assorts the levels of authority among the candidates,

and it involves analysis of reputation, publication, citation, and activities among a list

of candidate scholars. Finally, expert profiling [12,13] to dig and extract the profile

information of a individual scholar from the Web, it includes basic information,

contact information, and the educational history. In this section, we descript the

related work includes the above three components.


4/15

4

2.1 Expert FindingTraditional expert finding is identifying a list of people who are with appropriate

skills and knowledge about a given topic [5]. Most of previous efforts rely on the

development of an expert database through manual [21], or base on the text, citation

or document analysis in matching users research topic [3,8,11,14].

Since our research is based on Web Mining and search engines, we discuss that

the relevant work in here. In 1997, Kautz et al . [2] developed a first expert extraction

system called Referred Web based on Web mining and search engine techniques.

Referred Web automatically generates a representations of social network based on

evidence gathered from the Web pages. Based on the links of social network, Referred

Web allows users to search people who are likely to be experts in a given topic in a

workgroup. Recently, Harada et al . [14] proposed NEXAS system, an extension to

web search engines that attempts to find real world entities reflected on the Web, anduse it to search people relevant to a topic. And Zhang et al . [9] proposed a mixture

model for expert finding, main idea is to utilize the Probabilistic Latent Semantic

Analysis (PLSA) [21] to capture the semantic relevance between the query and the

experts. The authors also developed Arnetminer system [13] that addresses several

key issues in scholar searching, such as scholar finding, scholar profiling. In 2007,

Microsoft Asia Research Group also developed a similar system, called MS Libra

Academic Search [18], which is a free computer science bibliography search engine,

and its principal idea is based on the object-level vertical search technique proposedby Nie et al . [22].

2.2 Expert RankingWe are aware that a few systems employed expert ranking techniques, such as

Arnetminer, Libra, and CiteSeer X. The main idea of Arnetminer is similar to Referred

Web, and other systems are based on the Information Retrieval schemes. Because no

standards specifying the criteria for particular levels of authority of scholar, how to

rank scholar is a difficult task and hard to result in a unanimous solution. The

well-known ranking index is impact factor that is defined as the average number of

citations per journal over a two years period. In 2005, the h-index [16] has been

proposed to measure an individual scholars impact. The h-index is indicated that a

scholar has published h papers and these papers received more than h citations. In the

state of the art, Ren and Tayloar [15] provided an automatic publication-ranking based

framework to support such ranking for scholars and research institutions. They

discussed the most important ranking policy and pointed out some problems for

publication-ranking. In our scholar searching system, we focus on the user-define

research area. However, the above criteria are better at discriminating between


5/15

5

scholars within a whole research field than within a single research area. And it is also

hard to gather the impact factors for every papers and authors. Lacking these materials,

we wish to use %-index to work better in smaller research areas and yield more valid

evaluation results for more prominent scholars.

2.3 Expert ProfilingAnother important challenge of scholar searching system is expert profiling task.

Specifically, it focuses on studying how to extract the profile of an individual

researcher from the Web automatically. Several research efforts have been made for

expert profiling task [12,13,23]. Recently, Tang and coworkers [12,13] present a

unified approach to extract the scholar profile on an academic social network. This

system also addresses the name disambiguation problem [24] in integration. Actually,

many profile extraction methods have been proposed, an overview can be found inhere [25].

3 Our ApproachIn this section, we describe our scholar searching system and its components, and

demonstrate the system by several simple experiments for scholar search. First, we

give an overview of the systems main concepts, the corresponding task components,

and their interplay. Then we had built the system prototype based on these concepts

and started experimenting, we demonstrated a number of search experiments, andcompared our approach with other systems, such as Arnetminer and Libra.

3.1 System OverviewOur scholar searching system consists of three main components that are

depicted in Figure 1. In the first step of our approach, we design a specific crawler to

crawl the scientific literature digital archive to gather the candidates set, called c set.

This crawler collects a lot of scientific literatures related to a given the query topic q

and extracts the name of authors in the crawled articles. In addition, the crawler also

analyses the citations of each literature and candidate, respectively. After obtaining

the c set, the second step of our system is to estimate the associations between topic

and candidates. For estimating the relevance to the given topic, we have the following

claim.

Claim 1 : Authors with a plenty of articles about a certain topic are more likely to

be an expert on the topic q.

This claim should be reasonable because a more important scholar, his/her name


6/15

6

should be more popular on the Web pages [14], e.g. a more important scholars name

should be enthusiastically recorded on a amount of web pages, such as conference

program, seminar web pages, journal papers. Since our idea is based in this claim, we

estimate the extracted authors relevance to the given topic on web pages through

statistical analysis. A number of statistical analysis methods have been proposed for

estimating term association based on co-occurrence measures [26]. In our study,

Chi-square test is adopted because the required parameters for it could be easily

gathered by a search engine. After this step, we can rank candidates according to the

results of Chi-square test, and determine top- k candidates in the c set. Finally, we use

the ranking function, called %-index, which is a novel way of estimating an individualscholars impact in a single research field. We will define the %-index in next section.

In the following section, we describe the details of our two components includes

scholar extraction and scholar ranking, respectively.

3.2 Relevance EstimatingIn our system, we use the Chi-square test to estimate the strength of relation

between extracted scholar and given research topic by co-occurrence of their on web

pages. The Chi-square test is easy to implement in our system. We list the requiredparameters for Chi-square test in the Table 1. Given a query topic q and a candidates

Figure 1. A system overview of our approach and its main components


7/15

7

name c, and we assume q and c are independent. Same with [27], we can get the

following equations,

E(q,c) = (( a+c)(a+b))/ n,

E(q, c) = (( b+d )(a+b))/ n,

E( q,c) = (( a+c)(c+d ))/ n, and

E( q, c) = (( b+d )(c+d ))/ n,

Then, we can get a conventional chi-square test as follow:

)()()()()(

),()],(),([

),(

2

},{},,{

22

d cd bcabacbd an

Y X E Y X E Y X n

cq

ccY qq X

++++

=

=

This Chi-square test plays a crucial role in our system, and it could be a

co-occurrence index approximately. In the implementation, we use Google as a search

engine, but other major search engines are also applicable platforms. The Chi-square

test method provides a simple way to estimate the relevance between candidate and

research topic, and it is easy to implement, but its performance strongly is dominated

by the size and amounts of retrieved web pages.

The required parameter Notation

The total number of Web pages n

The number of Web pages containing both candidates

name and topic

a

The number of Web pages containing topic but without

candidates name

b

The number of Web pages containing candidates

name but without topic

c

The number of Web pages without both candidates

name and topic

d (d=n-a-b-c)

Table 1. The required parameters for Chi-square test (we set n=8 billions in our

experiments)

3.3 Scholar RankingExisting ranking indexes have several limitations. One major limitation is that

they are used to rank the whole research field, such as all of computer science. It ishard to infer the contributions of a scholar in a sub research field. For example, the


8/15

8

CiteSeer x provides a ranking of whole computer science scholar by counting citations,

but it is not easy to find a significant researcher in the Network Coding research

area. Hence we design a novel ranking function, called %-index, to estimate an

individual scholars impact in a single research field, such as network coding. We

define the %-index as follow.

The %-index is indicated that the total citation of a scholars papers is m%

percentage of total citation of whole papers in this research field. We define %-index

as follow,

100%

=

citations

C taionsciC iindexi ,

where C i is dedicated a scholar in the c set, and the is indicated a set of collected

scientific literatures.

However, assessing scholars is a complex social and scientific process. Our%-index could be used alone, but it should probably serve as one quantitative

indicator in a more comprehensive methodology. In addition to publications, many

important factors, such as research impact, funding, students, can reflect the

importance of a scholar and we could take into consideration in advance works.

3.4 System Implementation and DemoOur scholar searching system is implemented in CGI dynamic web page using

the Perl language. Figure 2 is the portal page of our system. Users enter a researchtopic to retrieve top- k scholars in the given topic (here we set k =20 generally). Figure

3 shows the output results of our system. However, our system has a drawback, it

needs more than five minutes to process a request, due to the Google search engine

does not accept a lot of flash requests. In additional, we also show the results of

querying data mining and association rules in Table 2 and Table 3, individually. Noted

that association rules is a subset of data mining field. In following section, we will

analyze these results.

Figure 2. Portal of our system Figure 3. The ranking results


9/15

9

Ranking Candidate %-index 2 test

1 J Han 9.647531009 2652383

2 M Kamber 4.334521751 3927792

3 E Frank 3.833536147 2044977

4 IH Witten 3.833536147 1977850

5 G Piatetsky-Shapiro 3.728926937 28694484

6 P Smyth 3.595141009 4588678

7 T Hastie 3.498359699 2951988

8 R Tibshirani 3.498359699 2219211

9 J Friedman 3.498359699 186502

10 JC Bezdek 3.280601752 424565

11 UM Fayyad 3.218690179 25534724

12 R Agrawal 2.558300065 4679901

13 JA Hartigan 2.019598215 180509

14 J Shawe-Taylor 1.906449478 1827019

15 N Cristianini 1.906449478 1503047

16 J Pei 1.84311465 1646410

17 PS Yu 1.813226305 2855124

18 U Fayyad 1.641012503 6263246

19 Y Yin 1.609700903 248230

20 Ming-Syan Chen 1.60258463 3107667

Table 2. The ranking result of data mining research area

Ranking Candidate %-index 2 test

1 R Agrawal 31.18867812 63317224.61

2 R Srikant 20.87173693 255685481.9

3 A Swami 9.587025034 234630270.1

4 T Imielinski 8.729262056 350995142.7

5 J Han 8.076648629 33356763.11

6 G Piatetsky-Shapiro 6.964678599 107474023.8

7 H Mannila 6.30909199 67368432.23

8 P Smyth 5.750133793 13005709.05

9 H Toivonen 5.503359696 41276775.5

10 UM Fayyad 5.249152643 26075405.81

11 PS Yu 3.891895106 18344775.38

12 Ming-Syan Chen 3.432538503 16059136.61

13 R Motwani 3.041565083 8234266.614


10/15

10

14 S Brin 2.84979485 7073997.568

15 MJ Zaki 2.347327109 53639929.75

16 A Savasere 2.229886424 35516464.2

17 DW Cheung 1.948920735 5068106.193

18 B Liu 1.934054825 3484209.628

19 N Pasquier 1.87905096 17276391.84

20 R Taouil 1.855265505 18117526.04

Table 3. The ranking result of association rules research area, a subset

of data mining

4 Experimental ResultsTo validate our system, we use it to perform two rankings. The first ranking

assessed the scholars in data mining area, and second ranking evaluated association

rules field. We compared both two ranking results with Arnetminer and MS Libra

system by analyzing the co-occurrence information of a scholars name and the query

topic on the Web. First we give the perspective statistics of these two fields. Noted

due to the limitations of Google Scholar, our crawler only retrieved first 1,000 papers

in Google Scholar search results.

Figure 4 depicts the citation distribution of collected papers. In this figure, we can

know that the citation impact is on the decline after the 100 th paper. And there is a

very high citation paper in Data mining field. In fact, it is a book, called Neuralnetworks: a comprehensive foundation by Simon Haykin, received more than 11,407

citations. Although this author has a very high citation paper, his importance may not

be more than the top- k scholars in data mining field, his chi-square test is 107836


11/15

11

Next, we compared our approach with Arnetminer and MS Libra system by

analyzing the co-occurrence information of a scholars name and the query topic on

the Web. We show the rankings of our approach, Arnetminer and MS Libra in

following tables. The Table 4 shows the product by querying Data Mining area, and

the Table 5 shows the output of querying association rules. And the number in right

side of a name is the co-occurrence of a name and given research topic on the Web

(by Google, again). Following, we discussed the performance metric for evaluating

the three systems. Let Ri be the number of co-occurrence of a name and given

research topic in the tables. And T i is indicated the set of scholars produced by each

mechanism individually.

=

=

T Ri median R

T T T medianmedian

0:1?)(

)( 321

U U

is our performance metric for evaluating the systems, the median () is used to

get the median number in the union set. Figure 6 and 7 show the comparison results of

two experiments. In first experiment, query data mining, the performances of three

systems are similar. Then in the second experiment, our approach is more

distinguishable than the two approaches. Noted mining association rules is the sub

field of whole data mining area. The reason is the impact of our %-index metric,

which is designed to ranking a specific research field.

Figure 6. Citation distribution


12/15

12

Our Approach ( T 1) Arnetminer ( T 2) MS Libra ( T 3)J Han (46,600) Jiawei Han (46,600) Rakesh Agrawal (21,500)

M Kamber (13,900) Christos Faloutsos (17,300) Tomasz Imielinski (5,610 )

E Frank (20,600) Philip S. Yu (19,800) Jiawei Han (46,600)

IH Witten ( 23,900) Mohammed Javeed Zaki (2,510) Philip S. Yu (19,800)

G Piatetsky-Shapiro (12,000) Heikki Mannila (19,100) Christos Faloutsos (17,200)

P Smyth (15,800) Rakesh Agrawal (21,500) Ramakrishnan Srikant (14,300)

T Hastie (10,600) Jian Pei (11,700) Heikki Mannila (19,100)

R Tibshirani (8,470) Usama M. Fayyad (9,230) Ian H. Witten ( 23,900)

J Friedman (7,420) Eamonn J. Keogh (1,750) Padhraic Smyth (15,800)

JC Bezdek (2,620) Charu C. Aggarwal (8,060) Hans-peter Kriegel (9,760)

UM Fayyad (9,230) Johannes Gehrke (10,900) Gregory Piatetsky-shapiro (12,000)

R Agrawal (21,500) Wei Wang (57,900) Arun N. Swami (2,500 )

JA Hartigan (1,280) Srinivasan Parthasarathy (5,410) Ming-syan Chen (8,870)

J Shawe-Taylor (6,210) Haixun Wang (6,020) Mohammed Javeed Zaki (2,510)

N Cristianini (4,980) Jiong Yang (7,570) Hannu Toivonen (7,070)

J Pei (11,700) Salvatore J. Stolfo (5,660) Raymond T. Ng (7,120)

PS Yu (19,800) Bing Liu (18,100) Usama M. Fayyad (9,230)

U Fayyad (8,240) Gregory Piatetsky-Shapiro (12,000) Salvatore J. Stolfo (5,660)

Y Yin (4,460) Chris Clifton (5,130) Jim Gray (15,300)

Ming-Syan Chen (8,870) Ming-Syan Chen (8,870) Vipin Kumar (21,400)

Table 4. The comparisons of scholar ranking system, a result of data mining research

area (median = 9230)

Figure 7. Citation distribution


13/15

13

Our Approach ( T 1) Arnetminer ( T 2) MS Libra ( T 3)R Agrawal (13,000) Jiawei Han (13,600) Rakesh Agrawal (13,000)

R Srikant (9,360) Philip S. Yu (6,860) Tomasz Imielinski (5,320)

A Swami (1,720) Rakesh Agrawal (13,000) Ramakrishnan Srikant (9,360)

T Imielinski (5,320) Ramakrishnan Srikant (9,360) Arun N. Swami (1,720)

J Han (13,600) David Wai-Lok Cheung (1,210) Heikki Mannila (6,700)

G Piatetsky-Shapiro (5,470) Ke Wang (15,100) Hannu Toivonen (19,000)

H Mannila (6,700) Bing Liu (4,150) Jiawei Han (13,600)

P Smyth (3,200) Mohammed Javeed Zaki (1,300) A. Inkeri Verkamo (2,880)

H Toivonen (19,000) Yasuhiko Morimoto (976) Philip S. Yu (6,860)

UM Fayyad (4,150) Takeshi Tokuyama (1,070) Ming-syan Chen (4,300)

PS Yu (6,860) Takeshi Fukuda (1,090) Yongjian Fu ( 2,770)

Ming-Syan Chen (4,300) Shinichi Morishita (1,610) Shamkant B. Navathe (1,360)

R Motwani (3,250) Charu C. Aggarwal (2,910) Jong Soo Park (1,920)

S Brin (3,020) Frans Coenen (897) Mohammed Javeed Zaki (1,300)

MJ Zaki (1,300) Paul H. Leng (27) Edward Omiecinski ( 1,770)

A Savasere (75) Yiming Ma (1,810) Rajeev Motwani (3,250)

DW Cheung (1,210) Ling Feng (1,660) Wei Li (13,500)

B Liu (4,150) Ming-Syan Chen (4,300) Vipin Kumar (4,450)

N Pasquier (3,290) Vassilios S. Verykios (67) Ashoka Savasere (75)

R Taouil (2,980) Wynne Hsu (2,530) Srinivasan Parthasarathy (2,280)

Table 5. The comparisons of scholar ranking system, a result of association rules

research area (median = 2880)

5 ConclusionIn this project, we present the design and implementation of a scholar searching

system prototype based on a web mining approach. Our system computes the ranking

of scholars that are relevant to a give research area, e.g. Data mining area, and shows

out top- k scholars. We also designed %-index, a new ranking function for positioningscholar in a specific research field. Our contributions in this work include: 1) proposal

of a web mining approach to famous/authoritative scholar searching, 2) we developed

a flexible ranking function called %-index that facilitates scholar ranking in a smaller

research field, and 3) we developed and demonstrated our scholar searching system in

a realistic web service. A main advantage of our approach is that users can query any

research topic and find a list of authoritative scholars without dedicated databases for

the demand. Based on the experimental results, our approach outperforms Arnetminer

and MS Libra in a specific research field. We wish our system could make juniorstudents more convenient in studying way.


14/15

14

References

1. P. Jacso, Google Scholar: the Pros and the Cons, Online Information Review,pages 208-214, 2005.

2. Henry Kautz, Bart Selman, and Mehul Shah. ReferralWeb: Combining SocialNetworks and Collaborative Filtering, the Communications of the ACM, vol. 30

no. 3, March 1997.

3. Krisztian Balog, Toine Bogers, Leif Azzopardi, Maarten de Rijke, Antal van denBosch: Broad expertise retrieval in sparse data environments. SIGIR 2007:

551-558.

4. M. Maybury. Expert Finding Systems. Technical Report MTR. 06B000040,MITRE Corporation, 2006.

5. Nick Craswell and Arjen P. de Vries, Overview of the TREC-2005 Enterprise

Track, In Proceedings of the 15th Text Retrieval Conference (TREC), 2006.6. Dumais, S. T. and Nielsen, J. , Automating the assignment of submitted

manuscripts to reviewers." In N. Belkin, P. Ingwersen, and A. M. Pejtersen (Eds.),

SIGIR'92: Proceedings of the 15th Annual International ACM SIGIR Conference

on Research and Development in Information Retrieval. ACM Press,

pp.233-244 ,1992.

7. Stefano Ferilli, Nicola Di Mauro, Teresa Maria Altomare Basile, Floriana Esposito,Marenglen Biba: Automatic Topics Identification for Reviewer Assignment.

IEA/AIE 2006: 721-7308. Toine Bogers, Klaas Kox, and Antal van den Bosch, Using citation analysis for

expert retrieval in workgroups, Proceedings of the 8th Belgian-Dutch

Information Retrieval Workshop (DIR 2008), pp 21-28. Maastricht, April 2008.

9. Jing Zhang, Jie Tang, Liu Liu, and Juanzi Li. A Mixture Model for Expert Finding.In Proceedings of 2008 Pacific-Asia Conference on Knowledge Discovery and

Data Mining (PAKDD2008).

10. Jun Zhang , Mark S. Ackerman , Lada Adamic, Expertise networks in onlinecommunities: structure and algorithms, Proceedings of the 16th international

conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada.

11. Tim Reichling , Michael Veith , Volker Wulf, Expert Recommender: Designing fora Network Organization, Computer Supported Cooperative Work, v.16 n.4-5,

p.431-465, October 2007.

12. Jie Tang, Duo Zhang, and Limin Yao. Social Network Extraction of AcademicResearchers. In Proceedings of 2007 IEEE International Conference on Data

Mining (ICDM2007).

13. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Extractionand Mining of Academic Social Network. In Proceedings of the Fourteenth ACM


15/15

15

SIGKDD International Conference on Knowledge Discovery and Data Mining

(SIGKDD2008).

14. Masanori Harada, Shin-ya Sato, Kazuhiro Kazama : Finding Authoritative Peoplefrom the Web, Joint Conference on Digital Libraries (JCDL 2004), June, 2004.

15. Jie Ren, Richard Taylor, Automatic and Versatile Publications Ranking forResearch Institutions and Scholars, the Communications of the ACM, June 2007.

16. Hirsch, J. E., An index to quantify an individual's scientific research output,Proceedings of the National Academy of Science, vol. 102, Issue 46,

p.16569-16572, 2005.

17. MS Libra Academic Search, http://libra.msra.cn/ 18. CiteSeer X, http://citeseerx.ist.psu.edu/ 19. Littlepage, G. E., & Mueller, A. L. Recognition and utilization of expertise in

problem-solving groups: Expert characteristics and behavior. Group Dynamics:Theory, Research, and Practice, 1, 324-328 (1997).

20. D. Yimam-Seid and A. Kobsa. Expert finding systems for organizations: Problemand domain analysis and the demoir approach. Journal of Organizational

Computing and Electronic Commerce, 13(1): 1--24, 2003.

21. Thomas Hofmann, Probabilistic Latent Semantic Analysis, Proceedings of theFifteenth Conference on Uncertainty in Artificial Intelligence (UAI'99), 1999.

22. Zaiqing Nie, Ji-Rong Wen, Wei-Ying Ma, Object-Level Vertical Search, in

Proceedings of the Third Biennial Conference on Innovative Data SystemsResearch (CIDR) 2007.

23. K. Balog and M. de Rijke, Finding Experts and their Details in E-mail Corpora, in15th International World Wide Web Conference (WWW 2006), May 2006.

24. R. Bekkerman and A. McCallum, Disambiguation Web Appearances of People ina Social Network, In Proc. of the 14th International World Wide Web Conference,

pp. 463-470, 2005.

25. Jie Tang, Mingcai Hong, Duo Zhang, Bangyong Liang, and Juanzi Li. InformationExtraction: Methodologies and Applications. In the book of Emerging

Technologies of Text Mining: Techniques and Applications, Hercules A. Prado

and Edilson Ferneda (Ed.), Idea Group Inc., Hershey, USA, 2007. pp. 1-33

26. R. Rapp. Automatic Identification of Word Translations from Unrelated Englishand German Corpora, In Proceedings of 37th Annual Meeting of the Association

for Computational Linguistic (ACL), pp. 519-526, 1999.

27. Pu-Jen Cheng, Jei-Wen Teng, Ruei-Cheng Chen, Jenq-Haur Wang, Wen-HsiangLu, Lee-Feng Chien: Translating unknown queries with web corpora for

cross-language information retrieval. SIGIR 2004: 146-153.

Documents

Hunter X Scholar – Finger out Famous Men in Your Research Area