34
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics

TopicTrend

  • Upload
    zazu

  • View
    32

  • Download
    3

Embed Size (px)

DESCRIPTION

Discover Emerging and Novel Research Topics. TopicTrend. By: Jovian Lin. Introduction. Formulating a research idea is the 1 st step for success in academia. A worthy research idea must be original and innovative . - PowerPoint PPT Presentation

Citation preview

Page 1: TopicTrend

TopicTrend

By: Jovian Lin

Discover Emerging and Novel Research Topics

Page 2: TopicTrend

Introduction

Formulating a research idea is the 1st step for success in academia.

A worthy research idea must be original and innovative.

In order to come up with innovative research ideas, researchers have to read a lot of published articles…

… which is time-consuming.

Page 3: TopicTrend

“Is there any shortcut to success?” “No.”

“There are efficient ways to achieve success”

Search Engines in Digital Libraries:

Page 4: TopicTrend

Search engines support information seeking and retrieval.

Introduction

SearchEngine

“Search Query”

List of titles (of articles)

Page 5: TopicTrend

Search

Results

Page 6: TopicTrend

Search engines support information seeking and retrieval.

However, is this enough for the junior researcher?

Introduction

FYP students 1st year PhD students

• Define a research topic (from zero knowledge)• Help in survey• Identify emerging/new research areas to explore• Determine related topics

How useful is this result to the junior researcher?

Page 7: TopicTrend

Junior researchers want:Understand research topics and trends.Recognize HOT topics.Understand how topics interact and influence research activity.

Problem Definition

Page 8: TopicTrend

Junior researchers want:Understand research topics and trends.Recognize HOT topics.Understand how topics interact and influence research activity.

Problem Definition

Enter a search query

View results

Select a few articles to read

Extract new terms fromselected article

CurrentInefficient

Method

Page 9: TopicTrend

Search

Results

Page 10: TopicTrend

Information overload !

Page 11: TopicTrend

Junior researchers want:Understand research topics and trends.Recognize HOT topics.Understand how topics interact and influence research activity.

Problem Definition

Enter a search query

View results

Select a few articles to read

Extract new terms fromselected article

CurrentInefficient

Method

Page 12: TopicTrend

Junior researchers want:Understand research topics and trends.Recognize HOT topics.Understand how topics interact and influence research activity.

Problem Definition

Enter a search query View results

DesiredEfficient Method

Visualization of the research topics

List of HOT research topics (related to the search query)

Do it quick!TopicTrend

Page 13: TopicTrend

Quick Demo

Page 14: TopicTrend
Page 15: TopicTrend

Recruited 4 participants.

Participants:Tested TopicTrend using queries from their respective domains.Rated TopicTrend’s output (w.r.t. their query). [Quantitative]Filled up a questionnaire. [Qualitative]

Evaluation

• Chemistry / PhD• Engineering (Transportation) / PhD• Comp Science (AI) / PhD• Engineering / FYP

Page 16: TopicTrend

Evaluation

Topic ATopic BTopic CTopic DTopic ETopic FTopic GTopic H

10111111

Topic I 1Topic J 1

Score

9/10

Topic A

Topic B

Topic CTopic D

Topic E

Topic F

Topic G

Topic HTopic I

Topic J

“machine learning”

Page 17: TopicTrend

Evaluation

Average score = 68.125%

Quantitative

Page 18: TopicTrend

Evaluation

Questionaire using Five-Point Likert Scale.

1=Disagree, 5 =Agree.

Some examples:“The system was easy to use.”“The system gave interesting results.”“I was able to get a better understanding of the topics.”“I was able to discover trends.”“I was able to discover relationships between topics.”“I was able to discover potential, novel topics.”

Details in Project Report.

Qualitative

4.75 / 5 4 / 5

4 / 54 / 5

4 / 54 / 5

Page 19: TopicTrend

ConclusionTopicTrend is a visualization tool that helps junior researchers:

Understand research topics and trends.Recognize HOT topics.Understand how topics interact and influence research activity.

However, results were mediocre Due to presence of stop phrases (e.g., “problem set”, “proposed model”, etc)

Solutions and Future Work:TF-IDF weight — don’t have to manually enter stop words.

Statistical measure to evaluate how important a word is.The importance increases to the number of times a word appears in the document...But is offset by the frequency of the word in the corpus.

Latent Dirichlet Allocation (LDA) – view each abstract as a mixture of topics. (David Blei)Online LDA – find topics faster than normal LDA; analyze in a stream.Dynamic Topic Models (DTM) – captures the word evolution of each topic over time.

Search by exemplar (instead of search by keyword)Benefits users who have difficulty expressing their query.

Page 20: TopicTrend

ConclusionTopicTrend is a visualization tool that helps junior researchers:

Understand research topics and trends.Recognize HOT topics.Understand how topics interact and influence research activity.

However, results were mediocre Due to presence of stop phrases (e.g., “problem set”, “proposed model”, etc)

Solutions and Future Work:TF-IDF weight — don’t have to manually enter stop words.

Statistical measure to evaluate how important a word is.The importance increases to the number of times a word appears in the document...But is offset by the frequency of the word in the corpus.

Latent Dirichlet Allocation (LDA) – view each abstract as a mixture of topics. (David Blei)Online LDA – find topics faster than normal LDA; analyze in a stream.Dynamic Topic Models (DTM) – captures the word evolution of each topic over time.

Search by exemplar (instead of search by keyword)Benefits users who have difficulty expressing their query.

Page 21: TopicTrend

Thank You

Page 22: TopicTrend

Backup Slides

Page 23: TopicTrend

OpenNLP — a machine learning based toolkit for theprocessing of natural language text.

Used OpenNLP to retrieve a list of NPs.

Implementation

OpenNLPToolsAn article

1. Sentence Detection2. Tokenization3. Part-of-Speech (POS) Tagging4. Chunking and Retrieving NPs

NP A

NP B

NP C

NP D

NP E

NP F

Page 24: TopicTrend

Sentence Detection

Implementation

Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate. Those contraction-less sentences don't have boundary/odd cases...this one does.

• Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

• Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.

• Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate.

• Those contraction-less sentences don't have boundary/odd cases...this one does.

Page 25: TopicTrend

Tokenization

Implementation

• Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

• Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.

• [Pierre] [Vinken] [,] [61] [years] [old] [,] [will] [join] [the] [board] [as] [a] [nonexecutive] [director] [Nov.] [29] [.]

• [Mr.] [Vinken] [is] [chairman] [of] [Elsevier] [N.V.] [,] [the] [Dutch] [publishing] [group] [.]

Page 26: TopicTrend

Part-of-Speech Tagging

Implementation

• Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

• Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.

• [NNP] [NNP] [,] [CD] [NNS] [JJ] [,] [MD] [VB] [DT] [NN] [IN] [DT] [JJ] [NN] [NNP] [CD] [.]

• [NNP] [NNP] [VBZ] [NN] [IN] [NNP] [NNP] [,] [DT] [JJ] [NN] [NN] [.]

Page 27: TopicTrend

Text Chunking and Extracting NPsText chunking consists of dividing a text in syntactically correlated parts of words.Uses the Tokenization and POS Tagging data.For example:

He reckons the current account deficit will narrow to only # 1.8 billion in September.

Becomes:

[NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ] .

Implementation

Page 28: TopicTrend

Text Chunking and Extracting NPsText chunking consists of dividing a text in syntactically correlated parts of words.Uses the Tokenization and POS Tagging data.

Implementation

Note the:• B-Chunk• I-Chunk

Page 29: TopicTrend

OpenNLP — a machine learning based toolkit for theprocessing of natural language text.

Used OpenNLP to retrieve a list of NPs.

Implementation

OpenNLPToolsAn article

1. Sentence Detection2. Tokenization3. Part-of-Speech (POS) Tagging4. Chunking and Retrieving NPs

NP A

NP B

NP C

NP D

NP E

NP F

Page 30: TopicTrend

An algorithm to calculate the score of a NP.

Implementation

NP A

NP B

NP C

NP D

NP E

NP F

# (0 ~ 2 years)

# (2 ~ 4 years)

# (4 yrs & beyond)

10

2

1

Score = 10 + 1

10 + 2 + 1 + 20

= 11

33 = 0.333

# (0 ~ 2 years)

# (2 ~ 4 years)

# (4 yrs & beyond)

1

2

10

Score = 1 + 1

1 + 2 + 10 + 20

= 3

33 = 0.090

Page 31: TopicTrend

An algorithm to calculate the score of a NP.

Implementation

NP A

NP B

NP C

NP D

NP E

NP F

Page 32: TopicTrend

Re-rank the list of NPs base on the score.

Implementation

Re-rankNP B

NP D

NP E

NP C

NP A

NP F

New!NP A

NP B

NP C

NP D

NP E

NP F

Page 33: TopicTrend

Implementation

Calculate the relationship strength between NPs byconsidering the common articles (PIIs) that they have.

The more articles they have in common, the thicker the edge.

Page 34: TopicTrend

The End