34
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009

A Classification-based Approach to Question Answering in Discussion Boards

  • Upload
    todd

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

A Classification-based Approach to Question Answering in Discussion Boards. Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009. Outline. Introduction Related Work Problem Definition Classification Methods Experiments - PowerPoint PPT Presentation

Citation preview

Wheres the Verb? Correcting Machine Translation During Question Answering

A Classification-based Approach to Question Answering in Discussion BoardsLiangjie Hong and Brian D. Davison

Department of Computer Science and EngineeringLehigh University

SIGIR 20091OutlineIntroductionRelated WorkProblem DefinitionClassification MethodsExperimentsConclusion2IntroductionOnline users share ideas, discuss issues and form communities within discussion boards(online forums)Knowledge discovery and information extractionSeveral potential applications about mining QA content:Search enginesOnline QA servicesExperts in social mediaKnowledge base of automatic chat-bots

3Related WorkCong et al., 2008They developed a classification-based method for question detection sequential pattern features extracted from both questions and non-questions in forumsPreprocess by applying a POS tagger while keeping 5W1H and modal wordsTime-consuming problemFocus on question sentences or question paragraphs

4Related Work(contd)Knowledge acquisition from discussion boardsZhou and Hovy, 2005Feng et al., 2006Using non-textual features like click count to predict the quality of answersJeon et al., 2006

In general all related work does not need to detect questions5TasksTasks:Identifying question-related first postsFining potential answers in subsequent responses within the corresponding threads

Some questions

6Tasks(contd)Some questions: Can we detect question-related threads in an efficient and effective manner?What other features can be used to improve the performance?How much can the combinations of some simple heuristics improve performance?Are traditional relevance-based approaches suitable to these QA content? 7Problem DefinitionQuestionsFocus on finding whether the first post is a question postTreat the whole post as a question post:

8Problem DefinitionQuestionsFocus on finding whether the first post is a question postTreat the whole post as a question post:

9Problem DefinitionQuestionsFocus on finding whether the first post is a question postTreat the whole post as a question post:

10Problem Definition(contd)AnswersIf one of the replied posts contains answers to the questions proposed in the first post, then regard that reply as an answer postAlso consider replied post not containing the actual content of answers but providing links to other potential answers an answer posts.

Result from the system: Question-answer post pairs11Classification Methods(1/3)NTU CSIE LIBSVM 2.88

Question detection: Question mark5W1H wordsTotal number of posts within one threadAuthorshipN-gram12Classification Methods(2/3)Answer detectionThe position of the answer postAuthorshipN-gramStop wordsQuery likelihood model score13Classification Methods(3/3)Cong et al., 2008Sequential pattern mining

Graph-based modelQuery likelihood language modelKL-divergence language model

14Experiments(1/9)Data crawled555,954 threads from Ubuntu dataset721,422 threads from Photography On The NetQuestion detection task: Randomly sampled 572 threads from Ubuntu dataset and 500 threads from the DC datasetAnswer detection task:Randomly sampled 500 question-related threads from both dataset

15Experiments(2/9)

Positive Negative() TP FN() FP TN

accuracy=(TP+FN)/(TP+TN+FP+FN)/precision=TP/(TP+FP)(TP+TN)(TP)recall=TP/(TP+FN)(TP+FP)(TP)

16Experiments(2/9)

17Experiments(2/9)

18Experiments(2/9)

19Experiments(3/9)

Users do use language patterns20Experiments(4/9)

21Experiments(4/9)

QM+5W+LEN -> localAUTH -> global22Experiments(5/9)

23Experiments(5/9)

24Experiments(5/9)

25Experiments(6/9)

26Experiments(7/9)

27Experiments(7/9)

Senior users usually answer questions (near to the top post)28Experiments(7/9)

Only need local information, and performs well29Experiments(8/9)Propose a ranking schemeRanking score:

V1: position + authorship, V2: position, V3: authorship30Experiments(9/9)

31ConclusionUse of N-grams and the combination of several non-content features can improve the performanceRelevance-based retrieval methods would not be effective in tackling the problem but the performance can be improved by combining with non-content featuresDesign a simple ranking scheme that outperforms previous approaches32Combine several potential answers together to make a better answer ?

A good understanding of the interaction of question answering in the discussion boards 33Thank You !34