33
Open Source Question Answer System Advisor Dr. Chris Pollett Committee Dr. Mark Stamp Dr. Robert Chun By Salil Shenoy

Open Source Question Answer System - SJSU

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

OpenSourceQuestionAnswerSystem

AdvisorDr.ChrisPollettCommittee

Dr.MarkStampDr.RobertChun

BySalilShenoy

Outline

• ProblemStatement• BackgroundonQuestionAnsweringSystems• Implementation• Experiments• Conclusion

ProblemStatement

•ImproveanexistingEnglishQuestionAnsweringSysteminYioop•ImplementaHindiQuestionAnsweringSystemforYioop

EnglishQuestionAnswerSystem

HindiSearchinYioop…

Whattypeofquestionsareasked…

ExistingQuestionAnswerSystems…

QuestionAnswerSystems

• TypesofQuestionAnsweringSystems• ClosedDomaineg:BASEBALL,LUNAR,IRSLetc.•OpenDomaineg:START,Google,etc.

QuestionAnswerSystemsParadigms

• TextBased/IRBased•KnowledgeBased•Hybrid

PartofSpeechTagging

•ApproachestoPoS tagging,• RuleBased(BrillTagger)•MachineLearning

• Foroursystem,implementedvariantofBrilltaggerforHindiPoS tagging

PartofSpeechTagging

• Inputissentenceprocessed,stemmed,n-gram,stopwordandpunctuationremoved• Termspresentinthelexiconindatabasearetaggedandwordsnotfoundaretaggerasunknown•PoS taggingrulesareappliedforunknownwords

PoS taggingExample

• Example:

•महामा गाँधी का जम 2अक्टूबर को हुआ

• महामा ~NNगाँधी ~NNका ~INजम ~NN 2~QTअक्टूबर ~NNहुआ ~VB

ParseTreeGeneration

• ForHindi,weusefollowinggrammarrulestoextracttriplets

ParseTree

SENTENCE

NounPhrase PostPhrase VerbPhrase

NN IN NN VB

महामा गाँधी जन्म 2 अक्टूबरका हुआ

TripletExtraction

• Inputistheparsetree• Twotypesoftriplets•Concise•Raw

•ForHindi,tripletis->[Subject– Object- Verb]

TripletExtraction

• Forparsetreewesawonthepreviousslidethetripletsandthecorrespondinganswersareasfollows,

• [महामा गाँधी - qqq - हुआ ]=>जन्म 2 अक्टूबर• [qqq -जन्म 2 अक्टूबर - हुआ ]=>महामा गाँधी• [महामा गाँधी -जन्म 2 अक्टूबर - qqq]=>हुआ

PoS taggingExample

• Example:

• नरेंद्र मोदी भारत के कितने प्रधानमंत्री हंै

•नरंेद्र ~NNमोदी ~NNभारत~NNPके ~INप्रधानमंत्री ~NN हैं ~VB

ParseTree

SENTENCE

NounPhrase PostPhrase VerbPhrase

NN IN NN VB

नरेंद्र मोदी भारत प्रधानमंत्रीके हैं

• Forparsetreewesawonthepreviousslidethetripletsandthecorrespondinganswersareasfollows,

• [नरेंद्र मोदी भारत - qqq - हंै]=>प्रधानमंत्री• [qqq - प्रधानमंत्री - हंै]=>नरेंद्र मोदी भारत• [नरेंद्र मोदी भारत - प्रधानमंत्री - qqq]=>हंै

NamedEntities

•Namedentityrecognition,oneofthemajorcomponentofaQAsystem•Waystorecognizingentitiesintext…•Maintainahandwrittendictionaryofentities•Useautomatedmethodsforeg.UsingWikipediatorecognizeentities.

NamedEntities

•Usercanadd,delete,viewentitiesforaselectedlocale

•Usercanviewentitiesforalocale

Experiments

• WecreatedanindexforEnglishandanotherforHindibysettingupacrawlonwikipedia pages.• TosetupacrawlinYioop,undertheManageCrawlsTab,clickonOptions,addtheseedsites• Forhindi,Iusedhttps://hi.wikipedia.org/ anddomainstocrawlas’co.in’and’in’• Forenglish https://en.wikipedia.org/ anddomainstocrawlas’com’

ExperimentsYioop withoutHindiQ/A Yioop WithHindiQ/A

Before/AfterHindiQ/Aintegration

PartofSpeechtaggerPerformance

EnglishvsHindiQ/A

• Iused4topictoevaluatetheretrievalefficiencyofthesystem

• Foragiventopic,Iaskedthesamesetofquestionstoenglish andhindisystems.

• Foreg:“WhowasMahatmaGandhi”“महात्मा गांधी कौन थे”“WhoisAlbertEinstein?”“अल्बर्ट आइंस्टीन कौन थे”

• Theaccuracyisevaluatedbycomparingretrievedanswerwithknownanswerdataset

AveragePrecisionScore

ReciprocalRank

AccuracyofQuestionAnswering

• Icreatedasetof25questions,correspondingsetofanswersknowntobetrue• Iaskedthequestionsfromthequestionsetandcomparedtheretrievedanswerwithknownanswersettoevaluateefficiency

AccuracyEnglishQ/A

AccuracyHindiQ/A

Questions?