Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
OpenSourceQuestionAnswerSystem
AdvisorDr.ChrisPollettCommittee
Dr.MarkStampDr.RobertChun
BySalilShenoy
Outline
• ProblemStatement• BackgroundonQuestionAnsweringSystems• Implementation• Experiments• Conclusion
ProblemStatement
•ImproveanexistingEnglishQuestionAnsweringSysteminYioop•ImplementaHindiQuestionAnsweringSystemforYioop
QuestionAnswerSystems
• TypesofQuestionAnsweringSystems• ClosedDomaineg:BASEBALL,LUNAR,IRSLetc.•OpenDomaineg:START,Google,etc.
PartofSpeechTagging
•ApproachestoPoS tagging,• RuleBased(BrillTagger)•MachineLearning
• Foroursystem,implementedvariantofBrilltaggerforHindiPoS tagging
PartofSpeechTagging
• Inputissentenceprocessed,stemmed,n-gram,stopwordandpunctuationremoved• Termspresentinthelexiconindatabasearetaggedandwordsnotfoundaretaggerasunknown•PoS taggingrulesareappliedforunknownwords
PoS taggingExample
• Example:
•महामा गाँधी का जम 2अक्टूबर को हुआ
• महामा ~NNगाँधी ~NNका ~INजम ~NN 2~QTअक्टूबर ~NNहुआ ~VB
TripletExtraction
• Inputistheparsetree• Twotypesoftriplets•Concise•Raw
•ForHindi,tripletis->[Subject– Object- Verb]
TripletExtraction
• Forparsetreewesawonthepreviousslidethetripletsandthecorrespondinganswersareasfollows,
• [महामा गाँधी - qqq - हुआ ]=>जन्म 2 अक्टूबर• [qqq -जन्म 2 अक्टूबर - हुआ ]=>महामा गाँधी• [महामा गाँधी -जन्म 2 अक्टूबर - qqq]=>हुआ
PoS taggingExample
• Example:
• नरेंद्र मोदी भारत के कितने प्रधानमंत्री हंै
•नरंेद्र ~NNमोदी ~NNभारत~NNPके ~INप्रधानमंत्री ~NN हैं ~VB
ParseTree
SENTENCE
NounPhrase PostPhrase VerbPhrase
NN IN NN VB
नरेंद्र मोदी भारत प्रधानमंत्रीके हैं
• Forparsetreewesawonthepreviousslidethetripletsandthecorrespondinganswersareasfollows,
• [नरेंद्र मोदी भारत - qqq - हंै]=>प्रधानमंत्री• [qqq - प्रधानमंत्री - हंै]=>नरेंद्र मोदी भारत• [नरेंद्र मोदी भारत - प्रधानमंत्री - qqq]=>हंै
NamedEntities
•Namedentityrecognition,oneofthemajorcomponentofaQAsystem•Waystorecognizingentitiesintext…•Maintainahandwrittendictionaryofentities•Useautomatedmethodsforeg.UsingWikipediatorecognizeentities.
Experiments
• WecreatedanindexforEnglishandanotherforHindibysettingupacrawlonwikipedia pages.• TosetupacrawlinYioop,undertheManageCrawlsTab,clickonOptions,addtheseedsites• Forhindi,Iusedhttps://hi.wikipedia.org/ anddomainstocrawlas’co.in’and’in’• Forenglish https://en.wikipedia.org/ anddomainstocrawlas’com’
EnglishvsHindiQ/A
• Iused4topictoevaluatetheretrievalefficiencyofthesystem
• Foragiventopic,Iaskedthesamesetofquestionstoenglish andhindisystems.
• Foreg:“WhowasMahatmaGandhi”“महात्मा गांधी कौन थे”“WhoisAlbertEinstein?”“अल्बर्ट आइंस्टीन कौन थे”
• Theaccuracyisevaluatedbycomparingretrievedanswerwithknownanswerdataset
AccuracyofQuestionAnswering
• Icreatedasetof25questions,correspondingsetofanswersknowntobetrue• Iaskedthequestionsfromthequestionsetandcomparedtheretrievedanswerwithknownanswersettoevaluateefficiency