Click here to load reader
Upload
saugata-bose
View
152
Download
5
Embed Size (px)
Citation preview
Presentation on English to Bangla Text Conversion
Saugata BoseAssistant ProfessorDepartment of Computer Science and EngineeringULABPresentation onEnglish to Bangla Text Conversion
Flow of our Session
What happens earlier
What qualifies me for applying for PhD
Text TranslationMachine TranslationSignificance
Introductory Ideas
Machine TranslationDirectIndirectWord by Word, Phrase by PhraseRequirementsBilingual DictionaryRearrangement Rule
All information necessary for the generation of the target text without looking back to the original textSL analysisTL GenerationSL to TL transfersourceLogical form of SOURCELogical form of TARGETtargetEmpirical
Empirical SystemStatistical ApproachExample BasedApproachSource text==stored example translationRequirementsBilingual corpusBest Match algorithmGrammar is not major focusGood quality of bilingual data in very large corpus
Previous WorksEMBT approach by Dr. Mumit Khan and Anwarus SalamTagging and Parsing the English sentenceTranslate from source language to the target language following some sentence rules.CYK-CNF approach by Sajib Dasgupta, Abu Wasif and Sharmin AzamSame as EMBT approachApply CNF to convert the parse tree to normal formTransfer English parse tree to Bangla one.Generate Bangla sentence
Am I ready???A Framework for Detecting External Plagiarism from Monolingual Documents: Use of Shallow NLP and N-gram Frequency Comparison Approach, Presented at 2nd International Conference on Information and Communication Technology for Competitive Strategies (ICTCS-2016) (Paper ID: 89), March, 2016. [Conference Proceedings by ACM ICPS, Proceedings Volume ISBN No 978-1-4503-3962-9]
Propose a FrameworkInvestigate the role of machine learning in the proposed framework.
ScopeExternal Plagiarism
Text Pre-processing & NLP Techniques
Comparison Methodologies
Feature Vector
Suspicious Documents
Original Documents
CorpusLowercasingChunkingPunctuationsStop WordsStemming/Lemmatizing1 gram2 gram3 gram4 gram5 gram
Feature Selection
Reduced Feature Set
Train Classifier
Apply Classifier on Test Data
Plagiarism Detection
Experimental Setup(cont)Comparison Methodologies
Machine learning algorithm
N gram Frequency based similarity measureJ48 Classifier, Nave Bayes Classifier
Experiment and Findings-1
Generating Decision Tree 95 instances121 attributesSelecting FeaturesBuild train model95 instances27 attributesAccuracy: 94.6809 % on J48Accuracy: 65.9574 % % on Nave BaiseAccuracy: 71.2766 % on Nave BaiseAccuracy: 93.617 % on J48
Accuracy: 89.0052 % on J48
Accuracy: 86.3874 % on NaiveBaise
Thank You
11