Click here to load reader
Upload
vundemodalu-manjush
View
106
Download
0
Embed Size (px)
DESCRIPTION
Why is it difficult to process text data and why is it very diffcult do sentiment analysis. Methods, improvements, problems and solutions.
Citation preview
Meet him
I am a very happyyy person....
I am a very happyyy person....
Remove repetition of letters
I am a very happyyy person.... 8)
I am a very happyyy person.... 8)
Convert smileys
@raju is a very happyyy person.... :)
@raju is a very happyyy person.... :)
@sachin is a very happyyy person.... :)
@sachin is a very happyyy person.... :)
@raju @sachin is a very happyyy person.... :)
Getting huge data
Collecting useful data
Pre Processing
This is f**king sHittt.I hate you :)
This is f**king sHittt.I hate you :)
This is s**r .Are you are watching star plus :D
This is s**r .Are you are watching star plus :D
Don't worry about everything
Regex Test before you run
Get Large Data
Filter to Useful Data
Clean
Get Large Data
Filter to Useful Data
Clean Data Munging
20%
Ask questions
+ve (or) ve ?
???
Magic Box
Inputdata
Magic Box
Inputdata
Data modeling 60%
Computer is dumb machine
HeHe what's that??
1 0 machine
We need to tag words
Assign numbers to text
Worry about adjectives first
Awesome 4
Ugly 3
Why 0
Scores.txt
Data
Sentiment
Less accurate why?
Most words are ignored
What's the solution?
TFIDF
Normal TfIdf = Tf * Idf
Slightly modified
Tf = score Idf=update count
Awesome 4
Ugly 3
good 2
Scores.txt
Data
Sentiment
Fun 2.014 5
Soft 2.92
20
Dynamic.txt
How do we know, if it's correct?
Testing accuracy
Mixed sentences
I hate facebook, but I love twitter
I hate facebook, but I love twitter
I hate rahul #politics, but I love modi :)
I hate rahul #politics, but I love modi :)
Closest possible one is pos
I hate facebook, but I love twitter
I hate facebook, but I love twitter
arg1 arg2 arg3 arg4Key Word
Main word+
Args
Tagger+
PatternPolarity
Problems with this model
Training data
Processing speed
I hate facebook, but I love twitter
I love
output
I hate facebook, but I love twitter
I hate
output
Problems with this model?
Sarup is a tech Enthusiast.He has a great taste in music. He is not only a designer
but also startup minded.
Sarup
is a tech Enthusiast
Co reference Resolution
Problems with using stanford nlp
We are designing our own co reference model
Problems?
Thank you