Lexalytics Text Analytics Workshop: Perfect Text Analytics

  • Published on

  • View

  • Download

Embed Size (px)


Presentation by Seth Redmore, VP Product Management at the Text Analytics Summit 2010


<ul><li> 1. Perfect Text Analytics<br />Seth Redmore<br />VP, Product Management<br /></li> <li> 2. Perfect<br />perfect<br /> [adj., n. pur-fikt; v. per-fekt]<br />1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman.<br />2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect.<br />2<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 3. Text Analytics<br />The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia)<br />In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion.<br />3<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 4. Perfect is Fast<br />Average Human Reading Speed: 250wpm<br />Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core)<br />Each core is equivalent to the reading bandwidth of 12 people.<br />Modern machines have 8 cores. <br />Thats just about 100 people in a box. <br />Nice.<br />4<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 5. Perfect is Useable<br />I dont like the results is not the same as the results are incorrect<br />Understanding the behavior key to usefulness<br />Can you make better decisions?<br />Can you make more money or save money?<br />What is the most controversial area of text analytics?<br />Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points<br />5<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 6. Useable: How much can you differ?<br />In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd. Katie Delahaye Payne<br />Why is 10% wrong so much less absurd than 20% wrong?<br />20% Error<br />10% Error<br />6<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 7. Perfect is Consistent<br />Same results for same content, every time<br />University of Pittsburgh Multi-Perspective Question Answering Corpus: 535 documents, 11k+ sentences. <br />40 hours of training for each rater<br />~80% inter-rater agreement<br />7<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 8. Perfect is (new) Knowledge<br />Discover the stuff you dont know<br />Text Analytics is really, really great at telling you the who, the what, and the where. Sometimes the how<br />You have to supply the why but that question is way easier to answer when you know the other ws and the h<br />8<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 9. Perfect Includes Everything<br />Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine)<br />The more data the better and the greater worth your ta has<br />9<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 10. Perfect is Trainable<br />Can you solve YOUR business problem with it?<br />Can you optimize to suit different kinds of content and roll those results up into a single reporting system?<br />10<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 11. Perfect Text Analytics<br />11<br />All right reserved 2010 Lexalytics Inc.<br />Fast<br />Useable<br />Consistent<br />Knowledge<br />(that is)<br />Inclusive<br />Trainable<br /></li> <li> 12. Customer Snapshots<br />(or, rubber, meet road)<br /></li> <li> 13. Reputation Management<br />13<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 14. Politics<br />14<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 15. Market Intelligence<br />Client Employee<br />User <br />Authentication<br />Single <br />Sign-on<br />External Content Providers<br />SinglePoint<br />Client Company<br />User <br />Authentication<br />Web 2.0<br />Collaboration<br />Search Results<br />Secondary<br />Research<br />Suppliers<br />User <br />Authentication<br />MI Analyst <br />Text Analytics<br />Integrated<br /> Index<br />News<br />&amp; Journals <br />NL Search Engine<br />FIREWALL<br />Internal<br />Document <br />Repository<br />Optional<br />Document <br />Repository<br />Financial <br />analyst <br />reports<br />Internal <br />research<br />Content <br />Processing<br />Custom Web <br />Crawls &amp; Gov.<br />Databases<br />Trash<br />can<br />crawl, <br />FTP<br />or CD<br />15<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 16. Hospitality<br />16<br />All right reserved 2010 Lexalytics Inc.<br /></li> <li> 17. Financial Services<br />Turns News into numbers for automatic trading systems<br /><ul><li>Company stocks + Commodities </li> <li> 18. Resilient server product</li></ul>All right reserved 2010 Lexalytics Inc.<br />17<br />Algorithmic<br />Trading<br />(QED firm)<br />Financial data<br />Indicators<br />Buy/Sell<br />RNSE<br />Server<br />Indicators<br /><ul><li>Ultimate customers are financial institutions </li> <li> 19. QED (Quantitative and Event-Driven Trading) Banks, hedge funds. </li> <li> 20. JPMorgan, SocGen, Alpha Equitiesand others</li></ul></li></ul>


View more >