Fun with Text - Managing Text Analytics

  • Published on
    14-Apr-2017

  • View
    1.127

  • Download
    2

Embed Size (px)

Transcript

<p>PowerPoint Presentation</p> <p>Cohan Sujay CarlosCEO, Aiaioo LabsFun with TextManaging Text Analytics</p> <p>What I am going to talk about.Text Analytics Examine 3 kinds of opportunities Discuss 3 text analytics problems Touch upon 3 things to watch out for and 3 things to embrace.</p> <p>What if we can master text?What do we get from it?There are opportunities in every vertical:</p> <p>Aerospace / Defense / Automotive -- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management</p> <p>What if we can master text?What do we get from it?There are opportunities in every vertical:</p> <p>Aerospace / Defense / Automotive -- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback managementHealthcare / Life sciences -- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence </p> <p>What if we can master text?What do we get from it?There are opportunities in every vertical:</p> <p>Aerospace / Defense / Automotive -- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback managementHealthcare / Life sciences -- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence Legal and Government -- Legal and administrative filings / Case document and administrative record management / Analysis of legal and administrative documents (land records, case files)</p> <p>What if we can master text?What do we get from it?Do you observe a pattern?</p> <p>In every vertical </p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>How do we unlockthe value in text?</p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>Natural Language GenerationNatural Language UnderstandingNatural Language Processing (aka Text Analytics)</p> <p>Use Case 1:Customer Service</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p> and you have to fill in the database fieldsfrom the information in the text ReporterLocation (of Reporter)Product</p> <p>Use Case 1:Land Records</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>Property K45L234(lot 23-24) in Wake Countyof 3000 sq ftwas sold to James Fischeron 3-30-1997 </p> <p> and you have to fill in the database fieldsfrom the information in the text </p> <p>Use Case 1:Land Records</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>Property K45L234(lot 23-24) in Wake Countyof 3000 sq ftwas sold to James Fischeron 3-30-1997 </p> <p> and you have to fill in the database fieldsfrom the information in the text Title NumberLotCounty</p> <p>Use Case 1:M&amp;A Transactions</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>Acme Financials, a subsidiaryof Lehman Sisters, was acquiredby John Doe Corp on 5/26/2001.</p> <p> and you have to fill in the database fieldsfrom the information in the text </p> <p>Use Case 1:M&amp;A Transactions</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>Acme Financials, a subsidiaryof Lehman Sisters, was acquiredby John Doe Corp on 5/26/2001.</p> <p> and you have to fill in the database fieldsfrom the information in the text AcquirerAcquiredDate</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p>Entities are pieces of text that could go into the fields in the database.Identifying entities and the relations between themReporterLocation (of Reporter)Product</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>Lets say you have some text and a database or spreadsheet with columns</p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p>Entities are pieces of text that could go into the fields in the database.Identifying entities and the relations between themReporterLocationProductJohn ChambersSpringfield, MAFord Ranger</p> <p>Use Case 1: Customer Service[ Information Extraction ]Relations tell you about the connections between entities.</p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p>Entities are pieces of text that could go into the fields in the database.Relations connect the entities that belong in a row.Identifying entities and the relations between themReporterLocationProductJohn ChambersSpringfield, MAFord Ranger</p> <p>Location of Reporter</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p>Information extraction converts:unstructured information into structured information.Identifying entities and the relations between themReporterLocationProductJohn ChambersSpringfield, MAFord Ranger</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p>Information extraction can improve efficienciesin processes where humans read text and copy fields into databases.Identifying entities and the relations between themReporterLocationProductJohn ChambersSpringfield, MAFord Ranger</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>How can text analytics methods be usedto automate entity and relation extraction?Rule based methodsMachine learning methodsAiaioo Labs aiaioo.com</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>Rule-based frameworks for entity and relation extraction?</p> <p>http://services.gate.ac.uk/annie/</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>It uses lists of first names and last names of persons, and names of places and matches them in the text </p> <p>How does GATE/Annie identify entities and the relations?John Chambers of Springfield, MA reported a problem with the clutchon his Ford Ranger purchased in Boston, MA in 2005.JackJillJohnChambersMillerFarnsworthSpringfieldBostonCambridgeMACAMD</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>Machine learning frameworks for entity and relation extraction?https://opennlp.apache.org/</p> <p>Apache OpenNLP</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>Machine learning frameworks need training data.https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>From examples such as:</p> <p>It learns to recognize:How does OpenNLP identify entities and the relations?John Chambers of Springfield, MA reported a problem with the clutchon his Ford Ranger purchased in Boston, MA in 2005.John Archer of Maryland reported a problem with his Figo.Vince Chambers of Denver, CO had trouble with his Focus.</p> <p>Use Case 1: Customer Service[ Information Extraction ]</p> <p>How to choose between text analytics methods for entity and relation extraction?Rule based methodsMachine learning methods3 months to reasonably performing modelTypically higher precisionTypically less flexibilityTypically less recall1+ years to reasonably performing modelTypically lower precisionTypically more flexibilityTypically higher recall + overall performance</p> <p>5115 8Can you classify these door heights as: Short / Tall ?5851162665 26869 610Aiaioo Labs aiaioo.com</p> <p>5115 8In analytics, an analyst comes upwith a rule.5851162665 26869 610If door_height &lt; 6 then Short else TallAiaioo Labs aiaioo.com</p> <p>5115 8In machine learning, the computer comes up with a rule from examples.5851162665 26869 610Aiaioo Labs aiaioo.com</p> <p>How do we unlockthe value in text?The first use case </p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>Information ExtractionIdentifying entities and the relations between themAiaioo Labs aiaioo.com</p> <p>How do we unlockthe value in text?The second use case </p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>Text CategorizationLabeling text with one or more category labelsAiaioo Labs aiaioo.com</p> <p>Use Case 2:Organizing Text for Storage</p> <p>Lets say you have some text and you want to mark it as one of </p> <p>John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.</p> <p>ReportInquiryAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>Start by collecting some samples of documents of each of your categoriesReportInquiryI have a problemThis complaint is aboutWhere can I buy aDo you sell furnitureAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>Train a classifier with them.Aiaioo Labs aiaioo.comReportInquiryI have a problemThis complaint is aboutWhere can I buy aDo you sell furniture</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>Start by collecting some samples of documents of each of your categoriesPoliticsSportsThe United NationsThe United States andManchester UnitedManchester and BarcaAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>Train a classifier with them.PoliticsSportsThe United NationsThe United States andManchester UnitedManchester and BarcaAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>Run the classifier on a new piece of text.</p> <p>The classifier will return a label.PoliticsNations and StatesAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>How can text analytics methods be usedto automate organization/categorization?Rule based methodsMachine learning methodsAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>But rule-based methods work for classification too.</p> <p>Rule-based text categorization is often used in:Social media sentiment classificationAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>We use lists of negative and positive words (usually adjectives)(available in the AFINN gazetteer) and match them in the text </p> <p>How do we use rules to identify sentiment?I am sad that Steve Jobs died.sadbadevildistraughtdeaddiedthrilledexcitedamazedhappylovejoyAiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>Can we use entity and relation extraction to do better?I am sad that [Steve Jobs died].Analysis: This person holds a positive opinionof Steve JobsThe ve entity sad is related to the ve event Steve Jobs died.Aiaioo Labs aiaioo.com</p> <p>Use Case 2: Organizing Text[ Text Categorization ]</p> <p>How to choose between text analytics methods for text categorization?Rule based methodsMachine learning methodsTypically higher precisionTypically less flexibilityTypically less recallTypically lower precisionTypically more flexibilityTypically higher recall + overall performanceAiaioo Labs aiaioo.com</p> <p>How do we unlockthe value in text?The first use case </p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>Information ExtractionIdentifying entities and the relations between themAiaioo Labs aiaioo.com</p> <p>How do we unlockthe value in text?The second use case </p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>Text CategorizationLabeling text with one or more category labelsAiaioo Labs aiaioo.com</p> <p>How do we unlockthe value in text?The third use case </p> <p>Output Text / Store and Transform Text / Ingest and Analyze Text</p> <p>Question AnsweringGenerating a response to an inquiryAiaioo Labs aiaioo.com</p> <p>Use Case 3:Answering Questions</p> <p>Lets say you get a question and you want to answer to be one of </p> <p>Do you ship your cars to Boston, MA?</p> <p>YesNoAiaioo Labs aiaioo.com</p> <p>Use Case 3:Answering Questions</p> <p>First you classify the question into one of 3 typesand these are</p> <p>Do you ship your cars to Boston, MA?</p> <p>Who is the CEO of Apple?</p> <p>Why is the sky blue?</p> <p>Yes/No questionsFactoid questionsNon-factoid questionsAiaioo Labs aiaioo.com</p> <p>Use Case 3:Answering Questions</p> <p>Look for answers in databases that you created using entity / relationship extraction</p> <p>Do you ship your cars to Boston, MA?</p> <p>Who is the CEO of Apple?</p> <p>Why is the sky blue?</p> <p>ProductShips ToCarsUSA</p> <p>CEOFirmTim CookApple</p> <p>Aiaioo Labs aiaioo.com</p> <p>To watch out for:Text Analytics Traps Testing on Training Data Using US Training Data for India Treating all Data Sources as OneAiaioo Labs aiaioo.com</p> <p>To embrace:Text Analytics Tricks UI Compensation for AI Inaccuracy Raising Precision at the Cost of Recall Domain Specific RulesAiaioo Labs aiaioo.com</p> <p>About Aiaioo LabsAI Research Lab http://aiaioo.com http://aiaioo.com/publications http://aiaioo.wordpress.comAiaioo Labs aiaioo.com</p> <p>THANK YOUAiaioo Labs aiaioo.com</p>