18
Career Opportunities for Linguists in Industry Vita Markman Computational Linguistics Engineer DIMG

Career Opportunities for Linguists in Industry Vita Markman Computational Linguistics Engineer

  • Upload
    diallo

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Career Opportunities for Linguists in Industry Vita Markman Computational Linguistics Engineer DIMG. Linguists in Industry??!!!. Overview I. What can a linguist do in industry? II. Some concrete examples of what linguists work on “in the real world” - PowerPoint PPT Presentation

Citation preview

Page 1: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Career Opportunities for Linguists in Industry

Vita MarkmanComputational Linguistics Engineer

DIMG

Page 2: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Vita Markman 2

Linguists in Industry??!!!Overview

I. What can a linguist do in industry?

II. Some concrete examples of what linguists work on “in the real world”

II. From school to work: what skills/knowledge one needs

IV. Useful References

V. Conclusion and Q&A

Page 3: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

3

But Why?Because:●Having options is good●You may want to take a year off before going to grad school, yet work on something relevant● It's really fun●Research jobs are NOT confined to the academic setting●Not all industry jobs are like from 'The Office Space'

Page 4: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

4

Explosion of Text Data● Explosion of linguistic data online

● Social media (twitter, facebook, blogosphere)

● Linguistic data is not easily amenable to analysis. It requires much processing and much insight into the nature and structure of language

● Modifying old NLP techniques and tools as well as inventing new ones: for example, no off-the-shelf parser can handle Twitter conversations.

Page 5: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

5

A linguist in the industry?

Examples:●Information retrieval (esp. search engines that do semantic search)

●Voice recognition/generation – think 'google voice'!

●Text classification and text clustering

●Text mining – finding grains of useful info in unstructured text

●E-discovery

●Analyzing the language of social media (topic and sentiment extraction from short fragmented and noisy data)

Page 6: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

6

Chat filtering: an example● Computational linguistics application for on-

line virtual worlds

● Ensuring safety of on-line chat environments

● Filtering chat for appropriate content

● Filtering is an example of a classification problem: classify text as ‘appropriate’ or ‘inappropriate’

Page 7: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Chat filtering: an example

The problem of determining whether lines involve inappropriate content is similar to spam detection

Simple word/phrase matches are not enough. It will be too aggressive and not be enough:

some nefarious lines may pass throughMost inappropriate talk is made up of

completely innocent words!

Page 8: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Chat filteringPeople use innocent words to make inappropriate phrases

People also find ways to say things with MixEdCAse or b,r,o.k.en w.o.r,ds that get around the filter

We want: a general way of saying: “if you see MixED CaSE or br.ok.en word,s it is probably a sign of something bad”

We also want: to capture inappropriate combinations of innocent words

A solution: a model, much like the ones used for spamWhat is the general idea?

Page 9: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

The idea behind filteringLook at the words and other features that make up

appropriate and inappropriate chat, and ask how likely is a word to appear in inappropriate chat?

Example: If a pair of words such as “stew pit” never appears in regular, appropriate chat, it is probably an indicator of something inappropriate.

Having done that, we can ask for a new chat segment, is it likely to be inappropriate?

Page 10: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

That said…Filtering is an example of text classification,

used very broadlyClassifying documents by content/topic

requires a labeled set of data and a learning algorithm

The difficult part is not the algorithm itself, but the fine-tuning of its parameters and manipulating and preprocessing the data

This is where creative and analytical thinking becomes truly crucial and the job becomes really fun!

Page 11: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Another example: TwitterClustering super-short twitter posts by topic Clustering = finding groups in unlabeled dataThis data is very noisy and fragmentedlets see im on lates on monday so dont start till two and could get down

from work .......mail me xxx 7/16/2009 11:50:26 AMWell i'm watching mate running at silly o'clock but i'll be free in the late

afternoon 7/17/2009 8:18:12 AM

Resistant to parsing and part of speech taggingGoal: arrive at K clusters, each representing a

topic of the post! Problem: very little to go on!

Page 12: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

TwitterPossible solutions: padding – adding more relevant

words to each postApplying spellcheckers to reduce the noise in the

dataRemoving various content-less function words,

known as stop-words. Since even those posts that share a common topic,

do not often share many words in common, look at the mutual contexts in which the words in posts appear

This technique is known as Latent Semantic Analysis

Page 13: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

13

From school to workplace●What skills are needed for a future (relevant) career in computational linguistics ?–Statistics: basic understanding of sampling methods and probabilistic reasoning–Some linear algebra, esp. as it relates to matrix and vector manipulation–Some calculus–Machine learning algorithms that are used commonly in NLP such as Naïve Bayes, HMM, and Expectation Maximization–Some programming (python, java, C++)

Page 14: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

14

From school to workplace cont'dHow can these skills be learned?

●Formal instruction●Self-teaching●Taking a class on-line

Page 15: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Companies that employ linguistsGoogle (Bay area, Los Angeles)Yahoo (Bay area, Los Angeles)IBMMicrosoft (Seattle, WA)Smaller search engines and semantic search

engines: Ask.comAutonomy (Bay area)H5 (San Francisco)Cognition (Los Angeles)

Companies that do Machine Translation (Systran, Language Weaver)

Entertainment Companies (Los Angeles)

Page 16: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

Learning while workingInternships: Companies hire students to work

and learn

This can be an invaluable experience !

It can really supplement classroom learning via the actual application of the learned material

Sometimes academic knowledge and practical application do not go hand-in-hand…

Page 17: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

17

Resources●Association for Computational Linguistics (Conference in Portland OR in June, 2011)

●KDD Nuggets – a website for data mining and knowledge discovery, contains useful links to tutorials, news, and jobs

●Dice.com – job website for technical jobs only

●NLTK.org – natural language toolkit in python. Can be used to try a 'do it yourself' document classification and clustering.

●NLP Group at Information Science Institute at USC, located in Marina del Rey – great talks to get an overview of what’s going on in the field

●Books and articles:

●Jurafsky and Martin 2009 - The bible of computational linguistics

●Chris Manning 2008 – Information Retrieval

●Mitchel 1997 Machine Learning;

●WEKA – Data Mining free software and book (Witten and Frank 2005)

Page 18: Career Opportunities for Linguists in Industry  Vita Markman Computational Linguistics Engineer

18

Conclusion●As a linguist, you can do lots of interesting work in a non-academic setting●You must supplement your knowledge of linguistics by some math and computer science knowledge and you are good to go!●Bottom line: all you really need is to be open to learning new things, which is exactly why we all go to school in the first place!

Thank you!