56
1 Brave New Search World Ran Hock Online Strategies [email protected]

Brave new search world

  • Upload
    voginip

  • View
    399

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Brave new search world

1

Brave New Search World

Ran HockOnline [email protected]

Page 2: Brave new search world

2

Brave New Search World• The nature of “search” is changing

radically. • Structure is being created from (relatively)

unstructured data.• The “Semantic Web” is becoming an

actuality.• Natural Language Processing (NLP) and

other technologies are being extensively applied to search and search-related activities.

Page 3: Brave new search world

3

Brave New Search World• These technologies are making the following

kinds of things happen:– “Knowledge graphs”– “Entity” identification in numerous

applications– Natural language search statements– Actual searching of images (not just of

image metadata)• These advances are coming not just from

Google but from numerous services, especially for “news” search.

Page 4: Brave new search world

4

Some Themes/Perspectives• What is happening is more evolutionary than

revolutionary. Many, but not all, of the "pieces" of the technology have been around for a while.

• Structure is being derived out of (not totally) chaos. We are going from words to meaning.

• Google isn’t the only player here.• We can take real advantage of the developments.• Using what you already know about “search” is

important.

Page 5: Brave new search world

5

Unstructuredness of Data• Part of the “organization of knowledge” problem• Particularly acute for textual material • To a computer, a “word” is a string of characters

bounded by spaces or punctuation and has no “meaning”.

• When we are searching for something, we are searching for meaningful things, not character strings.

• Meaning can be derived from context by the use of NLP.

Page 6: Brave new search world

6

Where We Were Recently

• Boolean Logic– Actually a precursor/example of Artificial

Intelligence (AI) applied to “search”.– Still a part of search AI

• Boolean is (from our infancy) a central aspect of how we think, a part of our “consciousness”

• Old approach: Searching by concepts

Page 7: Brave new search world

7

Where We Were Recently “Old” (circa 1975 – 2???)

search strategy (searching by “concepts”)

OR

Page 8: Brave new search world

8

Where We Were Recently(cont.)

• Ranking of web search results was/is based on a wide range (ca 200) factors, “signals”

• User-controlled field searching (intitle: etc.)

• Etc.

Page 9: Brave new search world

9

The “Newer” Technologies• Semantic Web Technologies• Artificial Intelligence (AI) used at a broad

level and utilizing various AI subfields• AI - Expert Systems approaches• AI - Natural Language Processing (NLP)• AI - NLP - Entity identification (extraction,

disambiguation, classification, etc.) • AI - Machine Learning• Big Data processing

Page 10: Brave new search world

10

Technologies:The Semantic Web

• W3C “informal” definition – "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”(from Tim Berners-Lee et al, The Semantic Web. Scientific American, May 2001.)

Page 11: Brave new search world

11

Technologies:The Semantic Web

• Essence:• “strings to things”• “words to meaning”

• Technologically accomplished on webpages by means of a specialized xml markup language, etc.

Page 12: Brave new search world

12

Technologies:The Semantic Web

• Idea born pre-1999• In practice, also requires other technologies

such as Natural Language Processing, etc. • 2006 - Berners-Lee and colleagues stated

that: "This simple idea…remains largely unrealized".

• 2013 - more than four million Web domains contained Semantic Web markup.

Page 13: Brave new search world

13

Technologies:AI - Expert Systems

• Search results ranking has long used an “expert systems” approach, mimicking what an experienced researcher looks for:– Words appearing in the title – Number of times cited (linked-to)– Proximity of words– Words in the abstract– Words in headings – Etc.

• This will continue, more and more automatically.

Page 14: Brave new search world

14

Technologies:Natural Language Processing

• A part of artificial intelligence and computational linguistics

• Deals with helping computers “understand” written and spoken languages

• Plays a key role in voice input for search, natural language search statements, translations, and more.

Page 15: Brave new search world

15

Technologies:Natural Language Processing

Google's syntactic systems • predict part-of-speech tags for each word in

a given sentence, • identify morphological features such as

gender and number. • label relationships between words, such as

subject, object, modification, etc. • leverage large amounts of unlabeled data• incorporate neural net technology.

research.google.com/pubs/NaturalLanguageProcessing.html

Page 16: Brave new search world

16

Technologies:Natural Language Processing

Google’s semantic systems• identify entities in free text,• label them with types (such as person,

location, or organization), • cluster mentions of those entities within and

across documents (co-reference resolution), • incorporates multiple sources of knowledge

and information to aid with analysis of textresearch.google.com/pubs/NaturalLanguageProcessing.html

Page 17: Brave new search world

17

Technologies:Entity Extraction

• A.k.a. named-entity recognition, entity identification• Complementary to other natural language processing• Identifies things, people, places, etc. within text (and

speech).• Relates to the idea of concepts referred to earlier. • Because “text” is based on language, “structure” is there

but the structure is not readily evident to a computer.

Page 18: Brave new search world

18

Technologies:Entity Extraction

• Context-based connections allow discernment of different meanings of a word.

• Entity extraction draws inferences based on the logical content of the data.

• Entity extraction may be the single most important tool for bringing structure to unstructured data, specifically text.

• Also used for search query “suggestions”.• An excellent example is found in Silobreaker.

Page 19: Brave new search world

19

18

Page 20: Brave new search world

20

Page 21: Brave new search world

21

Page 22: Brave new search world

22

Technologies:Machine Learning

Computers teaching themselves

Google RankBrain• Used in processing search results, part of Google’s

Hummingbird search algorithm• A way of interpreting a search statement in order to

find web pages that may not have the specific words in the search statement.

• Uses patterns from seemingly unconnected other “complex” searches to find similarities in the current search, then applying that information to most likely useful content.

• Google regards this as the third most important signal.

Page 23: Brave new search world

23

Technologies:Big Data

• The existence of “big data” collections provides unprecedented opportunities for computational approaches for computers to “understand” text.

• In neural networking image entity identification experiments, the accuracy of machine learning algorithms improves vastly when used with large pools of data.

• "...Google’s search engine queries a 100 petabyte index that incorporates over 200 indicators and whose algorithms change more than 500 times per year."

Page 24: Brave new search world

24

Specific Applications of These (and Other) Technologies

• Continued gradual incorporation of “expert” techniques

• Natural language search statements• Search by voice• Image recognition and search: search of images,

search by image, and facial recognition• Knowledge Graphs• Entities in news search

Page 25: Brave new search world

25

Gradual Incorporation of “Expert” Techniques

• An “ordinary” search isn’t what it used to be.• Google has now quietly taken over more of the

“old” “professional searcher” techniques and now automatically adds not just word variants, but synonyms.

Page 26: Brave new search world

26

Gradual Incorporation of “Expert” Techniques

• Suggested searches (based on known connections and not just based on your character string)

A "data-driven" approach - trillions of words, vs "rules“. Not just word variants.

• The old “synonyms” (~diet) option didn’t just go away. It is now applied automatically. (Few people use the OR.)

Page 27: Brave new search world

27

Gradual Incorporation of “Expert” Techniques

• “Did you mean” is now more often “Showing results for”

Page 28: Brave new search world

28

Gradual Incorporation of “Expert” Techniques

• “Fuzzy Logic” – As well as searching for words that are “close”, Google may drop some of your “concepts” for some records

Page 29: Brave new search world

29

Gradual Incorporation of “Expert” Techniques

– If Google “thinks” you want specific facts and “sees” a matching answer, you may get that immediately.

Page 30: Brave new search world

30

Specific Applications:Natural Language Search Statements

• Don’t hesitate to use them!

• The above two searches give different (and relevant) answers

• This is especially important for Google Now and Siri!

Page 31: Brave new search world

31

Specific Applications:Voice Search

• Apple (iOS) - Siri• Google – Google Now• Bing – Cortana (recently deceased?)• These “expect” natural language, so

natural language will yield the best results.

Page 32: Brave new search world

32

Specific Applications:Image Recognition and Search:

Search of ImagesNot much recent obvious change in Bing’s or

Google’s regular image search, but:• “Categorization” (aspect of entity extraction) is

now shown on image search results pages• Google, Microsoft (Bing) and Apple are heavy

into research on image identification and classification.

• What’s happening/coming can be anticipated by looking at Google Photos.

Page 33: Brave new search world

33

Specific Applications:Image Recognition and Search:

Search of ImagesBing Image Search

Page 34: Brave new search world

34

Specific Applications:Image Recognition and Search:

Search of Images

Page 35: Brave new search world

35

Specific Applications:Image Recognition and Search:

Search of Images• In December 2015, Microsoft beat out 5 competitors

(including Google) in the ImageNet contest for machine recognition of images

• Machines were trained to recognize images using a “deep neural networking” method.

• Competitors must locate and identify objects from 100,000 photographs found in Flickr and search engines and then place them in 1,000 object categories.

• Microsoft, the winner, had an error rate of 3.5 percent for classification and 9 percent for localization.

• Machine learning using neural networking is also very successfully used for translations, such as in Skype’s new translation offering

Page 36: Brave new search world

36

Specific Applications:Image Recognition and Search: Search by Image

Page 37: Brave new search world

37

Specific Applications:

Image Recognition and

Search: Entity and Facial Recognitionin Google Photos

Page 38: Brave new search world

38

Specific Applications:Knowledge Graphs

• Knowledge graphs do not originate with Google (but Google has made the term widely known.)

• “Knowledge graph theory was initiated by C. Hoede, a discrete mathematician at the University of Twente and F.N. Stokman, a mathematical sociologist at the University of Groningen, both in the Netherlands.” (ca 1982) http://doc.utwente.nl/64931/1/memo1876.pdf

Page 39: Brave new search world

39

Specific Applications:Google Knowledge Graph

• The Google Knowledge Graph, overall, is a database about “things” and the connections between those things.

• Delivers and summarizes key facts about people, places, things.

• The selection of those facts is based on connections regarding that entity and related entities and on what other users have asked about that entity.

Page 40: Brave new search world

40

Specific Applications:Google Knowledge Graph

• Launched May 2012• At its heart, Google Knowledge Graph is a

database of facts.• At that time it contained 18 billion facts

between 570 million objects.• The kinds of things included vary with the

kind of entity.• Content comes primarily from Wikipedia,

World Factbook, Freebase/Wikidata, plus other sources.

Page 41: Brave new search world

41

Page 42: Brave new search world

42

Page 43: Brave new search world

43

Specific Applications:Google Knowledge Graph

• The key power of Google Knowledge Graph lies in its utilization of connections between entities as searched for by other users.

• At present, its present main weakness is its heavy un-vetted reliance on Wikipedia, which is not always right, e.g., the Wikipedia article on Knowledge Graph.

Page 44: Brave new search world

44

WRONG!

Page 45: Brave new search world

45

Page 46: Brave new search world

46

Bing’s Knowledge Graph

• Named “Snapshot”, it uses Bing’s Satori technology

• Launched in June 2012• Utilizes Wikipedia, Freebase, Qwiki,

LinkedIn, Britannica, etc.• Builds into results interactive features

such as audio and video

Page 47: Brave new search world

47

Page 48: Brave new search world

48

Page 49: Brave new search world

49

Specific Applications:News Applications

Examples of News Sites Effectively Using These Technologies

• Silobreaker (example shown earlier)• EMM

Page 50: Brave new search world

50

Specific Applications:News Applications

EMM – European Media Monitor• From the European Commission• Computerized analysis of news trends

and story content• Makes extensive use of NLP techniques

for entity extraction and clustering• “Organizes” a vast quantity of

knowledge very efficiently.

Page 51: Brave new search world

51

Page 52: Brave new search world

52

Page 53: Brave new search world

53

Page 54: Brave new search world

54

So, How do we as researchers take advantage of this?

• Get in the habit of using what's new (Siri, Google Now, natural language). Join the Evolution!

• Actually pay attention to Google Instant (suggestions).

• Don't forsake the old. There are times when you need to turn the auto-pilot off and take charge.

• Ask questions you didn't bother asking before [because you didn't think the search engine would do it.]

Page 55: Brave new search world

55

So, how do we as researchers take best advantage of this?

• Increase awareness of information quality criteria

• Worry a bit - – Worrisome - the general public's further reliance

on quick, single, local, twitter-length answers– Worrisome - Localization, – Worrisome -"echo chambers“– " Machines making decisions on our behalf”

• Enjoy the new.

Page 56: Brave new search world

56

Questions?

Ran HockOnline [email protected]