Upload
kelly-russell
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Search and the ‘Net in 2015
Michael HunterReference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library CouncilMember Libraries’ Staff
Sponsored by the
Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds
granted by the New York State Library 2015
For today . . .
The Searchscape Behind the Screen:
Current Web Search Developments New Services The Social Web and Research Data Visualization Bing, Yahoo and DuckDuckGo Google Linklist
http://people.hws.edu/hunter/searchnet15links.htm
USC Annenberg’s Digital Future Report 2014http://www.digitalcenter.org/wp-content/uploads/2014/12/2014-Digital-Future-Report.pdf "General Internet Activities"
E-Reading Rises as Device Ownership JumpsBy Kathryn Zickuhr and Lee Rainiehttp://www.pewinternet.org/2014/01/16/e-reading-rises-as-device-ownership-jumps
American adults 18+ - % who read at least 1 book in that year
American adults 18+ - % who own each device
New Top Level Domains
First made available 1/29/14 Over 150 now live on donuts.co (2/15/15) Content-significant
.bike, .energy, .delivery, .legal, .guru Brand-specific – “vanity domains”
.android, .walmart, .nyc Allow for non-roman scripts –Arabic,
Chinese etc. Require proof of identity/relationship to TLD Unique TLD costs $185,000
Growth of Query Types over 1 yearhttp://searchenginewatch.com/sew/how-to/2383498/how-will-voice-search-impact-a-search-marketers-world
Voice Searchhttp://googleblog.blogspot.com/2014/10/omg-mobile-voice-survey-reveals-teens.html
Google's Voice Search (2010) 36 languages Apple's Siri (2011) 11 languages MS's Cortana (2014) 6 languages Study by Northstar; 1400 American
smartphone users, 400 age 13-17, 1000 18+ 40% - ask for directions 39% - dictate a text message 32% - make a phone call 27% - check the weather 23% 18+ - questions about cooking 51% 13-17, 32% 18+ - "just for fun"
Web Access in 2015Mobile has outpaced Desktop
Web Search in 2015Who’s crawling the Web?
Google Bing (aka Yahoo!) Gigablast Blekko DuckDuckGo Baidu Yandex
Market Share Growth Oct. 2013– Oct. 2014
www.comscore.com
Google Bing Yahoo! Ask AOL0
10
20
30
40
50
60
70
80
20132014
Behind the Screen: WAY beyond matching keywords
Semantic Processing Predictive Operations
Internal to the From data about
search engine and from the user
ANSWERS, NOT JUST SEARCH RESULTS
Semantic Processes
NLP Parsing Pattern MatchingKnowledgebase Entities Structured DataTerm Frequency Data
Unsilo.com
NLP Parsing
Machine-learned meaning derived from human or natural language speech or text.(Adapted from Wikipedia)
Analysis of large sets of documents (corpora) that have been human-annotated with parts of speech and other semantic information
Machine “learns” the relationships and meaning through statistical inference
Visualization at http://nlpviz.bpodgursky.com
Knowledgebase EntitiesGoogle’s Knowledge Graph – Bing’s Satori
Google’s Knowledge Graph – rooted in the (human) community-created entities in Freebase
Crowdsourcing too slow; often ignores specialized areas of knowledge, non-English content
Knowledge Vault – Automated extraction of raw data and creation of entities derived from that data
DOM trees-structures that help browsers represent and interact with documents in html and other formats (Wikipedia)
More Semantic Processing…
Term Frequency Data Frequency, proximity, order Aids in discovery across subject areas,
filetypes and entire domains Pattern Matching Algorithms
Focuses on recognition of patterns and regularities in text, data and images
Structured Data Structured Web tables and data sets
(.xls, .kml, .sdf) Human created tags – Schema.org
Schema.org
Organization backed by G, B, Y and other engines to standardize metadata for use by crawler-based services.
Helps create "real" answers and "rich snippets"
Schema example: Restaurant with a menu
Predictive Operations: Inferring the user’s intent“The Holy Grail of Search”
Location-based results – IP and GPS Weather, entertainment, restaurants…..
Anonymous past searches and user behavior
Personal data volunteered by user Time of day Device used
Semantic PredictiveProcessing Operations
--Correctly interpret the query, or a portion of the query
--Give a “best guess” answer based on highly trusted sources (knowledgebase) and similar searches
--Aggregate and grow the knowledgebase through iterative, real-time web crawls
Discovery Apps:Personalized Search on Steroids Combines your
Personal preferences Location Demographic characteristics Social network data
People, Preferences, Interests, Events Suggests entertainment, restaurants and
more Chat with your social network friends “Current events you may like within X
miles” Gravy – Free on I Tunes
Personal Assistant Apps
Connects to your E-mail Calendar Facebook events
Prompts for transportation times, quickest routes
Includes some discovery and chat features Relies heavily on user-supplied personal
data Sunrise, Tempo, et. al.
Apps and the Deep Web
Currently crawler-based S.E.’s cannot access content in apps unless the app allows it to. Posts Links Personal data
User must have the app loaded in order to access content, even if it appears in the S.E.
Education apps continue to grow in content, quality and use
Google is working on it…..
New Services
QwantA fresh approach to search Aims to offer a European-based service that
respects user’s privacy No cookies or other tracking of user's search
behavior No filtering of content unless user-initiated Launched in France in 2013 Search verticals offered:
Web News Social Images Videos ShoppingBoards (Online Forums, mostly European)
16 interface languages, which influence search results
Oaddo - www.oaddo.org
Members Suggest and curate all content Vote on search enhancements and new
features "All cultures are invited to join the
conversation" Hierarchical tabs suggest other relevant
concepts, ideas, locations and more. Only accessible with Chrome, Safari or
Firefox
Oaddo - www.oaddo.org
Based on a graph database model Content stored around subject-focused
nodesFacilitates cross-keyword and cross-language retrieval
Logical connections stored in a relationships dbfacilitates hierarchical organization and hidden connections
Still in alpha; limited content, but of high quality
Graph Databasehttp://www.slideshare.net/slidarko/graph-windycitydb2010 Structurally different from a relational
database (index-based) Composed of a set of nodes (eg. subject
clusters) connected to one another by lines (eg. relationships)
Every element in a node has a pointer to every other element it is related to
From the top…..
CC Searchsearch.creativecommons.org/
Searches media in the public domain Flickr, YouTube, Jamendo, Wikimedia
Commons, SoundCloud and others….. Some sponsored results appear that are
not in the public domain Verify use conditions for each result
Search and the Dark Web
Dark Web- Networks with server addresses intentionally obscured
Often house online criminal activities Includes TOR Networks Hidden Services have .onion TLD Only accessible via TOR’s private
browser Content not PW protected, but not
accessible to crawler-based services due to lack of linkage
MemexDOD’s Dark Web Search Engine
Software to visualize and organize big data
Searches text, handwritten text, images, geographic data embedded in photos….
Identifies hidden relationships among websites, deep web sites and forums
Can access Dark Web obscured networks
Used in online criminal investigations Sex-trafficking ads ISIS-funding and other money laundering
Contact [email protected]
http://www.wsj.com/articles/sleuthing-search-engine-even-better-than-google-1423703464
Instya meta enginewww.instya.com
Launched April, 2015 Results from each source appear in their
own browser tab Sources include
Web (7) Image (8) News (11)Video (7) Shopping (11) Dictionary
(14)Answers (8) Social (11)
Domain search offers website data, analysis7 Backlink sources 6 Website stats10 Domain information sites
The Social Web and Research
Why search the social web???
Public responses, attitudes, opinions Breaking news, events Trending topics and people Latest product reviews
First-hand accounts of events-text, image, audio, video (primary sources)
Security, technology topics (latest virus, etc.)
Locate individuals/experts and their networks
People interested in a topic/hobby Social web research projects
BuzzSumo - meta for social networks
Discovers the most shared content Crawls FB, TW, LinkedIn, Pinterest, Google+ Backlink and sharer data for 20 or more
instances Advanced search features
Boolean URL or domain searchAuthor search Twitter user search
FiltersArticle Infographic Guest PostGiveawaysInterviews Videos Date
Requires (free) account; other fee-based options
Twitter Search - search.twitter.com
Now includes every public Tweet since 2006
Searchable with all search features previously available at twitter.com/search-advanced
Indexes ca. ½ trillion tweets, and grows by several billion tweets a week.
Tweets deal with “everyday human experiences to major historical events”
Entire TV, sports seasons ConferencesPlaces Events Industry
discussionsLong-lived hashtags across countries, ideologies#ScotlandDecides #HongKong #Ferguson
#Hamas
Opinion Mining with TW
Identify TW users with differing opinions on a debated topic
Linguistic analysis of ca. 1 m. public tweets with “guns” or
“gun control” sent 4/15/13-4/18/13 Members of TW lists such as “Prevent Gun
Violence” or “Guns Save Lives” – Sample of 26385 for reforms 178 against reforms
Belonging to no relevant TW lists – Sample 500 276 for, 120 against, 204 did not voice opinion- (re-tweets of relevant tweets from others) Ashwin Rajadesingan and Huan Liu “Identifying Users with Opposing
Opinions in Twitter Debates” http://www.public.asu.edu/~huanliu/papers/sbp14.pdf
TW as social indicator and health predictor – Upenn study Linguistic and emoticon analysis of geo-
tagged tweets combined with health data from over 1,300 US counties
Tweets expressing negative emotions-stress, anger, fatigue-are associated with higher heart disease risk
Tweets with positive emotions-optimism, enthusiasm-are associated with lower levels of risk
http://www.upenn.edu/pennnews/news/twitter-can-predict-rates-coronary-heart-disease-according-penn-research
Education and the social searchscape Offers first-hand accounts of events and
conditions Informative of current world cultures
and trends on a wide range of subjects Gateway to blogs and other online
communication that can enhance scholarship
Channel for updates to educational programs
Embedded links and other information often highly relevant and recent
Requires careful evaluation of information found there
Data Visualization
Enables patterns to emerge in big data More accessible to visual learners Facilitates sharing across languages Can be made compatible with a wide
range of data formats Responsive to real-time changes Showcase of 2014 projects:
http://flowingdata.com/2014/12/19
Bing, Yahoo and DuckDuckGo
Looking for a niche
Bing and Yahoo represent 29% of all US searches http://comscore.com 12/1/14
Yahoo Focus is on local and personalized
search results Now partnered with Yelp, local business
search engine Bing
Focus is on lifestyle, travel, images, maps
Social search results (FB, TW) in a sidebar
Bing Image Search High quality images Related search offered, based on
descriptive text associated with the image
Clustering by topic Filters
Size People Color Date Type License Layout SafeSearch
Image Match with a URL or image you upload
Entity Comparisons
Google Bing
Bing for Schoolshttp://www.bing.com/classroom
Safe search filters and ad-free environment
Requires registration by a school Not possible to access it for home use Daily lesson plan available based on the
image used each day on the Bing homepage
Excludes Bing apps
DuckDuckGo http://ddg.gg
Offers anonymous search functionality Popularity spiked after NSA PRISM
search engine scandal Does not save search history of any
type G. does, using it "to increase relevancy" Included as a search option in Apple's
latest version of Safari Has been blocked in China !!!
Center for Student Workhttp://centerforstudentwork.elschools.org/ Joint initiative of the Harvard Graduate
School of Education and Expeditionary Learning
Free resource of searchable K-12 exemplary student projects
To help students "know what they are aiming for, and what it looks like when they get there."
English Mathematics Visual ArtsSocial Studies World LanguagesScience and Technology Health and Wellness
Knowledge VaultBeyond the Graph…..
Knowledge Graph seeded from Freebase entities and human additions
Automated generation of entities increases number and discovers hidden relationships among entities and their attributes
Entities now appear at top of results page with related topics or other relevant information
Type of additional information varies depending on entity
Right to be Forgotten rulingEU's European Court of Justice, May 2014
G. and other search engines must remove results deemed to be "inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes for which they were processed and in the light of the time that has elapsed."
http://curia.europa.eu/jcms/upload/docs/application/pdf/2014-05/cp140070en.pdf
Does not require them to be removed from the servers on which they are located
Makes the content more difficult to find Of the initial 12,000 removal requests
33% - fraud accusations 20% - related to violent/serious crimes 12% - related to child pornography arrests
App indexing G. currently indexes content from apps
that open their content to G's crawlers Results from apps are combined with
mobile search results if the searcher has that app installed on their mobile device.
Agawi - streaming technology that breaks apps up into small files, allowing users to access content in the app while the full app is loading. (Similar to YouTube's streaming video technology)
G. acquired Agawi in the fall of 2014
Google’s device-dependent results sets
The intent and context of queries varies between devices
G.'s search results on mobile devices vary from those on desktops or laptops by as much as 43%
Mobile results Tend to focus more on local-based results Display pages with smaller file size, on
average Based on analysis of first 30 results for
10,000 keyword searches“US Google Ranking Factors 2014” http://www.searchmetrics.com/news-and-events/mobile-optimization/
http://www.comscore.com/Insights/Presentations_and_Whitepapers/2013/The_Digital_World_in_Focus
Maps Gallery, In-depth articles Interactive digital thematic map
collections Historic city plans Climate
trends Housing affordability Shipwrecks Up-to-date evacuation routes
In-depth articles caveat "How to write the In Depth Articles that
Google Loves" copyblogger.com Content farm orientation? Requires careful evaluation of each item;
unvetted websites in particular
Google's tech projects
Google for Kids - under 13; more parental controls
Project Loon - Provide Web access via solar-powered drones
Self-driving cars Google Glass 2 Smart contact lenses Continuous health monitoring via disease-
detecting nanoparticles Liftware - stabilized spoon for tremor sufferers"Google Tracker 2015" http://arstechnica.com
Search in the Future
Will continue to be more specialized Shopping - Amazon Travel - Kayak Movies - IMDB Real-time news - TW
Discovery software will integrate more diverse types of data, crowdsourced to expert
Semantic processing and predictive search will grow
Social web will increase as a tool for social change
Search engines will be challenged by governments worldwide in the areas of commercial monopoly and individual privacy
Thank You and Enjoy Your Searching!
Michael HunterReference Librarian
Hobart and William Smith CollegesGeneva, NY 14456
(315) 781-3014 [email protected]