Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and...

Divide and Conquer:Challenges in Scaling Federated

Search

Presented by Abe Lederman, President and CTO

Deep Web Technologies, LLC

SearchEngine Meeting 24 April 2006 Boston, MA

SEARCH ALL OF THESE SOURCES

ONE AT A TIME

OR SEARCH THEM ALL AT

Finding the Gold Hidden in the World Wide Web

“Google-type” search engines “pan” the surface web for gold

“Deep Web” search engines go mining for gold

Finding the Gold Hidden in the World Wide Web

“Google-type” search engines “pan” the surface web for gold

“Deep Web” search engines go mining for gold

Challenges Overview

• Managing a large number of sources

• Searching a large number of sources in parallel

• Organizing and ranking the results returned

Challenges of Managing Thousands of Data Sources

Locate Reliable Sources

Categorize Sources by Content

Configure Sources for Searching

Maintain Sources

Challenges in Searching Thousands of Sources

Automatically Select Sources to Search

Retrieve Results from Cache

Perform Many Searches in Parallel

Bring Back Best Results

Source Selection Optimizer

Search Conductor

Source Selection Optimizer

Source

Descriptions Previous Results

Caching of Search ResultsReduces the load (cost) of accessing sources

CHALLENGES

• Requires a large database

• Need to determine how often to update the cache

• Works best with lots of users doing similar searches

We Address Scalability Through a Grid-Based Solution

• Uses open standards (Web Services, WSDL, SOAP, XML)

• Runs on distributed nodes

• Is platform independent (Java based)

• Very flexible, providing a framework for integration of various filtering and analysis tools

Distributing the Workload as Grid Services

Information Services

Filtering Services

Aggregation Services

Presentation Services

Select sources to search

Can I get more results from “good”

sources?

Enough good

results?

Deliver results to user

Perform Search

Get Next Results

Search Conductor

Searching a large number

of sources can lead to a flood

of results

Challenges in Organizing and Ranking Results

Multi-tier Relevance Ranking

User-driven Ranking

Clustering of Results

Multi-tier Relevance Ranking

• QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet

• MetaRank – Ranks results utilizing custom algorithms applied to meta-data

• DeepRank – Downloads and indexes full-text documents

HEAVY LIFTING REQUIRED!

User-driven Ranking

Credibility of sourceDate rangeDocument lengthDocument type

Geographic proximityPopularity of documentReading levelRelevance

Desired: Blending (weighing) of above criteria

Clustering

A Grand Challenge for Federated Search

Source: Walter Warnick, Ph.D., DOE OSTI. Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science. Presented at the Annual Meeting of the American

Association for the Advancement of Science, February 16-20, 2006.

Mathematician’s Scientific Discovery

Biology Researcher’s

Scientific Discovery

Physics Scientific Discovery

Math Databases:•Research Papers•Correspondence•Conferences

Biology Databases:•Research Papers•Correspondence•Conferences

Physics Databases:•Research Papers•Correspondence•Conferences

Global Discovery

Search Portal

Math Community

Biology Community

Physics Community

Knowledge Diffusion in Action

Grid of Grids

Each circle = a portal with 10-100 sources

End result is thousands of sources in 2

Scaling to the Next Level

Abe Lederman

122 Longview Drive

Los Alamos, NM 87544

abe@deepwebtech.com

www.deepwebtech.com

Thank You!

Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and...

Documents

03 December 2012 By Abe Lederman, CEO

Lederman portfoliopdf

Opportunities in Pharmaceutical Sales Training Lynne Lederman, PhD Freelance Medical Writer AMWA DVC 2009 Freelance Workshop © 2009 Lynne Lederman

Lederman D

Federated Search: The Good and the Bad Abe Lederman, President and CTO Deep Web Technologies, Inc. APLA 2008 - May 9, 2008

Eyal Lederman - Process Approach in PT

© 2009 Deep Web Technologies, Inc. Proprietary & Confidential Federated Search: Breaking Down the Language Barrier Abe Lederman, President and CTO Deep

Dr. Gil Lederman settlement agreement

1962 Lederman,Schwartz,Steinberger Brookhaven National Laboratory

FEDERATED MACHINE LEARNING - Applied Mathematics · Federated Averaging and FedSGD Federated Averaging (FedAvg) Shares updated parameters Federated SGD (FedSGD) Shares local gradients

Anne Lederman - cfmb.icaap.org

Toward a Federated FrameworkToward a Federated Framework ... · APAN 29th Meeting - Sensor Network Workshop Toward a Federated FrameworkToward a Federated Framework for Sensor Overlay

Lederman Leon & Teresi Dick - Boska Cząstka

Federated Government Reserves Fund · Federated Government Reserves Fund ... Federated Institutional Money Market Management Federated Institutional Prime Obligations Fund ... The

kit al 2010 - Anne Lederman

SEARCHENGINE OPTIMIZATION & SOCIALMEDIA Anna Bloch Interactive Marketing Analyst Bari Mogil Senior Interactive Marketing Specialist

Explorit Federated Search · © 2010 Deep Web Technologies, Inc. By Abe Lederman President and CTO Explorit Federated Search

Lederman 2007 PoLAR Comparative

Uniting Global Information with Federated Search Abe Lederman, President, Deep Web Technologies Dr. Rosanne Hessmiller, CEO, Ferguson-Lynch Presentation

Lederman - PI Decision