16
Metabuscadores Fabricio Echeverría [email protected] Joseph Brodsky

Meta Buscadores

Embed Size (px)

Citation preview

Page 1: Meta Buscadores

Metabuscadores

Fabricio Echeverría

[email protected]

Joseph Brodsky

Page 2: Meta Buscadores

Agenda

• Índices de palabras

• Web Search Engine

• Retrieval Information Systems

• Metabuscadores

• Preguntas

Page 3: Meta Buscadores

En busca de la memoria dinámica extendida

Page 4: Meta Buscadores

Índice de Palabras: Onomástica de los nombres en Catalán

Page 5: Meta Buscadores

Web Search Engine

• Lenguaje de programación: Python

• Manejo de Alta RAM

• Almacenamiento Compartido

• Procesamiento en Paralelo

Page 7: Meta Buscadores

Código Python – Web Search Engine

def crawl_web(seed): # returns index, graph of inlinks

tocrawl = [seed]crawled = []graph = {} # <url>, [list of pages it links to]index = {} while tocrawl:

page = tocrawl.pop()if page not in crawled:

content = get_page(page)add_page_to_index(index, page, content)outlinks = get_all_links(content)graph[page] = outlinksunion(tocrawl, outlinks)crawled.append(page)

return index, graph

def get_next_target(page):start_link = page.find('<a href=')if start_link == -1:

return None, 0start_quote = page.find('"', start_link)end_quote = page.find('"', start_quote + 1)url = page[start_quote + 1:end_quote]return url, end_quote

def get_all_links(page):links = []while True:

url, endpos = get_next_target(page)if url:

links.append(url)page = page[endpos:]

else:break

return links

def union(a, b):for e in b:

if e not in a:a.append(e)

def add_page_to_index(index, url, content):words = content.split()pos=0for word in words:

pos=content.find(word, pos)add_to_index(index, word, url,pos)

def add_to_index(index, keyword, url,pos):if keyword in index:

index[keyword].append([url,pos])else:

index[keyword] = [[url,pos]]

def lookup(index, keyword):if keyword in index:

return index[keyword]else:

return None

cache = {'http://www.udacity.com/cs101x/final/multi.html': """<html>

<body>

<a href="http://www.udacity.com/cs101x/final/a.html">A</a><br><a href="http://www.udacity.com/cs101x/final/b.html">B</a><br>

</body>""",

'http://www.udacity.com/cs101x/final/b.html': """<html><body>

Monty likes the Python programming languageThomas Jefferson founded the University of VirginiaWhen Mandela was in London, he visited Nelson's Column.

</body></html>""",

'http://www.udacity.com/cs101x/final/a.html': """<html><body>

Monty Python is not about a programming languageUdacity was not founded by Thomas JeffersonNelson Mandela said "Education is the most powerful weaponwhich you canuse to change the world."</body></html>""", }

def get_page(url):if url in cache:

return cache[url]else:

print "Page not in cache: " + urlreturn None

http://www.udacity.com/cs101

Page 8: Meta Buscadores

Information Retrieval Systems

Page 9: Meta Buscadores

Metabuscadores

• Es la unión de búsquedas(query) en varios buscadores(Search Engine) – Índices de Búsquedas -

Page 10: Meta Buscadores

http://dg3rtljvitrle.cloudfront.net/slides/chap10.pdf

Page 11: Meta Buscadores

http://dg3rtljvitrle.cloudfront.net/slides/chap10.pdf

Page 12: Meta Buscadores
Page 13: Meta Buscadores
Page 14: Meta Buscadores
Page 15: Meta Buscadores
Page 16: Meta Buscadores