Upload
vuonghuong
View
221
Download
5
Embed Size (px)
Citation preview
Uncensorable,Untraceable
Search Enginesfor
Freedom of Information
Michael Christen, [email protected]
Campus Party 2012, Berlin
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
Abstract
Search portals in the web are vital decision tools for knowledge and cultural values of people. Free content should be accessible with free search. Instead of going through a centralized server that acts as a gatekeeper, keeps logs of your searches and directs you to selected information, your own self-made search engine can deliver information with no censorship, and no tracking.
In this talk, search use-cases like a project search, file search (with attached downloader), faceted search with user-defined categories, social search and peer-to-peer search are explained and demonstrated. You will be familiarized with search engine technology in general and different software modules which can be used to create amazing search portals with unusual but useful functions in just some minutes.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Human Rights
Knowledge is free
Access must be freefor everyone
Privacy is a human right
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Human Rights statement from United Nations
UNO World Summit 2003 on the Information Society:CHARTER OF CIVIL RIGHTS FOR A SUSTAINABLE KNOWLEDGE SOCIETY
(a) Knowledge is the heritage and the property of humanity and is thus free.
(b) Access to knowledge must be free.(c) Everyone has an unlimited right of access to the
documents of public and publicily controlled bodies.(d) The right to privacy is a human right and is
essential for free and self-determined human development in the knowledge society.from: http://www.worldsummit2003.de/en/web/375.htm
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
can trace your behaviour
danger of censoring, blocking, spamming
they own your data
Centralized Search Portals
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
User needs proprietary and centralized software to discover free content
as it is today:
proprietary & centralized, it traces you and data can be censored, blocked, removed,
spammed
u.a.:
free Software
Data unter Creative Commons License
Open Access Archive
free Data Search User
free information can only be truly free if it can be accessed with free search
Access to Information bridge between data and user
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
free Data Search
In a specific community people share the same relevancy criteria.
free information can only be truly free if it can be accessed with free search
Access to Information bridge between data and user
Community
Ordering
User
RelevancyRanking
Ranking influences standards and opinions within a community!
Centralized Search Engines have a cultural impact on communities!
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Your Own Search Engine
Freedom
Independence
Privacy
...of Information: no data access limits, no censoring, no filtering, no user observation, no content spamming, your ranking
...from Centralized Search Portals: collect your own search index and search in a special way as needed for the content.
...you are the search engine operator: nobody can trace you!
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Requirements for a „homebrew“ search engine
AvailableThe software must be free.
HackableAPIs and transparency.
Software ModulesSearch Technology
Examplesfor use cases and possibilities.
DemoA ,Hello World‘ - search engine is a good startpoint to hack.
EasyEveryone must be able to install and operate the software
KnowledgeLearn how the search engine components work.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
your ownsearch portal
projects+communities
share knowledge
search forfiles
(ftp/smb)...with
downloader?
data protection & sanctuaries
persecuted content
torrents etc.
social search
shareyour search experience
topic-oriented (news-) feeds
federated searchyour intelligence
service
distributed searchshare
your search index
Examples for use cases and possibilities.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
search server
web interface
Knowledge how search engine components work
I
crawlerrobots balancer queues
parser pdf
html rss zipxls
doc
eml
network interfacesfile http ftp smb oai-pmh
apiopensearch gsa solr
monitoring
administration/ steering
I/O requests Disk/RAM
search index
facetsschema
moderationranking
document cache
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
search server
Knowledge how search engine components work
crawler
parser
network interfaces
api
monitoring
administration/ steering
search index
document cache
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Knowledge how search engine components work
Easy3-minute installationjust decompress and start
Availableall parts are free softwarehttp://yacy.nethttp://lucene.apache.org/solr/
Hackablelots of APIs, many standards
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Knowledge how search engine components work
Demo:• curl -OL „http://archive.apache.org/dist/lucene/solr/3.6.1/apache-solr-3.6.1.tgz“
• tar xfz apache-solr-3.6.1.tgz• cd apache-solr-3.6.1/example/
• java -jar start.jar
• open http://localhost:8983/solr/admin/
• curl 'http://localhost:8983/solr/update/json?commit=true' -H 'Content-type:application/json' -d '{"add":{"doc":{"id":"data1", "title":"Hello World"}}}'
• curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @exampledocs/books.json -H 'Content-type:application/json'
• curl 'http://localhost:8983/solr/select/?q=*%3A*'
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Knowledge how search engine components work
Demo:• curl -OL „http://yacy.net/release/yacy_v1.04_20120709_9000.tar.gz“
• tar xfz yacy_v1.04_20120709_9000.tar.gz• cd yacy
• ./startYACY.sh
• open http://localhost:8090
• solr search interface is athttp://localhost:8090/solr/select?q=*:*&start=0&rows=10
• start a web crawl at• http://localhost:8090/CrawlStartSite_p.html
SearchEngine
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
your ownsearch portal
projects+communities
share knowledge
DocumentsCreate and Share Produce
(micro)Blogging
Discussion
Project Steering
Bugtracker
Version Control
search engine
Demo:• Make a federated search portal for:
gnu.org, fsfe.org, campus-party.eu• Add a FTP video archive from
ftp://dewy.fem.tu-ilmenau.de/CCC/
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
search forfiles
(ftp/smb)...with
downloader?
Demo:• Choose „File Search“ or
http://localhost:8090/yacyinteractive.html• After searching, click
„create a download script“• copy-paste the result to your terminal
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
data protection & sanctuaries
persecuted content
torrents etc.
Demo:• Do an indexing of thepiratebay using the
sitemap provided by their robots.txt• Use
http://localhost:8090/CrawlStartSite_p.htmland check the ,Sitemap URL‘ option.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
topic-oriented (news-) feeds
federated searchyour intelligence
service
Demo:• Feed YaCy with rss feeds at
http://localhost:8090/Load_RSS_p.html• Activate the scheduler to do this frequently• Do a web search and add /date to the query
to order by date• change the page to rss format by replacing
the html extension of the result page with rss
• read the search result page with your rss reader
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
distributed searchshare
your search index
Peer-to-PeerShared Search Index
YaCy has an integrated Peer-to-Peer protocol to connect to other YaCy users.
But how can this scale? How are peer connected?
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
SearchEngine
SearchEngine
Search Engine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
horizontal scaling: more documentsvert
ical
sca
ling:
mor
e pe
rfor
man
ce
Search Engine Cluster
distributed searchshare
your search index
A Search Engine Cluster consist of independent search engines in the form of a search matrix.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
SearchEngine
SearchEngine
Search Engine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
distributed searchshare
your search index
We want to take the search matrix out of the data center to your home.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
SearchEngine
SearchEngine
Search Engine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
SearchEngine
Peer Peer Peer Peer Peer
Peer Peer Peer Peer Peer
Peer Peer Peer Peer Peer
distributed searchshare
your search index
The distributed search matrix in your home is connected using a peer-to-peer protocol.
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
Peer
PeerPeer
Peer
Peer
Peer
Peer
Peer Peer
Peer
Peer
Peer
Peer
Peer
Peer
PeerDHT-Store DHT-Read
Crawl the web, create a web index, distribute
the index
Search in aDistributed Hash Table
The YaCy Search Engine Cluster consist of independent search engines, but they are connected in an efficient way using a distributed hash table.
DHTDistributed Hash Table
distributed searchshare
your search index
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
Everyone can join the network. Nobody can censor the search index.distributed
searchshare
your search index
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
social search
shareyour search experience
Peer-to-PeerShared Search Index
Peer-to-PeerShared Search Experience
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
Knowledge how search engine components work
Demo:• read http://seeks-project.info/wiki/index.php/Download#Download• or just build seeks yourself:
> git clone git://seeks.git.sourceforge.net/gitroot/seeks/seeks> cd seeks> ./autogen.sh> ./configure LDFLAGS="-Wl,--no-as-needed" --disable-opencv> make> cd src && ./seeks
• attach YaCy: use opensearch interface fromhttp://localhost:8090/yacysearch.rss?query=%query
• in seeks/src/plugins/websearch/websearch-config add the linesearch-engine opensearch_rss http://localhost:8090/yacysearch.rss?query=%query yacy default
• set seeks as your web proxy at port 8250
• open your browser at http://s.s/websearch-hp
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
SRU
FacetsFile Types, Protocols,
Domains, Authorsuser-generated
ontologies
every link is verifiedbefore it is displayed: the content is loaded,
parsed and used for a search snippet generation
Opensearch (search results with RSS), JSON, AJAX toolsAPIssearch widget, ready-to-use code snippets to embed search everywhereTools
Standards
APIs in Search Interface - Opensearch, SRU
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine> curl http://localhost:8080/yacysearch.rss?query=foaf&maximumRecords=10<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type='text/xsl' href='/yacysearch.xsl' version='1.0'?><rss version="2.0" xmlns:yacy="http://www.yacy.net/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"<!-- very short example --><item> <title>Friend of a Friend (FOAF) project</title> <link>http://www.foaf-project.org/</link> <pubDate>Fri, 23 May 2008 02:00:00 +0200</pubDate></item><item> <title>FOAF - Wikipedia</title> <link>http://de.wikipedia.org/wiki/FOAF</link> <pubDate>Tue, 08 Jan 2008 01:00:00 +0100</pubDate></item><item> <link>http://microformats.org/wiki/xfn-to-foaf</link> <pubDate>Fri, 09 May 2008 02:00:00 +0200</pubDate></item></rss>
How to get Opensearch/JSON Search Results:• do a normal web search in YaCy• replace the ‘html‘ extension of
the result page URL with ‘rss‘• for json, replace the ‘html‘
extension with ‘json‘
SRU Standard for Queries: http://www.loc.gov/standards/sru/specs/search-retrieve.htmlOpensearch Standard: http://www.opensearch.org
APIs in Search Interface - Opensearch
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine<iframe name="target2" src="http://141.52.175.43:8080/yacysearch.html?display=2&resource=local" width="100%" height="180" frameborder="0" scrolling="auto" id="target2"</iframe>
<form method="get" accept-charset="UTF-8" action="http://141.52.175.43:8080/yacysearch.html"> <div> <div>MySearch</div> <input type="text" name="query" value="" maxlength="80" /> <input type="hidden" name="verify" value="true" /> <input type="hidden" name="maximumRecords" value="10" /> <input type="hidden" name="meanCount" value="5" /> <input type="hidden" name="resource" value="local" /> <input type="hidden" name="urlmaskfilter" value=".*" /> <input type="hidden" name="prefermaskfilter" value="" /> <input type="hidden" name="display" value="2" /> <input type="hidden" name="nav" value="all" /> <input type="submit" name="Enter" value="Search" /> </div></form>
How to integrate a YaCy Search Portal:Just copy-paste the code snippet to your web page source code.
Code Snippet Example #1: a search window in an iframe
Code Snippet Example #2: a search box (points to new page)Code Snippet #2 looks like:
The YaCy administration interface offers more code snippets. An example from/ConfigSearchBox.htmllooks like:
your YaCy peer provides help pages with code snippets for an easy integration!
Search Interface Integration
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine
<?xml version="1.0" encoding="utf-8"?><!-- YaCy surrogate using dublin core notion --><surrogates xmlns:dc="http://purl.org/dc/elements/1.1/">
<record> <dc:title><![CDATA[Alan Smithee]]></dc:title> <dc:identifier>http://de.wikipedia.org/wiki/Alan_Smithee</dc:identifier> <dc:description> <![CDATA['''Alan Smithee''' ist ein Anagramm von „The Alias Men“.]]> </dc:description> <dc:language>de</dc:language> <dc:date>2009-04-14T00:00:00Z</dc:date> <!-- date is in ISO 8601 --> </record> </surrogates>
Standards:YaCy can import standard Dublin Core Metadata XML files as input for indexing
How to import Dublin Core Files:just place the xml files into a hand-over directory at DATA/SURROGATES/in/
The Dublin Core XML File Standard:http://dublincore.org/documents/dc-xml-guidelines/
APIs in Harvesting: Dublin Core Dump Import
Michael [email protected], http://yacy.net
Uncensorable, Untraceable Search Engines for Freedom of InformationTalk at Campus Party 2012 Berlin - http://www.campus-party.eu/2012/
SearchEngine1. Access to knowledge and the right to privacy is a human right. Communites need their own ranking.Centralized search engines are not sufficient to provide this right to everyone. We need decentralized systems.
2. We demonstrated search use cases that are unmatched with current search portal providersFree content need more appropriate search technology for such content.
3. We explained how search technology works in generalThis was just the icetip. There is a lot more to know.
4. We demonstrated search tools which are easy, available and hackable: Solr, YaCy and SeeksFor each tool you find a short tutorial inside this slides.
5. Please support the idea of free search and the projectsPlease help, test the software, ask questions, tell other people and help hacking!
Summary
SearchEngine
Thank You for ListeningDipl. Inf. Michael Christen,[email protected]://yacy.net
QR-Code: vCard
Downloadhttp://yacy.nethttp://latest.yacy.net
Discussionhttp://forum.yacy.de
Newshttp://twitter.com/#!/yacy_searchhttp://blog.yacy.dehttp://blog.yacy-kochbuch.de
Documentationhttp://wiki.yacy.nethttp://yacy-kochbuch.de
Bugshttp://bugs.yacy.net
Developmenthttps://gitorious.org/yacy
all images are (CC0),many are from http://openclipart.org