21
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges for Open Access Repositories Univ. of Glasgow, 18-20 October 2006 Friedrich Summann Bielefeld University Library

Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Embed Size (px)

Citation preview

Page 1: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

Bielefeld Academic Search Enginea Scientific Search Service

for Institutional Repositories

Open Scholarship 2006New Challenges for Open Access RepositoriesUniv. of Glasgow, 18-20 October 2006

Friedrich SummannBielefeld University Library

Page 2: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

BASE: concept and contentOverview BASE user-interface and further visionsBASE dataflowOAI harvesting challengesBASE interfacesDemo

Overview:

Page 3: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

BASE uses Fast Data Search BASE uses Linux-based multi-node systemBASE contains intellectual selected resources with focus on OAI Servers but also web crawled contentBASE displays result lists as bibliographic data and full text hitsBASE frontend is written in PHP using the search API from Fast Data SearchBASE offers sorting, search refinement and search history

BASE: concept and content

http://www.base-search.net

Page 4: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

Search API

Pipeline

QU

ERY &

RESU

LTPR

OC

ESSINGDO

CU

MEN

TPR

OC

ESSING

Pipeline

Pipeline

FILETRAVERSER

FILTER

SEARCH

INDEXFILES

CO

NN

ECTO

RS

TUNING, ADMINISTRATION and DEBUGGING

WEBCRAWLER

BASE: concept and content

Page 5: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

BASE: concept and content At present 3.8 mio documents in 274 collections,

15 of them web crawled data

Page 6: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

Projekt Gutenberg-DE

Internet Library of Early Journals Oxford

Various Institutional Repositories

Springer Link Metadata

Cornell HistMath Fulltext Crawl

University Michigan Historical Math

CiteSeer Zentralblatt Mathematik

Bielefeld Univ: Math. Preprints

ArXiv OPAC UL Bielefeld

Ifo Institute Munich

PubMed Journals of Enlightment

(Digital Collection of Bielefeld UL)

BASE: concept and content

Page 7: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6Special view on IR server collections

Collections are listed in configuration file [ftubirmingham]

url = "http://eprints.bham.ac.uk/"desc_de = "The Univ. of Birmingham: Eprints Archive"desc_en = "The Univ. of Birmingham: Eprints Archive"descdd_de = "Birmingham Univ."descdd_en = "Birmingham Univ."

Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], …

Parametric search possible

Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)

Page 8: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6BASE: end-user interface (1)

Displays search results as

bibliographic data and full text hits

Page 9: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6BASE: end-user interface (2)

The result list (left hand

side)

If the document

contains meta

data (e.g. title,

author, abstract)

the displayed

description is

highlighted

Page 10: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6BASE: end-user interface (3)

• Various options to sort

the result set• Search refinement by

author, keyword,

document type,

language etc.• Search history

comprises up to 10

queries

The result list (right hand side)

Page 11: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6BASE: end-user interface (4)

Search RefinementSelect an author ...

... only documents by this author are displayed

Page 12: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

Check citations (citing articles) in Google

Scholar ...

Google Scholar integration

Page 13: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6Vision: DDC Browsing

Page 14: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

OAI-Data Web PagesDatabaseRecords

Harvesting Pre-Processing

Processing

Internal Index (FAST)

User interface (PHP)

BASE dataflowBASE dataflow

Page 15: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

2

1612

12

55

176

39

4

2

18

17 3

3

USA 82Canada 14South America 2Africa 3 India 5Australia 11New Zealand 1

OAI-compliant university repositories in BASEOAI-compliant university repositories in BASE

3

1

1

Page 16: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6OAI harvesting challengesOAI harvesting challenges

Repositories do not response or deliver Error Messages

Data contain only References without any Fulltext

Links to the Document are not included or do not work

Access to fulltext often is restricted

XML file is not well-formed

Field content varies

Page 17: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6Some Rules from the Harvesting PracticeSome Rules from the Harvesting Practice

Standard repository software is great - for OAI harvesting as well

Small collections – small problems

Getting the related fulltext is complicated

Libraries produce better metadata

Data aggregation may produce problems

Writing e-mails helps - sometimes

Page 18: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6BASE interfacesBASE interfaces

Search form

HTTP calls

Web Service

Page 19: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

<form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /></form>

Local integration (via search form)Local integration (via search form)

E-Repository Integration

Page 20: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6

Prototype: Search Based on SOAP interface(EU project DRIVER)

Page 21: Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges

Op

en

Sch

ola

rsh

ip 2

00

6 Thank you!