19
CASE STUDY: SPAREBANK1 GRUPPEN Sébastien Muller

Enterprise Search Case Study: SpareBank1 Gruppen

Embed Size (px)

Citation preview

Customer Requirement

”Better portal search”

Project background

•SpareBank1 Gruppen

• 19  individual  bank  portals  and  1  forside

•Boost 25 umbrella project

• ”Seman7c”  URLs:

h>ps://www2.sparebank1.no/9898/3_privat?_nfpb=true&_nfls=false&_pageLabel=page_privat_innhold&pId=1233149354625&_

• New  search  GUI

•CMS with no easy way of telling which bank has published what

• Mass  duplica7ons

• Access  to  other  portal  specific  ar7cles

• Webcrawlers

What is better search?

At the very least :

•Relevant hits

•Facetting

•Query completion

•Spelling check and suggestions

•Basic search analytics

Relevant hits

• Relevancy = ”.. The quality of results returned from

a query...”

• Based  on  hits  in  fields  generated  from  document  processing

• Clean and meta-data rich index

• Pushed  from  CMS  or  extracted  by  crawlers

Relevant hits

Relevant hits

Crawling and Indexing

•Clean and meta-data rich index

•OpenPipeline

• Ignore  irrelevant  ar7cles

• Extract  ar7cle  text  contents

• Detect  duplicates

• Facet  data

• Populate  index  fields  including  *_qc  and  *_sp  fields

Crawling and Indexing

• Crawlers will be as smart as you make them

• Very  rigid  logic

• Heavily  reliant  on  ar7cle  quality

• Don’t  blame  the  crawler

https://www2.sparebank1.no/portal/4702/3_privat?_nfpb=true&_n!s=false&_pageLabel=page_privat_innhold&pId=1233149354625&_n!s=false

https://www2.sparebank1.no/portal/9898/3_privat?_nfpb=true&_n!s=false&_pageLabel=page_privat_innhold&pId=1233149354625&_n!s=false

Relevant hits

Scoring model<bean id="qf" class="com."ndwise.jelly"sh.solr.querymodi"er.dismax.StaticQueryFieldSetter">

<property name="queryFields">

<list value-type="java.lang.String">

<value>keyword^4</value>

<value>content1^8</value>

<value>content2^3</value>

<value>content3^2</value>

<value>stem1^1.5</value>

<value>stem2^1.2</value>

<value>stem3</value>

</list>

</property>

</bean>

Relevant hits

•Spell checker

• Request  handler  for  each  bank

• Index  based

•Stop  words

Result

Result

System Architecture

•Solr is incredibly !exible

• Master/slave

•Security constraints

• Search  services  available  publicly

• Search  analy7cs  available  internally  but  limited

• Indexing  

System Architecture

System Architecture

System Architecture

Quality Assurance

•Crawler friendly content modi"cations

• Edit

• Delete

• Add

• Share

• Risk  analyse  etc

Lessons Learnt

•Scope creep

•Garbage in, garbage out

•Documentation is only useful if it gets read