26
Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference “Making search work” track

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

Building an easy to use search solution

(for different languages)

Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference !

“Making search work” track

Page 2: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Speaker

•Co-owner of Netgen - web development agency, Zagreb, Croatia

•Started as developer 11 years ago

•Now I do variety of things, but can be best described as International Business Developer

Page 3: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

So I am still a developer! :)

Page 4: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Use case

•Regulatory reform project: cutting of unneeded legislative, laws and/or procedures

•Netgen is the technology implementation partner

•Project lead by Sense Consulting

•Croatia, Egypt, Vietnam, Armenia, Iraq - mostly “exotic” countries

Page 5: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

We would rather work in Denmark, but seems that it doesn’t need such a solution :(

Page 6: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

How we use search

Page 7: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Solution

•In 2006. simple filter

•Today eZ Publish CMS powered flexible information architecture with Solr for search

•Usually 70% common features, 30% customisation

•Aiming for 90%/10%

•If you interested in tech specifics ask me later…

Page 8: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Search features

• Simple (default) and advanced search (with filters)

• Full text search on complex data, boosting on attribute level

• Filtering with multilevel tags/taxonomies

• Stopwords

• Search time spelling based on indexed data

• Sometimes using faceting on result set

Page 9: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Additional features

•Sometimes using multi search

•Typing suggestions

•Latest search phrase list

Page 10: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

Challenges

Page 11: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Characters

•At the beginning we didn’t have Unicode - it was a mess!

•Unicode solved a lot of problems but not all

•Same characters can have more byte codes which is not being normalised by default

Page 12: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Indexing

•Indexing files like Word, PDF or similar proved to be problematic due to character problems

•token delimiter configuration could be language specific

•stemming sometimes supported, sometimes not

Page 13: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Searching

•search phrase input problems

Page 14: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Blind work

•the biggest challenge is that developers don’t know the language

•first level of testing is very hard

•still can’t trust Google Translate

Page 15: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

What vehicle would you use to transport 10 cases of Heineken?

Page 16: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference
Page 17: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

How to overcome this?

Page 18: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Main idea

•lets try to assess search result quality

•use editors for rating (not the public)

•use most frequently searched terms (we can’t test all)

•rate results above the fold

Page 19: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

The tool

•integrated in the public site

•added thumbs up/down buttons for first X results and only shown to editors

Page 20: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Demo

•imported articles to test instance form various sources about CMS topic

•rating result quality of 7 search terms

•Thumbs up/down for suggested 3 search results

•Test periods are used for framing test data

Page 21: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

Rating side

Page 22: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

Analysing side

Page 23: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Rate measures

•Discounted Cumulative Gain (DCG) - rate sum discounted based on position in search results

•Normalised Discounted Cumulative Gain (NDCG) - discounted rate sum normalised against best possible outcome (to get percentage as the unit)

•Popularity based NDCG - takes into account the popularity of the search form

http://en.wikipedia.org/wiki/Discounted_cumulative_gain

Page 24: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Known problems

•What if good results are not showing? - something bad is going on with the search engine

•what if there is no good result?

•what about new content added in time?

•at the end of the day measurements are good for comparing between test periods, not meaningful by itself

Page 25: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

www.netgenlabs.com

Improvements

•opening rating to public users

•using clicks as rates

•implement “did you find what you have looking for?” feature

•integrate with analytics

•use rate data to boost particular item in search!

Page 26: Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

Questions now or later

[email protected] ilukac.com/twitter ilukac.com/facebook ilukac.com/gplus ilukac.com/linkedin