47
Building a New Content Platform in 7 Months Brian Bishop VP of Platform Development

Springer for R&D and MarkLogic

Embed Size (px)

Citation preview

Building a New Content Platform in 7 Months

Brian BishopVP of Platform Development

Springer for R&D and MarkLogic | 15.04.2023 | 2

Who is Springer?

• Leading global scientific publisher

• 6,000 employees in 25 countries

• 890 million EUR in turnover

• 2,000 journals / 7,000 new book titles published every year

• 50,000 eBooks

• Largest open access portfolio worldwide (over 300 open access journals)

Springer for R&D and MarkLogic | 15.04.2023 | 3

Springer’s history with MarkLogic

Springer for R&D and MarkLogic | 15.04.2023 | 4

2008

Springer for R&D and MarkLogic | 15.04.2023 | 5

2009

Springer for R&D and MarkLogic | 15.04.2023 | 6

2009WINNER

2009MarkLogic Innovation

Award

Springer for R&D and MarkLogic | 15.04.2023 | 7

2010

Springer for R&D and MarkLogic | 15.04.2023 | 8

MarkLogic cluster

RESTful APIs realtime.springer.com

citations.springer.com

iPhone apps

Springer for R&D and MarkLogic | 15.04.2023 | 9

What were we trying to do?

Springer for R&D and MarkLogic | 15.04.2023 | 10

The business goal

“Double the sales from Corporate market …in 5 years”

• How?• Hire more sales people

• Investment in dedicated content platform

-- Derk Haank, CEO

Springer for R&D and MarkLogic | 15.04.2023 | 11

What we used to do

Sure, we’ve got what you want.

In here somewhere.

Springer for R&D and MarkLogic | 15.04.2023 | 12

But…

Springer for R&D and MarkLogic | 15.04.2023 | 13

Professional R&D researchers are different from academic researchers

Springer for R&D and MarkLogic | 15.04.2023 | 14

R&D researchers need content categorized according to their needs

Springer for R&D and MarkLogic | 15.04.2023 | 15

They have unique collaboration…

…and security needs

Springer for R&D and MarkLogic | 15.04.2023 | 16

They must satisfy fundamentally different

stakeholders

Springer for R&D and MarkLogic | 15.04.2023 | 17

They operate in rapidly changing environments

Springer for R&D and MarkLogic | 15.04.2023 | 18

That require flexible business models to suit each customer’s situation

Springer for R&D and MarkLogic | 15.04.2023 | 19

So we built them a new site

Springer for R&D and MarkLogic | 15.04.2023 | 20

Agile development

Springer for R&D and MarkLogic | 15.04.2023 | 21

Agile process

Springer for R&D and MarkLogic | 15.04.2023 | 22

Goals are prioritized

(top to bottom) and stories

are prioritized (left to right)

Velocity is measured every week, allowing us to accurately forecast when a certain level of work can be completed

Springer for R&D and MarkLogic | 15.04.2023 | 23

Agile development should lead to a lot of this

Springer for R&D and MarkLogic | 15.04.2023 | 24

We track how much work we are doing against each goal

Springer for R&D and MarkLogic | 15.04.2023 | 25

Story #150 - Abstract page default area (Articles only) 

• As Henry, I want to see a quick summary of article information so that I can decide if the article is relevant to me without having to sift through lots of irrelevant content.

Springer for R&D and MarkLogic | 15.04.2023 | 26

• And then in a later iteration…

Springer for R&D and MarkLogic | 15.04.2023 | 27

• Still another iteration…

Springer for R&D and MarkLogic | 15.04.2023 | 28

So what did we do for our new audience?

Springer for R&D and MarkLogic | 15.04.2023 | 29

What’s specific to Corporate customers

Content organized according to the way customers see the world

Show how their peers are using the content

Extra security/ reporting

for Deposit Accounts

Springer for R&D and MarkLogic | 15.04.2023 | 30

Some cool new enhancements

We have rasterized all pages of all documents

(over 60 million pages)

Limit search results to only accessible content

Links directly to sections of HTML

Auto-suggest based on Google search terms

Springer for R&D and MarkLogic | 15.04.2023 | 31

What have we done with MarkLogic that’s cool?

• Indexed 5.6 million XML metadata files (2TB)

• Faceted search

• Transform XML on the fly

• Related documents

• Local-Disk Failover

• Customized search library

• Store Entitlements as queries

Facebook

Instagram

Co

oln

ess

fa

cto

r

Springer for R&D and MarkLogic | 15.04.2023 | 32

Search customizations

• Exact phrase match weighs a lot

• Titles weigh a lot

• Abstracts weigh some

• References are excluded

• Publication level weighs more than document level

• Full-text weighs some

• Search customized to browser language

• Future search enhancements:

• Highly cited weigh more

• Highly downloaded weigh more

Springer for R&D and MarkLogic | 15.04.2023 | 33

Content Entitlements

2TB

Storing Entitlements in MarkLogic

Customers

<material_ID=“001”> Subject : Engineering

<content> Journal_ID:0001 ContentType: Article DatePublished: 4/4/2012 Subject:Mathematics Author: John Smith Language: English Keywords: “k theory” <material_ID=“002”>

Journal_ID: 0001-0099

<material_ID=“003”> Subject: Engineering SearchTerm: “carbon nanotube” DatePublished: 2000-2012

<customer=“001”> material_ID : 001

These are stored as serialized queries

Springer for R&D and MarkLogic | 15.04.2023 | 34

Benefits of this approach

• Each query can be arbitrarily complex and completely customized

• We’ll come back to this in a second

• The stored queries automatically select any new content that matches the query's criteria

• Every day we insert thousands of new documents

• These documents are immediately available on the site

• And immediately included in the query, so customers have access

• Many materials, potentially tens of thousands, can be associated with a user

• all looked up and combined into a single request

• on the fly

Springer for R&D and MarkLogic | 15.04.2023 | 35

What if every customer created their own package?

Springer for R&D and MarkLogic | 15.04.2023 | 36

What if you could subscribe to a search query?

Springer for R&D and MarkLogic | 15.04.2023 | 37

Take for example

Springer for R&D and MarkLogic | 15.04.2023 | 38

Scout Diagnostics are interested in:

• Cerebrospinal fluid

• Alzheimer’s disease

• Peptides

• But:

– “Published:1990-2012”

– AND Only in “Subject: biomedical”

Springer for R&D and MarkLogic | 15.04.2023 | 40

Custom query model

• Do a search for what you want

• Facet to your heart’s content

• Your price is determined by the number of documents

• Database licensing model

• Access, not ownership

• You automatically get access to new documents that match your query

• At the end of a year we see how much the package has grown

• …and re-negotiate

Springer for R&D and MarkLogic | 15.04.2023 | 41

Suggested marketing

Custom Query package!

Springer for R&D and MarkLogic | 15.04.2023 | 42

Enough silliness.

Springer for R&D and MarkLogic | 15.04.2023 | 43

Springer for R&D

• Conceived: May 1, 2011

• Born: Nov. 15, 2011

• Weight: 11TB

• 2 TB XML

• 3.5 TB PDF

• 5.5 TB Images

• Proud parents:

Springer for R&D and MarkLogic | 15.04.2023 | 44

Performance

Springer for R&D and MarkLogic | 15.04.2023 | 45

…and we should STAY fast at scale

Springer for R&D and MarkLogic | 15.04.2023 | 46

Summary

• Springer needed to:

• Build a content platform from scratch

• Dedicated to a particular market segment

• MarkLogic allowed us to:

• Leverage our significant XML assets

• Increase development speed by having database perform more heavy lifting

• Solve a difficult technical problem related to granting access

• Offer completely new business models tailored to our market

• Have a highly performant solution that operated at scale

Thank you.

Brian Bishop, VP of Platform Development

[email protected]

@mochasteak