34
2nd Annual Review Meeting, Luxembourg, 2015-11-25 Andrea Volpini, Marcello Colacino Work Package 8: Video News Showcase

New Thinking in the Practice of Digital Journalism

Embed Size (px)

Citation preview

2nd Annual Review Meeting, Luxembourg, 2015-11-25

Andrea Volpini, Marcello ColacinoWork Package 8: Video News Showcase

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Hello, I am: @cyberandy

This presentation is about:

@mico_project

New thinking in the practice of digital journalism

Quaddles Roost deviantart.net © Quaddles 2011

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Support and stimulate innovation in Digital News Journalism

Mission statement: What is this showcase really about?

The massive amount of content being produced inside and outside the newsroom needs to be organised and curated to meet the evolving demands of the audience. Content offerings shall become seamlessly accessible across different devices. The cross­-media analysis, querying and recommendation functionalities provided by MICO can play a crucial role for both readers and content creators.

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Challenges: What really is Digital News Journalism?

● Originated in UK in 1970, the 1st type of digital journalism was called teletext: it was brief and instant.

● Throwing news online without context and analysis simply doesn’t work and the focus for digital news is on:

Interactivity Engagement Community

here is how the Guardian (@pilhofer) has re-organised the Newsroom

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

“If our role as journalists is to help communities better organize their knowledge and themselves, then it is apparent that we are in the service business and that we must draw on many tools, including content, and place value on the relationships we build with members of our communities, which will also take many forms. Thus we are in the relationship business.” -- Jeff Jarvis

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Organizing Knowledge

Interactivity Engagement Community

What Who

WhereWhen

● Making editorial decisions about what connects to what and why

● Providing a context and analysis of the facts

● Invite others to explore and discover

● Publish interoperable datasets that follows the linked data principles

This slide is deliberately inspired by “Thinking Outside the Search Box”: http://www.slideshare.net/TheMediaConsortium/aron-pilhofer

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Showcase Description: Innovating in Digital News Journalism with MICO

Our goal is to augment and extend the existing publishing workflows of news organizations. MICO ‘unveils’ the hidden semantics of raw multimedia content and helps:

● organizing knowledge internally (news publisher) and externally (stakeholders)

● reducing the complexity of content management operations ● extending readers dwell time with repurposing matching

content

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Let’s move to our real-world use cases: Greenpeace and Shoof

Greenpeace Italy has built a magazine website (magazine.greenpeace.it) to increase supporters retention and loyalty.

Insideout Today is developing Shoof a micro-video recording app for the people living in Cairo.

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Traditional Content PublishingMedia is managed using different tools and published over different platforms. Editorial teams struggle to reuse their vast archives.Readers, when approaching a single asset online, might have hard time in finding the context for that asset: -when was it produced?-where? -what was the story behind it?

TEXT

IMG

VIDEO

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Content Publishing with MICO

NAMED ENTITY

By extracting structured content (using Named Entity Recognition) from textual contents and videos at a fine grained level MICO helps classification and cross-media publishing.

TEXTVIDEOIMG

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

MICO in the Linked Open Data Cloud

MICO uses Linked Open Data as a support for content editing and content curation.

MICO allows content publishers to create and publish their metadata as Linked Open Data (Knowledge Graph) hence contributing in organizing public knowledge.

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

MICO and semantic tagging in general requires the publisher to make editorial decisions.

This has created for Greenpeace Italy a new level of self-awareness of the relationship between the organization, the concepts they use to classify their contents and their audiences.

Building Publisher Awareness What are the concepts we care the most? Why?

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Ordering KnowledgeContent classification enables content discovery and creates alternative entry-points and routes for contentsMICO brings to the reader cross-medial semantic classifications enabling queries like:● All contents related to Gazprom● All contents related to Gazprom

mentioning Putin● All contents related to renewables

industry sector in Italy● All contents related to fishery

interesting for a given user or for a cluster of users

What Where Who When

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

WordLift is a WordPress plugin to organise posts and pages adding facts, text and media.

WordLift analyses articles using Named Entity Recognition (NER).

WordLift adds semantic annotations and combines information publicly available as linked open data to support editorial workflows.

Platform Components 1/3

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

HelixWare is an Online Video Hosting Platform designed for telcos, internet service providers, enterprises, news and media publishers and developers.

HelixWare also features a WordPress plugin for integrating online videos with the world famous open source CMS.HelixWare runs in the cloud.

Platform Components 2/3

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

An Hyperlocal Video Recording App for Messaging and Independent News Gathering designed for the people living in Cairo, Egypt. Shoof works in conjunction with HelixWare and pulls all recorded videos into the cloud making them accessible across multiple screens.Shoof captures the pulsing moments of a city sharing them over the cloud.

Platform Components 3/3

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

NEVINELives in Cairo, Egypt very active on Social Networks and willing to contribute to news making process as field journalist. Always in search of fresh updates and social engagement.

SIRAWorks in Rome, Italy as marketer for an environmental NGO and is responsible for supporters retention. She needs to revamp the NGO magazine turning the digital version in a tool to enhance supporters retention and loyalty.

Showcase Personas

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● Greenpeace is a vertical content publisher very focused on multimedia content production about environmental issues

● Greenpeace does not accept money from either companies or governments. Individual contributions are the only source of funding

● Greenpeace in Italy has more than 70.000 supporters and communicates with them through the “Greenpeace News”, a traditional papery magazine

● “Greenpeace News” was in the past an important acquisition tool: many one-off donations came from it. Now it’s “just” a communication tool and a cost

Greenpeace: Background

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● A test environment has been created to test integrations with MICO. ● “Greenpeace News” magazine was officially launched in april of 2015.

Issues #116, #117, #118 are live, while the #119 one is close to be live.

● We started to collect real usage data for MICO. These data were used for WP5 validation.

● Wordlift & MICO were introduced in production: the editorial staff started to enhance Greenpeace contents through semantic content enrichment and classification.

● A first release of the Greenpeace “knowledge graph” is live and published on Apache Marmotta as linked data.

● WordLift, Helixware and MICO have been connected via a JAVA gateway (now available as open source) and AMQP - this represents the basis of our integration infrastructure that others can re-use.

Greenpeace: State Of Art

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

1st Qtr 2nd Qtr 3rd Qtr 4rd Qtr

MICO / GP test environment is

configured

Greenpeace News issue #117 is live

Greenpeace editorial staff starts

to work with semantic

classification on the magazine

contents

Greenpeace News issue #119 is

planned within november

Greenpeace: 2015 Timeline

The Greenpeace News Magazine is officially launched

with issue #116

Greenpeace News #118 is live together

with WordLift + MICO

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Greenpeace: Architecture

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● Working with WordLift & MICO helped the organization thinking through its role and identity. Important editorial decisions had to be taken for creating their first vocabulary.

● The adoption of WordLift & MICO in production it’s still way too recent to evaluate the overall solution benefits in terms of suggested KPI - Pages / Visits, Engagement Rate, Direct donations, Organic traffic, Donors upgrade, Donors engagement, Donors loyalty and Prospects.

● However the semantic classification provided by WordLift already offers meaningful content discovery opportunities and represent a clear enhancement in the user experience.

● Test integrations with Heliware & MICO show that by annotating videos using automatic speech recognition and entity detection these opportunities becomes even more valuable.

Greenpeace: Insights

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● independent news agencies struggle to find their audience on the web● the shift to mobile requires a complete new information architecture

and “mobile first approaches”● video is key for sustaining organic growth and advertising revenues ● creating premium and attention-grabbing news contents is expensive● metrics are changing as advertisers seek engagement and time rather

than just clicks and impressions● next generation UGC enter the newsroom but they are hard to manage

for the small guys

Shoof: Background

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● The application was launched in closed beta with a selected numbers of users in May this year

● Several independent media partners have been pro-actively involved in the project since the very beginning

● Videos produced from Shoof are sent to HelixWare and from HelixWare to MICO for content filtering and analysis

● Videos produced with Shoof can be re-used by news sites, bloggers and websites using the HelixWare WordPress plugin

● This plugin has been extended to interface the MICO gateway and to bring the power of semantic annotations in the hands of editors, journalists and content curators

Shoof: State Of Art

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Shoof: Architecture

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● Concerns about user privacy are high; the technology needs to help news organizations use UGC materials more carefully (this is why face detection greatly helps)

● Content ownership is also an important issue (audio-tempering might help here understanding what is original and what is not)

● Launching Shoof requires a massive labor force for content filtering and moderation (and yes, this is an area where MICO is needed)

● A first end-to-end integration has shown:○ how MICO can help create chapters of Shoof videos (this makes it

quicker for editors and for viewers to review the material)○ how videos can be re-organised with face detection (adding the

person tag is just one way of exploiting this functionality)

Shoof: Insights

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● MICO TE-204 (Face Detection) outperformed BetaFaceAPI for in-front faces detection and proved to be immediately useful in our showcase.

● Despite its higher WER (Word Error Rate), MICO TE-214 (Automatic Speech Recognition) when combined with Named Entity Recognition proved to be already useful for interlinking audiovisual contents with text articles as required in our showcase.

● A complete integration workflow with MICO TE-206 (Temporal Video Segmentation) and HelixWare & Shoof has been implemented providing video chapters to WordPress users.

Validation Test: Good News

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

● The lack of language models for Italian and Arabic is a limitation that shall be removed.

● During the testing phase the MICO platform was still quite unstable. ● We are limited by the current version of the broker to use only one

extraction pipeline at the time.● Long processing time might become an issue when content load

increases.● More work is planned in WP5 that will bring cross-media

recommendation into action.

Validation Test: Work in Progress...

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

http://facedetection-mico.insideout.io

Ground Truth MICOMICO

BetaFaceAPI

BetaFaceAPI

Ground Truth

Face Detection (MICO TE-204)Are You Human?

Face detection uses the latest version of libccv, that includes a comprehensive state-of-the-art face detection algorithm.

https://github.com/insideout10/facedetection

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

TemporalVideoSegmentation (MICO TE-206)Show Me This Video’s Chapters...

Temporal Video Segmentation provides detection of edited shot boundaries and key frame within these boundaries. The extractor uses the Fraunhofer TVS library. The extractor has been integrated with WordPress by extending the HelixWare WordPress Plugin.

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

TextAnalysis (MICO TE-220)Show Me All Contents Related to...

http://magazine.greenpeace.it

This Extractor uses the Redlink Analysis Service (Open Source technology) to extract Named Entities and it is integrated with WordLift - a WordPress plugin that helps bloggers and news publishers organise content and reach their audiences.

All Contents Related to Enel and Smartgrid.

http://magazine.greenpeace.it/enel-volta-pagina/http://magazine.greenpeace.it/entity/greenpeace/

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

AutomaticSpeechRecognition (MICO TE-214)Show Me All Videos Related to this Article

This Extractor sends the detected speech to the Redlink Analysis Service (TE-220) to extract Named Entities from Videos. All contents are then filtered using ONLY Named Entities created by editors using WordLift on the website.

All Contents related to Lego.

An Articleon Lego

Lego as Named Entity on the website

The SPARQL Query(Show Me All Content

Related to the Article)

A Video on Lego

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

code available @

demo available @

https://github.com/insideout10/mico-gatewayhttps://github.com/insideout10/helixware-mico-pluginhttps://github.com/insideout10/helixware-pluginhttps://github.com/insideout10/facedetection

http://magazine.greenpeace.ithttp://facedetection-mico.insideout.io

Demo

© MICO Project. Funded by the European Commission FP7 (grant agreement no: 610480). No reproduction without written permission.

Andrea VolpiniMarcello Colacino

Insideout10 S.r.l.

Viale Bruno Buozzi, 47 00197 Rome, Italy

[email protected]@insideout.io

www.insideout.io

Thanks for your attention!

Quaddles Roost deviantart.net © Quaddles 2011