46
Harald Sack Internet Technologies and Systems (ITS) Future Internet Technologies / Semantic Technologies Hasso-Plattner-Institute for IT Systems Engineering Research Seminar Oct 5th, 2010 Mediaglobe & CONTENTUS from 10.000 feet above ground

Mediaglobe & Contentus - from 10.000 Feet Above Ground

Embed Size (px)

DESCRIPTION

Presentation from my Research talk, Oct 5, 2010 on our 2 research projects MEDIAGLOBE and CONTENTUS within the German THESEUS research programme.

Citation preview

Page 1: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Harald SackInternet Technologies and Systems (ITS) Future Internet Technologies / Semantic TechnologiesHasso-Plattner-Institute for IT Systems Engineering

Research SeminarOct 5th, 2010

Mediaglobe & CONTENTUSfrom 10.000 feet above ground

Page 2: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

2

• Semantic Technologies & Multimedia Retrieval

• Theseus Research Program

• Projekt Mediaglobe

• Projekt Theseus/Contentus

Mediaglobe & Contentusfrom 10.000 feet above ground

Page 3: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

3

Semantic Technologies & Multimedia Retrieval

• 2009/01 started with 1 senior researcher ...

• 2009/03 Jörg Waitelonis

• 2009/12 Zalan Kramer

• 2010/01 Johannes Hercher

• 2010/03 Bernhard Quehl

• 2010/03 Haojin Yang

• 2010/05 Nadine Ludwig, Johannes Osterhoff

• 2010/07 Magnus Knuth

• 2010/09 Joscha Jäger

• 2010/11 N.N.

Page 4: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

4

Semantic Technologies & Multimedia Retrieval

•Research Topics

• Semantic Web Technologies

• Ontological Engineering

• Information Retrieval

•Multimedia Retrieval

•Multimedia Analysis

• Social Networking

• Data/Information Visualization

Page 5: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

5

Semantic Technologies & Multimedia Retrieval

•Research Projects

Page 6: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

6

• Semantic Technologies & Multimedia Retrieval

• Theseus Research Program

• Project Mediaglobe

• Project Theseus/Contentus

Mediaglobe & Contentusfrom 10.000 feet above ground

Page 7: Mediaglobe & Contentus - from 10.000 Feet Above Ground

• THESEUS - New Technologies for the Internet of Services

• GOAL: to develop a new Internet-based infrastructure in order to better use and utilize the knowledge available on the Internet.

• FOCUS: Computational Linguistics and Semantic Technologies

• Overall Budget: 200 Mio Euro / Time Frame: 2007 - 2012

• Partners:

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

7

Theseus Research Program

antibodies-online GmbH / Averbis GmbH / B2M Software AG / Blue Order Technologies AG / CIM Aachen GmbH / defa-spektrum GmbH / Deutsche Thomson oHG / DISY Informationssysteme GmbH / Empolis GmbH / EXAPT Systemtechnik GmbH / Festo AG & Co. KG / Festool GmbH / Fraunhofer-Gesellschaft / German National Library / German Research Center for Artifi cial Intelligence (DFKI) / Hasso-Plattner-Institut für Softwaresystemtechnik (HPI) GmbH / Hessian Telemedia Technology Competence Center (httc e.V.) / imc information multimedia communication AG / InfoChem Gesellschaft für chemische Information mbH / Infoman AG / Institut für Rundfunktechnik GmbH / intelligent views gmbh / jCOM1 AG / Karlsruhe Institute of Technology (KIT) / Ligmatech Automationssysteme GmbH / Ludwig-Maximilians-Universität (LMU) / Medien Bildungsgesellschaft Babelsberg GmbH / Metris GmbH / mufi n GmbH / neofonie GmbH / ontoprise GmbH / raumobil GmbH / Research Center for Information Technology Karlsruhe (FZI) / RESprotect GmbH / RWTH Aachen University / SAP AG / SEEBURGER AG / Siemens AG / Sterling SIHI GmbH / Technische Universität Darmstadt / Technische Universität Dresden / Technische Universität München / Transinsight GmbH / Universität des Saarlandes / Universität Freiburg / Universität Karlsruhe (TH) / Universität Leipzig / Universität Stuttgart / Universitätsklinikum Erlangen / VDMA – Verband Deutscher Maschinen- und Anlagenbau e.V. / Yellowmap AG

www.theseus-programm.de

Page 8: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

8

Theseus Research Program

Page 9: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

9

Theseus Research Program

THESEUS Core Technology Cluster• WP1: CTC Management (HHI)• WP2: Video, Audio, Metadata, Platforms (HHI)• WP3: Ontology Management (FZI)• WP4: Semantic Access to Media and Services (DFKI)• WP5: User Interface, Visualization (IGD)• WP6: Statistical Machine Learning (Siemens)• WP7: DRM/IPR Management (IIS)• WP8: Evaluation (IDMT)

THESEUS Use Cases• ALEXANDRIA - A Knowledge Platform on the Internet• CONTENTUS - Technologies for the Library of the Future• MEDICO - Intelligent Searches in Medical Databases• ORDO - Order in a Digital World• PROCESSUS - Making Better Use of Corporate Knowledge• TEXO - An Infrastructure for Web-Based Services

THESEUS SME 2009• MEDIAGLOBE + 11 other projects

Page 10: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

10

• Semantic Technologies & Multimedia Retrieval

• Theseus Research Program

• Project Mediaglobe

• Project Theseus/Contentus

Mediaglobe & Contentusfrom 10.000 feet above ground

Page 11: Mediaglobe & Contentus - from 10.000 Feet Above Ground

• THESEUS SME Project

• Affiliated with THESEUS/CONTENTUS

• Sept 2009 – Aug 2011 / to be extended until June 2012

• 4 Partners / Budget: 2.5 Mio €

• Topic

• Open Up Audiovisual Media Archives with historic & documentary content

• Enable exploratory and semantic search in Audiovisual Media Archives

• Business Cases

• Semantic Search Engine Infrastructure and Services for

•Media Archives,

• Broadcasters and Producers

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

11

Project Mediaglobe - About

www.projekt-mediaglobe.de

Page 12: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

12

Project Mediaglobe - Partners

Project Management Research & Development

AV Archive Media Asset Management System

Page 13: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

13

Project Mediaglobe - Topics

Automated  Media  Analysis

Seman1c  Search

Digi1za1on  of  AV  Media

Rights  Management

Media  Archive  Requirements

User  Interface  Design

Metadata  Engineering

Page 14: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

14

Project Mediaglobe - Topics

Topic: Requirement Analysis and Media Census Data Collection from > 200 AV-Archives in Germany about digitization, online distribution, and rights management

Topic: Efficient Digitization of AV-Archives Workflow definition and avaluation, best practices

Topic: Software Enabled Digital Rights Management Workflow definition and best practices for unique determination of copyrights

Topic: automated AV Media Analysis Extraction of textual and semantic metadata for semantic search

Page 15: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

15

Project Mediaglobe - Topics

Topic: Metadata Engineering Definition, interlinking and validation of (semantic) metadata model for media archives

Topic: Semantic Search Combining semantic metadata of heterogeneous provenance into semantic searchIndex to enable high precision/recall multimedia retrieval and exploratory search

Topic: User Interface Design Support of innovative search strategies with semantic data/information visualization

Page 16: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

16

Project Mediaglobe - Responsibilities

Structural AV-SegmentationIntelligent Character RecognitionFace/Body DetectionGenre DetectionSpeaker DetektionAutomated Speech Recognition

Ontology DesignEntity-Mapping / Schema MappingSemantic Enabled Retrieval Exploratory SearchGUI Design Data/Information Visualization

Media Asset ManagementDistribution

Page 17: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

17

Project Mediaglobe - HPI Research

AutomatedMedia  Analysis

Structural  Analysis

Intelligent  CharacterRecogni1on

Face  Detec(on  +  Tracking

Audio  Analysis

Genre  Analysis

Seman1cAnalysis

Context  Analysis

En1ty  Mapping

Evalua1on  FrameworkMedia  Transcoding

Persistent  Storage

       UIMA  -­‐  Unstructured  Informa1on  Management  Architecture

digi1zedAV-­‐Media

Page 18: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

18

Project Mediaglobe - HPI Research

Media Transcoding

Archival and Distribution•SD - DVCpro 50•HD - DVCpro HD

Processing•MPEG4/AVC•Downscaling

Evaluation Framework

•Accurate manual annotation of 25 video clips (750 min) from defa spektrum archive•TREC video test datasets

Page 19: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

19

Project Mediaglobe - HPI Research

video

scenes

shots

subhots

frames

Structural Analysis

Page 20: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

20

Project Mediaglobe - HPI Research

Structural Analysis

shots

• Shot Boundary Detection

• Identification of• Hard Cuts• Drop Outs• Soft Cuts, as e.g., Dissolve, Wipe, Cross-Fade, etc.

Analytical Shot Boundary Detection• Analysis of Luminance/Chrominance Histograms• Analysis of Edge Distribution• Analysis of Motion Vectors

Machine Learning• Classification of Hard/Soft Cuts based on Image Features• Random Trees • Support Vector Machines

histogram differences

Page 21: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

21

Project Mediaglobe - HPI Research

Structural Analysis

Analytical Shot Boundary Detection• How to differentiate between Soft Cuts and Camera Rotation, Pan, and Zoom?

• Analysis of Motion Vectors

Page 22: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

22

Project Mediaglobe - HPI Research

Structural Analysis

(Preliminary) Evaluation• Yovisto/Mediaglobe• CTC 2 - Shot Detection (HHI)• Advene Shot Detection• Student seminar project

(analytical analysis, AL)• Student seminar project

(machine learning, ML)

recall precision f1 measureyovisto/mediaglobe 0,76 0,77 0,75

Advene 0,64 0,76 0,67

HHI 0,78 0,77 0,77

Students AL 0,72 0,78 0,71

Students ML 0,80 0,81 0,80

new 0,87 0,83 0,85

Page 23: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

23

Project Mediaglobe - HPI Research

• Preprocessing• Keyframe extraction• Script identification• Script filtering• Adaption of script geometry (Deskew)• Image quality enhancement

• Optical Character Recognition (OCR)• with standard software (tesseract)

• Postprocessing• Keyterm spotting• Lexical analysis • Statistical filtering

Intelligent Character Recognition

Prof. Rudolf AgstenLDPD

Page 24: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

24

Project Mediaglobe - HPI Research

Intelligent Character Recognition

(a) Original

(f) Mask after erosion & dilation(e) Binarized(d) Normalized

(c) Weighted DCT(b) DCT

Page 25: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

25

Project Mediaglobe - HPI Research

Intelligent Character Recognition

(h) sequence 1

(i) sequence 2

(k) Adapted sequence 2

(j) Adapted sequence 1

Page 26: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

26

Project Mediaglobe - HPI Research

Tex

Metadaten Engineering

• Requirement Analysis• Semantic Data Modelling• Vocabulary Inter-Linking• MPEG-7 Compliance

Page 27: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

27

Project Mediaglobe - HPI Research

• Entity Mapping • Mapping keyterms (text) to semantic entities• Context Analysis and Disambiguation

Truman

User Tag

LOD Cloud

Truman Capote

Harry S. Truman

Truman, Minesota

The Truman Show

?

?

?

?

Metadaten Engineering

Page 28: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

28

Project Mediaglobe - HPI Research

• Entity Mapping • Mapping keyterms (text) to semantic entities• Context Analysis and Disambiguation

Truman

PotsdamEisenhower

Inauguration

Context Graph Analysis

Metadaten Engineering

Page 29: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

29

Project Mediaglobe - HPI ResearchAutomated Media Analysis

Semantic Search

• Creation of a Semantic Search Index• Query String Mapping and Refinement• Facetted Search• Search by Timeline• Geographical Search• Exploratory Search

Page 30: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

30

Project Mediaglobe - HPI Research

User Interface Design

Page 31: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

31

Project Mediaglobe - HPI Research

User Interface Design

Page 32: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

32

Project Mediaglobe - HPI Research

User Interface Design

Page 33: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

33

• Semantic Technologies & Multimedia Retrieval

• Theseus Research Program

• Project Mediaglobe

• Project Theseus/Contentus

Mediaglobe & Contentusfrom 10.000 feet above ground

Page 34: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

34

CONTENTUS•Use Case (among 5 others) of the German Theseus Research

Program•Time Frame: 2007 - 2012•7 Project Partners•Supported by the Bundesministerium für Wirtschaft und Technologie

Page 35: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

35

Motivation•Deterioration of Media (Books,

Video, Records, DVD, CD… )

•Enormous amount of multimedia objects

•High costs and manpower to drive a digitizing workflow

•Almost no internet-based linking of cultural goods

Page 36: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

36

Project Goals•Development of concepts and Technologies for

,Next Generation Multimedia Libraries‘

• Automatic quality control & restauration• Automatic metadata generation • Semi-automatic semantic linking • Incorporation of social networks and expert communities

Page 37: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

37

Contentus Process Chain HPI Research

Page 38: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

38 Contentus Service Platform

Page 39: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

39 Contentus Process Chain

Backend Media Processing

Page 40: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

40 Selected Contentus Components

Face Detection / Dirt Detection & Removal

Page 41: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

41 Selected Contentus Components

Face Detection / Scratch Detection & Removal

Page 42: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

42 Selected Contentus Components

Layout Detection /OCR Preprocessing

Page 43: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

43 Selected Contentus Components

Audio Analysis /Audio Annotation

Page 44: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

44 Contentus SMMS Process Chain

Backend Media ProcessingFrontend

Processing

Page 45: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

45

SMMS GUI DEMO - D2

Page 46: Mediaglobe & Contentus - from 10.000 Feet Above Ground

Dr. Harald Sack, Mediaglobe & Contentus, Research Seminar, 5. Oct. 2010, Hasso-Plattner Institute for IT Systems Engineering, Potsdam

46

• Semantic Technologies & Multimedia Retrieval

• Project Mediaglobe

• Project Theseus/Contentus

Mediaglobe & Contentusfrom 10.000 feet above ground

Thank you for your Attention!