View
985
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
VIDEOLECTURES.NETexchange ideas / share
knowledge
Outline of the talk
About videolectures.net and K4A
Technical solutions in preparationTowards the content personalisation
Automatic Transcriptions
Enhanced Recommender Services
Visitors analytics
OCWC on videolectures.net
Jozef Stefan Institute Department of Knowledge Technologies @ Center for Knowledge Transfer
Selection of FP6 & FP7 Projects (Integrated Projects and Networks of Excellence only):FP7 IP ACTIVE – Enabling the Knowledge Powered EnterpriseFP7 IP COIN – COllaboration and INteroperability for networked enterprisesFP7 IP EURIDICE – Inter-Disciplinary Research on Intelligent Cargo for Efficient, Safe and Environment-friendly LogisticsFP7 NoE PASCAL2 – Pattern Analysis, Statistical Modeling and Computational Learning FP7 NoE T4ME – Machine Translation & Multilingual Information RetrievalFP6 IP NeOn – Lifecycle Support for Networked OntologiesFP6 IP ECOLEAD – European Collaborative Networked Organizations Leadership InitiativeFP6 IP SEKT – Semantically-Enabled Knowledge Technologies
Jozef Stefan Institute (JSI) is the leading Slovene research institution for natural sciences (800+ people) in the areas of computer science, physics, chemistry
Department of Knowledge Technologies have around 60 people working in various areas of artificial intelligence (machine learning, data mining, semantic technologies, computational linguistics, decision support)
Spinoff-s: Cyc-Europe, Quintelligence, LiveNetLife, Temida, XLab
Selection of Portals and Products: Text-Garden (http://www.textmining.net) Enrycher (http://enrycher.ijs.si/) VideoLectures.NET (http://videolectures.net/) IST-World (http://www.ist-world.org/) Project Intelligence (http://pi.ijs.si/) Search-Point (http://searchpoint.ijs.si/) OntoGen (http://ontogen.ijs.si/) Document-Atlas (http://docatlas.ijs.si/) AnswerArt (http://answerart.net/)
Semantic-Graphs Document-AtlasVideoLectures.NET
Videolectures: Basic facts
10000 videolectures - CC
10000 unique visitors per day
Recorded events 2009: 70, 2868 videos
Shared business models:Research projects
Events
Academic institutions
Baseline funds
In-house developed services with strong support in research in semantics
JSI infrastructure, 5 permanent, 10-15 part time
Goal: Contributing to a global higher ed change by offering open access to high quality scientific material
International dimension
European research supported by the European Commission (from 3M to 10M Euro scale RTD projects)
International institutions: EC, CEEMAN , CERN , Cluster Network , EFMD, IPSA , CLSP, MIT, UC Irvine , Yale, Stanford, TEDx, CMU, University of Ljubljana, Slovenian public research agency…
Active participation in: Opencast, OCWC, EuroCRIS
Knowledge4All foundation
K4AOriginates from Pascal NoE
Knowledge and content exchange networkInspired and lead by most active institutions and organisations around the world from the area of free and open scientific content
Effective and pragmatic
Global impact
Distributed, networked, bottom –up governance
Funds , joint projects
Using existing University networks and resources
Distinctive element: all content to be scientifically approved
K4A - Five pillars of activity
Infrastructure: ICT Matterhorn - Interoperability, Channels, Semantics
Science: Journal and conferencesOnline scientific video journal to global university
Education: courses and contentQuality assurance – peer reviewed content
Research: facilitating the systems, accessing the content, enabling interaction
IPRs, multilinguality, standards
Business models (added value models)
Other continent connections: case study in engagement and interaction
World Summit Award 09
World Summit award 09 “With this, “Videolectures.Net” has approximately outrun 20.000 other products and projects from 157 countries participating in the 4th edition of the WSA, the United Nations based contest for e-content and creativity in the Information Society”.
Technology stack
5 servers serving 20 TB of data
700,000 unique files
300,000 web requests daily (90,000 dynamic)
Application Django software / VideoLectures
Services Nginx, Apache, PostgreSQL, Memcached
Flash Streaming server
Windows Media Server
Cloud storage, Static web hosting
System level Ubuntu Linux Server Windows Server Linux
Servers Web server, Database
Development
Storage, Processing
Flash video streaming
Windows streaming
Amazon S3
Technologies and Research
Graph/Social Network Analysis (GraphGarden/SNAP, IST-World, FPIntelligence)
Complex Data Visualization (DocAtlas, NewsExplorer, SearchPoint)
Computational Linguistics (Enrycher, AnswerArt)
Social Computing/Web2.0 (LiveNetLife)
Decision Support (DEX)
Light-Weight Semantic Technologies(OntoGen, OntoBridge)
Deep Semantics & Reasoning (Cyc)
Statistical Machine Learning
Data/Web/Text/Stream-Mining (TextGarden Suite of tools)
Personalisation
Modeling
Log files
Conte
nt m
inin
g
(Needs and preferences)
Adaptation
Towards personalisation @ videolectures.net
Enrycher(Contextualisation of
content objects)
Quintelligence Miner
(user modeling and segmentation)
Recommender(Content/user matching)
Content/learning object
User behavior
TEL environment(videolectures.net)
User profiling service(Qminer)
Ver1 – identifying segments: developed for NYT, Bloomberg
Ver2 – individual profiling: web service for videolectures.net
Analysing user logs and the content being accessedTextual description – need for transcripts
Contextualisation – need for enriched content
Deep analyticsModeling user behavior
Detecting SIGs – marketing groups, investors,…
Predicting and simulating user’s
Detecting trends in visits
Personalising content and methods
…
User profiling – identifying segments
Log files
User profiles
Videos articles
Advertisers
Segment Keywords
Machine learners
Text Mining, SVM, Link analysis, Learning, Modeling, Mining,…
Biologists - Arthropoda
Spiders, Mites, Ticks, Crab, Lobster, Shrimp….
… …
QMinerSystem/services
Editors
Authors
Search fieldsSearch field valuesAdd stateNon-persistent QueryGet stateGet statesUpdateRename stateDelete stateChange IndexExit
Recommendation service(Recommender)
Ver1: Developed and tested for videolectures.net
Ver2: Operating at Bloomberg.com also for textual documents
Each video is scored from three directions:Collaborative filtering
Category – VL taxonomy and improved SVM module working on optimized categories
Content – matching video against the user group’s history using all the enriched features
All three scores are combined into final score using weights estimated from the collected training data
Demonstration
Content enrichment(enrycher)
Providing wider context to the document… needed for efficient content mining and modeling
A set of Web services (http://enrycher.ijs.si)
Enriching a document with annotations presenting:Extracted known concepts to the machine
Generated most descriptive sentences and dynamic abstracts
Semantic graph
Descriptions with existing ontologies
Links to the external sources (wikipedia, dmoz, dbpedia, openlink data)
Demonstration
Transcription service(Transcriptor)
Prototype service with automatic rapid vocabulary training of the speech recognition engine using:
Lecture description
Slides information
Videolectures taxonomy
Enriched complementary content
Used for:Transcription
Speech indexing
Video content search
Demonstration
OCWC on videolectures.NET
Videolectures.NET offers to organisations:Low cost service and channel
Unlimited video preservation and fixed urls
Organisation, project and personal videography pages
Access to the back-office editorial and tools
Many innovative viewing and content management features
Sustainable innovation through research projects
Demonstration
Supporting OCWC
Video and courses content distribution through videolectures.net
User modeling and analytics … on a distributed network of OCWC sites
… common access to the analytics services
Opening existing services for independent use… transcription, categorisation, classification, content enrichment
OCWC website on videolectures.net:… crawling, enriching, structuring, categorising distributed materials
… common curriculum support
[email protected] – head of Center for knowledge transfer at JSI
[email protected] – head of videolectures.net service
[email protected] – main editor at videolectures.net
[email protected] – head of the KT research group at JSI
John Shawe -Taylor ([email protected]) – K4A director
Colin de la Higuera ([email protected]) – K4A director
Enrycher: http://enrycher.ijs.si
Recommender: http://videolectures.net
Contextual search: http://searchpoint.ijs.si
Support slides
A movement/competition …
Competitive advantageAccess to lecture rooms and the three most active communities
Videos + slides + comments
Viewing features
Semantically enriched functionalities
Curriculum building and management support
Efficient back-office
Low cost and efficient service from recording to hosting
Answering to challenges?
OpenCourseWare
MIT + >140 Universities
Curriculum, standards, quality of training
OpenCast Berkeley, ETH +
40 top World Universities
OS for video recording at Universities
VL as CDCs
Open CDNVideolectures +
JSI team
Using University Internet links and servers
Knowledge4All foundatio
n
K4A foundersEurope – Pascal2 Network of Excellence:
University College London
Jozef Stefan Institute
University of Bristol
XEROX Research Centre Europe
ETH Zurich
CERN
US:Berkeley + Opencast community
MIT + OCW consortium
AsiaKorea University + Network of South Korean Universities
AfricaVoices of Africa, Kenya + East Africa Universities
Kofi Annan Center for ICT and Development, Ghana + West Africa Universities
K4A - reach
Current developmentOpenCDN – OSS/Collaborative Content Distribution Network
Automatic capturing, enriching, and synchronisation
Deep semantic search through videos
Accessibility, multilinguality
Knowledge extractionSpeech Indexing, Text Mining, Video mining,
Automatic ontology construction,
User Tracking and Profiling.
SCOPE proposal
Visitors
Knowledge 4 All
Expressed interestInternet Society Central America - Mexico
Individual organisations: Trento, ULJ, Zagreb, Southampton, CNRS, VTT, Max Planck, TU Graz, TUB, Oxford, Carlos III de Madrid, UVA,…
Commercial organisations: Springer Verlag, Elsevier Science
Governmental bodies: Slovenia, European Commission
Develo
pm
ent
Rese
arc
hO
pera
tiv
eFree, open access, high quality, scientific content
Systems, standards, interoperability
Didactics, methodics, pedagogical models
Methods (individual, collaborative, business)
Ad
ded v
alu
e
(busi
ness
) m
odels
Em
erg
ing
org
anis
ati
on
models
Inn
ovati
ve
tools
Projects
In preparation:AI Research institute for West Africa: implications for infrastructure, summer schools, course definition, interaction software, etc.
Education kiosks in Africa
Journal SCI registration – also in discussion with Springer about possible publication
Virtual conference
Virtual university
Web 2.5 for learning: support for discussion groups, research communities
Long-term optionsInnovation tube – industry/business use
Virtual universities and virtual programmesBottom-up, distributed, self-organised,
Authoring servicesSupport content enrichment for the content creators
Services:On-the-fly personalisation and recommendation
Video scene recognition, automatic annotation and categorisation
Semantic and multilingual search
Accessibility, Internationalization (subtitles, transcripts)
Advanced presentation services with direct user involvement
Textual, graphical, video (audio) content integration services and enrichment