Upload
primrose-dulcie-wilcox
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
DISTRIBUTED ENCODING ENVIRONMENT
BASED ON GRIDS AND IBP INFRASTRUCTURE
Petr Holub*‡ and Lukáš Hejtmánek*
*Faculty of Informatics and ‡Institute of Computer Science,Masaryk University, Brnoand ‡ CESNET, Prague
Czech Republic
TERENA Networking Conference 2004, Rhodes, Greece 2
Motivation
• Huge production of multimedia and esp. video content– education (lectures, educational movies), science, fun, etc.
• Need for transformation (transcoding) from source formats to formats suitable for downloading and streaming– very computationally demanding
• Problems with storage capacity
• BUT: We have great Grid infrastructure! :-)
TERENA Networking Conference 2004, Rhodes, Greece 3
Used Infrastructure
• MetaCenter Grid Infrastructure in Czech Rep.– PC clusters
• more than 80 dual processor (PIII and P4) nodes with 2 GB RAM and fast scratch disk
• GE and Myrinet interconnection• Scheduling system: PBSPro• clusters are cheap and grow fast!
– SGI machines, Alphas...• Distributed Data Storage (DiDaS)
– 15 TB of IBP based distributed storage
TERENA Networking Conference 2004, Rhodes, Greece 5
• exNode - serialized XML metadata– collection of capabilities of allocated IBP arrays– essential for file access
• We use AFS for storing exNodes
IBP Overview
TERENA Networking Conference 2004, Rhodes, Greece 6
• Selection of best hosts– based on Completion Time Estimate (CTE)
• Data location optimization– selection of best storage depots– prefetch support
• Simplified CTE
– problem with network performance estimate bD,p(t)
Scheduling Model
TERENA Networking Conference 2004, Rhodes, Greece 7
Scheduling Algorithm (1/2)
• General scheduling NPO class– for uniform processors and jobs of different size
• Our greedy algorithm PO class when processors and depots are connected via a complete graph– takes advantage of uniform task size– formal proof of correctness– for common graph, the scheduling belongs to PO class
again as greedy algorithm might prevent maximum utilization of depots
TERENA Networking Conference 2004, Rhodes, Greece 9
Implementation
• Distributed Encoding Environment– for steering transcoding process
• libxio library– for enabling IBP in applications
• relies on transcode and HelixProducer for actual data transcoding– many input/output built in transcode formats: MPEG-1,
MPEG-2, MPEG-4 (DivX, MS MPEG...), DV, RAW, etc.– RealMedia and others through external compression software
(e.g. HelixProducer)
TERENA Networking Conference 2004, Rhodes, Greece 10
libxio library
• Provides equivalents for standard UNIX I/O functions– open, close, read, write, fttruncate, lseek, stat, fstat, and lstat
• IBP URI format
– without lors:// prefix, local file is accessed– local_path/file specifies serialized metadata– short form lors:///local_path/file is available
for reading• IBP enabled transcode based on libxio
TERENA Networking Conference 2004, Rhodes, Greece 12
Distributed Encoding Environment (1/3)
• lors tools are used for uploading from editing stations (Win32, MacOS X)
• remultiplexing for proper video/sound interleaving
TERENA Networking Conference 2004, Rhodes, Greece 13
Distributed Encoding Environment (2/3)
• image transformations are performed using transcode– image size reduction, de-interlacing, noise reduction, color
corrections, audio resampling and cleaning
TERENA Networking Conference 2004, Rhodes, Greece 14
Distributed Encoding Environment (3/3)
• IBP-enabled servers• IBP-enabled client applications
TERENA Networking Conference 2004, Rhodes, Greece 15
Pilot User Groups (1/2)
• Lecture recording @ Faculty of Informatics, MU– 20 hrs/week, new lecturing halls with automatic video
acquisition• HW conversion of analog signals to DV using Canopus
ADVC-100 boxes– several target formats
• high quality RealMedia (768576 @ 25 fps, 3 Mbps)• low quality RealMedia (384288 @ 15 fps, 56-768 kbps)• DivX (384288 @ 25 fps, 1CD)
TERENA Networking Conference 2004, Rhodes, Greece 16
Pilot User Groups (2/2)
• Neurosurgery department at St. Anna University Hospital in Brno– large archives of operation recordings– they are willing to make them available to students of
medicine– some editing is necessary: to select interesting pieces only
and to anonymize patient– publishing to CESNET RealMedia streaming server
TERENA Networking Conference 2004, Rhodes, Greece 17
Future Work
• Deployment of new scheduling systems– DataGrid/EGEE, GridLab, or something else?
• Network traffic prediction service– suitable for distributed data storage– support for regularly running jobs– support for in-advance bandwidth allocations
• GUI for DEE
TERENA Networking Conference 2004, Rhodes, Greece 18
Acknowledgements
• CESNET Development Foundation projects 017/2002 (DEE) and 018/2002 (DiDaS)
• CESNET Research Intent MSM 6383917201• Miloš Liška, Luděk Matyska, Eva Hladká and
MetaCenter staff