19
DISTRIBUTED ENCODING ENVIRONMENT BASED ON GRIDS AND IBP INFRASTRUCTURE Petr Holub *‡ and Lukáš Hejtmánek * * Faculty of Informatics and Institute of Computer Science, Masaryk University, Brno and CESNET, Prague Czech Republic

DISTRIBUTED ENCODING ENVIRONMENT BASED ON GRIDS AND IBP INFRASTRUCTURE Petr Holub *‡ and Lukáš Hejtmánek * * Faculty of Informatics and ‡ Institute of

Embed Size (px)

Citation preview

DISTRIBUTED ENCODING ENVIRONMENT

BASED ON GRIDS AND IBP INFRASTRUCTURE

Petr Holub*‡ and Lukáš Hejtmánek*

*Faculty of Informatics and ‡Institute of Computer Science,Masaryk University, Brnoand ‡ CESNET, Prague

Czech Republic

TERENA Networking Conference 2004, Rhodes, Greece 2

Motivation

• Huge production of multimedia and esp. video content– education (lectures, educational movies), science, fun, etc.

• Need for transformation (transcoding) from source formats to formats suitable for downloading and streaming– very computationally demanding

• Problems with storage capacity

• BUT: We have great Grid infrastructure! :-)

TERENA Networking Conference 2004, Rhodes, Greece 3

Used Infrastructure

• MetaCenter Grid Infrastructure in Czech Rep.– PC clusters

• more than 80 dual processor (PIII and P4) nodes with 2 GB RAM and fast scratch disk

• GE and Myrinet interconnection• Scheduling system: PBSPro• clusters are cheap and grow fast!

– SGI machines, Alphas...• Distributed Data Storage (DiDaS)

– 15 TB of IBP based distributed storage

TERENA Networking Conference 2004, Rhodes, Greece 4

MetaCenter, DiDaS, CESNET Network

TERENA Networking Conference 2004, Rhodes, Greece 5

• exNode - serialized XML metadata– collection of capabilities of allocated IBP arrays– essential for file access

• We use AFS for storing exNodes

IBP Overview

TERENA Networking Conference 2004, Rhodes, Greece 6

• Selection of best hosts– based on Completion Time Estimate (CTE)

• Data location optimization– selection of best storage depots– prefetch support

• Simplified CTE

– problem with network performance estimate bD,p(t)

Scheduling Model

TERENA Networking Conference 2004, Rhodes, Greece 7

Scheduling Algorithm (1/2)

• General scheduling NPO class– for uniform processors and jobs of different size

• Our greedy algorithm PO class when processors and depots are connected via a complete graph– takes advantage of uniform task size– formal proof of correctness– for common graph, the scheduling belongs to PO class

again as greedy algorithm might prevent maximum utilization of depots

TERENA Networking Conference 2004, Rhodes, Greece 8

Scheduling Algorithm (2/2)

TERENA Networking Conference 2004, Rhodes, Greece 9

Implementation

• Distributed Encoding Environment– for steering transcoding process

• libxio library– for enabling IBP in applications

• relies on transcode and HelixProducer for actual data transcoding– many input/output built in transcode formats: MPEG-1,

MPEG-2, MPEG-4 (DivX, MS MPEG...), DV, RAW, etc.– RealMedia and others through external compression software

(e.g. HelixProducer)

TERENA Networking Conference 2004, Rhodes, Greece 10

libxio library

• Provides equivalents for standard UNIX I/O functions– open, close, read, write, fttruncate, lseek, stat, fstat, and lstat

• IBP URI format

– without lors:// prefix, local file is accessed– local_path/file specifies serialized metadata– short form lors:///local_path/file is available

for reading• IBP enabled transcode based on libxio

TERENA Networking Conference 2004, Rhodes, Greece 11

Distributed Encoding EnvironmentOverview

TERENA Networking Conference 2004, Rhodes, Greece 12

Distributed Encoding Environment (1/3)

• lors tools are used for uploading from editing stations (Win32, MacOS X)

• remultiplexing for proper video/sound interleaving

TERENA Networking Conference 2004, Rhodes, Greece 13

Distributed Encoding Environment (2/3)

• image transformations are performed using transcode– image size reduction, de-interlacing, noise reduction, color

corrections, audio resampling and cleaning

TERENA Networking Conference 2004, Rhodes, Greece 14

Distributed Encoding Environment (3/3)

• IBP-enabled servers• IBP-enabled client applications

TERENA Networking Conference 2004, Rhodes, Greece 15

Pilot User Groups (1/2)

• Lecture recording @ Faculty of Informatics, MU– 20 hrs/week, new lecturing halls with automatic video

acquisition• HW conversion of analog signals to DV using Canopus

ADVC-100 boxes– several target formats

• high quality RealMedia (768576 @ 25 fps, 3 Mbps)• low quality RealMedia (384288 @ 15 fps, 56-768 kbps)• DivX (384288 @ 25 fps, 1CD)

TERENA Networking Conference 2004, Rhodes, Greece 16

Pilot User Groups (2/2)

• Neurosurgery department at St. Anna University Hospital in Brno– large archives of operation recordings– they are willing to make them available to students of

medicine– some editing is necessary: to select interesting pieces only

and to anonymize patient– publishing to CESNET RealMedia streaming server

TERENA Networking Conference 2004, Rhodes, Greece 17

Future Work

• Deployment of new scheduling systems– DataGrid/EGEE, GridLab, or something else?

• Network traffic prediction service– suitable for distributed data storage– support for regularly running jobs– support for in-advance bandwidth allocations

• GUI for DEE

TERENA Networking Conference 2004, Rhodes, Greece 18

Acknowledgements

• CESNET Development Foundation projects 017/2002 (DEE) and 018/2002 (DiDaS)

• CESNET Research Intent MSM 6383917201• Miloš Liška, Luděk Matyska, Eva Hladká and

MetaCenter staff

Thank you for your attention!

Q/A?