Automated Sports Analysis and Distribution with Computer ...Automated Real Time Sport Analysis and Distribution with Computer Vision and Mobile Phones Ryan Sherlock B.A. (Mod.) Computer

Automated Real Time Sport Analysis and

Distribution with Computer Vision and

Mobile Phones

Ryan Sherlock

B.A. (Mod.) Computer Science

Final Year Project May 2004

Supervisor: Dr Kenneth Dawson-Howe

Abstract

“Automated Real Time Sport Analysis and Distribution with Computer Vision and

Mobile Phones” is a project in the area of both Computer Vision and Mobile

Communications. The field of Computer Vision attempts to emulate human vision

through the processing of digital images. The aim of this project is to recognize, track

and draw conclusions on the locations of the players and ball in a sports clip from a

static camera. The result from this step, in the form of player and ball locations, will

then be distributed to mobile communication devices, for example, a mobile phone,

for display as low bandwidth animation representations of what occurred in the sports

clip.

This report aims to outline the development of this project, from the background

research, to the successful end result.

ii

Acknowledgements

I would like to thank my supervisor Kenneth Dawson-Howe for his advice throughout

the project.

I would also like to thank my family and friends for their support and encouragement.

iii

Table of ContentsABSTRACT.............................................................................................II

ACKNOWLEDGEMENTS..................................................................III

1 INTRODUCTION.................................................................................11.1 PURPOSE...........................................................................................................1

1.2 AIMS...................................................................................................................1

1.3 MOTIVATION...................................................................................................2

2 OVERVIEW OF PROBLEM..............................................................32.1 INTRODUCTION...............................................................................................3

2.2 PRINCIPAL PROJECT COMPONENTS..........................................................3

2.3 PROJECT ASSUMPTIONS...............................................................................4

2.4 TECHNOLOGIES USED...................................................................................5

2.5 READERS GUIDE.............................................................................................5

2.5.1 Background Research.................................................................................5

2.5.2 Video Preprocessing...................................................................................6

2.5.3 Player and Ball Recognition.......................................................................6

2.5.4 Tracking......................................................................................................6

2.5.5 Geometric Transformations........................................................................6

2.5.6 Animation Distribution with Mobile Devices..............................................7

2.5.7 Sports Display Servlet.................................................................................7

2.5.8 Sports Display Mobile Application.............................................................7

2.5.9 Sample Results............................................................................................7

2.5.10 Conclusions...............................................................................................7

3 BACKGROUND RESEARCH............................................................83.1 PLAYER TRACKING AND RECOGNITION..................................................8

3.1.1Colour Tracking...........................................................................................8

3.1.2 Active Contour Models................................................................................8

3.1.3 Template Tracking......................................................................................9

4 PREPROCESSING AND BASIC VISION ALGORITHMS..........114.1 INTRODUCTION.............................................................................................11

4.2 SMOOTHING (AVERAGING)........................................................................11

4.3 BACKGROUND DETECTION.......................................................................12

iv

4.4 BINARY THRESHOLDING............................................................................12

4.5 VIDEO STABILIZATION................................................................................13

4.5.1 Introduction...............................................................................................13

4.5.2 Algorithm..................................................................................................14

5 PLAYER AND BALL INITIALIZATION.......................................175.1 INTRODUCTION.............................................................................................17

5.2 CONNECTED COMPONENT ANALYSIS....................................................17

5.2.1 Algorithm..................................................................................................18

5.2.2 Connected Component Analysis Example.................................................19

5.3 INITIALIZING THE BALL..............................................................................20

5.3.1 Algorithm..................................................................................................20

5.4 INITIALIZING PLAYERS...............................................................................20

5.4.1 Algorithm..................................................................................................21

5.4.2 Building a Model of the Player.................................................................21

6 TRACKING.........................................................................................236.1 INTRODUCTION.............................................................................................23

6.2 WHICH TRACKING SHOULD BE USED?...................................................23

6.3 NON OCCLUSION TRACKING.....................................................................23

6.3.1 Algorithm..................................................................................................24

6.4 OCCLUSION TRACKING..............................................................................24

6.4.1 Algorithm..................................................................................................25

7 GEOMETRY CONVERSION...........................................................267.1 INTRODUCTION.............................................................................................26

7.2 NECESSARY MATHEMATICS.....................................................................26

7.2.1 Intersecting Point Between Two Lines......................................................27

7.2.2 Distance calculation.................................................................................27

7.2.3 External Divisor........................................................................................28

7.2.4 Point inside a Polygon..............................................................................28

7.3 EXTENDING THE COURT............................................................................29

7.3.1 Algorithm..................................................................................................29

7.4 PARTITIONING THE COURT........................................................................30

7.4.1 Initial Partitioning of the Court................................................................31

7.4.2 Subsequent Partitioning of the Court........................................................31

v

7.5 CONVERTING PIXEL COORDINATES TO REAL COORDINATES.........32

7.5.1 Algorithm..................................................................................................33

7.6 TIPS OUTPUT FILES (TOF FILES)................................................................35

8 ANIMATION DISTRIBUTION WITH MOBILE DEVICES.......368.1 INTRODUCTION.............................................................................................36

8.2 FULL SYSTEM ARCHITECTURE.................................................................36

8.2.1 Overview of the Life of a Video Sequence.................................................37

9 SPORTS DISPLAY SERVLET.........................................................399.1 INTRODUCTION.............................................................................................39

9.2 FILE INFORMATION REQUESTS.................................................................39

9.3 FILE DOWNLOAD REQUESTS.....................................................................39

9.4 FILE DOWNLOAD RESPONSE FORMAT...................................................40

9.5 THE COST OF DOWNLOAD.........................................................................41

10 SPORTS DISPLAY MOBILE APPLICATION............................4210.1 DISPLAYING ANIMATIONS.......................................................................42

10.2 USER INTERFACE EXAMPLES .................................................................43

11 SAMPLE RESULTS.........................................................................4511.1INTRODUCTION............................................................................................45

11.2 SAMPLE TRACKING RESULTS.................................................................46

11.3 SAMPLE SPORTS DISPLAY RESULTS.....................................................47

12 CONCLUSIONS AND ANALYSIS................................................4812.1 CONCLUSIONS ............................................................................................48

12.2 PROBLEMS ENCOUNTERED.....................................................................48

12.3 IMPROVEMENTS.........................................................................................49

12.4 COMMERCIAL PROSPECTS.......................................................................50

13 REFERENCES..................................................................................51

APPENDIX A – EXAMPLE TIPS OUTPUT FILE...........................54

APPENDIX B – VIDEO SETUP FILE................................................56

vi

Index of FiguresFigure 2-1 Example sports footage [Boyle03]..............................................................4Figure 3-1 Variations in Player Model [Boyle03]........................................................9Figure 4-1 Before Smoothing......................................................................................11Figure 4-2 After Smoothing.........................................................................................11Figure 4-3 Difference Image Creation........................................................................12Figure 4-4 Binary Image Creation..............................................................................13Figure 4-5 Need for video stabilization.......................................................................14Figure 4-6 Least difference bin selection....................................................................15Figure 4-7 Background Image and Mask....................................................................16Figure 4-8 Rotating Mask Example.............................................................................16Figure 5-1 Connected Component Analysis Example.................................................18Figure 5-2 Joining Player Parts..................................................................................18Figure 5-3 CCA 1........................................................................................................19Figure 5-4 CCA 2........................................................................................................19Figure 5-5 CCA 3........................................................................................................19Figure 5-6 CCA 4........................................................................................................19Figure 5-7 Difference in Player Size and Elongatedness............................................21Figure 5-8 Player Model Example..............................................................................22Figure 6-1 Tracking Occluded Players......................................................................25Figure 6-2 Player Model Representation....................................................................25Figure 7-1 Pixel to Meter Conversion.........................................................................26Figure 7-2 External Divisor........................................................................................28Figure 7-3 Diagonal Intersection Lines......................................................................29Figure 7-4 Extending with Ratios................................................................................30Figure 7-5 Initial Partitioning of the court.................................................................31Figure 7-6 Subsequent Partitioning of the court.........................................................32Figure 7-7 Converting a Location 1............................................................................33Figure 7-8 Converting a Location 2............................................................................34Figure 7-9 Coordinate Conversion.............................................................................34Figure 8-1 Full System Architecture...........................................................................37Figure 10-1 Nokia 7210..............................................................................................42Figure 10-2 Animation Generation.............................................................................43Figure 10-3 Phone Examples - Main Menu - Open File - About................................44Figure 11-1 Sample TIPS results................................................................................46Figure 11-2 Sample Phone Results.............................................................................47Figure 12-1 Sample Indoor Image..............................................................................49

vii

1 Introduction

Computer Vision attempts to emulate the human vision system. This is an extremely

difficult problem and has not been solved generally. To fully solve Computer Vision a

comprehensive understanding of the human brain is necessary. Experts believe that

this understanding milestone is still some time off.

Measurement of human motion has been available with the use of invasive sensors or

targets for some time. Recent developments in computer vision provide basic human

motion tracking. Since computer vision relies only on video cameras, computer

hardware and software, this type of tracking is unobtrusive, inexpensive and usually

works in existing unmodified environments.

Mobile Communications has expanded dramatically over the last ten years.

Convergence has meant that the mobile phone, originally a wireless extension of the

wired telephony infrastructure, is developing into a comprehensive media center used

for an expanding range of services.

The combination of Computer Vision, for content creation, and mobile

communication devices for content distribution can create interesting systems. This

project aims to explore one of these systems.

1.1 Purpose

The purpose of this project is to automatically create and display on a mobile

communication device an animation representation of a sports video clip in real time.

1.2 Aims

Create an application that will be able to track individual players in video clips in real

time and produce player waypoint1 that will allow the measurement and display of the

players in game position.

1 Waypoint - A major point on a route to a destination – a 'snap shot' of the objects location.

1

Build a distribution infrastructure that will be used to drive a graphical representation

of the actual game footage on a mobile device. (mobile phone/PDA)

Integrate the systems so that the full media creation and distribution pipeline is

automatic.

1.3 Motivation

Mobile phone penetration has reached more then 80% in the Irish market [Budde03].

In parallel with this explosion in penetration, mobile phone usage has also diversified.

SMS (Short Messaging Service), MMS (Multimedia Messaging Service), WAP

(Wireless Access Protocol) and mobile gaming have now become an important part of

an operators revenue with data services accounting for over 17% of operator revenues

in 2003 in the UK and Ireland. With multimedia services and mobile gaming seen as

the next major evolution in the development of the mobile phone, the creation of low

bandwidth multimedia content has become a lucrative market. [Telephony04]

Using Computer Vision for automated content creation, mobile device suitable

multimedia content can be created and distributed with a very low “cost to market”.

2

2 Overview Of Problem

2.1 Introduction

This Chapter is intended to give an overview to the project, identifying problem areas

that had to be resolved in order to provide a working solution. This chapter will also

act as a “readers guide” for the rest of the document by outlining the purpose of each

chapter.

There has been a great deal of research in human tracking in recent years. Player

tracking is an example of a practical use for human tracking. Previous research into

sports player tracking had several motivations:

1. Professional trainers and sports scientists are interested in the amount of ground

covered during a game by each athlete and how quickly they move during the

game. This information can be used to specialize training regimes for players.

2. By tracking the main events that occur during a game, meta data can be added to

the video for future searching and referencing [Dahyot04]

3. By tracking opponents during a game it is possible to draw conclusions on the

tactics and plays that they use in different situations. [Aaron02]

What I plan to do in this project is to add another practical use for player tracking to

this list.

2.2 Principal Project Components

Tracking sports players over a large playing area is an extremely challenging problem.

The players change direction quickly and have large variations in their form. The size

of the court (15 x 25 meters) means that the resolution of an object varies for different

parts of the image. When looking at the front of the court [Figure 2-1] each pixel

represent several centimeters while when you look at the pixels at the rear of the court

each pixel represents 10-20 centimeters. This means that, from this view, the size of a

3

player can vary from 30 pixels to 10 pixels in width.

In addition, the tracking of the ball has not previously been accomplished in a sports

analysis system.

In order to create an automated sports analysis and distribution system, this problem

had to be divided into smaller modular tasks that were easy to test. This increased the

efficiency of development.

The principal steps that were needed to automatically create and display mobile sports

content are as follows:

8 Stabilize the video clip

8 Identify the players and ball in the video sequence

8 Track the players and ball throughout the sequence

8 Convert the players image positions (in pixels) into real world positions (in meters)

8 Make this information available on the Internet in an efficient form for

downloading/streaming

8 Create a mobile display interface and connection protocol so that mobile devices

can download and display the animation

2.3 Project Assumptions

The following are the assumptions that were made in order to make this project

4

Figure 2-1 Example sports footage [Boyle03]

possible in the given time period. Given more research and development time, some

of these assumptions may be relaxed.

8 Input video is from a static camera without zooming

8 A background image for the input video is available

8 Lighting conditions throughout the clip do not change dramatically

8 The physical dimensions of the court are known

8 The first frame of the video does not contain occluded2 players or ball

8 Players do not leave the playing area

2.4 Technologies Used

The Trinity Image Processing System (TIPS), developed by Kenneth Dawson-Howe,

Trinity College Dublin, is the platform built upon for content creation. This system,

written in C++, is expanded in order to create a new video stabilization method,

initialize and track the players and ball, and convert the relevant player information

into real world coordinates.

Java 2 Enterprise Edition (J2EE), and more specifically servlets, are used as the base

technology to convert and distribute the TIPS Output Files (.tof files) into mobile

phone optimized downloads that can be distributed to mobile devices using the

Internet and GPRS.

Java 2 Micro Edition (J2ME) is used as the platform for building a mobile device

display and communication system.

2.5 Readers Guide

2.5.1 Background Research

Background research into the areas of player recognition and tracking.

2 Occlusion occurs when an object obstructs the view to another object. That is, the object that is

behind is not fully visible.

5

2.5.2 Video Preprocessing

In order for the sports clip to be suitable for processing the video must first be

stabilized with the background image. This section will discuss a new method that

was developed for video stabilization as well as give a brief introduction into the

other preprocessing techniques and basic vision algorithms that were used in the

project.

2.5.3 Player and Ball Recognition

To be able to track the players and ball successfully, their initial position in the image

must be found. This section will describe the techniques used during player and ball

initialization.

2.5.4 Tracking

Tracking is performed so that the location of the players and ball is known throughout

the video sequence.

A computationally fast algorithm will be presented that can track players when there

is no occlusion expected.

When objects that are been tracked occlude each other, which in team sports occurs

regularly, tracking becomes much more difficult. This chapter will also describe the

advanced tracking techniques that were developed to overcome this problem.

2.5.5 Geometric Transformations

When a player or ball is tracked successfully, there is still no “image world” to “real

world” mapping. This section aims to describe the process involved in converting an

objects location from pixels, to meters.

6

2.5.6 Animation Distribution with Mobile Devices

Using the player and ball locations, waypoints are created. This section will examine

the full system architecture and how these waypoints are transported to a mobile

device.

2.5.7 Sports Display Servlet

The Sports Display Server is the connection point between the content creation stage

and the animation display stage. This section will give an introduction to this

connecting service.

2.5.8 Sports Display Mobile Application

The Sports Display Application is the only part of the system that the end user will

use. It allows the user to download new animations from the server, view animations,

open previously saved animations and delete previously saved animations on their

mobile phone. This section will give an introduction to the design and operation of

this application

2.5.9 Sample Results

Sample images from various video clips and mobile phone animations are presented.

(It is advised that the attached CD-ROM is used for sample video results)

2.5.10 Conclusions

This section will present the project conclusion, the problems encountered and the

commercial prospects of this system.

7

3 Background Research

3.1 Player Tracking And Recognition

This section introduces some of the background research into object recognition and

tracking that was needed for the successful completion of this project.

The segmentation of the players from the background was the most difficult step in

this project. Below are some examples of techniques that are used to initialize and

track movable objects, and in this case players, over a video sequence. The actual

algorithms used in this project draw from these research areas.

3.1.1 Colour Tracking

The algorithm searches for the pixel most similar to the recorded colour of the player.

The search is performed on a limited area (say 10 pixels in each direction, we are

using a finite acceleration/velocity assumption here) The similarity measure is defined

as Euclidean distance

S colour x , y � I R x , y "C R2 I R x , y "C R

2 I R x , y "C R2

Where I is the image and C is the recorded colour of the player. R, G and B denote the

red, green and blue channels, respectively. The main advantage of this algorithm is

that it is highly reliable. It tracks the players even when the colour of the player

changes due to compression artifacts or changes in the lighting conditions. The main

disadvantage is that it can also lock onto background colours. Another disadvantage is

that this method can create a lot of jitter in the resultant player trajectory. This makes

it inappropriate for stand alone use. [Pers01]

3.1.2 Active Contour Models

The development of active contour models (snakes) results from the work of Kass,

8

Witkin and Terzpolos [Sonka99]. Energy minimization is used to achieve image

segmentation and understanding. The snake, itself, contains two parts – a list of

control points and an energy function. Each control point is an x , y point in the

image plane that collectively represent the shape of the object being recognized. For

visualization purposes the control points are usually linked together on the screen

creating an outline of the investigated shape. The energy function is the mathematical

rule set that governs the changes possible in the location of the control points.

Therefore the energy function is critical to the active contour model’s success at being

able to lock onto particular types of object [Tabb03].

Whilst improvements have been made in making snakes suitable for particular tasks,

there are still several major weaknesses. Snakes cannot categorize shapes as there is

no inbuilt ‘knowledge’ of the object that they are detecting. This makes snakes less

suitable for tracking objects in complex backgrounds [Tabb03]. Also, since snakes

become ‘locked’ to an objects outline, if the object becomes occluded by another

object, the snake will no longer be able to follow the object. This is the fatal flaw with

snakes with regards to player tracking, which meant that they could not be used in this

project.

3.1.3 Template Tracking

Visual differences between the players and the background can be exploited in order

to track objects [Pers00]. Generally in tracking applications the objects that are being

tracked are similar in nature, for example, chocolates on a conveyor belt. However

with player tracking the variation in template can be quite dramatic. Figure 3-1 shows

some of the possible player variations [Boyle03].

9

Figure 3-1 Variations in Player Model [Boyle03]

As the players shape and scale change dramatically throughout the clip, the number of

templates that would be needed for accurate tracking would be unmanageable. Simple

template tracking is therefore not appropriate for general player tracking. However, as

will be demonstrated in chapter 6, a large part of the solution to occlusion tracking

relies on concepts taken from template tracking.

10

4 Preprocessing and Basic Vision Algorithms

4.1 Introduction

During video preprocessing many simple algorithms are used to aid and initialize the

more complex tracking algorithms. This section will briefly go through these

algorithms.

In addition, a new method for stabilizing videos that was developed for this project

will be presented.

4.2 Smoothing (Averaging)

Compression artifacts and image noise are a major concern in later image processing

stages. Smoothing uses redundancy in the image data to suppress the noise. [Sonka99]

Given an input image, Figure 4-13, the output image pixels were specified by the

average of the equivalent input pixel and its neighbouring pixels. The effect of this

operation is the blurring of the image. [Figure 4-2] As the objects of interest in the

image are large in relation to the size of the neighbourhood used to create the output

pixel, none of the necessary detail is lost. The smoothing algorithm used in this

project is a part of the TIPS platform.

3 The court in all images except Figure 2-1 is part of the Trinity College Dublin Campus.

11

Figure 4-1 Before Smoothing Figure 4-2 After Smoothing

4.3 Background Detection

Assuming a stationary camera and constant illumination, background detection is the

most straightforward approach to motion detection. Each pixel in each frame in the

video sequence (current image) is subtracted from the equivalent pixel in the

background image to yield a difference image D.

D��RB"RC� �GB"GC� �BB"BC�

where R, G and B denote the red, green and blue components of the pixel. B and C

represent the background image and the current image respectively.

The resultant image [Figure 4-3] shows all pixels in the video that have changed from

the background image.

4.4 Binary Thresholding

Binary thresholding is used as a simple form of image segmentation. Thresholding is

12

Figure 4-3 Difference Image Creation

computationally inexpensive and fast. More accurately, thresholding is the

transformation of an input image f to a segmented output image g.

g i , j �1 for f i , j �Tg i , j �0 for f i , j T

where T is the threshold value, g i , j �1 for elements of image objects and

g i , j �0 for elements of the background. [Sonka99]

Figure 4-4 shows a sample difference image and the thresholded binary output. The

white areas are referred to as image objects, while the black area is the background.

4.5 Video Stabilization

4.5.1 Introduction

Video Stabilization is the first preprocessing technique that is performed on the video

sequence. It is being introduced at this stage as an understanding of the subsequent

preprocessing techniques is needed to understand the need for the stabilization.

In the background subtraction step, it is vital that the background image is aligned

correctly with the video sequence that is being processed. Figure 4-5, shows the

resultant difference image from a frame in a video sequence that is offset by one pixel

13

Figure 4-4 Binary Image Creation

from the background image4. In this image the outline of the court markings and

surrounding buildings are clearly visible. If binary thresholding was then performed

on this image, the courts markings would become an image object that would not

allow the correct initialization of the player and ball tracking algorithm.

This section will now present the new method that was developed for the stabilization

of a video sequence that has a maximum jitter of several pixels in any direction.

Briefly, stabilization is performed by finding features in the video frame that have not

changed in relation to the equivalent features in the reference image. Then, by moving

these features around their neighbouring area, a vote can be held among the chosen

features in the video with its corresponding feature in the background image. The

direction that wins this vote is the direction that the video will be offset.

The stabilization of the video did not have to be performed on every video sequence

that was used for processing.

4.5.2 Algorithm

This method starts by breaking the current image (current frame from the video

sequence) into a grid of N x M ’bins’, where N is typically 8 and M is 12. [Figure 4-6]

4 This image has been exaggerated so that the errors in the difference image are visible in print.

14

Figure 4-5 Need for video stabilization

Every pixel in each bin is then subtracted from the equivalent pixels in the reference

image. The absolute pixel difference between all pixels in the current bin and the

equivalent reference bin is then stored with the current bin.

After this has been completed for each bin in the image, there is now a way to elect

the bins that differ least from the equivalent reference bins. These ’least difference’

bins are the bins that contain the lowest absolute difference count. X bins are then

used for the next step, where X is a natural number, typically 25% of N x M. In

Figure 4-6, the selected least difference bins are visible as the bins that are not

artificially darkened. These bins are the “ features” that will be used for image

realignment.

The X bins that were selected in the previous step are now used to find the optimum

feature matching offset. This basically means that the pixels in each of the bins will

be offset by a certain amount, against the equivalent background pixels. A difference

count will then be performed. When all offsets have been calculated for each bin,

each bin will report the offset that most matched (had the lowest absolute difference

count) the background image. Whichever offset got the most votes will be declared

the winner and the full video will be offset by that amount. For example, two pixels

left, one pixel up.

15

Figure 4-6 Least difference bin selection

As a simple example, Figure 4-7 shows a background image in white and a bin in

gray. Each box in both images represent a pixel. Figure 4-8 shows the bin being offset

against the background in all positions that are a maximum of 1 pixel away in each

direction. The central image shows the bin in its natural position, that is, with an

offset of zero columns and zero rows. The same principle is used in video

stabilization except the maximum distance to check for a match is user defined.

16

Figure 4-8 Rotating Mask Example

Figure 4-7 Background Image and Mask

5 Player and Ball Initialization

5.1 Introduction

In order to be able to track the players and ball in the scene their initial position must

first be found. Below is a discussion on how to extend upon the preprocessing

techniques already described to initialize the player and ball tracking.

The first technique introduced in this section is Connected Component Analysis, the

initialization of both the players and the ball relies on the successful completion of

this step. Next, the steps behind initializing the ball will be described. Finally, a

discussion on how the players are initialized will be presented.

5.2 Connected Component Analysis

The binary image obtained from thresholding consists of objects and background. In

the left image in Figure 5-1 the objects are represented by the white pixels and the

background by the black pixels. There is now a need to label the objects in order to

perform player and ball recognition.

Within the TIPS platform there is an implementation of Connected Component

Analysis, a new faster and more specialized version was developed to keep the

application running as fast as possible. This new version will now be presented.

Figure 5-1 shows the input and output of the connected component analysis

algorithm. In the output image each labeled region is visualized by giving each region

a different colour.

17

5.2.1 Algorithm

The first stage of the algorithm is to give each object in the binary image a label.

B Search the image row by row.

B If a pixel is an object pixel and not labeled, then give the pixel a label and

recursively search all connected object pixels giving them the same label.

B If the resultant region is smaller then N5 pixels, delete the region.

This completes the first stage of the specialized connected component analysis. The

next stage joins and relabels large close regions. Figure 5-2 Illustrates the need for

this step. In the left hand it is clear that the two objects represent a full player, so the

next step joins these two regions to yield the right hand image in Figure 5-2.

B Navigate around the edge of each labeled region searching for other labeled

regions that are within several pixels.

B If such regions are found, artificially create a connection between both regions,

5 N is typically 8 when using a video with resolution 360 x 288

18

Figure 5-1 Connected Component Analysis Example

Figure 5-2 Joining Player Parts

label both regions consistently and record the size, in pixels of the complete

region.

The final step is to relabel all regions, so that the first region found has a label of 1 the

second 2 and so on.

5.2.2 Connected Component Analysis Example

Bellow is an example of the algorithm in operation on a simple test case.

Figure 5-3 is the input binary image to Connected Component Analysis (CCA) The

objects are shows as 1s and the background as 0s. Each box represents a pixel.

Figure 5-4 Shows the result after the algorithm finds the first non labeled object pixel.

All adjoining pixels are labeled the same value, 2. One region has now been created.

Figure 5-5 The algorithm continues and finds the second non labeled region. It labels

it 3.

Since both region 2 and 3 are close together they are joined and the labels are

synchronized. Figure 5-6 Shows the result of this operation.

19

Figure 5-3 CCA 1

0 0 0 0 0 0 0 00 1 0 0 0 0 0 00 1 1 0 0 1 0 00 1 1 1 0 1 1 00 0 1 1 0 1 1 00 0 0 0 0 1 1 00 0 0 0 0 1 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Figure 5-4 CCA 2

0 0 0 0 0 0 0 00 2 0 0 0 0 0 00 2 2 0 0 1 0 00 2 2 2 0 1 1 00 0 2 2 0 1 1 00 0 0 0 0 1 1 00 0 0 0 0 1 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Figure 5-5 CCA 3

0 0 0 0 0 0 0 00 2 0 0 0 0 0 00 2 2 0 0 3 0 00 2 2 2 0 3 3 00 0 2 2 0 3 3 00 0 0 0 0 3 3 00 0 0 0 0 3 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Figure 5-6 CCA 4

0 0 0 0 0 0 0 00 2 0 0 0 0 0 00 2 2 0 0 2 0 00 2 2 2 2 2 2 00 0 2 2 0 2 2 00 0 0 0 0 2 2 00 0 0 0 0 2 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

5.3 Initializing the Ball

Initializing the ball in a low resolution image is an extraordinarily difficult problem.

Due to the size of the ball, in the video sequence it can appear as being only a few

pixels wide. This can lead to the region that the ball is in not being found, or other

objects, for example players heads, in the image being falsely initialized as the ball.

There are steps used to increase the accuracy of the initialization of the ball. Only

regions of the correct size, shape and colour are searched.

5.3.1 Algorithm

B For each labeled region in the image

B Compare the size of the region with the expected size of the ball. If the region is

of the correct size, compare the elongatedness6 of the region with the expected

elongatedness.

B Compare the associated central pixel values in the current image with the

expected colour of the ball.

B If all of the above conditions are met, a ball candidate has been found.

5.4 Initializing Players

Initializing the players is performed in much the same way that the initialization of

the ball was performed. Since the assumption was made that the first frame in all

sports clips will contain all the players not occluding each other it is possible to study

the regions returned from connected component analysis to find the players. Regions

that are the correct size and shape are assumed to be players.

6 Elogatedness: This the the length divided by the width. For example a square has an elongatedness

of 1. Since a ball is a sphere it should be approximately 1, although the blurring of the ball in the

video must also be considered

20

5.4.1 Algorithm

B For each labeled region in the image

B Compare the size of the region with the expected size of the player.

B If the region is of the correct size compare the elongatedness of the region with

the expected elongatedness. The elongatedness of a player can vary

dramatically, so the allowable range must be quite large. Figure 5-7 is an

example of this wide variation.

B If successful, a player has been found. Using the height and width obtained

from CCA, the player now has an artificial bounding box surrounding it.

5.4.2 Building a Model of the Player

For occlusion tracking, in the next section, it is necessary that a model of the players

last appearance is available. During initialization and non occlusion tracking this

model is created/updated. The follow steps are needed to create this internal player

model.

When a player is initialized for the first time an object within the TIPS platform is

created to represent the player. This object contains information about the players

width, height, location and size in pixels. There are two more attributes of the object

that are very important for the new method of occlusion tracking that was developed.

These are:

1. Information on what parts inside the bounding box actually belong to the player

2. Colour information on all parts of the player that are visible

21

Figure 5-7 Difference in Player Size and Elongatedness

This information is generated as follows.

B When a player has been found an artificial bounding box is placed around the

player. An example of the bounding box is in Figure 5-7.

B All pixels within this bounding box are tested for being part of the region that the

player has been initialized from. If a pixel is part of the region, it is recorded in the

players object.

B Also, if a pixel is part of a player, the colour at that location is also recorded.

For example, Figure 5-8 shows a simple representation of the 3 stages in this process.

Image A shows the original image of the player that is bring used for initialization.

Image B shows the pixels within the bounding box that are part of the player. 1

represents part of the player, while 0 represents the background.

Image C shows the colours that were saved to represent the player.

22

Figure 5-8 Player Model Example

6 Tracking

6.1 Introduction

The correct tracking of the players in the video sequence was the most difficult aspect

of this project. After initialization of the players and ball, the tracking algorithm must

lock on to the player or ball throughout the video sequence. For the tracking to be

useful in this application, it must also be capable of running in real time.

6.2 Which Tracking Should Be Used?

In order to keep the application running in real time, the use of the occlusion tracking

techniques, since they are computationally more expensive, should be used as little as

possible. In order to decide if occlusion tracking is necessary at the start of processing

of each frame in the video sequence, the shortest distance between each player is

calculated. If this distance is small enough such that in the next frame the players may

occlude each other then these players are ’tagged’ and the special occlusion tracking

algorithm will be used to track them.

6.3 Non Occlusion Tracking

A new computationally fast method for tracking non occluding players was developed

for this section. The method itself is quite simple to understand once the process of

player initialization is understood. This method relies on two assumptions.

1. No other players are within a few pixels of the outside of the bounding box

2. The player has a maximum velocity that they cannot exceed

Assumption 1 is true, since at the start of processing of each frame each player is

checked to make sure no other players are close by. If there is a player close by the

occlusion tracking algorithm will be used.

Assumption 2 is true since the fastest speed possible by a human, without artificial

23

assistance is in the order of 10 meters per second. This, depending on the location of

the player in the image can range from 1 to 5 pixels per frame of video.

6.3.1 Algorithm

1. Extend the bounding box surrounding the player by N pixels in every direction,

where N is the maximum distance per frame that the player can move given their

previous location and velocity.

2. Perform full connected component analysis on this region as described in Section

5.2.

3. Now perform player initialization on the resultant CCA image, expecting to find

one player.

4. Using the new player region, update all player object attributes.

6.4 Occlusion Tracking

The tracking of players through occlusion is accomplished by using the player colour

model generated in 5.4.2. Effectively the players colour and shape based model is

tracked through the sequence.

Figure 6-1 Shows an example of two players being tracked through occlusion. In the

first image both players are surrounded by a red box, this signifies that both players

are being tracked by the fast non occlusion tracking algorithm. In the next three

images the players bounding boxes are blue, this shows that the special occlusion

tracking algorithm is in use. This section will describe this occlusion tracking

algorithm.

24

6.4.1 Algorithm

B Due to the position of the player in the image it is possible to find out the

maximum distance in each direction that the player could have moved since the

last frame, lets refer to this distance in pixels as P. [Section 6.3]

B Using the colour model of the players jersey [Figure 6-2], test how well the jersey

model matches the current frame in the players neighbourhood up to a maximum

of P pixels away.

B The location that most matches the player, is the location that the player moved to.

Alter the players coordinates appropriately. Previous player velocity information

can also be used to resolve ambiguities.

The tracking of the ball is accomplished in a similar manner to the occlusion tracking

of a player.

25

Figure 6-1 Tracking Occluded Players

Figure 6-2 Player Model Representation

7 Geometry Conversion

7.1 Introduction

With the tracking information that was obtained in the previous two sections it is

possible to infer where the players are in pixel coordinates on the screen. For this

tracking information to be useful in the creation of generic animations, the pixel

coordinates must be converted into real world locations. There are two main parts in

this conversion process.

1. Extend the visible court

2. Convert pixel coordinates into real world coordinates

This section will give a detailed description of both of these steps as well as a detailed

description of the mathematics involved and a description of how court partition is

implemented

7.2 Necessary Mathematics

The methods involved in extending the court, partitioning the court and converting

the coordinates rely on several equations. In this section these equations are described.

26

Figure 7-1 Pixel to Meter Conversion

PixelsMeters

7.2.1 Intersecting Point Between Two Lines

The intersection point between two lines segments was calculated by creating the

appropriate line equations and then solving these equations for x and y. The line

equations are of the form

a1 x b1 y m1�0

a2 x b2 y m2�0

Using x0 , y0 and x1 , y1 as the start and end point of the line the following

equations are used to generate the line equations.

a1� y0" y1

b1�x1"x0

m1� x0� y1" y0 " y0� x1"x0

Then to find xi , yi , the intersection point between the two line segments, the

following equations are used.

xi� b1�m2 " m1�b2 � a1�b2 " b1�a2

yi� a1�m2 " m1�a2 � b1�a2 " a1�b2

7.2.2 Distance calculation

The distance between two points was calculated using the Euclidean distance formula

Distance� y2" y12 x2"x1

2

Where x0 , y0 and x1 , y1 are the two points we want the distance between.

27

7.2.3 External Divisor

In order to extend the court a method for finding a point that was a certain distance

from the end of a line segment was needed. The external divisor equations were used

for this calculation.

x x� m�x2 " n�x1 � m"n

y x� m�y2 " n�y1 � m"n

Where x x , y x is the new external point, x1 , y1 and x2 , y2 are the two

points on the line segment that will be extended. M is a representation of the distance

between x1 , y1 and x2 , y2 . N is a representation of the distance between

x2 , y2 and x x , y x .

7.2.4 Point inside a Polygon

In this section a fast method was needed for determining if a point lies on or inside a

convex four sided polygon. The method used was based upon [Bourke87].

This method considers the polygon as a “ path” from the first vertex. Moving around

the path anticlockwise, if the result D (from the equation below) is always greater or

equal to zero then the point is on or inside the polygon.

D� y" y1 x1"x0 " x"x1 y1" y0

Where x , y is the point that you want to check. x0 , y0 is the starting point of

the current line and x1 , y1 is the end point of the current line.

28

Figure 7-2 External Divisor

7.3 Extending the Court

In most video clips that were available for processing, sections of the court were not

visible due to limitations of the camera and camera location. To rectify this, geometric

properties of the court were used to extend the court. The main principal that was

used was that the intersection of the diagonals of a court viewed from any angle still

give the center of the court.

This step was necessary so that every video sequence, no matter how much of the

court was visible, could be treated by the conversion method as if the full court was

visible.

7.3.1 Algorithm

Using the known pixel locations of half the court, draw the diagonal line connecting

opposite corners and extend the lines. [Figure 7-3 Diagonal Intersection Lines]

29

Figure 7-3 Diagonal Intersection Lines

12

Using the ratio distance 1:2, the diagonal can be extended to a point such that the

distance along the extended diagonal (3) is the same ratio as the part that lies on the

court (1+2). [Figure 7-4] It is possible to find the length of 3 since 1:2 is the same

ratio as (1+2):3. Using the distance (1+2) and 3 as m and n respectively, the external

point X [See Figure 7-4] can be found using the external divisor equations. Complete

this calculation with both diagonals and extend both side lines. Join the two new

points that were created from the external divisors. The intersection of this line with

the court side lines is the outline of the court.

7.4 Partitioning the court

This section will give a description on how the court can be partitioned into

quadrants, given limited amounts of information about the court. Partitioning of the

court is needed for the coordinate conversion stage.

There are two basic types of partitioning needed.

30

Figure 7-4 Extending with Ratios

1. Initial partitioning of the court into quadrants [Figure 7-5]

2. Subsequent partitioning of a quadrant into sub quadrants [Figure 7-6]

In Figure 7-5 and Figure 7-6 the blue dots represent points that are found by finding

the intersection point of the associated lines.

7.4.1 Initial Partitioning of the Court

Initially, six points on the court are known, these points are represented by the red

dots in image a of Figure 7-5. The purpose of this step is to end up with all points in

image c.

1. By connecting the half court diagonals as shown in image a the center of each half

court (the blue dots) can be calculated from the intersection points.

2. Join the two blue dots in image a and extend the line in both directions. The

intersection of this line with both base lines is the midpoint of the baselines.

3. Connect the two points either side of the half court. The intersection point of this

line and the line created in step 2 is the half court.

7.4.2 Subsequent Partitioning of the Court

In the coordinate conversion process it is vital that, given a quadrant it is possible to

break that quadrant into four more quadrants. For example, in image a of Figure 7-6

31

Figure 7-5 Initial Partitioning of the court

the dark gray quadrant is to be split into four quadrants. The result is image d in

Figure 7-6.

1. Join the diagonals of the quadrant that is to be partitioned as well as both quadrants

that share a border line.

2. Calculate the intersection points of all these lines. (The blue dots in image b)

3. Create and extend lines between the points created in step 2, as shown in image c.

4. The intersection points of these lines with the borders of the quadrant that is to be

partitioned is the missing points that were needed. The result is image d.

7.5 Converting Pixel Coordinates to Real Coordinates

Given a pixel coordinate pair xP , yP , it is necessary to convert them into its real

world location xR , yR . Due to the location of the camera, a simple mapping can

not be made from pixel to real world coordinates. A new approach to solving this

problem was needed.

The conversion was accomplished by developing a method that subdivided the court

into quadrants, testing which quadrant the pixel coordinates is in and recursively

subdividing the quadrant that contains the player.

32

Figure 7-6 Subsequent Partitioning of the court

7.5.1 Algorithm

This section will give a simple, step by step explanation on how the conversion is

accomplished.

1. Using the methods developed in 7.4, the court is partitioned into four equal (in the

real world) quadrants.

2. Each of these quadrants is given a label from zero to three.

3. Each quadrant is then tested to check if the point that is being converted is

contained in that quadrant. For example in Figure 7-7 the player at location

(190,260) is contained in quadrant 2.

4. If the point is inside a quadrant, that quadrant is appended onto a quadrant choice

list. The quadrant choice list keeps track of the quadrant that the player is in at

each recursive step.

5. The quadrant that contains the player, the current quadrant, is then partitioned into

four quadrants.

6. Go to step 1, using the current quadrant as the court. [Figure 7-8] This step should

be run 8 times.

33

Figure 7-7 Converting a Location 1

After eight operations of this method on a court that is twenty five meters long

enough information has been collected to return a result accurate to .049 of a meter.

By performing the operations on the real court that are described in the quadrant

choice list, it is possible to subdivide the real court until an exact location of the pixel

point is available in “ real world” coordinates.

For example, given a court that is 15 meters wide and 25 meters long and a small

quadrant choice list of {2, 1} it is possible to give the position accurate to 3.125

meters.

34

Figure 7-8 Converting a Location 2

Figure 7-9 Coordinate Conversion

Figure 7-9 shows this operation being calculated. The origin is originally (0,0) and the

opposite corner is (15,25). The first number in the quadrant choice list is 2, which

means that the point is in quadrant 2. Therefore the origin is now (7.5,12.5) and the

opposite corner remains to be (15,25). The next number is the quadrant choice list is

1. Therefore the origin is now at (11.25,12.5) and the opposite corner is now at

(15,18.75). Calculating the center point between these points gives a good

approximation of the real world location for the point. That is (13.125,15.625).

7.6 TIPS Output Files (TOF Files)

In order to recreate the video sequence as an animation on a mobile device, certain

pieces of information are necessary. The number of players, the presence of a ball, the

size of the court and the location of all of these objects are vital for animation

creation. An output file structure was developed with this specification in mind. After

every F frames of the video sequence the locations of all players and the ball are

converted to real coordinates and appended to the TOF file, where F is approximately

one third of the video sequences frame rate. This TOF file will then be used by the

animation server to create mobile device specific animations.

There is an example TOF file in Appendix A.

35

8 Animation Distribution with Mobile Devices

8.1 Introduction

The result of the previous sections is a TIPS Output File. This file contains all known

information about the number of players, presence of the ball, location of all players

and ball at discrete time intervals and the velocity of every object. The next stage in

this project is to use this information to create animation representations of the video

sequence on mobile devices, for example, mobile phones.

This section is split into two parts.

1. A description of the high level view of the full system.

2. An end-to-end example of the system in use.

The following two chapters will present more detailed information on each of the

distribution components.

8.2 Full System Architecture

There are three applications developed to produce a full working system.

1. Extending the TIPS platform to perform video stabilization, object initialization,

object tracking, position conversion and TOF file creation

2. A servlet7 that runs on a web server that uses the TOF file and creates mobile

device specific byte streams.

3. A mobile device application that can connect to a web server, download/stream

specialized byte streams, and display the stream as an animation to the user.

7 A servlet is a Java based application that runs on a web server. It is a powerful tool for the creation

of client-server applications.

36

Figure 8-1 Gives an outline of how these parts link together. Each modular section is

separated by a light gray box.

The TIPS environment and web server may reside on the same machine. It is

necessary that the web server is accessible via the Internet and that TIPS environment

saves the TOF files to a location that is accessible by the web server.

8.2.1 Overview of the Life of a Video Sequence

In this section, an introduction to the steps that are performed during an automatic

media creation session will be presented. This should be used as an overview for the

more detailed description that will be presented in later chapters.

37

Figure 8-1 Full System Architecture

1. When opening the TIPS environment, a background image, video sequence and

output location for the TOF file is required. The output location for the TOF file

should be accessible by the Sports Application servlet on the web server. The TIPS

environment will then perform all tasks needed for the creation of a TOF file

automatically.

2. When a user performs a “ View Live” operation on the Sports Display Application

running on their mobile device, they are asked for the servers URL. Once entered a

GPRS connection is created over the Internet to the web server at that URL. A list

of files that are available for download/stream will be returned by the Sports

Display servlet running on the web server.

3. The user decides which file to view and selects the file. This sends a request to the

servlet for that file. The servlet then reads the associated TOF file, creates a

specialized byte array and returns it to the mobile device.

4. The mobile device then displays the animation and saves the file for future

viewing.

38

9 Sports Display Servlet

9.1 Introduction

In the previous section the high level view of the full system was presented. In this

section a detailed description of the Sports Display Servlet will be given.

The Sports Display Servlet is the connection point between the content creation stage

and the animation display stage. There are two main operations that the Sports

Display Servlet performs.

1. On request, returns a list of files to the Sports Display user that are available for

download – this is a pull operation for the mobile device.

2. Given a file request from a Sports Display user, it converts the correct TOF file

into a download optimized device specific byte stream and returns the stream to

the mobile device – this is a pull operation for the mobile device.

9.2 File Information Requests

Before the Sports Display Application can request a particular file for download, it

must first know what files are available for download. When the mobile device

connects to the servlet, it appends “ ?info=files” to the URL. When the servlet sees

this request, it knows that the user wants a list of available files for download. The

servlet keeps a track of available TOF files in its upload directory. A TOF file does

not become available unless it is in the correct format. The servlet then simply returns

a list of files that are available for download.

9.3 File Download Requests

When the Sports Display Application makes a download request to the server it

appends several pieces of information to the URL.

39

1. The name of the file requested for download.

2. The width (resolution) of the viewable screen area. (If not given defaults to 128)

3. The height (resolution) of the viewable screen area. (If not given defaults to 75%

of width)

For example the URL may look like:

http://www.someaddress.com/servlet/DownloadServlet?file=Example.tof&width=128

The server can then use this information to tailer the response for the specific device.

9.4 File Download Response Format

As the cost to download the animation to a mobile device is directly proportional to

the size of the file, creating an efficient file format was very important. In addition to

this, MIDP 1.0 devices (the version of Java that runs on the second generation of

mobile phones) does not have a division operator or floating point precision.

Therefore it was important that there would be little or no operations to be performed

on the response to make it viewable. These two constraints were satisfied in the

following way.

The response from the servlet consists of a header, which is 5 + N bytes long and the

body, where N is the number of players. A single byte is used for each of the

following.

B Number of players

B Presence of the ball

B Number of waypoints

B Number of ticks between each key frame

B The sleep time between each tick

B Each players team number

Obviously, it is possible to make the header smaller by bit stuffing, but in this case,

clarity is more important then saving several bytes of space.

40

The body of the message is

2 * (number of players ( + 1 if the ball is present) * number of waypoints)

For each tracked object at every waypoint there is an update of the objects position.

The objects coordinate position is sent to the mobile device as the actual location on

the mobile devices screen that the player should be displayed. Sending the mobile

device this information means that it does not have to calculate the objects position on

its screen relative to real world positions.

The web server that was used in this project is Jigsaw 2.2.2. Jigsaw is a free, Java

based server that is both easy to install and maintain.

For a more detailed look at the structure of the Sports Display Servlet, refer to the

associated javadocs that are available on the attached CD-ROM.

9.5 The Cost of Download

Using the above technique for animation distribution, a very competitive pricing

system can be developed. Currently, the cost per kilobyte downloaded from the

Internet via GPRS on a mobile phone is 0.02 Euro. [Vodafone04] Although this is

quite expensive per kilobyte, using the above byte format it is possible to show a

relatively long animation economically.

For example:

There is a game with 10 players and 1 ball. The game will last 10 minutes. The key

frame refresh rate is two times per second. (Which, when the full court is represented

on a screen that is typically 128 pixels wide, is fine) The calculation is

file�header numtrackable objects�refresh rate�length of clip�2 , thus

26405�5 11�2� 60�10 �2

rounding 26405 up to 27 kilobyte. The total cost to download is 0.54 Euro.

41

10 Sports Display Mobile Application

The Sports Display Application is the only part of the system that the end user will

see. It allows the user to download new animations, view animations, open previously

saved animations and delete previously saved animations. Figure 10-1 shows a Nokia

7210 displaying an animation. The blue dots represent players.

Since this application is built upon J2ME it will run successfully on all devices that

have J2ME installed, for example, most recent mobile phones. For simplicity, in this

section there will be specific references to the Sports Display Application running on

a Nokia 7210 mobile phone.

10.1 Displaying Animations

The format of the mobile device downloadable byte stream was discussed in the

previous chapter [9.4]. This section will describe how the Sports Display application

uses this byte stream to display an animation.

42

Figure 10-1 Nokia 7210

The byte stream contains all the information about the video sequence needed to

recreate an animation representation. It tells the Sports Display application the exact

location on its screen to place the objects. When the Sports Display Application

downloads the byte stream it creates an internal object for every movable object in the

animation, these objects are then initialized with all way point information that is

available.

In order to decrease the size of the download needed to create an animation, several

optimizations were made. In the TIPS platform, the creation of waypoints occurs three

times per second. (Configurable) If these positions were simply shown on the Sports

Display application, it would lead to jittery animations. To solve this problem, the

application breaks the distance from the origin to the destination into sub destinations.

By displaying these sub destinations, the movement between waypoints becomes

smooth. Figure 10-2 shows four images describing how the animation is created

between two point. The blue dots represent the waypoints that the Sports Display

application received from the animation server. The red dot is what is actually

displayed.

As time passes the displayed point for the player moves between sub destinations.

This gives the smooth motion of the player between waypoints that can be a

considerable distance apart.

10.2 User Interface Examples

The user interface features several menus and the actual animation view. Figure 10-1

shows the view of an animation being run. Figure 10-3 Shows three examples from

43

Figure 10-2 Animation Generation

Destination

Waypoint

Time

the menu driven user interface. The first phone is the main menu that appears when

the application is opened. From here it is possible to download new files, open files,

delete files and read the copyright information. The second phone shows the listing of

saved animations on the phone. By selecting one of these files, the file would be

displayed. The last phone is the ’about’ information for the application.

The Sports Display Application is quite a sophisticated application that can not be

fully explained here due to document size limitations. For a more detailed look at the

structure of the Sports Display Application, refer to the associated documentation that

is available on the attached CD-ROM.

44

Figure 10-3 Phone Examples - Main Menu - Open File - About

11 Sample Results

11.1 Introduction

This project shows that it is possible to create a fully automated, real time mobile

content creation system. This chapter presents sample results from the tracking that

was performed in the TIPS environment and the resultant animations that were

displayed on a mobile phone.

As this project deals with videos and animations it is highly advised that the CD-

ROM is used for sample results. Also, included on the CD-ROM is the application

that will run on any J2ME enabled mobile device. Installing this application will

present sample results in a more natural manner.

The attached CD-ROM contains videos of the tracking and mobile application

running, source code and source documentation. It also contains the original project

presentation slides and this document in PDF format.

45

11.2 Sample Tracking Results

Figure 11-1 is an example taken from the visual output of TIPS. It shows an image

taken every two seconds. Red bounding boxes show players been tracked with the

faster non-occluding tracking algorithm while the blue boxes show players been

tracked with the occlusion tracking algorithm.

46

Figure 11-1 Sample TIPS results

11.3 Sample Sports Display Results

The “ Sports Display” is the application that will run on your mobile device and

display the animations streamed/downloaded from the sports animation server. This

application allows for the downloading of new animations, saving downloaded clips,

opening previous downloaded clips and managing files that have been saved. Figure

10-1 shows the Sports Display running an animation on a Nokia 7210. In Figure 11-2

there are example screen captures taken from an animation displaying the results

created by TIPS from the video clip in section 11.2. Obviously it is difficult to display

an animation on paper, so again, it is highly recommended to either, use the attached

CD-ROM for the proper display of the results or install the application onto a J2ME

enable mobile phone and view the demo animations.

The above images are taken from the application running on a Nokia 7210 with 128

by 128 pixel resolution. The application menu has been removed for clarity.

47

Figure 11-2 Sample Phone Results

12 Conclusions and Analysis

12.1 Conclusions

The original aim of this project was to simply track and draw conclusion on the

locations of players and ball in a video sequence. This project, as it is now, far

exceeds the original specification.

Throughout the design of the extensions to the TIPS platform, speed of execution was

always important. The design of the tracking algorithms reflect this design decision.

There are more advanced tracking algorithms available that would have performed

more precise tracking, however the precision of the tracking was not vital in this

application since the length of a full court will be represented on a screen that is

typically 128 pixels wide. The tracking algorithms that were developed suited these

restrictions well.

Personally, working on this project has been a lot of fun. Initially I underestimated the

size and difficulty of this project but perseverance and a lot of hard work has brought

it through to fruition. The end result is an interesting system that is entertaining to

demo and an excellent “ proof of concept” .

12.2 Problems Encountered

There were several difficulties that arose during the development of this project.

At first, the conversion of coordinates in the image world to real world seemed like a

simple problem. However, due to the location of the camera, the court was distorted

which meant that the creation of a simple map from the image world to real world

was impossible. The result was that a new method for converting pixel coordinates to

real world coordinates was developed [Section 7].

Due to the resolution of the video sequences, ball initialization and tracking was very

48

difficult to achieve consistently. Several of the the sample videos on the CD-ROM

show the ball being tracked consistently, however, in general it was almost

impossible. The main reason for this was the size of the ball as it appears in the video

sequence. As the video was filmed at 360x288 resolution, the ball was only a few

pixels wide. To be able to find this consistently is an extremely difficult problem.

Higher resolution video clips would help to solve this problem.

Obtaining footage that was suitable for testing was very difficult. Indoor footage was

typically unreliable for tracking and player initialization due to the artificial light that

’bleached’ all colours. The lights also produced a large amount of shadows. Figure 12-

1 Shows an image that was originally used for tracking.

In addition to this, obtaining footage of actual teams playing wearing proper team

uniforms would have been useful. During player initialization it is possible to assign

each player a team depending on the colour of their jersey, as there was no ’proper’

footage available for this project, this doesn’t take place in the sample videos.

12.3 Improvements

Although this project was a resounding success, there are still several areas that need

further development to make this project into a commercial product.

49

Figure 12-1 Sample Indoor Image

Many of the problems that were encountered in this project could be solved if a well

placed high resolution camera was available.

Most of the future code development would be spent making the tracking and

initializing of the players and ball more robust. Currently the tracking algorithm does

not deal correctly with shadows cast by the players or dramatic changes in video

sequence brightness. Therefore a more sophisticated method for tracking and

initializing players will have to be developed that remains in real time.

Also, with a more sophisticated tracking algorithm and higher resolution video it may

be possible to infer player orientation and to recognize particular players.

In addition, the availably of proper ’team’ footage would make demonstrating the

project a lot more realistic, as the breakup between different teams would be

noticeable in the Sports Display animation representation.

12.4 Commercial Prospects

As was mentioned in the introduction, mobile phone multimedia creation is a rapidly

expanding market. Since the mobile phone market has become saturated, mobile

phone operators are looking for new ways to increase revenue. The success of ring

tones and background downloads for mobile phones has shown that the public is

willing to spend money on personalizations. Downloadable football, basketball etc...

may be an appealing avenue for revenue increase as the cost associated with both the

creation and distribution of the animations is very low, which would allow this

service to be priced very competitively. This could also be used as an easy first step

into getting people more comfortable with using the advanced features of their phones

which typically generate more revenue for the mobile phone operators.

With several months of tracking refinement, high resolution “ birds eye view” cameras

and a human operator to oversee the tracking results this system could become a

viable commercial product.

50

13 References

[Aaron02] Aaron Bobick, Stephen Intille, Anthony Hui. “ Computers watching

football” , Last Accessed on 26-4-04

http://www-white.media.mit.edu/vismod/demos/football/football.html

[Bodor] Robert Bodor, Bennet Jackson, Nikolaos Papanikolopoulos, “ Vision-Based

Human Tracking and Activity Recognition” , University of Minnesota, Last Accessed

on 26-4-04

http://mha.cs.umn.edu/Papers/Vision_Tracking_Recognition.pdf

[Bourke87] Paul Bourke, “ Determining if a point lies on the interior of a polygon” ,

November 1987, Last Accessed 26-4-04

http://astronomy.swin.edu.au/~pbourke/geometry/insidepoly/

[Boyle03] Chris J. Needham and Roger D. Boyle, “ Tracking multiple sports players

through occlusion, congestion and scale” , The University of Leeds, England, Last

Accessed on 26-4-04

http://www.comp.leeds.ac.uk/chrisn/research/index.html

[Budde03] Paul Budde, “ 2004 Telecoms in Europe – UK and Ireland” , December 9th

2003, Last Accessed 26-4-04,

http://www.marketresearch.com/product/display.asp?productid=946732

[Choi03] Sunghoon Choi, Yongduek Seo, Hyunwoo Kim, Ki-Sang Hong. “ Where are

the ball and players? : Soccer Game Analysis with Color-based Tracking and Image

Mosaick” Pohang University of Science and Technology, Republic of Korea , 1997,

Last Accessed on 26-4-04

http://citeseer.nj.nec.com/370333.html

[Dahyot04] Tozenn Dahyot, Niall Rea, Anil Kokaram, “ Sport Video Shot

Segmentation and Classification” , Electronic and Electrical Engineering Department,

Trinity College Dublin

51

[Spengler03] Martin Spengler, Bernt Schiele. “ Automatic Detection and Tracking of

Abandoned Objects” . Proceedings of the 2003 Joint IEEE International Workshop on

Visual Surveillance and Performance Evaluation of Tracking and Surveillance” ,

October 2003, pp. 149-156

[Meier99] “ Segmentation and Tracking of moving objects for content-based video

coding” , T. Meier, K.N.Ngan, 1999, VISP IEEE

[Kang03] Jinman Kang, Isaac Cohen, Gerard Medioni. “ Soccer Player Tracking

across Uncalibrated Camera Streams” . Proceedings of the 2003 Joint IEEE

International Workshop on Visual Surveillance and Performance Evaluation of

Tracking and Surveillance” , October 2003, pp. 172-179

[Kovacic03] Stanislav Kovacic, “ Tracking Players in Sports Games” , Last Accessed

on 13-10-03

http://vision.fe.uni-lj.si/research/trackp/

[Pers00] Janex Pers, Stanislav Kovacic, “ Computer Vision System for Tracking

Players in Sports Games” , Faculty of Electrical Engineering, University of Ljublijana,

Last Accessed on 26-4-04

http://vision.fe.uni-lj.si/docs/janezp/iwispa2000-janez.pdf

[Pers01] Janez Pers, Stanislav Kocacic, “ Tracking People in Sport: Making Use of

Partially Controlled Environment” , University of Ljubljana Last Accessed on 26-4-04

http://avi5.fe.uni-lj.si/docs/janezp/caip2001.pdf

[Pers01b] Janex Pers, Goran Vuckovic, Stanislav Kovacic, Branko Dexman, “ A Low-

Cost Real-Time Tracker of Live Sport Events” , Faculty of Electrical Engineering,

University of Ljublijana, Last Accessed on 26-4-04

http://vision.fe.uni-lj.si/docs/janezp/pers-ispa2001.pdf

[Pers01c] Janex Pers, Marta Bon, Stanislav Kovacic. “ Errors and Mistakes in

Automated Player Tracking” , Proceedings of Sixth Computer Vision Winter

52

Workshop, February 2001, pp 25-36

[Sonka99] Milan Sonka, Vaclav Hlavac, Roger Boyle, “ Image Processing, Analysis,

and Machine Vision” , PWS Publishing, Second Edition, 1999

[Stauffer03] Chris Stauffer, Kinh Tieu, Lily Lee, “ Robust Automated Planar

Normalization of Tracking Data” ,. Massachusetts Institute of Technology, VSPET

2003.

[Tabb03] Ken Tabb, Neil Davey, Rod Adams, Stella George, “ The Recognition and

Analysis of Animate Objects using Neural Networks and Active Contour Models” ,

University of Hertfordshire

[Telephony04] Telephony World, “ New Report Predicts Exponential Growth in

Mobile Multimedia Services” , March 18th 2004, Last Accessed 26-4-04,

http://www.telephonyworld.com/cgi-

bin/news/viewnews.cgi?category=all&id=1079658444

[Yannone03] Ronald Yannone, “ Some of my Experiences with Kalman Filters” , Last

Accessed 26-4-04

http://www.polymath-systems.com/intel/hiqsocs/megasoc/noes138/kalman.html

[Vodafone04] Vodafone, “ Our Tariffs” , April 2004, Last Accessed 26-4-04,

http://www.vodafone.ie/dataservices/gprs/prices/index.jsp

53

Appendix A – Example TIPS Output File

Below is a section from a TIPS Output file. Note that it also contains velocity

information, in number of pixels, about each player and ball. At initialization each

player and ball has a velocity of zero.

# File Type

0

# Number of Players

5

# Is the ball present (1 is true)

1

# Time to wait between ticks (in milliseconds)

100

# The width and height of the real court

15 25

# <playerNumber> <teamNumber> <column> <row>

# Update number 0

# Velocity (0.000000,0.000000)

0 -842150451 8.891602 3.637695

# Velocity (0.000000,0.000000)

1 -842150451 4.555664 13.745117

# Velocity (0.000000,0.000000)

2 -842150451 4.438477 16.528320

# Velocity (0.000000,0.000000)

3 -842150451 11.791992 11.352539

# Velocity (0.000000,0.000000)

4 -842150451 6.899414 16.137695

# Velocity (0.000000,0.000000)

6.870117 21.459961

# Update number 1

# Velocity (0.000000,-0.300000)

0 -842150451 8.833008 3.540039

# Velocity (0.550000,-0.400000)

1 -842150451 4.526367 13.500977

# Velocity (-1.150000,0.150000)

2 -842150451 3.881836 16.870117

# Velocity (-0.250000,0.950000)

3 -842150451 11.791992 11.791992

# Velocity (1.850000,-0.200000)

4 -842150451 7.309570 15.795898

6.870117 21.459961

# Update number 2

# Velocity (0.000000,0.200000)

0 -842150451 8.891602 3.637695

54

# Velocity (0.250000,-1.650000)

1 -842150451 3.295898 12.475586

# Velocity (-1.050000,-0.550000)

2 -842150451 3.676758 17.114258

# Velocity (-0.850000,-0.150000)

3 -842150451 11.791992 12.377930

# Velocity (1.050000,0.500000)

4 -842150451 7.749023 15.893555

# Velocity (0.000000,0.000000)

6.870117 21.459961

# Update number 3

# Velocity (0.000000,0.750000)

0 -842150451 8.891602 3.637695

# Velocity (0.500000,-0.850000)

1 -842150451 2.944336 11.791992

# Velocity (-0.300000,-1.650000)

2 -842150451 3.149414 16.821289

# Velocity (-0.750000,0.900000)

3 -842150451 12.407227 12.866211

# Velocity (2.150000,-1.050000)

4 -842150451 8.100586 15.209961

# Velocity (-0.050000,-0.050000)

6.870117 21.459961

# Update number 4

# Velocity (0.000000,-0.300000)

0 -842150451 8.891602 3.637695

# Velocity (1.050000,-0.450000)

1 -842150451 3.208008 11.401367

# Velocity (0.750000,-0.450000)

2 -842150451 3.149414 16.577148

# Velocity (-0.350000,0.250000)

3 -842150451 12.260742 13.159180

# Velocity (2.250000,-1.000000)

4 -842150451 8.598633 14.282227

# Velocity (0.200000,0.200000)

6.958008 21.459961

55

Appendix B – Video Setup File

Bellow is a sample configuration file for a particular video clip. It describes the

location of the points, in pixel coordinates, around the visible half court. It also

describes the size, in meters, of the real court.

#

# This file is associated with the file Video_1.00.avi

#

# This is the type of setup file that we have

# 0 is a four point half court system where points are ordered such that

#

# 0 --- 1

# | |

# | |

# | |

# 3 --- 2

#

# 1 is a four point full court system where points are ordered as above...

0

#

# This is the list of image points x0 y0 x1 y1 x2 ... y3

198 89 356 117 284 217 80 150

# This is the list of real world points (full court)

0 0 15 0 15 25 0 25

# This is the polygon with points listed clockwise that is the area of

# interest in our video, movement outside this we don’t have to worry about...

200 65 360 95 200 429 -140 250

56

Documents

Automated Sports Analysis and Distribution with Computer ...Automated Real Time Sport Analysis and Distribution with Computer Vision and Mobile Phones Ryan Sherlock B.A. (Mod.) Computer