Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Automated Real Time Sport Analysis and
Distribution with Computer Vision and
Mobile Phones
Ryan Sherlock
B.A. (Mod.) Computer Science
Final Year Project May 2004
Supervisor: Dr Kenneth Dawson-Howe
Abstract
“Automated Real Time Sport Analysis and Distribution with Computer Vision and
Mobile Phones” is a project in the area of both Computer Vision and Mobile
Communications. The field of Computer Vision attempts to emulate human vision
through the processing of digital images. The aim of this project is to recognize, track
and draw conclusions on the locations of the players and ball in a sports clip from a
static camera. The result from this step, in the form of player and ball locations, will
then be distributed to mobile communication devices, for example, a mobile phone,
for display as low bandwidth animation representations of what occurred in the sports
clip.
This report aims to outline the development of this project, from the background
research, to the successful end result.
ii
Acknowledgements
I would like to thank my supervisor Kenneth Dawson-Howe for his advice throughout
the project.
I would also like to thank my family and friends for their support and encouragement.
iii
Table of ContentsABSTRACT.............................................................................................II
ACKNOWLEDGEMENTS..................................................................III
1 INTRODUCTION.................................................................................11.1 PURPOSE...........................................................................................................1
1.2 AIMS...................................................................................................................1
1.3 MOTIVATION...................................................................................................2
2 OVERVIEW OF PROBLEM..............................................................32.1 INTRODUCTION...............................................................................................3
2.2 PRINCIPAL PROJECT COMPONENTS..........................................................3
2.3 PROJECT ASSUMPTIONS...............................................................................4
2.4 TECHNOLOGIES USED...................................................................................5
2.5 READERS GUIDE.............................................................................................5
2.5.1 Background Research.................................................................................5
2.5.2 Video Preprocessing...................................................................................6
2.5.3 Player and Ball Recognition.......................................................................6
2.5.4 Tracking......................................................................................................6
2.5.5 Geometric Transformations........................................................................6
2.5.6 Animation Distribution with Mobile Devices..............................................7
2.5.7 Sports Display Servlet.................................................................................7
2.5.8 Sports Display Mobile Application.............................................................7
2.5.9 Sample Results............................................................................................7
2.5.10 Conclusions...............................................................................................7
3 BACKGROUND RESEARCH............................................................83.1 PLAYER TRACKING AND RECOGNITION..................................................8
3.1.1Colour Tracking...........................................................................................8
3.1.2 Active Contour Models................................................................................8
3.1.3 Template Tracking......................................................................................9
4 PREPROCESSING AND BASIC VISION ALGORITHMS..........114.1 INTRODUCTION.............................................................................................11
4.2 SMOOTHING (AVERAGING)........................................................................11
4.3 BACKGROUND DETECTION.......................................................................12
iv
4.4 BINARY THRESHOLDING............................................................................12
4.5 VIDEO STABILIZATION................................................................................13
4.5.1 Introduction...............................................................................................13
4.5.2 Algorithm..................................................................................................14
5 PLAYER AND BALL INITIALIZATION.......................................175.1 INTRODUCTION.............................................................................................17
5.2 CONNECTED COMPONENT ANALYSIS....................................................17
5.2.1 Algorithm..................................................................................................18
5.2.2 Connected Component Analysis Example.................................................19
5.3 INITIALIZING THE BALL..............................................................................20
5.3.1 Algorithm..................................................................................................20
5.4 INITIALIZING PLAYERS...............................................................................20
5.4.1 Algorithm..................................................................................................21
5.4.2 Building a Model of the Player.................................................................21
6 TRACKING.........................................................................................236.1 INTRODUCTION.............................................................................................23
6.2 WHICH TRACKING SHOULD BE USED?...................................................23
6.3 NON OCCLUSION TRACKING.....................................................................23
6.3.1 Algorithm..................................................................................................24
6.4 OCCLUSION TRACKING..............................................................................24
6.4.1 Algorithm..................................................................................................25
7 GEOMETRY CONVERSION...........................................................267.1 INTRODUCTION.............................................................................................26
7.2 NECESSARY MATHEMATICS.....................................................................26
7.2.1 Intersecting Point Between Two Lines......................................................27
7.2.2 Distance calculation.................................................................................27
7.2.3 External Divisor........................................................................................28
7.2.4 Point inside a Polygon..............................................................................28
7.3 EXTENDING THE COURT............................................................................29
7.3.1 Algorithm..................................................................................................29
7.4 PARTITIONING THE COURT........................................................................30
7.4.1 Initial Partitioning of the Court................................................................31
7.4.2 Subsequent Partitioning of the Court........................................................31
v
7.5 CONVERTING PIXEL COORDINATES TO REAL COORDINATES.........32
7.5.1 Algorithm..................................................................................................33
7.6 TIPS OUTPUT FILES (TOF FILES)................................................................35
8 ANIMATION DISTRIBUTION WITH MOBILE DEVICES.......368.1 INTRODUCTION.............................................................................................36
8.2 FULL SYSTEM ARCHITECTURE.................................................................36
8.2.1 Overview of the Life of a Video Sequence.................................................37
9 SPORTS DISPLAY SERVLET.........................................................399.1 INTRODUCTION.............................................................................................39
9.2 FILE INFORMATION REQUESTS.................................................................39
9.3 FILE DOWNLOAD REQUESTS.....................................................................39
9.4 FILE DOWNLOAD RESPONSE FORMAT...................................................40
9.5 THE COST OF DOWNLOAD.........................................................................41
10 SPORTS DISPLAY MOBILE APPLICATION............................4210.1 DISPLAYING ANIMATIONS.......................................................................42
10.2 USER INTERFACE EXAMPLES .................................................................43
11 SAMPLE RESULTS.........................................................................4511.1INTRODUCTION............................................................................................45
11.2 SAMPLE TRACKING RESULTS.................................................................46
11.3 SAMPLE SPORTS DISPLAY RESULTS.....................................................47
12 CONCLUSIONS AND ANALYSIS................................................4812.1 CONCLUSIONS ............................................................................................48
12.2 PROBLEMS ENCOUNTERED.....................................................................48
12.3 IMPROVEMENTS.........................................................................................49
12.4 COMMERCIAL PROSPECTS.......................................................................50
13 REFERENCES..................................................................................51
APPENDIX A – EXAMPLE TIPS OUTPUT FILE...........................54
APPENDIX B – VIDEO SETUP FILE................................................56
vi
Index of FiguresFigure 2-1 Example sports footage [Boyle03]..............................................................4Figure 3-1 Variations in Player Model [Boyle03]........................................................9Figure 4-1 Before Smoothing......................................................................................11Figure 4-2 After Smoothing.........................................................................................11Figure 4-3 Difference Image Creation........................................................................12Figure 4-4 Binary Image Creation..............................................................................13Figure 4-5 Need for video stabilization.......................................................................14Figure 4-6 Least difference bin selection....................................................................15Figure 4-7 Background Image and Mask....................................................................16Figure 4-8 Rotating Mask Example.............................................................................16Figure 5-1 Connected Component Analysis Example.................................................18Figure 5-2 Joining Player Parts..................................................................................18Figure 5-3 CCA 1........................................................................................................19Figure 5-4 CCA 2........................................................................................................19Figure 5-5 CCA 3........................................................................................................19Figure 5-6 CCA 4........................................................................................................19Figure 5-7 Difference in Player Size and Elongatedness............................................21Figure 5-8 Player Model Example..............................................................................22Figure 6-1 Tracking Occluded Players......................................................................25Figure 6-2 Player Model Representation....................................................................25Figure 7-1 Pixel to Meter Conversion.........................................................................26Figure 7-2 External Divisor........................................................................................28Figure 7-3 Diagonal Intersection Lines......................................................................29Figure 7-4 Extending with Ratios................................................................................30Figure 7-5 Initial Partitioning of the court.................................................................31Figure 7-6 Subsequent Partitioning of the court.........................................................32Figure 7-7 Converting a Location 1............................................................................33Figure 7-8 Converting a Location 2............................................................................34Figure 7-9 Coordinate Conversion.............................................................................34Figure 8-1 Full System Architecture...........................................................................37Figure 10-1 Nokia 7210..............................................................................................42Figure 10-2 Animation Generation.............................................................................43Figure 10-3 Phone Examples - Main Menu - Open File - About................................44Figure 11-1 Sample TIPS results................................................................................46Figure 11-2 Sample Phone Results.............................................................................47Figure 12-1 Sample Indoor Image..............................................................................49
vii
1 Introduction
Computer Vision attempts to emulate the human vision system. This is an extremely
difficult problem and has not been solved generally. To fully solve Computer Vision a
comprehensive understanding of the human brain is necessary. Experts believe that
this understanding milestone is still some time off.
Measurement of human motion has been available with the use of invasive sensors or
targets for some time. Recent developments in computer vision provide basic human
motion tracking. Since computer vision relies only on video cameras, computer
hardware and software, this type of tracking is unobtrusive, inexpensive and usually
works in existing unmodified environments.
Mobile Communications has expanded dramatically over the last ten years.
Convergence has meant that the mobile phone, originally a wireless extension of the
wired telephony infrastructure, is developing into a comprehensive media center used
for an expanding range of services.
The combination of Computer Vision, for content creation, and mobile
communication devices for content distribution can create interesting systems. This
project aims to explore one of these systems.
1.1 Purpose
The purpose of this project is to automatically create and display on a mobile
communication device an animation representation of a sports video clip in real time.
1.2 Aims
Create an application that will be able to track individual players in video clips in real
time and produce player waypoint1 that will allow the measurement and display of the
players in game position.
1 Waypoint - A major point on a route to a destination – a 'snap shot' of the objects location.
1
Build a distribution infrastructure that will be used to drive a graphical representation
of the actual game footage on a mobile device. (mobile phone/PDA)
Integrate the systems so that the full media creation and distribution pipeline is
automatic.
1.3 Motivation
Mobile phone penetration has reached more then 80% in the Irish market [Budde03].
In parallel with this explosion in penetration, mobile phone usage has also diversified.
SMS (Short Messaging Service), MMS (Multimedia Messaging Service), WAP
(Wireless Access Protocol) and mobile gaming have now become an important part of
an operators revenue with data services accounting for over 17% of operator revenues
in 2003 in the UK and Ireland. With multimedia services and mobile gaming seen as
the next major evolution in the development of the mobile phone, the creation of low
bandwidth multimedia content has become a lucrative market. [Telephony04]
Using Computer Vision for automated content creation, mobile device suitable
multimedia content can be created and distributed with a very low “cost to market”.
2
2 Overview Of Problem
2.1 Introduction
This Chapter is intended to give an overview to the project, identifying problem areas
that had to be resolved in order to provide a working solution. This chapter will also
act as a “readers guide” for the rest of the document by outlining the purpose of each
chapter.
There has been a great deal of research in human tracking in recent years. Player
tracking is an example of a practical use for human tracking. Previous research into
sports player tracking had several motivations:
1. Professional trainers and sports scientists are interested in the amount of ground
covered during a game by each athlete and how quickly they move during the
game. This information can be used to specialize training regimes for players.
2. By tracking the main events that occur during a game, meta data can be added to
the video for future searching and referencing [Dahyot04]
3. By tracking opponents during a game it is possible to draw conclusions on the
tactics and plays that they use in different situations. [Aaron02]
What I plan to do in this project is to add another practical use for player tracking to
this list.
2.2 Principal Project Components
Tracking sports players over a large playing area is an extremely challenging problem.
The players change direction quickly and have large variations in their form. The size
of the court (15 x 25 meters) means that the resolution of an object varies for different
parts of the image. When looking at the front of the court [Figure 2-1] each pixel
represent several centimeters while when you look at the pixels at the rear of the court
each pixel represents 10-20 centimeters. This means that, from this view, the size of a
3
player can vary from 30 pixels to 10 pixels in width.
In addition, the tracking of the ball has not previously been accomplished in a sports
analysis system.
In order to create an automated sports analysis and distribution system, this problem
had to be divided into smaller modular tasks that were easy to test. This increased the
efficiency of development.
The principal steps that were needed to automatically create and display mobile sports
content are as follows:
8 Stabilize the video clip
8 Identify the players and ball in the video sequence
8 Track the players and ball throughout the sequence
8 Convert the players image positions (in pixels) into real world positions (in meters)
8 Make this information available on the Internet in an efficient form for
downloading/streaming
8 Create a mobile display interface and connection protocol so that mobile devices
can download and display the animation
2.3 Project Assumptions
The following are the assumptions that were made in order to make this project
4
Figure 2-1 Example sports footage [Boyle03]
possible in the given time period. Given more research and development time, some
of these assumptions may be relaxed.
8 Input video is from a static camera without zooming
8 A background image for the input video is available
8 Lighting conditions throughout the clip do not change dramatically
8 The physical dimensions of the court are known
8 The first frame of the video does not contain occluded2 players or ball
8 Players do not leave the playing area
2.4 Technologies Used
The Trinity Image Processing System (TIPS), developed by Kenneth Dawson-Howe,
Trinity College Dublin, is the platform built upon for content creation. This system,
written in C++, is expanded in order to create a new video stabilization method,
initialize and track the players and ball, and convert the relevant player information
into real world coordinates.
Java 2 Enterprise Edition (J2EE), and more specifically servlets, are used as the base
technology to convert and distribute the TIPS Output Files (.tof files) into mobile
phone optimized downloads that can be distributed to mobile devices using the
Internet and GPRS.
Java 2 Micro Edition (J2ME) is used as the platform for building a mobile device
display and communication system.
2.5 Readers Guide
2.5.1 Background Research
Background research into the areas of player recognition and tracking.
2 Occlusion occurs when an object obstructs the view to another object. That is, the object that is
behind is not fully visible.
5
2.5.2 Video Preprocessing
In order for the sports clip to be suitable for processing the video must first be
stabilized with the background image. This section will discuss a new method that
was developed for video stabilization as well as give a brief introduction into the
other preprocessing techniques and basic vision algorithms that were used in the
project.
2.5.3 Player and Ball Recognition
To be able to track the players and ball successfully, their initial position in the image
must be found. This section will describe the techniques used during player and ball
initialization.
2.5.4 Tracking
Tracking is performed so that the location of the players and ball is known throughout
the video sequence.
A computationally fast algorithm will be presented that can track players when there
is no occlusion expected.
When objects that are been tracked occlude each other, which in team sports occurs
regularly, tracking becomes much more difficult. This chapter will also describe the
advanced tracking techniques that were developed to overcome this problem.
2.5.5 Geometric Transformations
When a player or ball is tracked successfully, there is still no “image world” to “real
world” mapping. This section aims to describe the process involved in converting an
objects location from pixels, to meters.
6
2.5.6 Animation Distribution with Mobile Devices
Using the player and ball locations, waypoints are created. This section will examine
the full system architecture and how these waypoints are transported to a mobile
device.
2.5.7 Sports Display Servlet
The Sports Display Server is the connection point between the content creation stage
and the animation display stage. This section will give an introduction to this
connecting service.
2.5.8 Sports Display Mobile Application
The Sports Display Application is the only part of the system that the end user will
use. It allows the user to download new animations from the server, view animations,
open previously saved animations and delete previously saved animations on their
mobile phone. This section will give an introduction to the design and operation of
this application
2.5.9 Sample Results
Sample images from various video clips and mobile phone animations are presented.
(It is advised that the attached CD-ROM is used for sample video results)
2.5.10 Conclusions
This section will present the project conclusion, the problems encountered and the
commercial prospects of this system.
7
3 Background Research
3.1 Player Tracking And Recognition
This section introduces some of the background research into object recognition and
tracking that was needed for the successful completion of this project.
The segmentation of the players from the background was the most difficult step in
this project. Below are some examples of techniques that are used to initialize and
track movable objects, and in this case players, over a video sequence. The actual
algorithms used in this project draw from these research areas.
3.1.1 Colour Tracking
The algorithm searches for the pixel most similar to the recorded colour of the player.
The search is performed on a limited area (say 10 pixels in each direction, we are
using a finite acceleration/velocity assumption here) The similarity measure is defined
as Euclidean distance
S colour x , y � I R x , y "C R2 I R x , y "C R
2 I R x , y "C R2
Where I is the image and C is the recorded colour of the player. R, G and B denote the
red, green and blue channels, respectively. The main advantage of this algorithm is
that it is highly reliable. It tracks the players even when the colour of the player
changes due to compression artifacts or changes in the lighting conditions. The main
disadvantage is that it can also lock onto background colours. Another disadvantage is
that this method can create a lot of jitter in the resultant player trajectory. This makes
it inappropriate for stand alone use. [Pers01]
3.1.2 Active Contour Models
The development of active contour models (snakes) results from the work of Kass,
8
Witkin and Terzpolos [Sonka99]. Energy minimization is used to achieve image
segmentation and understanding. The snake, itself, contains two parts – a list of
control points and an energy function. Each control point is an x , y point in the
image plane that collectively represent the shape of the object being recognized. For
visualization purposes the control points are usually linked together on the screen
creating an outline of the investigated shape. The energy function is the mathematical
rule set that governs the changes possible in the location of the control points.
Therefore the energy function is critical to the active contour model’s success at being
able to lock onto particular types of object [Tabb03].
Whilst improvements have been made in making snakes suitable for particular tasks,
there are still several major weaknesses. Snakes cannot categorize shapes as there is
no inbuilt ‘knowledge’ of the object that they are detecting. This makes snakes less
suitable for tracking objects in complex backgrounds [Tabb03]. Also, since snakes
become ‘locked’ to an objects outline, if the object becomes occluded by another
object, the snake will no longer be able to follow the object. This is the fatal flaw with
snakes with regards to player tracking, which meant that they could not be used in this
project.
3.1.3 Template Tracking
Visual differences between the players and the background can be exploited in order
to track objects [Pers00]. Generally in tracking applications the objects that are being
tracked are similar in nature, for example, chocolates on a conveyor belt. However
with player tracking the variation in template can be quite dramatic. Figure 3-1 shows
some of the possible player variations [Boyle03].
9
Figure 3-1 Variations in Player Model [Boyle03]
As the players shape and scale change dramatically throughout the clip, the number of
templates that would be needed for accurate tracking would be unmanageable. Simple
template tracking is therefore not appropriate for general player tracking. However, as
will be demonstrated in chapter 6, a large part of the solution to occlusion tracking
relies on concepts taken from template tracking.
10
4 Preprocessing and Basic Vision Algorithms
4.1 Introduction
During video preprocessing many simple algorithms are used to aid and initialize the
more complex tracking algorithms. This section will briefly go through these
algorithms.
In addition, a new method for stabilizing videos that was developed for this project
will be presented.
4.2 Smoothing (Averaging)
Compression artifacts and image noise are a major concern in later image processing
stages. Smoothing uses redundancy in the image data to suppress the noise. [Sonka99]
Given an input image, Figure 4-13, the output image pixels were specified by the
average of the equivalent input pixel and its neighbouring pixels. The effect of this
operation is the blurring of the image. [Figure 4-2] As the objects of interest in the
image are large in relation to the size of the neighbourhood used to create the output
pixel, none of the necessary detail is lost. The smoothing algorithm used in this
project is a part of the TIPS platform.
3 The court in all images except Figure 2-1 is part of the Trinity College Dublin Campus.
11
Figure 4-1 Before Smoothing Figure 4-2 After Smoothing
4.3 Background Detection
Assuming a stationary camera and constant illumination, background detection is the
most straightforward approach to motion detection. Each pixel in each frame in the
video sequence (current image) is subtracted from the equivalent pixel in the
background image to yield a difference image D.
D��RB"RC� �GB"GC� �BB"BC�
where R, G and B denote the red, green and blue components of the pixel. B and C
represent the background image and the current image respectively.
The resultant image [Figure 4-3] shows all pixels in the video that have changed from
the background image.
4.4 Binary Thresholding
Binary thresholding is used as a simple form of image segmentation. Thresholding is
12
Figure 4-3 Difference Image Creation
computationally inexpensive and fast. More accurately, thresholding is the
transformation of an input image f to a segmented output image g.
g i , j �1 for f i , j �Tg i , j �0 for f i , j T
where T is the threshold value, g i , j �1 for elements of image objects and
g i , j �0 for elements of the background. [Sonka99]
Figure 4-4 shows a sample difference image and the thresholded binary output. The
white areas are referred to as image objects, while the black area is the background.
4.5 Video Stabilization
4.5.1 Introduction
Video Stabilization is the first preprocessing technique that is performed on the video
sequence. It is being introduced at this stage as an understanding of the subsequent
preprocessing techniques is needed to understand the need for the stabilization.
In the background subtraction step, it is vital that the background image is aligned
correctly with the video sequence that is being processed. Figure 4-5, shows the
resultant difference image from a frame in a video sequence that is offset by one pixel
13
Figure 4-4 Binary Image Creation
from the background image4. In this image the outline of the court markings and
surrounding buildings are clearly visible. If binary thresholding was then performed
on this image, the courts markings would become an image object that would not
allow the correct initialization of the player and ball tracking algorithm.
This section will now present the new method that was developed for the stabilization
of a video sequence that has a maximum jitter of several pixels in any direction.
Briefly, stabilization is performed by finding features in the video frame that have not
changed in relation to the equivalent features in the reference image. Then, by moving
these features around their neighbouring area, a vote can be held among the chosen
features in the video with its corresponding feature in the background image. The
direction that wins this vote is the direction that the video will be offset.
The stabilization of the video did not have to be performed on every video sequence
that was used for processing.
4.5.2 Algorithm
This method starts by breaking the current image (current frame from the video
sequence) into a grid of N x M ’bins’, where N is typically 8 and M is 12. [Figure 4-6]
4 This image has been exaggerated so that the errors in the difference image are visible in print.
14
Figure 4-5 Need for video stabilization
Every pixel in each bin is then subtracted from the equivalent pixels in the reference
image. The absolute pixel difference between all pixels in the current bin and the
equivalent reference bin is then stored with the current bin.
After this has been completed for each bin in the image, there is now a way to elect
the bins that differ least from the equivalent reference bins. These ’least difference’
bins are the bins that contain the lowest absolute difference count. X bins are then
used for the next step, where X is a natural number, typically 25% of N x M. In
Figure 4-6, the selected least difference bins are visible as the bins that are not
artificially darkened. These bins are the “ features” that will be used for image
realignment.
The X bins that were selected in the previous step are now used to find the optimum
feature matching offset. This basically means that the pixels in each of the bins will
be offset by a certain amount, against the equivalent background pixels. A difference
count will then be performed. When all offsets have been calculated for each bin,
each bin will report the offset that most matched (had the lowest absolute difference
count) the background image. Whichever offset got the most votes will be declared
the winner and the full video will be offset by that amount. For example, two pixels
left, one pixel up.
15
Figure 4-6 Least difference bin selection
As a simple example, Figure 4-7 shows a background image in white and a bin in
gray. Each box in both images represent a pixel. Figure 4-8 shows the bin being offset
against the background in all positions that are a maximum of 1 pixel away in each
direction. The central image shows the bin in its natural position, that is, with an
offset of zero columns and zero rows. The same principle is used in video
stabilization except the maximum distance to check for a match is user defined.
16
Figure 4-8 Rotating Mask Example
Figure 4-7 Background Image and Mask
5 Player and Ball Initialization
5.1 Introduction
In order to be able to track the players and ball in the scene their initial position must
first be found. Below is a discussion on how to extend upon the preprocessing
techniques already described to initialize the player and ball tracking.
The first technique introduced in this section is Connected Component Analysis, the
initialization of both the players and the ball relies on the successful completion of
this step. Next, the steps behind initializing the ball will be described. Finally, a
discussion on how the players are initialized will be presented.
5.2 Connected Component Analysis
The binary image obtained from thresholding consists of objects and background. In
the left image in Figure 5-1 the objects are represented by the white pixels and the
background by the black pixels. There is now a need to label the objects in order to
perform player and ball recognition.
Within the TIPS platform there is an implementation of Connected Component
Analysis, a new faster and more specialized version was developed to keep the
application running as fast as possible. This new version will now be presented.
Figure 5-1 shows the input and output of the connected component analysis
algorithm. In the output image each labeled region is visualized by giving each region
a different colour.
17
5.2.1 Algorithm
The first stage of the algorithm is to give each object in the binary image a label.
B Search the image row by row.
B If a pixel is an object pixel and not labeled, then give the pixel a label and
recursively search all connected object pixels giving them the same label.
B If the resultant region is smaller then N5 pixels, delete the region.
This completes the first stage of the specialized connected component analysis. The
next stage joins and relabels large close regions. Figure 5-2 Illustrates the need for
this step. In the left hand it is clear that the two objects represent a full player, so the
next step joins these two regions to yield the right hand image in Figure 5-2.
B Navigate around the edge of each labeled region searching for other labeled
regions that are within several pixels.
B If such regions are found, artificially create a connection between both regions,
5 N is typically 8 when using a video with resolution 360 x 288
18
Figure 5-1 Connected Component Analysis Example
Figure 5-2 Joining Player Parts
label both regions consistently and record the size, in pixels of the complete
region.
The final step is to relabel all regions, so that the first region found has a label of 1 the
second 2 and so on.
5.2.2 Connected Component Analysis Example
Bellow is an example of the algorithm in operation on a simple test case.
Figure 5-3 is the input binary image to Connected Component Analysis (CCA) The
objects are shows as 1s and the background as 0s. Each box represents a pixel.
Figure 5-4 Shows the result after the algorithm finds the first non labeled object pixel.
All adjoining pixels are labeled the same value, 2. One region has now been created.
Figure 5-5 The algorithm continues and finds the second non labeled region. It labels
it 3.
Since both region 2 and 3 are close together they are joined and the labels are
synchronized. Figure 5-6 Shows the result of this operation.
19
Figure 5-3 CCA 1
0 0 0 0 0 0 0 00 1 0 0 0 0 0 00 1 1 0 0 1 0 00 1 1 1 0 1 1 00 0 1 1 0 1 1 00 0 0 0 0 1 1 00 0 0 0 0 1 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
Figure 5-4 CCA 2
0 0 0 0 0 0 0 00 2 0 0 0 0 0 00 2 2 0 0 1 0 00 2 2 2 0 1 1 00 0 2 2 0 1 1 00 0 0 0 0 1 1 00 0 0 0 0 1 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
Figure 5-5 CCA 3
0 0 0 0 0 0 0 00 2 0 0 0 0 0 00 2 2 0 0 3 0 00 2 2 2 0 3 3 00 0 2 2 0 3 3 00 0 0 0 0 3 3 00 0 0 0 0 3 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
Figure 5-6 CCA 4
0 0 0 0 0 0 0 00 2 0 0 0 0 0 00 2 2 0 0 2 0 00 2 2 2 2 2 2 00 0 2 2 0 2 2 00 0 0 0 0 2 2 00 0 0 0 0 2 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
5.3 Initializing the Ball
Initializing the ball in a low resolution image is an extraordinarily difficult problem.
Due to the size of the ball, in the video sequence it can appear as being only a few
pixels wide. This can lead to the region that the ball is in not being found, or other
objects, for example players heads, in the image being falsely initialized as the ball.
There are steps used to increase the accuracy of the initialization of the ball. Only
regions of the correct size, shape and colour are searched.
5.3.1 Algorithm
B For each labeled region in the image
B Compare the size of the region with the expected size of the ball. If the region is
of the correct size, compare the elongatedness6 of the region with the expected
elongatedness.
B Compare the associated central pixel values in the current image with the
expected colour of the ball.
B If all of the above conditions are met, a ball candidate has been found.
5.4 Initializing Players
Initializing the players is performed in much the same way that the initialization of
the ball was performed. Since the assumption was made that the first frame in all
sports clips will contain all the players not occluding each other it is possible to study
the regions returned from connected component analysis to find the players. Regions
that are the correct size and shape are assumed to be players.
6 Elogatedness: This the the length divided by the width. For example a square has an elongatedness
of 1. Since a ball is a sphere it should be approximately 1, although the blurring of the ball in the
video must also be considered
20
5.4.1 Algorithm
B For each labeled region in the image
B Compare the size of the region with the expected size of the player.
B If the region is of the correct size compare the elongatedness of the region with
the expected elongatedness. The elongatedness of a player can vary
dramatically, so the allowable range must be quite large. Figure 5-7 is an
example of this wide variation.
B If successful, a player has been found. Using the height and width obtained
from CCA, the player now has an artificial bounding box surrounding it.
5.4.2 Building a Model of the Player
For occlusion tracking, in the next section, it is necessary that a model of the players
last appearance is available. During initialization and non occlusion tracking this
model is created/updated. The follow steps are needed to create this internal player
model.
When a player is initialized for the first time an object within the TIPS platform is
created to represent the player. This object contains information about the players
width, height, location and size in pixels. There are two more attributes of the object
that are very important for the new method of occlusion tracking that was developed.
These are:
1. Information on what parts inside the bounding box actually belong to the player
2. Colour information on all parts of the player that are visible
21
Figure 5-7 Difference in Player Size and Elongatedness
This information is generated as follows.
B When a player has been found an artificial bounding box is placed around the
player. An example of the bounding box is in Figure 5-7.
B All pixels within this bounding box are tested for being part of the region that the
player has been initialized from. If a pixel is part of the region, it is recorded in the
players object.
B Also, if a pixel is part of a player, the colour at that location is also recorded.
For example, Figure 5-8 shows a simple representation of the 3 stages in this process.
Image A shows the original image of the player that is bring used for initialization.
Image B shows the pixels within the bounding box that are part of the player. 1
represents part of the player, while 0 represents the background.
Image C shows the colours that were saved to represent the player.
22
Figure 5-8 Player Model Example
6 Tracking
6.1 Introduction
The correct tracking of the players in the video sequence was the most difficult aspect
of this project. After initialization of the players and ball, the tracking algorithm must
lock on to the player or ball throughout the video sequence. For the tracking to be
useful in this application, it must also be capable of running in real time.
6.2 Which Tracking Should Be Used?
In order to keep the application running in real time, the use of the occlusion tracking
techniques, since they are computationally more expensive, should be used as little as
possible. In order to decide if occlusion tracking is necessary at the start of processing
of each frame in the video sequence, the shortest distance between each player is
calculated. If this distance is small enough such that in the next frame the players may
occlude each other then these players are ’tagged’ and the special occlusion tracking
algorithm will be used to track them.
6.3 Non Occlusion Tracking
A new computationally fast method for tracking non occluding players was developed
for this section. The method itself is quite simple to understand once the process of
player initialization is understood. This method relies on two assumptions.
1. No other players are within a few pixels of the outside of the bounding box
2. The player has a maximum velocity that they cannot exceed
Assumption 1 is true, since at the start of processing of each frame each player is
checked to make sure no other players are close by. If there is a player close by the
occlusion tracking algorithm will be used.
Assumption 2 is true since the fastest speed possible by a human, without artificial
23
assistance is in the order of 10 meters per second. This, depending on the location of
the player in the image can range from 1 to 5 pixels per frame of video.
6.3.1 Algorithm
1. Extend the bounding box surrounding the player by N pixels in every direction,
where N is the maximum distance per frame that the player can move given their
previous location and velocity.
2. Perform full connected component analysis on this region as described in Section
5.2.
3. Now perform player initialization on the resultant CCA image, expecting to find
one player.
4. Using the new player region, update all player object attributes.
6.4 Occlusion Tracking
The tracking of players through occlusion is accomplished by using the player colour
model generated in 5.4.2. Effectively the players colour and shape based model is
tracked through the sequence.
Figure 6-1 Shows an example of two players being tracked through occlusion. In the
first image both players are surrounded by a red box, this signifies that both players
are being tracked by the fast non occlusion tracking algorithm. In the next three
images the players bounding boxes are blue, this shows that the special occlusion
tracking algorithm is in use. This section will describe this occlusion tracking
algorithm.
24
6.4.1 Algorithm
B Due to the position of the player in the image it is possible to find out the
maximum distance in each direction that the player could have moved since the
last frame, lets refer to this distance in pixels as P. [Section 6.3]
B Using the colour model of the players jersey [Figure 6-2], test how well the jersey
model matches the current frame in the players neighbourhood up to a maximum
of P pixels away.
B The location that most matches the player, is the location that the player moved to.
Alter the players coordinates appropriately. Previous player velocity information
can also be used to resolve ambiguities.
The tracking of the ball is accomplished in a similar manner to the occlusion tracking
of a player.
25
Figure 6-1 Tracking Occluded Players
Figure 6-2 Player Model Representation
7 Geometry Conversion
7.1 Introduction
With the tracking information that was obtained in the previous two sections it is
possible to infer where the players are in pixel coordinates on the screen. For this
tracking information to be useful in the creation of generic animations, the pixel
coordinates must be converted into real world locations. There are two main parts in
this conversion process.
1. Extend the visible court
2. Convert pixel coordinates into real world coordinates
This section will give a detailed description of both of these steps as well as a detailed
description of the mathematics involved and a description of how court partition is
implemented
7.2 Necessary Mathematics
The methods involved in extending the court, partitioning the court and converting
the coordinates rely on several equations. In this section these equations are described.
26
Figure 7-1 Pixel to Meter Conversion
PixelsMeters
7.2.1 Intersecting Point Between Two Lines
The intersection point between two lines segments was calculated by creating the
appropriate line equations and then solving these equations for x and y. The line
equations are of the form
a1 x b1 y m1�0
a2 x b2 y m2�0
Using x0 , y0 and x1 , y1 as the start and end point of the line the following
equations are used to generate the line equations.
a1� y0" y1
b1�x1"x0
m1� x0� y1" y0 " y0� x1"x0
Then to find xi , yi , the intersection point between the two line segments, the
following equations are used.
xi� b1�m2 " m1�b2 � a1�b2 " b1�a2
yi� a1�m2 " m1�a2 � b1�a2 " a1�b2
7.2.2 Distance calculation
The distance between two points was calculated using the Euclidean distance formula
Distance� y2" y12 x2"x1
2
Where x0 , y0 and x1 , y1 are the two points we want the distance between.
27
7.2.3 External Divisor
In order to extend the court a method for finding a point that was a certain distance
from the end of a line segment was needed. The external divisor equations were used
for this calculation.
x x� m�x2 " n�x1 � m"n
y x� m�y2 " n�y1 � m"n
Where x x , y x is the new external point, x1 , y1 and x2 , y2 are the two
points on the line segment that will be extended. M is a representation of the distance
between x1 , y1 and x2 , y2 . N is a representation of the distance between
x2 , y2 and x x , y x .
7.2.4 Point inside a Polygon
In this section a fast method was needed for determining if a point lies on or inside a
convex four sided polygon. The method used was based upon [Bourke87].
This method considers the polygon as a “ path” from the first vertex. Moving around
the path anticlockwise, if the result D (from the equation below) is always greater or
equal to zero then the point is on or inside the polygon.
D� y" y1 x1"x0 " x"x1 y1" y0
Where x , y is the point that you want to check. x0 , y0 is the starting point of
the current line and x1 , y1 is the end point of the current line.
28
Figure 7-2 External Divisor
7.3 Extending the Court
In most video clips that were available for processing, sections of the court were not
visible due to limitations of the camera and camera location. To rectify this, geometric
properties of the court were used to extend the court. The main principal that was
used was that the intersection of the diagonals of a court viewed from any angle still
give the center of the court.
This step was necessary so that every video sequence, no matter how much of the
court was visible, could be treated by the conversion method as if the full court was
visible.
7.3.1 Algorithm
Using the known pixel locations of half the court, draw the diagonal line connecting
opposite corners and extend the lines. [Figure 7-3 Diagonal Intersection Lines]
29
Figure 7-3 Diagonal Intersection Lines
12
Using the ratio distance 1:2, the diagonal can be extended to a point such that the
distance along the extended diagonal (3) is the same ratio as the part that lies on the
court (1+2). [Figure 7-4] It is possible to find the length of 3 since 1:2 is the same
ratio as (1+2):3. Using the distance (1+2) and 3 as m and n respectively, the external
point X [See Figure 7-4] can be found using the external divisor equations. Complete
this calculation with both diagonals and extend both side lines. Join the two new
points that were created from the external divisors. The intersection of this line with
the court side lines is the outline of the court.
7.4 Partitioning the court
This section will give a description on how the court can be partitioned into
quadrants, given limited amounts of information about the court. Partitioning of the
court is needed for the coordinate conversion stage.
There are two basic types of partitioning needed.
30
Figure 7-4 Extending with Ratios
1. Initial partitioning of the court into quadrants [Figure 7-5]
2. Subsequent partitioning of a quadrant into sub quadrants [Figure 7-6]
In Figure 7-5 and Figure 7-6 the blue dots represent points that are found by finding
the intersection point of the associated lines.
7.4.1 Initial Partitioning of the Court
Initially, six points on the court are known, these points are represented by the red
dots in image a of Figure 7-5. The purpose of this step is to end up with all points in
image c.
1. By connecting the half court diagonals as shown in image a the center of each half
court (the blue dots) can be calculated from the intersection points.
2. Join the two blue dots in image a and extend the line in both directions. The
intersection of this line with both base lines is the midpoint of the baselines.
3. Connect the two points either side of the half court. The intersection point of this
line and the line created in step 2 is the half court.
7.4.2 Subsequent Partitioning of the Court
In the coordinate conversion process it is vital that, given a quadrant it is possible to
break that quadrant into four more quadrants. For example, in image a of Figure 7-6
31
Figure 7-5 Initial Partitioning of the court
the dark gray quadrant is to be split into four quadrants. The result is image d in
Figure 7-6.
1. Join the diagonals of the quadrant that is to be partitioned as well as both quadrants
that share a border line.
2. Calculate the intersection points of all these lines. (The blue dots in image b)
3. Create and extend lines between the points created in step 2, as shown in image c.
4. The intersection points of these lines with the borders of the quadrant that is to be
partitioned is the missing points that were needed. The result is image d.
7.5 Converting Pixel Coordinates to Real Coordinates
Given a pixel coordinate pair xP , yP , it is necessary to convert them into its real
world location xR , yR . Due to the location of the camera, a simple mapping can
not be made from pixel to real world coordinates. A new approach to solving this
problem was needed.
The conversion was accomplished by developing a method that subdivided the court
into quadrants, testing which quadrant the pixel coordinates is in and recursively
subdividing the quadrant that contains the player.
32
Figure 7-6 Subsequent Partitioning of the court
7.5.1 Algorithm
This section will give a simple, step by step explanation on how the conversion is
accomplished.
1. Using the methods developed in 7.4, the court is partitioned into four equal (in the
real world) quadrants.
2. Each of these quadrants is given a label from zero to three.
3. Each quadrant is then tested to check if the point that is being converted is
contained in that quadrant. For example in Figure 7-7 the player at location
(190,260) is contained in quadrant 2.
4. If the point is inside a quadrant, that quadrant is appended onto a quadrant choice
list. The quadrant choice list keeps track of the quadrant that the player is in at
each recursive step.
5. The quadrant that contains the player, the current quadrant, is then partitioned into
four quadrants.
6. Go to step 1, using the current quadrant as the court. [Figure 7-8] This step should
be run 8 times.
33
Figure 7-7 Converting a Location 1
After eight operations of this method on a court that is twenty five meters long
enough information has been collected to return a result accurate to .049 of a meter.
By performing the operations on the real court that are described in the quadrant
choice list, it is possible to subdivide the real court until an exact location of the pixel
point is available in “ real world” coordinates.
For example, given a court that is 15 meters wide and 25 meters long and a small
quadrant choice list of {2, 1} it is possible to give the position accurate to 3.125
meters.
34
Figure 7-8 Converting a Location 2
Figure 7-9 Coordinate Conversion
Figure 7-9 shows this operation being calculated. The origin is originally (0,0) and the
opposite corner is (15,25). The first number in the quadrant choice list is 2, which
means that the point is in quadrant 2. Therefore the origin is now (7.5,12.5) and the
opposite corner remains to be (15,25). The next number is the quadrant choice list is
1. Therefore the origin is now at (11.25,12.5) and the opposite corner is now at
(15,18.75). Calculating the center point between these points gives a good
approximation of the real world location for the point. That is (13.125,15.625).
7.6 TIPS Output Files (TOF Files)
In order to recreate the video sequence as an animation on a mobile device, certain
pieces of information are necessary. The number of players, the presence of a ball, the
size of the court and the location of all of these objects are vital for animation
creation. An output file structure was developed with this specification in mind. After
every F frames of the video sequence the locations of all players and the ball are
converted to real coordinates and appended to the TOF file, where F is approximately
one third of the video sequences frame rate. This TOF file will then be used by the
animation server to create mobile device specific animations.
There is an example TOF file in Appendix A.
35
8 Animation Distribution with Mobile Devices
8.1 Introduction
The result of the previous sections is a TIPS Output File. This file contains all known
information about the number of players, presence of the ball, location of all players
and ball at discrete time intervals and the velocity of every object. The next stage in
this project is to use this information to create animation representations of the video
sequence on mobile devices, for example, mobile phones.
This section is split into two parts.
1. A description of the high level view of the full system.
2. An end-to-end example of the system in use.
The following two chapters will present more detailed information on each of the
distribution components.
8.2 Full System Architecture
There are three applications developed to produce a full working system.
1. Extending the TIPS platform to perform video stabilization, object initialization,
object tracking, position conversion and TOF file creation
2. A servlet7 that runs on a web server that uses the TOF file and creates mobile
device specific byte streams.
3. A mobile device application that can connect to a web server, download/stream
specialized byte streams, and display the stream as an animation to the user.
7 A servlet is a Java based application that runs on a web server. It is a powerful tool for the creation
of client-server applications.
36
Figure 8-1 Gives an outline of how these parts link together. Each modular section is
separated by a light gray box.
The TIPS environment and web server may reside on the same machine. It is
necessary that the web server is accessible via the Internet and that TIPS environment
saves the TOF files to a location that is accessible by the web server.
8.2.1 Overview of the Life of a Video Sequence
In this section, an introduction to the steps that are performed during an automatic
media creation session will be presented. This should be used as an overview for the
more detailed description that will be presented in later chapters.
37
Figure 8-1 Full System Architecture
1. When opening the TIPS environment, a background image, video sequence and
output location for the TOF file is required. The output location for the TOF file
should be accessible by the Sports Application servlet on the web server. The TIPS
environment will then perform all tasks needed for the creation of a TOF file
automatically.
2. When a user performs a “ View Live” operation on the Sports Display Application
running on their mobile device, they are asked for the servers URL. Once entered a
GPRS connection is created over the Internet to the web server at that URL. A list
of files that are available for download/stream will be returned by the Sports
Display servlet running on the web server.
3. The user decides which file to view and selects the file. This sends a request to the
servlet for that file. The servlet then reads the associated TOF file, creates a
specialized byte array and returns it to the mobile device.
4. The mobile device then displays the animation and saves the file for future
viewing.
38
9 Sports Display Servlet
9.1 Introduction
In the previous section the high level view of the full system was presented. In this
section a detailed description of the Sports Display Servlet will be given.
The Sports Display Servlet is the connection point between the content creation stage
and the animation display stage. There are two main operations that the Sports
Display Servlet performs.
1. On request, returns a list of files to the Sports Display user that are available for
download – this is a pull operation for the mobile device.
2. Given a file request from a Sports Display user, it converts the correct TOF file
into a download optimized device specific byte stream and returns the stream to
the mobile device – this is a pull operation for the mobile device.
9.2 File Information Requests
Before the Sports Display Application can request a particular file for download, it
must first know what files are available for download. When the mobile device
connects to the servlet, it appends “ ?info=files” to the URL. When the servlet sees
this request, it knows that the user wants a list of available files for download. The
servlet keeps a track of available TOF files in its upload directory. A TOF file does
not become available unless it is in the correct format. The servlet then simply returns
a list of files that are available for download.
9.3 File Download Requests
When the Sports Display Application makes a download request to the server it
appends several pieces of information to the URL.
39
1. The name of the file requested for download.
2. The width (resolution) of the viewable screen area. (If not given defaults to 128)
3. The height (resolution) of the viewable screen area. (If not given defaults to 75%
of width)
For example the URL may look like:
http://www.someaddress.com/servlet/DownloadServlet?file=Example.tof&width=128
The server can then use this information to tailer the response for the specific device.
9.4 File Download Response Format
As the cost to download the animation to a mobile device is directly proportional to
the size of the file, creating an efficient file format was very important. In addition to
this, MIDP 1.0 devices (the version of Java that runs on the second generation of
mobile phones) does not have a division operator or floating point precision.
Therefore it was important that there would be little or no operations to be performed
on the response to make it viewable. These two constraints were satisfied in the
following way.
The response from the servlet consists of a header, which is 5 + N bytes long and the
body, where N is the number of players. A single byte is used for each of the
following.
B Number of players
B Presence of the ball
B Number of waypoints
B Number of ticks between each key frame
B The sleep time between each tick
B Each players team number
Obviously, it is possible to make the header smaller by bit stuffing, but in this case,
clarity is more important then saving several bytes of space.
40
The body of the message is
2 * (number of players ( + 1 if the ball is present) * number of waypoints)
For each tracked object at every waypoint there is an update of the objects position.
The objects coordinate position is sent to the mobile device as the actual location on
the mobile devices screen that the player should be displayed. Sending the mobile
device this information means that it does not have to calculate the objects position on
its screen relative to real world positions.
The web server that was used in this project is Jigsaw 2.2.2. Jigsaw is a free, Java
based server that is both easy to install and maintain.
For a more detailed look at the structure of the Sports Display Servlet, refer to the
associated javadocs that are available on the attached CD-ROM.
9.5 The Cost of Download
Using the above technique for animation distribution, a very competitive pricing
system can be developed. Currently, the cost per kilobyte downloaded from the
Internet via GPRS on a mobile phone is 0.02 Euro. [Vodafone04] Although this is
quite expensive per kilobyte, using the above byte format it is possible to show a
relatively long animation economically.
For example:
There is a game with 10 players and 1 ball. The game will last 10 minutes. The key
frame refresh rate is two times per second. (Which, when the full court is represented
on a screen that is typically 128 pixels wide, is fine) The calculation is
file�header numtrackable objects�refresh rate�length of clip�2 , thus
26405�5 11�2� 60�10 �2
rounding 26405 up to 27 kilobyte. The total cost to download is 0.54 Euro.
41
10 Sports Display Mobile Application
The Sports Display Application is the only part of the system that the end user will
see. It allows the user to download new animations, view animations, open previously
saved animations and delete previously saved animations. Figure 10-1 shows a Nokia
7210 displaying an animation. The blue dots represent players.
Since this application is built upon J2ME it will run successfully on all devices that
have J2ME installed, for example, most recent mobile phones. For simplicity, in this
section there will be specific references to the Sports Display Application running on
a Nokia 7210 mobile phone.
10.1 Displaying Animations
The format of the mobile device downloadable byte stream was discussed in the
previous chapter [9.4]. This section will describe how the Sports Display application
uses this byte stream to display an animation.
42
Figure 10-1 Nokia 7210
The byte stream contains all the information about the video sequence needed to
recreate an animation representation. It tells the Sports Display application the exact
location on its screen to place the objects. When the Sports Display Application
downloads the byte stream it creates an internal object for every movable object in the
animation, these objects are then initialized with all way point information that is
available.
In order to decrease the size of the download needed to create an animation, several
optimizations were made. In the TIPS platform, the creation of waypoints occurs three
times per second. (Configurable) If these positions were simply shown on the Sports
Display application, it would lead to jittery animations. To solve this problem, the
application breaks the distance from the origin to the destination into sub destinations.
By displaying these sub destinations, the movement between waypoints becomes
smooth. Figure 10-2 shows four images describing how the animation is created
between two point. The blue dots represent the waypoints that the Sports Display
application received from the animation server. The red dot is what is actually
displayed.
As time passes the displayed point for the player moves between sub destinations.
This gives the smooth motion of the player between waypoints that can be a
considerable distance apart.
10.2 User Interface Examples
The user interface features several menus and the actual animation view. Figure 10-1
shows the view of an animation being run. Figure 10-3 Shows three examples from
43
Figure 10-2 Animation Generation
Destination
Waypoint
Time
the menu driven user interface. The first phone is the main menu that appears when
the application is opened. From here it is possible to download new files, open files,
delete files and read the copyright information. The second phone shows the listing of
saved animations on the phone. By selecting one of these files, the file would be
displayed. The last phone is the ’about’ information for the application.
The Sports Display Application is quite a sophisticated application that can not be
fully explained here due to document size limitations. For a more detailed look at the
structure of the Sports Display Application, refer to the associated documentation that
is available on the attached CD-ROM.
44
Figure 10-3 Phone Examples - Main Menu - Open File - About
11 Sample Results
11.1 Introduction
This project shows that it is possible to create a fully automated, real time mobile
content creation system. This chapter presents sample results from the tracking that
was performed in the TIPS environment and the resultant animations that were
displayed on a mobile phone.
As this project deals with videos and animations it is highly advised that the CD-
ROM is used for sample results. Also, included on the CD-ROM is the application
that will run on any J2ME enabled mobile device. Installing this application will
present sample results in a more natural manner.
The attached CD-ROM contains videos of the tracking and mobile application
running, source code and source documentation. It also contains the original project
presentation slides and this document in PDF format.
45
11.2 Sample Tracking Results
Figure 11-1 is an example taken from the visual output of TIPS. It shows an image
taken every two seconds. Red bounding boxes show players been tracked with the
faster non-occluding tracking algorithm while the blue boxes show players been
tracked with the occlusion tracking algorithm.
46
Figure 11-1 Sample TIPS results
11.3 Sample Sports Display Results
The “ Sports Display” is the application that will run on your mobile device and
display the animations streamed/downloaded from the sports animation server. This
application allows for the downloading of new animations, saving downloaded clips,
opening previous downloaded clips and managing files that have been saved. Figure
10-1 shows the Sports Display running an animation on a Nokia 7210. In Figure 11-2
there are example screen captures taken from an animation displaying the results
created by TIPS from the video clip in section 11.2. Obviously it is difficult to display
an animation on paper, so again, it is highly recommended to either, use the attached
CD-ROM for the proper display of the results or install the application onto a J2ME
enable mobile phone and view the demo animations.
The above images are taken from the application running on a Nokia 7210 with 128
by 128 pixel resolution. The application menu has been removed for clarity.
47
Figure 11-2 Sample Phone Results
12 Conclusions and Analysis
12.1 Conclusions
The original aim of this project was to simply track and draw conclusion on the
locations of players and ball in a video sequence. This project, as it is now, far
exceeds the original specification.
Throughout the design of the extensions to the TIPS platform, speed of execution was
always important. The design of the tracking algorithms reflect this design decision.
There are more advanced tracking algorithms available that would have performed
more precise tracking, however the precision of the tracking was not vital in this
application since the length of a full court will be represented on a screen that is
typically 128 pixels wide. The tracking algorithms that were developed suited these
restrictions well.
Personally, working on this project has been a lot of fun. Initially I underestimated the
size and difficulty of this project but perseverance and a lot of hard work has brought
it through to fruition. The end result is an interesting system that is entertaining to
demo and an excellent “ proof of concept” .
12.2 Problems Encountered
There were several difficulties that arose during the development of this project.
At first, the conversion of coordinates in the image world to real world seemed like a
simple problem. However, due to the location of the camera, the court was distorted
which meant that the creation of a simple map from the image world to real world
was impossible. The result was that a new method for converting pixel coordinates to
real world coordinates was developed [Section 7].
Due to the resolution of the video sequences, ball initialization and tracking was very
48
difficult to achieve consistently. Several of the the sample videos on the CD-ROM
show the ball being tracked consistently, however, in general it was almost
impossible. The main reason for this was the size of the ball as it appears in the video
sequence. As the video was filmed at 360x288 resolution, the ball was only a few
pixels wide. To be able to find this consistently is an extremely difficult problem.
Higher resolution video clips would help to solve this problem.
Obtaining footage that was suitable for testing was very difficult. Indoor footage was
typically unreliable for tracking and player initialization due to the artificial light that
’bleached’ all colours. The lights also produced a large amount of shadows. Figure 12-
1 Shows an image that was originally used for tracking.
In addition to this, obtaining footage of actual teams playing wearing proper team
uniforms would have been useful. During player initialization it is possible to assign
each player a team depending on the colour of their jersey, as there was no ’proper’
footage available for this project, this doesn’t take place in the sample videos.
12.3 Improvements
Although this project was a resounding success, there are still several areas that need
further development to make this project into a commercial product.
49
Figure 12-1 Sample Indoor Image
Many of the problems that were encountered in this project could be solved if a well
placed high resolution camera was available.
Most of the future code development would be spent making the tracking and
initializing of the players and ball more robust. Currently the tracking algorithm does
not deal correctly with shadows cast by the players or dramatic changes in video
sequence brightness. Therefore a more sophisticated method for tracking and
initializing players will have to be developed that remains in real time.
Also, with a more sophisticated tracking algorithm and higher resolution video it may
be possible to infer player orientation and to recognize particular players.
In addition, the availably of proper ’team’ footage would make demonstrating the
project a lot more realistic, as the breakup between different teams would be
noticeable in the Sports Display animation representation.
12.4 Commercial Prospects
As was mentioned in the introduction, mobile phone multimedia creation is a rapidly
expanding market. Since the mobile phone market has become saturated, mobile
phone operators are looking for new ways to increase revenue. The success of ring
tones and background downloads for mobile phones has shown that the public is
willing to spend money on personalizations. Downloadable football, basketball etc...
may be an appealing avenue for revenue increase as the cost associated with both the
creation and distribution of the animations is very low, which would allow this
service to be priced very competitively. This could also be used as an easy first step
into getting people more comfortable with using the advanced features of their phones
which typically generate more revenue for the mobile phone operators.
With several months of tracking refinement, high resolution “ birds eye view” cameras
and a human operator to oversee the tracking results this system could become a
viable commercial product.
50
13 References
[Aaron02] Aaron Bobick, Stephen Intille, Anthony Hui. “ Computers watching
football” , Last Accessed on 26-4-04
http://www-white.media.mit.edu/vismod/demos/football/football.html
[Bodor] Robert Bodor, Bennet Jackson, Nikolaos Papanikolopoulos, “ Vision-Based
Human Tracking and Activity Recognition” , University of Minnesota, Last Accessed
on 26-4-04
http://mha.cs.umn.edu/Papers/Vision_Tracking_Recognition.pdf
[Bourke87] Paul Bourke, “ Determining if a point lies on the interior of a polygon” ,
November 1987, Last Accessed 26-4-04
http://astronomy.swin.edu.au/~pbourke/geometry/insidepoly/
[Boyle03] Chris J. Needham and Roger D. Boyle, “ Tracking multiple sports players
through occlusion, congestion and scale” , The University of Leeds, England, Last
Accessed on 26-4-04
http://www.comp.leeds.ac.uk/chrisn/research/index.html
[Budde03] Paul Budde, “ 2004 Telecoms in Europe – UK and Ireland” , December 9th
2003, Last Accessed 26-4-04,
http://www.marketresearch.com/product/display.asp?productid=946732
[Choi03] Sunghoon Choi, Yongduek Seo, Hyunwoo Kim, Ki-Sang Hong. “ Where are
the ball and players? : Soccer Game Analysis with Color-based Tracking and Image
Mosaick” Pohang University of Science and Technology, Republic of Korea , 1997,
Last Accessed on 26-4-04
http://citeseer.nj.nec.com/370333.html
[Dahyot04] Tozenn Dahyot, Niall Rea, Anil Kokaram, “ Sport Video Shot
Segmentation and Classification” , Electronic and Electrical Engineering Department,
Trinity College Dublin
51
[Spengler03] Martin Spengler, Bernt Schiele. “ Automatic Detection and Tracking of
Abandoned Objects” . Proceedings of the 2003 Joint IEEE International Workshop on
Visual Surveillance and Performance Evaluation of Tracking and Surveillance” ,
October 2003, pp. 149-156
[Meier99] “ Segmentation and Tracking of moving objects for content-based video
coding” , T. Meier, K.N.Ngan, 1999, VISP IEEE
[Kang03] Jinman Kang, Isaac Cohen, Gerard Medioni. “ Soccer Player Tracking
across Uncalibrated Camera Streams” . Proceedings of the 2003 Joint IEEE
International Workshop on Visual Surveillance and Performance Evaluation of
Tracking and Surveillance” , October 2003, pp. 172-179
[Kovacic03] Stanislav Kovacic, “ Tracking Players in Sports Games” , Last Accessed
on 13-10-03
http://vision.fe.uni-lj.si/research/trackp/
[Pers00] Janex Pers, Stanislav Kovacic, “ Computer Vision System for Tracking
Players in Sports Games” , Faculty of Electrical Engineering, University of Ljublijana,
Last Accessed on 26-4-04
http://vision.fe.uni-lj.si/docs/janezp/iwispa2000-janez.pdf
[Pers01] Janez Pers, Stanislav Kocacic, “ Tracking People in Sport: Making Use of
Partially Controlled Environment” , University of Ljubljana Last Accessed on 26-4-04
http://avi5.fe.uni-lj.si/docs/janezp/caip2001.pdf
[Pers01b] Janex Pers, Goran Vuckovic, Stanislav Kovacic, Branko Dexman, “ A Low-
Cost Real-Time Tracker of Live Sport Events” , Faculty of Electrical Engineering,
University of Ljublijana, Last Accessed on 26-4-04
http://vision.fe.uni-lj.si/docs/janezp/pers-ispa2001.pdf
[Pers01c] Janex Pers, Marta Bon, Stanislav Kovacic. “ Errors and Mistakes in
Automated Player Tracking” , Proceedings of Sixth Computer Vision Winter
52
Workshop, February 2001, pp 25-36
[Sonka99] Milan Sonka, Vaclav Hlavac, Roger Boyle, “ Image Processing, Analysis,
and Machine Vision” , PWS Publishing, Second Edition, 1999
[Stauffer03] Chris Stauffer, Kinh Tieu, Lily Lee, “ Robust Automated Planar
Normalization of Tracking Data” ,. Massachusetts Institute of Technology, VSPET
2003.
[Tabb03] Ken Tabb, Neil Davey, Rod Adams, Stella George, “ The Recognition and
Analysis of Animate Objects using Neural Networks and Active Contour Models” ,
University of Hertfordshire
[Telephony04] Telephony World, “ New Report Predicts Exponential Growth in
Mobile Multimedia Services” , March 18th 2004, Last Accessed 26-4-04,
http://www.telephonyworld.com/cgi-
bin/news/viewnews.cgi?category=all&id=1079658444
[Yannone03] Ronald Yannone, “ Some of my Experiences with Kalman Filters” , Last
Accessed 26-4-04
http://www.polymath-systems.com/intel/hiqsocs/megasoc/noes138/kalman.html
[Vodafone04] Vodafone, “ Our Tariffs” , April 2004, Last Accessed 26-4-04,
http://www.vodafone.ie/dataservices/gprs/prices/index.jsp
53
Appendix A – Example TIPS Output File
Below is a section from a TIPS Output file. Note that it also contains velocity
information, in number of pixels, about each player and ball. At initialization each
player and ball has a velocity of zero.
# File Type
0
# Number of Players
5
# Is the ball present (1 is true)
1
# Time to wait between ticks (in milliseconds)
100
# The width and height of the real court
15 25
# <playerNumber> <teamNumber> <column> <row>
# Update number 0
# Velocity (0.000000,0.000000)
0 -842150451 8.891602 3.637695
# Velocity (0.000000,0.000000)
1 -842150451 4.555664 13.745117
# Velocity (0.000000,0.000000)
2 -842150451 4.438477 16.528320
# Velocity (0.000000,0.000000)
3 -842150451 11.791992 11.352539
# Velocity (0.000000,0.000000)
4 -842150451 6.899414 16.137695
# Velocity (0.000000,0.000000)
6.870117 21.459961
# Update number 1
# Velocity (0.000000,-0.300000)
0 -842150451 8.833008 3.540039
# Velocity (0.550000,-0.400000)
1 -842150451 4.526367 13.500977
# Velocity (-1.150000,0.150000)
2 -842150451 3.881836 16.870117
# Velocity (-0.250000,0.950000)
3 -842150451 11.791992 11.791992
# Velocity (1.850000,-0.200000)
4 -842150451 7.309570 15.795898
6.870117 21.459961
# Update number 2
# Velocity (0.000000,0.200000)
0 -842150451 8.891602 3.637695
54
# Velocity (0.250000,-1.650000)
1 -842150451 3.295898 12.475586
# Velocity (-1.050000,-0.550000)
2 -842150451 3.676758 17.114258
# Velocity (-0.850000,-0.150000)
3 -842150451 11.791992 12.377930
# Velocity (1.050000,0.500000)
4 -842150451 7.749023 15.893555
# Velocity (0.000000,0.000000)
6.870117 21.459961
# Update number 3
# Velocity (0.000000,0.750000)
0 -842150451 8.891602 3.637695
# Velocity (0.500000,-0.850000)
1 -842150451 2.944336 11.791992
# Velocity (-0.300000,-1.650000)
2 -842150451 3.149414 16.821289
# Velocity (-0.750000,0.900000)
3 -842150451 12.407227 12.866211
# Velocity (2.150000,-1.050000)
4 -842150451 8.100586 15.209961
# Velocity (-0.050000,-0.050000)
6.870117 21.459961
# Update number 4
# Velocity (0.000000,-0.300000)
0 -842150451 8.891602 3.637695
# Velocity (1.050000,-0.450000)
1 -842150451 3.208008 11.401367
# Velocity (0.750000,-0.450000)
2 -842150451 3.149414 16.577148
# Velocity (-0.350000,0.250000)
3 -842150451 12.260742 13.159180
# Velocity (2.250000,-1.000000)
4 -842150451 8.598633 14.282227
# Velocity (0.200000,0.200000)
6.958008 21.459961
55
Appendix B – Video Setup File
Bellow is a sample configuration file for a particular video clip. It describes the
location of the points, in pixel coordinates, around the visible half court. It also
describes the size, in meters, of the real court.
#
# This file is associated with the file Video_1.00.avi
#
# This is the type of setup file that we have
# 0 is a four point half court system where points are ordered such that
#
# 0 --- 1
# | |
# | |
# | |
# 3 --- 2
#
# 1 is a four point full court system where points are ordered as above...
0
#
# This is the list of image points x0 y0 x1 y1 x2 ... y3
198 89 356 117 284 217 80 150
# This is the list of real world points (full court)
0 0 15 0 15 25 0 25
# This is the polygon with points listed clockwise that is the area of
# interest in our video, movement outside this we don’t have to worry about...
200 65 360 95 200 429 -140 250
56