Synchronizing 3D movements for quantitative comparison and

Synchronizing 3D movements for quantitative comparison and simultaneousvisualization of actions

Tobias Sielhorst Tobias Blum Nassir Navab

Technische Universitat Munchen, Boltzmannstr. 3, 85748 Garching b. Munchen, GermanyE-mail:sielhors|blum|[email protected]

Abstract

In our poster presentation at ISMAR’04 [11], we pro-posed the idea of an AR training solution including captureand 3D replays of subtle movements. The crucial part miss-ing for realizing such a training system was an appropri-ate way of synchronizing trajectories of similar movementswith varying speed in order to simultaneously visualize themotion of experts and trainees, and to study trainees’ per-formances quantitatively.

In this paper we review the research from different com-munities on synchronization problems of similar complexity.We give a detailed description of the two most applicablealgorithms. We then present results using our AR based for-ceps delivery training system and therefore evaluate bothmethods for synchronization of experts’ and trainees’ 3Dmovements. We also introduce the first concepts of an on-line synchronization system allowing the trainee to followmovements of an expert and the experts to annotate 3D tra-jectories for initiation of actions such as display of timelyinformation. A video demonstration provides an overview ofthe work and a visual idea of what users of the proposed sys-tem could observe through their video-see through HMD.

1. Introduction and related works

The presented work can be applied for different tasksof reproduction, synchronization and comparison of move-ments in AR. The environment of our research is medicaleducation.

1.1 AR Birth Simulator

The “Klinik f ur Orthopadie und Sportorthopadie r.d.Isar” has developed a birth simulator consisting of a hap-tic device in a body phantom and software that simulatesbiomechanical and physiological functions [7]. The posi-tion of the 3D model is visualized on a screen and audiooutput is generated. While still in development process, it

already represents a delivery simulator that provides multi-modal functionality (see figure 1). The long term goal ofthe proposed delivery simulator is offering a device for im-proved training in order to reduce the amount of cesareansections as well as the number of perinatal deaths.

As a first step we have combined augmented reality vi-sion with the birth simulator. The user can see the virtualimages of what should happen inside the simulator. Imagesof the baby’s head and the hip bone can be seen inside thephantom, which used to be on a screen next to the phantom(see figure 2).

The augmented reality system we use is the researchsystem RAMP. It has been developed by Siemens Corpo-rate Research (SCR) for real time augmentation in med-ical procedures [10]. It is optimized for accurate augmen-tation regarding relative errors between the real and the vir-tual scene. The system features a high resolution video seethrough HMD and infrared inside-out tracking. Its accu-racy, its high resolution, high update rate and its little lag iscurrently state of the art.

Figure 1. Delivery simulator: Virtual simulatoron the screen, phantom head on robot’s arm,female body phantom

1.2 AR + Forceps issue

The first functional addition to the delivery simulator af-ter the combination with the AR system was the visualiza-tion of the forceps. The forceps are an instrument which areused for real deliveries in the critical case of a birth stop.The baby’s head has to be drawn in order to support thebirth to prevent undersupply of oxygen.

In order to visualize instruments inside the phantom andrecord the users movements, the 6DOF of the forceps mustbe tracked. There are different options for this tasks. Fortwo reasons we decided not to take the same tracking sys-tem as the AR system does. First, there is the line of sightproblem. The targets for the single camera tracking of theAR system are likely to be occluded since it needs abouteight markers [12] for sufficiently accurate, reliable and ro-bust tracking. These are needed for each of the two for-ceps parts. Second, the error function of targets of the sin-gle camera tracking is unequally distributed in space [12].The least accuracy is in the orientation of the view. Forreal-time augmentation this is fine, because less accuracy isneeded in viewing direction for a satisfactory overlay thatincludes a minimal mean error in the 2D image [4]. Weintend to record the movements of the instrument and visu-alize it from potentially any direction. Therefore we havechosen an external tracking system that tracks with a moreequally distributed error function and uses a minimal set ofmarkers per target in order to estimate position and orienta-tion. Practically we use a multiple-camera infrared trackingsystem by A.R.T. which uses retro-reflective markers likethe inside out tracking of the AR system. This detail madeit simple to register one tracking system to the other. Bothsystems can therefore be registered using a common refer-ence target. Thus, we can provide the instrument’s positionin the coordinate system of the head tracking target that istracked by both systems (see figure 3).

In order to transfer the coordinates of the forceps, which

Figure 2. Augmented view of the delivery sim-ulator

Figure 3. Inside out and outside in tracking

are provided by the outside-in tracking system, into the co-ordinate system of the marker frame, the simple equation

TFrame2Forceps = TExt2Forceps.T−1Ext2Frame

can be applied where T is the matrix notation of the trans-lation and rotation in homogeneous coordinates. This ispossible because both tracking systems use the same modeldata for the marker frame, and both systems generate thecoordinate system in the same way from the model data.With this combination, the system is highly dynamic androbust. With this setup we can move the cameras of the ex-ternal tracking system while the system is running, withoutany need for re-calibration.

1.3 Omnidirectional viewing

For optimal learning of complex spatial movements andtasks, the desirable procedure is an expert demonstratingactions and giving feedback to the practicing students im-mediately. In general, the schedule of experienced expertsis tight. This makes it difficult to demonstrate the task toeach individual student and provide him/her feedback. Inaddition, it would be desirable to allow the majority to learnfrom the best international experts in their field. We proposean AR system that learns from the expert by tracking hismovements while he/she uses a simulator or performs a real(often complicated) task. This information is reproducedfor demonstration to students in an enhanced simulator. Bycomparison of the experts and students performance, directfeedback is provided.

Dosis et al. [1] record kinematic and visual data of asurgery for assessment of surgical skills. They use electromagnetic tracking with 3DOF for data acquisition for track-ing the instrument tip. For displaying the position of theinstrument they use video capturing. We intend to visualizethe experts movements into the real world. We cannot usea video stream, because we want to allow the student to beable to look from any direction.

2

Reproduction is, in AR systems, a particularly challeng-ing task if the viewpoint is not fixed.

Cheok et al. [8] capture 3D real time content for in-sertion of dynamic content into mixed reality. They use14 cameras to capture dynamic content. Views from otherviewpoints than the cameras are generated pixelwise. Forour solution we need additional quantitative data of themovement of the expert or his/her instruments in order tocompare it with the movements of the student to show andmeasure differences (see next section).

In order to assess comparable data and visualize themovements in an AR system we track the object and latervisualize its 3D model.

The movements of the expert can be visualized while thestudents work with the simulator. This allows them to tryto imitate the expert and it provides a permanent feedbackwhether the action is correct or not.

Since the system assesses quantitative data of the perfor-mances of expert and student it would be interesting to findout how to compare the performances electronically to havean objective measure.

1.4 Comparison of trajectories

In order to compare two performances of the same actionwe want to

• be able to visually compare two trajectories either doneby professionals or by a student and a professional.For this purposes we want to replay two previouslyperformed and recorded motions synchronously, usingAR to have an omnidirectional viewing on both whichhelps to identify and study subtle differences.

• get a similarity measure between two previouslyrecorded trajectories to quantitatively measure and au-tomatically judge the performance of a student whotried to reproduce the movement of a professional

Figure 4(a) shows the movement of a tracked instrumentwhen trying to perform the same motion twice. A straightforward approach to get a similarity measure between bothwould be to use the euclidean distance at chosen points intime, which turns out to be inadequate. This can be seenin figure 4(b), which shows the x-movement over time. Themotions were performed at a different speed and so the sim-ilarity measure would state them as very dissimilar. Evenscaling them to the same length as done in figure 5 wouldnot lead to a satisfying result, because the speed at whichboth tasks are performed will most likely change during ex-ecution. The same applies to our second objective. Whensimply starting a replay of both movements at the sametime, we would not be able to see subtle differences sincethey are shown at a different speed. Just as well scaling will

−300

−200

−100

0

100 −200

−100

0

100

200

−200

−100

0

Y−movement

X−movement

Z−

mov

emen

t

(a) trajectories A and B

0 5 10 15 20 25

−300

−200

−100

0

100

Time

X−

mov

emen

t

(b) trajectories A and B over time

Figure 4. (a) shows the same motion per-formed twice, (b) shows the x-movement ofthe same trajectories over time

not give us a replay where both motions are shown synchro-nous.

Another simple approach is to take every point from thefirst trajectory A, find the geometrical closest point in thesecond trajectory B and base the similarity measure on thisdistances. Again this would not deliver a satisfying result asthe temporal order of the trajectories would be disregarded.To point out the problem with this method one should thinkabout measuring the similarity between the same motionperformed once forwards and once backwards. This sim-ple similarity measure would see these two movements asequal. In general this similarity measure provides strongresponse as soon as as there exist similar parts within thetrajectories.

Besides, this does not help us in our task to show syn-chronized replays where the temporal order must be pre-served.

So we need a method that on one hand can handle speedvariation without downgrading the similarity measure andon the other hand accounts for the temporal order. Further-more we need a synchronization of both trajectories so that

3

0 5 10 15

−300

−200

−100

0

100

Time

x−m

ovem

ent

Figure 5. performance time of trajectory B hasbeen scaled down to fit A

speed variation between them are removed and similar partsare shown at the same time, whereas of course the temporalorder is preserved.

1.5 Comparison of values

This can be posed as a problem, where we have twotrajectoriesA = (a1, . . . ,am) and B = (b1, . . . , bn), ai

being the data we got at a certain time, in our case con-taining a time stamp, location and rotation of the trackedobject. We need a time-invariant similarity measure S(A,B) and a monotone mapping between the points of both,w = ((i (1) , j (1)) , . . . , (i (K) , j (K))) such that one tra-jectory is synchronized to the other one. Functionsi andjdefine the mapping between the elements of the two series.This mappingw can also be seen as a warping function orwarping-path, that is applied to the time-axis of one trajec-tory and synchronizes this to the other one.

At this point we refrain from giving a mathematical de-finition of a time invariant similarity since different ap-proaches also use different definitions and we do not wantto tie ourselves down to one at this point.

1.6 How to match points

In order to quantitatively compare trajectories we haveto match points from one trajectory to another. This prob-lem is similar to registration tasks, where we only want toregister one dimension. This registration problem cannotbe solved using an approach analog to rigid or affine reg-istration, as such transformations could not deal appropri-ately with the synchronization. Attempts to solve this kindof problems using landmark based registration [2] sufferedfrom the problem to determine the landmarks which is atime consuming and error-prone task. Some applicationsonly need a time-invariant similarity measure and no map-ping. Li Zhai Zeng et al. [5] suggest an algorithm thatis based on a similarity measure using SVD to do gesture

recognition, however it does not return a mapping betweenboth trajectories. More promising to fulfill our requirementsare non-rigid registration techniques, which have the draw-back of being very slow. Other applications that require reg-istration in only one dimension and can exploit constraintsuse methods that are similar to non-rigid registration buttake less time to compute. There are Dynamic Time Warp-ing (DTW), which is well known in speech analysis [9] andhas been used in statistics [14] and signature verification[6], and the Longest Common Subsequence (LCSS) thathas been used for similarity measures between mobile ob-ject trajectories[3]. The last two are the appropriate onesfor our application. The next two sections provide detaileddescription of these methods.

1.6.1 Longest Common Subsequence (LCSS)

A subsequence S of the set A is a sequence of the form(anr

) , r ∈ N, wherenr, r ∈ N is strictly increasing.More intuitively spoken, you can get the subsequence S bydropping some points of A. The LCSS is better known forobtaining a similarity measure between two strings by com-puting the longest common substring. The version for twotrajectories shares the same idea and defines similarity as ahigh number of points that are common to both trajectoriesand have to be in the same temporal order. When comput-ing common subsequences for strings we simply look forcharacters that are elements of both strings. With 3d pointsit is very unlikely that we find points that are included inboth movements. For this reason we see two pointsan andbm as equivalent if their distance d(an, bm) is below somechosen thresholdε.Let Ai = (a1, . . . , ai) andBj = (b1, . . . , bj).

DEFINITION 1. Given a distance-function d(x,y) an in-tegerδ and a real numberε, theLCSS(A,B)δ,ε is definedas:

LCSS(A,B)δ,ε ={0, if A or B is empty1 + LCSS (Am−1, Bn−1) , if d(an, bm) < ε andn−m ≤ δMax (LCSS (Am−1, B) , LCSS (A, Bn−1)) , otherwise

whereδ defines a matching window that limits how farin time we search for matching points. The output is thelength of the longest common subsequence, i.e. the numberof matchings that are possible.

The recursion can by explained as follows. The first con-ditional value is the termination criteria. The second condi-tional value can be interpreted as: If the last points of bothtrajectories,am andbn, are closer than the matching thresh-old ε, we correlate these points, memorize the matching byincreasing the result by one and continue by computing the

4

0 5

10 15

20

−200

−100

0

100

−200

−150

−100

−50

0

50

100

150

200

X−movement

Time

Y−

mov

emen

t

(a) mapping between A and B

0 5 10 15 20 25

−300

−200

−100

0

100

Time

X−

mov

emen

t

(b) dotted trajectory is synchronized to solid trajectory, syn-chronized one is dashed

0 5 10 150

5

10

15

20

25

30

Time of trajectory A

Tim

e of

traj

ecto

ry B

(c) warping path

Figure 6. (a) shows two trajectories and themapping between them, we got from LCSS,(b) shows the same trajectories and one syn-chronized to the other, (c) shows a warpingpath that synchronizes B to A

T R A I N I N G

R 0 1 1 1 1 1 1 1I 0 1 1 2 2 2 2 2N 0 1 1 2 3 3 3 3G 0 1 1 2 3 3 3 4I 0 1 1 2 3 4 4 4N 0 1 1 2 3 4 5 5G 0 1 1 2 3 4 5 6

Figure 7. matrix filled up when computingLCSS of two strings

longest common subsequence of the rest of both trajecto-ries. Otherwise we have to leave either the pointam or bn

unmatched, depending on what gives us the higher result.This recursive definition is top-down, and starts the com-

putation on both complete trajectories, lessen them in eachstep. This problem can also be solved using a bottom-updynamic programming approach that has a computationalcomplexity of O(δ(n+m)). The corresponding algorithmfills up a n by m matrix row-wise or column-wise, wherein each step(i,j) the LCSS(Ai, Bj) is computed, based onthe results of LCSS(Ai−1, Bj−1), LCSS(Ai, Bj−1) andLCSS(Ai−1, Bj). For simplification the example in figure7 shows the matrix that is used to compute the LCSS of twostrings. The bold numbers show characters that are com-mon to both strings. Each field(i,j) of the matrix containsLCSS(Ai,Bj). So the LCSS(’TRAINING’,’RING’) wouldbe 4 and LCSS(’TRAIN’,’RING’) is 3. If only a distancemeasure and no warping path is required only values fromthe last and the actual step must be kept in memory.

Since the value LCSS computes depends on the length ofboth trajectories, we need to normalize the output. We de-fine the similarity function derived from LCSS as follows:

DEFINITION 2. The similarity function S1δ,ε based onthe LCSS between two trajectories A and B is defined asfollows:

S1δ,ε =LCSS(A,B)δ,ε

min(m,n)

After computing the LCSS, a mapping can be obtainedthat connects all points that are included in the longest com-mon subsequence both trajectories share. To get this map-ping the matrix LCSS created during computation must beback tracked by starting at the lower-right field. If the upperor the left neighbor contains the same value as the currentfield you enter this one. Otherwise you store this point ascommon and enter the upper-left neighbor. This is doneuntil you reach the upper-left field of the matrix. In fig-ure 7 this would give you the common substring ’RINING’,

5

2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

−70

−60

−50

−40

−30

−20

Time

X−

mov

emen

t

Figure 8. Problems of LCSS when two trajeto-ries have different update rates

the correspondent algorithm for trajectories would give youmapping between points.

Figure 6(a) shows two trajectories that have beenrecorded by an optical tracking system when trying to per-form the same movement twice. The lines connecting bothtrajectories represent the mapping we got from LCSS. Fig-ure 6(b) shows the same trajectories and one synchronizedto the other. Although only movement in x-direction isshown in this figure, it has been computed regarding the3-dimensional distance, but not the rotation of the object.The warping path we got from the point correspondences isshown in figure 6(c).

Since LCSS is based on points that are similar in bothtrajectories we only get a mapping that includes similarpoints. Parts of the trajectories that are too distant areskipped. As we want to see the differences between bothmovements, we cannot just skip these parts in our replay.This problem can be overcome by interpolating betweenpoints that have been mapped.

As long as both trajectories have been recorded with sim-ilar update rates LCSS performs well and provides us witha meaningful similarity measure as well as with a quite ac-curate and very smooth synchronized replay. Due to thenormalization that is applied to the similarity measure, dif-ferent update rates will not affect this measure. But the syn-chronized trajectory tends to often run ahead.

This happens because for LCSS it does not matter whichpoints are mapped to each other, as long as the maximumpossible number of points are matched. An example can beseen in figure 8, where trajectory A has a higher update rateand the earlier points are assigned to every point of trajec-tory B that is within rangeε. As the LCSS by definitiondoes not care how these points are matched, it is not possi-ble to find a matching that is more reasonable for our caseusing only the similarity definition of LCSS.

Figure 9. Illustration of matching of LCSS(left) and DTW (right)

1.6.2 Dynamic Time Warping (DTW)

In contrast to LCSS the DTW has to match every point withat least one point of the other trajectory, in particular the firstpoints of both trajectories must be matched to each otherjust as well as the last points. The differences between bothare illustrated in figure 9. All distances between points, thatare matched to each other, are summed up. DTW computesthe matching that has the lowest summed up distance con-cerning a given distance function d(x,y). This can be de-fined recursively as follows.

DEFINITION 3. Given a distance-function d(x,y), theDTW(A,B) is defined as:

DTW (A,B) = d(an, bm) +

min (DTW (Am−1, Bn−1), DTW (Am−1, B), DTW (A, Bn−1))

In each recursion step the last points of both trajecto-ries have to be matched to each other. Either both points,or only one of them is left out in the next step, depend-ing on what produces the lowest result. DTW can also becomputed using dynamic programming in a way very sim-ilar to LCSS and takes also time of O(δ(n+m)) when usinga matching window ofδ, and another step with complex-ity O(n) afterwards to get the warping path w between Aand B. Just as with the LCSS a matrix is filled up with theresult of DTW(Ai, Bj) in field(i,j), which is acquired bycomputing d(ai,bj) and adding the minimum of the left, up-per and upper-left field. To get the matchings you start atthe lower-right field. In each step you store the pointsai,bj as correspondent and proceed either with field(ai−1, bj),field(ai, bj−1) or field(ai−1, bj−1) whichever is smaller.

Figure 10 shows the matchings we got, one trajectorysynchronized to the other one and the warping function.

Since all points have to be matched, outliers could have atoo intense impact on how points are matched. To deal withthis, we used a robust measure that restricts the maximumdistance between two points.

Because the value DTW delivers depends on the numberof correspondences, we have to define a normalized simi-larity function by:

6

0 5

1015

20

−200

−100

0

100

−200

−100

0

100

200

X−movement

Time

Y−

mov

emen

t


0 5 10 15 20 25

−300

−200

−100

0

100

Time

X−

mov

emen

t

(b) B (dotted) synchronized to A (solid), synchronized trajec-tory is shown dashed

0 5 10 150

5

10

15

20

25

30


Tim

e of

traj

ecto

ry B

(c) warping path

Figure 10. (a) shows the mapping computedwith DTW,(b) shows both trajectories and onesynchronized to the other, (c) shows a warp-ing path that synchronizes B to A

DEFINITION 4. The similarity function S2 based onthe DTW between two trajectories A and B is defined asfollows:

S2 =DTW (A,B)

|w|

where|w| is the number of matchings, which is slightlyhigher than max(m,n).

Such as with LCSS we get problems, when one trajec-tory has been recorded with a higher update rate. Sinceevery point has to be matched at least once, we would haveto correlate points from the shorter trajectory with multiplepoints from the longer one. This would lead to a snatchy re-play of the synchronized movement but can easily be dealtwith. When one point has multiple correspondent points weonly take one of them into account and drop the other ones.When both trajectories have about the same size, points thatare matched to several points are rare. Therefore this willnot cost us too many points. In general we can keep nearlymin(m,n) pairs of points.

2 Online synchronization

DTW can be computed as one of the two trajectories isrecorded, however it will not give you the similarity mea-sure until the algorithm has ended. The warping path is alsois acquired as a following step. In our application it couldbe very useful to get a mapping that would synchronize atrajectory that is currently recorded to a reference trajec-tory. This would allow us to show a reference motion whilea student tries to imitate it, in which by synchronizing thisreference motion to the students motion, we could adapt thespeed of the reference motion to the students performance.

Another use would be to automatically execute definedactions at certain points of a workflow. So we could, forexample, turn on or turn off augmentations or show certaininformations only for a period of time. This could be doneby recording a reference workflow and assign an action toa certain point in time on the reference trajectory. By syn-chronizing the actual performed motion we could estimatewhen to carry out the action.

Within our video demonstration (see section 3) we showan exemplary illustration of this new concept. The user as-sociates the action to a particular point of the reference 3Dtrajectory. The action is performed when the trainee’s syn-chronized motion reaches the appropriate position withinhis/her online trajectory. To achieve such an online syn-chronization we considered that we could use the inter-mediate data of DTW to get a preliminary result. Figure11 shows the matrix that is generated when computing theDTW. On the z-axis the result of DTW(Ax,By) is drawn.The path shown in the figure is the warping path that is re-quired to synchronize both trajectories. It is obtained by

7

Figure 11. Matrix with intermediate data fromDTW

Figure 12. warping path derived from inter-mediate data DTW delivered (solid was com-puted online, dashed offline)

backtracking the path from the upper-right corner to the ori-gin. When trying to get a mapping while one trajectory isnot finished yet, we cannot use backtracking for the rea-son that we would not know at which field of the matrixto start. But as we can see in figure 11 the warping pathtends to take a course close to the minimum of every row.In figure 12, the warping path for another DTW matrix, ob-tained by backtracking and the path obtained by selectingthe minimum of DTW(Ai,B1) .. DTW(Ai,Bn) in each stepi is shown. This mapping computed while recording onetrajectory, will diverge somewhat from the afterward com-puted one, especially when we have flat areas like in figure12. But in general it delivers a synchronization that does notvary too much from the offline computed one.

Figure 13 shows an online warping. In order to showcomparable results in this figure, we did not synchronize amovement that was recorded during computation. Instead

0 5

1015

20

−200

−100

0

100

−200

−100

0

100

200

X−movement

Time

Y−

mov

emen

t


0 5 10 15 20 25

−300

−200

−100

0

100

Time

X−

mov

emen

t

(b) B synchronized to A

0 5 10 150

5

10

15

20

25

30


Tim

e of

traj

ecto

ry B

(c) warping path

Figure 13. (a) shows the online computedmapping between A and B,(b) shows the on-line synchronization, (c) shows a warpingpath

8

Figure 14. replay of two forceps that havebeen synchronized

we used the same two trajectories as we used for offlinesynchronization. Although our first results using an onlinesynchronization are promising, it is not assured to work inevery case and will surely be less reliable than offline syn-chronization. Further testing and supposably some modifi-cations will be necessary.

3 Results

For a better understanding of our results we recommendto have a look at the video on our website.1 We discoveredthat finding a time-invariant similarity measure between two3d-trajectories and a synchronization in time of both cannot be solved using simple approaches like the one shownin figure 5. However there are problems which are similarlike speech recognition or signature verification, and can besolved using Dynamic Time Warping or Longest CommonSubsequence. We implemented both methods and evaluatedthem for our application. LCSS is able to provide a similar-ity measure and a mapping that synchronizes one trajectoryto the other, as can be seen in figure 6. Problems arise whenboth movements have been recorded using different updaterates. (see figure 8) DTW is also capable of providing asimilarity measure and a synchronization as in figure 10.Problems with different update rates can be overcome by aminor change. As we implemented it in the delivery simu-lator we could test it in an application, where it has shownto provide an appropriate synchronization. Figure 14 showstwo forceps that have been synchronized using DTW. Whileperforming the same action twice you can see that user fol-lows a slightly different trajectory.

Real-time dynamic synchronization of user’s action witha previously recorded one is more challenging. We did afirst step towards solving the problem. For this we com-puted the warping based on the intermediate data of DTWwhich is shown in figure 12. The results are not expected

1http://campar.in.tum.de/pub/Sielhorst2005Synchronizing3D/Sielhorst2005Synchronizing3D.video.avi

to be as good as with offline synchronization (see figure 13and compare with figure 10) but the errors have been withina low range. These results are promising but further workhas to be done on this before it can be applied. The resultsof offline synchronization and comparison of movements isready to be further tested in applications.

3.1 Possible development

In our implementation we only took into account the po-sition of the tracked object and disregarded orientation. Theeffect of including rotation or motion of multiple points onone object should be examined.

Also we used only a basic implementation to test thegeneral ability of DTW to solve our problems. In statis-tics and speech analysis the DTW is widely known, andseveral modifications and extensions have been done. So[13] and others proposed different cost functions, that couldimprove the result, and should be considered. Another com-mon task in statistics is to build average curves based on alarge number of sample curves, which can be solved by amethod using DTW [14]. It can be thought of building anaverage trajectory and analyzing which parts of a workfloware common to all samples and where we can find varieties.Further testing of online warping would be necessary beforean application can be built on this.

4 Discussion/Conclusion

In this paper, we present systems and methods which al-low us to synchronize and compare sequences of captured3D movements. The method is applied to an AR system de-signed for training of physicians and midwives. The DTWgives us a similarity measure that could be used to auto-matically rate the performance of a trainee when trying toreplicate a movement. Using only the position of a sin-gle point of the forceps is for a real application obviouslynot enough, but it helps understanding the problem of ob-ject motion matching with varying velocities. Since the al-gorithm uses an arbitrary distance measure we can easilyinvolve other aspects of the action such as orientation ofthe object, velocity, or even biomechanical data (e.g. mea-sured force) from the delivery simulator. Defining a mean-ingful distance measure is a non-trivial task. For instance,involving orientation the question arises how to representthe physical state of the object and how to define the dis-tance such that it makes sense within our application. In thecase of forceps, we believe that the use of multiple points,minimum of three non collinear ones, makes more sensethan using position and orientation. In addition, motion ofvarious parts of the tool have different effects on the overallresult of the action. Therefore, we aim at defining the dis-tance asweightedsum of distances of corresponding points.

9

For example, parts of the tool which moves inside the pa-tient will be weighted higher than the outside parts. This isone of the subjects of our current research and development.The synchronization process is essential because it not onlyprovides an initial estimate for the performance of the usersbut also enables us to measure user’s performance based onother, often more important, parameters. Force is one ofthe crucial parameters to take into account, speed, acceler-ation and angular velocity could also be considered whenproviding a measure of similarity/performance. Advancedvisualization and HCI design could allow us to provide in-tuitive user interfaces to visualize these parameters and pro-vide detailed measure of success. This is another subject ofour current research and development.

In fact even a system with a well designed similaritymeasure can currently not replace the supervisory com-ments of a professional. The whole action is very complex.It has parts of different importance. An experienced super-visor knows the crucial parts and can include his knowledgeinto his judgment of trainee’s performance. This additionalhigh level knowledge is not yet included in our system. Apossible approach for finding the importance of parts is toacquire many sequences done by different experts and usethe matching algorithm proposed in this paper to find sta-tistically significant common parts of the action, which ingeneral are the most crucial parts. Alternatively, the ex-pert could annotate a reference sequence for labeling cru-cial parts. The automatic matching proposed here wouldthen allow us to transfer this knowledge to trainees.

Nevertheless the warping path that can be obtained whencomputing the similarity measure is very valuable, as itgives us the possibility to visualize both movements simul-taneously and have a look at the differences between twoactions. Since we use augmented reality to replay the syn-chronized movements, we are able to examine them fromdifferent viewpoints and analyze them in a much more tan-gible way than video recordings would offer. Note, thatthe performance velocity is not lost: The warping path (fig.10(c)) contains the relative velocity of both motions. Itcan be used for visualization as mentioned in the previousparagraphs. Also our implementation allows to change thespeed at which the two trajectories are shown, stop themor rewind the replay through which a very detailed analy-sis of the trajectories is possible. In addition, an onlinesynchronization would also be useful as one could providetimely information or execute actions at predefined pointsof a workflow.

Since this is an Augmented Reality application the bestway to appreciate the current achievement is to visualize theresults through the HMD. Our video demonstration presentssome of the images taken by the HMD camera and providessome impression of what the final users observe.

Acknowledgments: We would like to thank Frank Sauer, AliKhamene, and Sebastian Vogt from Siemens Corporate Research in Prince-ton, USA for the courtesy of providing us with the AR system RAMP. Wewould like to thank A.R.T. GmbH, Herrsching, Germany for the courtesyof their multiple camera tracking system. Furthermore, we thank RainerBurgkart and Tobias Obst, Klinikum Rechts der Isar, Munich, Germany fortheir courtesy of their Delivery Simulator.

References

[1] A. Dosis, F. Bello, K. Moorthy, Y. Munz, D. Gillies, andA. Darzi. Real-time synchronization of kinematic and videodata for the comprehensive assessment of surgical skills. InMedicine Meets Virtual Reality (MMVR), 2004.

[2] T. Gasser, A. Kneip, A. Binding, A. Prader, and L. Molinari.The dynamics of linear growth in distance, velocity and ac-celeration.Annals of Human Biology, 18(3):187–205, 1991.

[3] D. Gunopoulos. Discovering similar multidimensional trajec-tories. InICDE ’02: Proceedings of the 18th InternationalConference on Data Engineering (ICDE’02), page 673. IEEEComputer Society, 2002.

[4] W. A. Hoff and T. L. Vincent. Analysis of head pose ac-curacy in augmented reality.IEEE Trans. Visualization andComputer Graphics, 6, 2000.

[5] C. Li, P. Zhai, S.-Q. Zheng, and B. Prabhakaran. Segmenta-tion and recognition of multi-attribute motion sequences. InMULTIMEDIA ’04: Proceedings of the 12th annual ACM in-ternational conference on Multimedia, pages 836–843. ACMPress, 2004.

[6] M. E. Munich and P. Perona. Camera-based id verificationby signature tracking. InECCV ’98: Proceedings of the 5thEuropean Conference on Computer Vision-Volume I, pages782–796. Springer-Verlag, 1998.

[7] T. Obst, R. Burgkart, E. Ruckhaberle, and R. Riener. Thedelivery simulator: A new application of medical vr. InPro-ceedings of Medicine Meets Virtual Reality 12, 2004.

[8] S. Prince, A. D. Cheok, F. Farbiy, T. Williamson, N. Johnson,M. Billinghurst, and H. Kato. 3d live: Real time capturedcontent for mixed reality. InProc. IEEE and ACM Interna-tional on Mixed and Augmented Reality (ISMAR), 2002.

[9] H. Sakoe and S. Chiba. Dynamic programming algorithm op-timization for spoken word recognition.IEEE Trans. Acoust.Speech Signal Process., 26(1):43–49, 1978.

[10] F. Sauer, A. Khamene, and S. Vogt. An augmented realitynavigation system with a single-camera tracker: System de-sign and needle biopsy phantom trial. InProc. Int’l Conf.Medical Image Computing and Computer Assisted Interven-tion (MICCAI), 2002.

[11] T. Sielhorst, J. Traub, and N. Navab. The ar apprenticeship:Replication and omnidirectional viewing of subtle move-ments. InProc. IEEE and ACM International on Mixed andAugmented Reality (ISMAR), 2004.

[12] S. Vogt, A. Khamene, F. Sauer, and H. Niemann. Single cam-era tracking of marker clusters: Multiparameter cluster opti-mization and experimental verification. InIEEE and ACMInternational Symposium on Mixed and Augmented Reality,pages 127–136, 2002.

[13] K. Wang and T. Gasser. Alignment of curves by dynamictime warping.Annals of Statistics, 25(3):1251–1276, 1997.

[14] K. Wang and T. Gasser. Synchronizing sample curves non-parametrically.Annals of Statistics, 27(2):439–460, 1999.

10

Documents

Synchronizing 3D movements for quantitative comparison and