Frame-rate conversion detection based on periodicity of ...hklee.kaist.ac.kr/publications/Multimedia Tools( with Dae-Jin Jung in... · proposed a method to identify individual source

Frame-rate conversion detection based on periodicityof motion artifact

Dae-Jin Jung1

& Heung-Kyu Lee2

Received: 23 June 2016 /Revised: 23 December 2016 /Accepted: 16 February 2017# Springer Science+Business Media New York 2017

Abstract With the advances in digital video technology, it is becoming easier to forge thedigital video without introducing any artificial visual trace. The temporal domain of the digitalvideos is one of the main targets of video tampering, and video frame-rate conversion is one ofthe common operations for temporal video tampering such as temporal splicing and videospeed adjustment. This operation necessarily accommodates temporal interpolation that intro-duces the periodic motion artifact on the motion trajectories. In this paper, the frame-rateconverted video detection method is proposed based on the motion artifact. The experimentalresults demonstrated the performance of the proposed method through the extensive experi-ments on 1300 original videos and 18,000 frame-rate converted videos in uncompressed andH.264/AVC formats. Especially, for the nearest neighbor and motion-based interpolation, theproposed method could detect over than 93.35% of the frame-rate up-converted videos whileexhibiting 0.01 false positive rate.

Keywords Digital forensics . Video forensics . Frame-rate conversion .Motion artifact

1 Introduction

With highly sophisticated IT technology, various multimedia tools which improve the qualityof our life have been developed. Among them, digital video cameras and video editingsoftware are the rate of exponential growth. Eventually, the high quality digital videos canbe found in our daily life. The greatest advantage of the digital video, which provides the highquality immersive viewing environments to viewers compared to digital images, made the

Multimed Tools ApplDOI 10.1007/s11042-017-4519-y

* Heung-Kyu [email protected]

Dae-Jin [email protected]

1 Agency for Defense Development, P.O.Box 35, Yusong-Gu Daejoen, South Korea 341862 School of Computing, KAIST, 291 Daehak-Ro, Yusong-Gu Daejoen, South Korea 34141

http://orcid.org/0000-0002-7136-9070http://crossmark.crossref.org/dialog/?doi=10.1007/s11042-017-4519-y&domain=pdf

digital videos come into mainstream use in various fields. However, the increased use of digitalvideos has also resulted in many misuses. The most common misuse is to forge the frames inthe video frames to fabricate the recorded scene. Combined with highly experienced usersequipped with sophisticated editing software that can doctor the digital videos, the fabricateddigital videos, which are realistic, are produced. Furthermore, it is becoming easier to edit thevideo footages without introducing any artificial visual trace for the ordinary users. The forgeddigital videos could have tremendous effects on various fields such as politics, economy, lawenforcement, and so on. One of the favorable solutions for the problem is the multimediadigital forensics. The multimedia digital forensics provides the forensic information on howthe multimedia data is acquired and processed without any side information [12, 14].

Last few decades have seen considerable development in the area of the digital imageforensics [12, 14]. Moreover, recently, the digital video forensics has been becoming a majorinterest in the area of the multimedia digital forensics [20]; still, more attention is required to bepaid for advancing the techniques to detect various video tampering attacks. Chen et al.proposed a method to identify individual source digital camcorders by analyzing photo-response non-uniformity (PRNU) that has proven to be the most robust digital fingerprint ofthe digital imaging sensor [8]. They tried to mitigate the periodic artifact, which is introducedby lossy video compression process, from estimated PRNU. Wang and Farid proposed amethod to detect doubly-compressed MPEG videos by analyzing double quantization artifact,which is also used in double JPEG compression detection [18, 22], and temporal statisticalperturbations that are introduced by addition/deletion of frames. The further study on theforensic analysis for the frame addition/deletion was taken by Stamm et al. [24]. Theydeveloped a theoretical model to design the new forensic and anti-forensic techniques. Also,a set of methods to evaluate the performance of the new techniques was proposed.

Video frame-rate conversion is one of the common temporal operations in video tampering.When multiple original video clips are used to create a forged video, each original video clip isoften acquired at different frame-rate. Therefore, the frame-rate of each original video clipneeds to be united to create a combined video clip. The frame-rate conversion also can be usedto attack the video watermarking system by desynchronizing the temporal information [21].Furthermore, the frame-rate conversion technique can be used to convert the video playbackspeed while the frame-rate is conserved.

The frame-rate conversion necessarily accommodates temporal interpolation to createframes to be fit into new temporal lattices. Conventional (nearest neighbor and bilinear)temporal interpolation methods have been commonly used for the frame-rate conversion incommercial and free software; however, recently, more advanced frame-rate conversiontechniques that utilize the motion-based frame interpolation [1, 6, 9, 10] have been studiedand utilized in commercial and free software [3]. These three types of temporal interpolationmethods consist of performing linear interpolation operations in a unit of frame or block. Thelinear interpolation operations result in the periodic artifact on the temporal axis.

In this work, the periodic motion artifact is focused to detect the frame-rate converted video. Theframe-rate conversion process introduces a traceable periodic artifact on the motion trajectorieswhich is caused by temporal interpolation. The analysis on the periodic artifact is presented and weexploit it to build the proposed method for the detection of frame-rate conversion. The extensiveexperiments conducted for 1300 original videos and 18,000 frame-rate converted videos. As a result,the test results exhibited the superiority of the proposed method. The average detection accuracycould reach 93.35% on frame-rate converted videos in uncompressed and H.264 format when thenearest neighbor and motion-based interpolation methods were used for frame-rate up-conversion.

Multimed Tools Appl

Furthermore, the proposed method outperformed other detection methods even when bilinearinterpolation method was used. The most outstanding point is that the proposed model can describethe frame-rate down conversion consisting of frame dropping. The remainder of this paper isorganized as follows. Section 2 discusses the related work in frame-rate conversion detection.Section 3 describes the analysis of the periodicity caused by frame-rate conversion. In Section 4 thedetails of the proposed method based on the periodicity analysis is presented. Experimental resultsand conclusion are presented in Section 5 and 6, respectively.

2 Related work

In spite of recent tremendous focus on video forensics, only a few work has been investigated onthe detection of frame-rate conversion [3, 4, 26, 27]. Wang and Farid proposed two differentmethods to detect the frame-rate conversion [26]. Their first method identifies the frame-rateconverted interlaced video sequences by analyzing the motion ratio between inter-field motionand inter-frame motion. With the assumption of the constant motion over at least three sequentialfields, the average value of the motion ratio between inter-field motion and inter-frame motion isalmost 1 on the original video. On the other hand, if a given interlaced video is frame-rateconverted, the average value would not be 1. However, this method only can be applied to theinterlaced video. In their second method, the expectation/maximization (EM) algorithm wasemployed to detect the frame-rate up-conversion. Since only nearest neighbor and bilineartemporal interpolation methods were considered, the relationship between frames was modeledusing simple weighted summation of two temporal neighbors; and the weight factors wereestimated by using EM algorithm. This method can be applied to both progressive and interlacedvideos. However, the it is not suitable for the frame-rate down-conversion, which is done by framedropping, because it cannot be distinguished from original video by their relationship model.

Bian et al. proposed a scheme which targets the detection of frame-rate up-conversion onlywhen the nearest neighbor interpolation method is used [4]. They exploited the periodic inter-frame similarity which is inevitably introduced by frame duplication. To measure the similar-ities between adjacent frames, the structural similarity index measurement (SSIM) [26] wasemployed. They quantized the SSIM values to lower the false positive detection ratio beforethe evaluation of the periodicity. However, their method is dedicated to the frame-rate up-conversion using nearest neighbor interpolation method.

Bestagini et al. proposed a frame-rate conversion detection scheme that is designed for themotion-based interpolation method [3]. The motion-based interpolation involves the block-based interpolation. They focused on the periodicity of the motion-based pixel value errorsalong the motion trajectory. Estimated motion vectors between two consecutive frames wereused to calculated the motion-based errors; and the errors were analyzed in the frequencydomain. If a peak exists in the frequency domain, the input video is determined as a frame-rateconverted video. However, in the light of the consideration of the video compression, theirmethod is not robust to quality degradation. Furthermore, in their experiments, only threesequences of uncompressed images were used for test set.

Above listed studies target only one or two temporal interpolation methods of the frame-rate conversion. Moreover, most of them do not consider the frame-rate down-conversion. Todetect the frame-rate conversion, a new scheme, which can be applied to all three types oftemporal interpolation methods and frame-rate up and down conversion, is needed. Therefore,we propose an approach explicitly devoted to the detection of the frame-rate conversion.

Multimed Tools Appl

3 Analysis on periodicity of frame-rate conversion

In this section, we analyze the periodicity of the frame-rate conversion. To accommodate nearestneighbor, bilinear, andmotion-based temporal interpolation methods, we built a simplifiedmodel.Let an original video sequence beX, whose frame-rate is fpsorg, and its frames are denoted asX(t),where t = 1,… TX. Then, the frame-rate converted video sequence is denoted as Y, whose frame-rate is fpsfrc. The frame length TY of Y is decided by the temporal resampling factor ω (fpsorg/fpsfrc). The video sequence is up-sampled when ω < 1, and it is down-sampled whenω > 1.

We can suppose a temporal interpolation function h(x), whose sum of each element is one,x ∈ℝ. Then the temporal interpolation model is described as below:

YBpos f rc xð Þ ¼X ∞

k¼−∞XBposorg kð Þh

xΔ

−k� �

ð1Þ

whereYB,XB,Δ, posfrc, and posorg denote the blocks of the interpolated frame and the originalframe, the sampling step 1ω

� �, the spatial indexes in the interpolated frame and the original

frame, respectively. Furthermore,ω 1Δ� �

determines the cycle of the change of the interpolation

function and the sum of each element of h(x) is one. For the conventional temporal interpo-lation methods, posfrc = posorg. The nearest neighbor temporal interpolation method requiresonly one non-zero element for h(x), that is, the only a single original frame is required for thetemporal interpolation. On the other hand, the bilinear and motion-based temporal interpola-tion methods use at most two non-zero elements for h(x). The motion compensation for themotion-based interpolation frame-conversion is omitted for the simplicity.

From the Eq. 1, we can derive the equation for the detection of the interpolated videosequence. If the original signal satisfies stationary signal requirements, the periodicity of theinterpolated signal can be detected based on the n -th derivative [15, 17, 19]. The n -thderivative of the interpolated video sequence is defined as

Dn YBpos f rc

n oxð Þ ¼

∂YBpos f rc xð Þ∂xn

; f or n > 0: ð2Þ

Thus, the n -th derivative of the interpolated signal is denoted as shown below:

Dn YBpos f rc

n oxð Þ ¼

X ∞k¼−∞X

Bposorg kð ÞDn hf g

xΔ

−k� �

: ð3Þ

When the video sequence has a stationary property with a variance σ2, the variance of Dn

YBpos f rc

n oxð Þ as a function of temporal index x can be represented by

var Dn YBpos f rc

n oxð Þ

n o¼ σ2

X ∞k¼−∞D

n hf g xΔ

−k� �2

ð4Þ

From above equation, the following equation for γ ∈ℤ is derives as below:

var Dn YBpos f rc

n oxþ γΔð Þ

n o¼ σ2

X ∞k¼−∞D

n hf g xΔ

− k−γð Þ� �2

: ð5Þ

Therefore, the variance is periodic over temporal index x with period Δ [20].

var Dn YBpos f rc

n oxð Þ

n o¼ var Dn YBpos f rc

n oxþ γΔð Þ

n o; γ∈ℤ: ð6Þ

Multimed Tools Appl

Afterwards, the periodicity of the video sequence YBpos f rc can be computed by locating a

peak from the magnitude of the variance signal in the frequency domain [15].

p ¼ peak FFT var Dn YBpos f rcn on o� �� : ð7Þ

The most common n values is two, thus, the second derivative is used to detect theperiodicity. However, above equations use the video frame pixel values, which are vulnerableto be modified by video compression. Thus, another periodic property, which is resilient tovideo compression, is required to detect the temporal interpolation. We propose to use motionvector to substitute the frame pixel values for the detection of the temporal interpolation. Withan assumption of stationary property with a variance σ2, Eq. 1 can be modified as below:

POS f rc xð Þ ¼X ∞

k¼−∞POSorg kð ÞhxΔ

−k� �

ð8Þ

where POSfrc and POSorg denote pixel block position whose pixel blocks are matched. Withthe identical process presented above, it is shown that the derivative of block position exhibitsthe periodicity with period of Δ. The most important point is that even frame-rate down-conversion consisting of frame-dropping can be described by this model.

4 Proposed method

From Section 3, we found that the frame-rate conversion introduces a detectable periodicartifact regardless of the type of temporal interpolation. The periodic artifact can be estimatedusing the pixel block position pairs, and the motion vector is the most proper candidate tomatching the pairs. Therefore, begin by assuming that the magnitude of each motion along themotion trajectory is almost constant across a small group of sequential frames in the originalvideo (stationary property). Figure 1 depicts the periodic motion artifact introduced by theframe-rate conversion that uses nearest neighbor interpolation. To detect the motion artifact,we propose a method that is composed of three steps. First, two motions MV1(i, j, t) andMV2(i, j, t) are estimated at each time t. Then, the motions that are not suitable for periodicitydetection are removed by motion pruning process. Finally, the periodicity of MA(i, j, t), whichis motion artifact, is measured in the frequency domain. The details of the proposed method aredescribed in the following subsections.

Fig. 1 The periodic motion artifact introduced by the temporal up/down resampling that uses nearest neighborinterpolation (t axis represents time index in the resampled video sequences): (a) frame-rate up-conversion (fpsorg=15, fpsfrc =30): zero motions appear due to the frame duplication (dashed frames are interpolated frames usingnearest neighbor method); (b) frame-rate down-conversion (fpsorg =15, fpsfrc =10): motion jitters (big motions)appear due to the frame drop (dashed frames are dropped frames using nearest neighbor method)

Multimed Tools Appl

4.1 Motion estimation

At each time t two different motions MV1(i, j, t) and MV2(i, j, t) are estimated. MV1(i, j, t)stores motions between frames at time t − 1 and t, also, MV2(i, j, t) stores motions betweenframes at time t and t + 1. To estimate the motions, each frame, whose resolution is M ×N, isdivided into B × B blocks. Then, the motion for each block centered in (i ⋅ round(B/2),jround(B/2)), where i = 1, …, floor(N/B) and i = 1, …, floor(M/B), is estimated. That is, theresolutions of MV1(i, j, t) and MV2(i, j, t) are floor(M/B) × floor(N/B).

To estimate the motion between frames, the classic optical flow method is considered [2,16]. The motion between the frames is modeled with a two-parameter translation (2D motion).For the brightness constancy assumption and the robustness to video compression, each videoframe is converted to YCbCr color channels; and then, only the Y channel is acquired for themotion estimation. Let f(x, y, t) be the Y channel of the given video sequence, then, the motionMV(i, j, t) = (Δxt, Δyt)

T between two consecutive frames is described as follows:

f x; y; t þ 1ð Þ≈ f xþΔxt; yþΔyt; tð Þ ð9Þwhere △ xt and △ xt denote the change in pixel position at time t. We can obtain the optimalsolution (Δxt, Δxt) by solving the error minimization problem. For more accurate motionestimation, Wang and Farid’s method is employed [13, 26, 27]. Since the motion estimationusing optical flow method is limited to small motions due to the Taylor approximation, 3-levelimage pyramid is used to estimate the large magnitude of the estimated motion [5, 23].

4.2 Motion pruning

The estimated motion vectors are validated before measuring the periodicity of the motionartifact. The consideration for three criterions results in the trajectory map (TM) that stores themotion validity information.

4.2.1 Sum of absolute difference (SAD)

For each motion vector, motion estimation error is tested using SAD measure. Since theestimated motion vector is in sub-pixel precision, SAD is calculated as follows:

SAD ¼X

x;y∈Ωf x; y; tð Þ− f xþ round Δxtð Þ; yþ round Δytð Þ; t þΔtð Þ ð10Þ

where f(Δ), Δxt, Δyt, t, and Ω denote Y channel frame, motion vector, time shift, and motionestimation block region, respectively. By thresholding the SAD value, each motion is assessedwhether it is valid or not. If the thresholded SAD value is one (i.e., the SAD value is greaterthan a given threshold), the motion vector is not valid. To validate whether the motionsMV1(i,j, t) and MV2(i, j, t) lie on the motion trajectory, the thresholded SAD value map (TSVM) isconstructed as follows:

TSVM i; j; tð Þ ¼ TSVM1 i; j; tð Þ∨TSVM2 i; j; tð Þ ð11Þwhere TSVM1(i, j, t) and TSVM2(i, j, t) represent the thresholded SAD value, which arecalculated using MV1(i, j, t) and MV2(i, j, t), respectively. The corresponding threshold is setas 10 in the proposed method.

Multimed Tools Appl

4.2.2 Motion direction

If the stationary property is assumed, consecutive motions on the motion trajectory need topresent small amount of angle difference. Thus, the motion direction map (MDM) is con-structed as follows:

MDM i; j; tð Þ ¼ 1; ADA i; j; tð Þ > τangle0; ADA i; j; tð Þ≤τangle

�ð12Þ

where ADA(i, j, t) denotes the absolute difference between ∠MV1(i, j, t) and ∠MV2(i, j, t). Totolerate small variation in the motion angle, τangle is set as 45

°.

4.2.3 Background

Since the background content do not contribute to estimate the motion artifact, the relatedmotions need to be pruned. The object that does not have any motion while three consecutiveframes pass is assumed to be the background content. The background map (BM) is con-structed as follows:

BM i; j; tð Þ ¼ 1; if MV1 i; j; tð Þ ¼ 0 and MV2 i; j; tð Þ ¼ 00; otherwise

�ð13Þ

After above three criterions are tested, TM is constructed as follows:

TM i; j; tð Þ ¼ : SAD i; j; tð Þ∨MDM i; j; tð Þ∨BM i; j; tð Þð Þ ð14Þwhere ¬ denotes negation bit operation. TM stores the motion validity information. If TM(i, j, t)is non-zero, the corresponding motions are assumed to be on the motion trajectory; otherwise,the corresponding motions are excluded in the periodicity calculation.

4.3 Periodicity detection

Based on the analysis in Section 3, the second derivative of the pixel block position iscalculated at each time index t. To simplify the calculation, only the scalar value(magnitude) of the motion vector is used. Since the magnitude of each motion vector is thefirst derivative, the second derivative is simply defined as the error between |MV2(i, j, t)| and|MV1(i, j, t)|. Furthermore, to enhance this error value, we calculate the normalizederror E(i, j, t) as

E i; j; tð Þ ¼MV2 i; j; tð Þj j− MV1 i; j; tð Þj j

MV1 i; j; tð Þj j ; if MV1 i; j; tð Þj j≠00 ; if MV1 i; j; tð Þj j ¼ 0

8<: ð15Þ

Afterwards, with the motion validity information TM(i, j, t), the proposed motion artifactmeasurement MA(t) is calculated as below:

MA tð Þ ¼ 1Xi; jTM i; j; tð Þ

Xi; jE i; j; tð Þ⋅TM i; j; tð Þ ð16Þ

Multimed Tools Appl

After the calculation at each time index t, one-dimensional array MA(t) is obtained. If anoriginal video is given as an input, the values of MA(t) would be around zeros (stationaryproperty assumption) and non-periodic. On the other hand, if MA(t) is obtained from theframe-rate converted video, it would be periodic and the values on MA(t) that create theperiodicity would not be around zero.

The periodicity of MA(t) can be analyzed in the frequency domain. MA(t) is transformedinto the frequency domain using Discrete Fourier Transform (DFT). Let the transformed signalbe FMA(ξ), then, the peak location is searched. If the magnitude of the peak is at least τmagtimes greater than the average spectrum magnitude, the input video is claimed as the frame-rateconverted video. However, to avoid false positive errors, certain proportions (two percent)from the top and bottom of the normalized frequency domain are trimmed before the peakvalidation. Figures 2 and 3 present the examples for original and frame-rate converted (nearestneighbor interpolation) videos in spatial and frequency domains, respectively.

Once the periodicity is estimated, the original frame-rate (fpsorg) can be directly estimatedusing the position of the peak in the spectrum and the interpolation factor (ω = fpsorg/fpsfrc) [17]:

ω ¼ n� Δ f ð17Þ

whereΔf and n denote the peak position in the normalized frequency domain and the integer (n -th derivative), respectively. However, when frame-rate down-converted, the periodic artifacts arecoincident with those of other frame-rate up-conversion due to aliasing [17]. Thus, the estimationof fpsorg might not be accurate.

5 Experimental results

In this section, the performance of the proposed method is evaluated. The specific settings forthe experiments are also provided. Furthermore, the results of the experiments are presentedand analyzed. For the experiments, 50 original video sequences, whose format is uncom-pressed YUV, were collected [7, 11, 25]. The original video set included various types ofcontents such as sports, news, animation, surveillance, and so on. The resolutions vary from176 × 144 (CIF) to 1920 × 1080 (full-HD). Since the proposed method is based on the motionvector, fast motion video sequences were also included. Their frame lengths were equal to orless than 300. For the frame-rate conversion, original video sequences were saved using fivedifferent frame-rates, which include 15, 20, 24, 25, and 30. Furthermore, each original videowas re-saved using H.264/AVC codec. For the robustness test to the video compression, whichis commonly conducted before saving the videos, five types of compression factors (100, 90,80, 70, and 60) were selected and the key frame interval was 20. When the frame-rates weredifferent, the original videos showed different pixel values even though they were compressedusing identical compression factor, except uncompressed ones. Therefore, 1300 (50 × 5 × 5 +50) original videos were prepared. Frame-rate conversion was performed for all pairs (20combinations) that can be made using fpsorg and fpsfrc. Furthermore, three types of temporalinterpolation methods, that is, ‘nearest neighbor interpolation’, ‘bilinear interpolation’ and‘motion-based interpolation’ methods [9], were used in frame-rate conversion. As a result,18,000 (50 × 6 × 20 × 3) frame-rate (up/down) converted videos were created for theexperiments. The experiments were tested on a PC equipped with Intel i7-2600 (3.4 GHz)CPU, 8 GB of RAM, and MATLAB 2014a.

Multimed Tools Appl

In the experiments, in order to demonstrate the performance of the proposed method, wecompared the proposed method with three different frame-rate conversion detection methods [3,4, 27]. To provide objective test results, true positive ratios were checked at three different falsepositive ratios (0, 0.01, and 0.03) for each method. For this purpose, parameter τmag, for theproposed method, was adaptively chosen by changing the false positive error. At 0.01 false positiveerror ratio, τmag was about 4.31. To measure the performance of Wang and Farid’s method, theirsecondmethod was used because their first method cannot be applied to progressive video [27]. Forthe test results, we applied our peak validation to the result (the array of probabilities) of EM

Fig. 2 Illustrations of the motion artifacts MA(t) for original and frame-rate converted videos: (a) Original video(‘Bus’) at 30 fps; (b) the frame-rate up converted video from 24 fps to 30 fps using nearest neighborinterpolation; (c) the frame-rate down-converted video from 30 fps to 24 fps using nearest neighbor interpolation

Multimed Tools Appl

algorithm since the authors did not provide a specific measuring tool for the periodicity check. ForBian’s method [4], its first parameter τ1 was set as the authors described. However, the secondparameter τ2 was adaptively selected to present true positive ratio at three different false positiveratios. For Bestagini’smethod, themax value peak in the frequency domain is selected [3]. Since thisdecisionmaking is not the threshold-based, our peak validationwas used to provide the pairs of falsepositive error rate and corresponding true positive error rate.

5.1 Frame-rate up-conversion test

In this section, the experiment results for the frame-rate up-conversion are presented. Uncompressedvideos and H.264/AVC compressed videos, which were encoded using five different compressionfactors (Q100, Q90, Q80, Q70, and Q60), were used in the extensive experiments. Table 1 exhibitsthe detection results for videos that were frame-rate up-converted using nearest neighbor interpola-tion. The test results in Table 1 demonstrate that the proposedmethod outperforms the othermethods

Fig. 3 Illustrations of the Fourier spectrum for ‘Bus’ in Fig. 2. The horizontal axis indicates normalizedfrequencies

Multimed Tools Appl

except for the case of converting from 24fps to 25 fps. Most detection methods exhibited the finedetection results. Especially, in the case of converting from 15 fps to 30 fps, most detectionmethodsexhibited the best detection result compared to other frame-rate conversion configurations. Everyduplicated frame, which was positioned in right after the original frame, created the strong andperiodic artifact signal and it resulted in the highest detection result. However, Bestagini’s methodexhibited low detection result in this case. Since the sign information is lost in the calculation of thesquared error, which is used in Bestagini’s method, it is difficult to detect the periodicity in this case.For the case of converting from 24 fps to 25 fps, where the interpolation factor ω is close to 1, everydetection method exhibited the most degraded performance. Although most detection methodsexhibited the fine detection results, Bestagini’s method exhibited the comparatively low detectionrate because of non-precise motion estimation and the lack of robustness to video compression.

Table 2 presents the performance for frame-rate up-conversion that uses the bilinearinterpolation method. Compared to Table 1, the overall test results were degraded. The pixelvalue changes by interpolation made the motion estimation less accurate, which led to theperformance degradation of the proposed method. However, it still exhibited the acceptabledetection rate and outperformed other methods. Bian’s method exhibited the poor performance

Table 1 The compared test results for frame-rate up-conversion using ‘nearest neighbor interpolation’

fpsorg fpsfrc FPR Wang and Farid(TPR)

Bian et al.(TPR)

Bestagini et al.(TPR)

Proposed method(TPR)

15 20 0.00 0.90 1.00 0.73 1.000.01 0.90 1.00 0.73 1.000.03 0.92 1.00 0.74 1.00

24 0.00 0.94 0.99 0.73 1.000.01 0.94 1.00 0.74 1.000.03 0.94 1.00 0.75 1.00

25 0.00 1.00 1.00 0.68 1.000.01 1.00 1.00 0.68 1.000.03 1.00 1.00 0.71 1.00

30 0.00 1.00 1.00 0.16 1.000.01 1.00 1.00 0.16 1.000.03 1.00 1.00 0.17 1.00

20 24 0.00 0.66 0.97 0.64 1.000.01 0.68 0.97 0.64 1.000.03 0.72 0.97 0.64 1.00

25 0.00 0.76 0.97 0.67 1.000.01 0.76 0.97 0.67 1.000.03 0.78 0.99 0.69 1.00

30 0.00 0.96 1.00 0.75 1.000.01 0.96 1.00 0.75 1.000.03 0.96 1.00 0.76 1.00

24 25 0.00 0.00 0.78 0.15 0.670.01 0.00 0.78 0.15 0.690.03 0.00 0.83 0.16 0.73

30 0.00 0.78 0.97 0.67 1.000.01 0.78 0.97 0.67 1.000.03 0.82 0.98 0.68 1.00

25 30 0.00 0.66 0.96 0.64 1.000.01 0.68 0.96 0.64 1.000.03 0.72 0.97 0.64 1.00

Six types (‘Uncompressed’, ‘Q100’, ‘Q90’, ‘Q80’, ‘Q70’, ‘Q60’) compression factors are used in video encoding

Multimed Tools Appl

for the bilinear interpolation because its quantization process prevented it from detectin theperiodicity of the SSIM sequence.

Table 3 presents the performance for frame-rate up-conversion that uses the motion-based interpolation method. Compared to Table 1 and 2, every detection method exclud-ing Bestagini’s method and the proposed method exhibited severely low detectionresults. The basic mechanism for those methods is not appropriate for the motion-based interpolation. Bestagini’s method exhibited low detection results considering thatit was designed to detect motion-based interpolated video sequences. The motion com-pensation of the motion-based frame-rate conversion and the light of the consideration ofwrongly estimated motions lowered the performance. The proposed method was lessaffected by motion compensation since the motion compensation changes pixel valuesbut not pixel locations; it outperformed other methods. However, since the periodicsignal was not strong enough, the proposed method required more frames to presentproper results compared to other detection methods.

Figure 4 compares the test results at each different level of compression factors.The detection results depicted in Fig. 4 were sampled when the FPR was 0.01. The

Table 2 The compared test results for frame-rate up-conversion using ‘bilinear interpolation’


Bian et al.(TPR)



15 20 0.00 0.64 0.06 0.70 0.850.01 0.64 0.08 0.70 0.860.03 0.68 0.08 0.70 0.86

24 0.00 0.78 0.12 0.70 0.980.01 0.78 0.14 0.71 0.980.03 0.80 0.14 0.72 0.98

25 0.00 0.80 0.16 0.70 1.000.01 0.80 0.16 0.70 1.000.03 0.80 0.18 0.71 1.00

30 0.00 0.94 0.12 0.77 0.770.01 0.94 0.12 0.77 0.780.03 0.94 0.14 0.78 0.82

20 24 0.00 0.64 0.08 0.73 0.680.01 0.64 0.08 0.73 0.690.03 0.72 0.10 0.74 0.70

25 0.00 0.68 0.08 0.72 0.760.01 0.68 0.10 0.73 0.760.03 0.70 0.12 0.74 0.77

30 0.00 0.78 0.12 0.75 0.950.01 0.78 0.12 0.76 0.950.03 0.80 0.14 0.77 0.96

24 25 0.00 0.34 0.02 0.16 0.320.01 0.36 0.02 0.16 0.340.03 0.42 0.02 0.17 0.36

30 0.00 0.68 0.08 0.72 0.760.01 0.68 0.10 0.72 0.760.03 0.70 0.12 0.74 0.77

25 30 0.00 0.64 0.08 0.72 0.670.01 0.66 0.08 0.73 0.690.03 0.72 0.10 0.73 0.70


Multimed Tools Appl

average detection rate of the proposed method was about 88% (97%, 78%, and 90%for nearest neighbor, bilinear interpolation, and motion-based interpolation methods).Bian’s method and the proposed method exhibited the robustness against H.264/AVCcompression in case of the nearest neighbor interpolation was used to convert theframe-rates. However, Wang and Farid’s method and Bestagini’s method presentedlower detection results along the compression factor decreases. In case of bilinearinterpolation, every detection method was affected by the change of the compressionfactor. It is because the bilinear interpolation process creates the high-frequencycontents, which is easily quantized by lossy video compression, in the interpolatedframes.

5.2 Frame-rate down-conversion test

In this section, the experiment results for the frame-rate down-conversion are present-ed. Table 4, 5 and 6 present the performance for frame-rate down-converted videos.Before analyzing the test results, the case of the frame-rate conversion from 30 fps to

Table 3 The compared test results for frame-rate up-conversion using ‘motion-based interpolation’


Bian et al.(TPR)



15 20 0.00 0.19 0.03 0.30 0.970.01 0.20 0.05 0.31 0.980.03 0.21 0.08 0.35 0.98

24 0.00 0.21 0.00 0.45 0.960.01 0.22 0.00 0.45 0.960.03 0.22 0.00 0.45 0.96

25 0.00 0.23 0.00 0.44 0.950.01 0.24 0.00 0.44 0.970.03 0.24 0.00 0.46 0.97

30 0.00 0.25 0.12 0.48 0.940.01 0.25 0.12 0.49 0.940.03 0.25 0.13 0.49 0.94

20 24 0.00 0.06 0.00 0.44 0.910.01 0.06 0.00 0.44 0.910.03 0.07 0.00 0.45 0.92

25 0.00 0.11 0.01 0.44 0.940.01 0.11 0.01 0.46 0.940.03 0.12 0.03 0.46 0.94

30 0.00 0.23 0.01 0.41 0.990.01 0.23 0.02 0.41 0.990.03 0.23 0.03 0.43 0.99

24 25 0.00 0.00 0.00 0.07 0.440.01 0.00 0.00 0.07 0.440.03 0.00 0.00 0.07 0.49

30 0.00 0.11 0.01 0.40 0.940.01 0.11 0.01 0.40 0.940.03 0.11 0.03 0.40 0.94

25 30 0.00 0.06 0.05 0.44 0.910.01 0.06 0.06 0.44 0.910.03 0.06 0.07 0.45 0.92


Multimed Tools Appl

15 fps is considered. This frame-rate conversion does not introduce any temporalartifact. More specifically, when the interpolation factor ω is an integer, no further

Fig. 4 Detection accuracies for frame-rate up-converted videos at the different compression factors (Thecorresponding false positive rates are 0.01)

Multimed Tools Appl

interpolation process is done; only frame dropping is done. As a result, the detectorcannot distinguish the frame-rate converted videos from the original videos. Indeed,the corresponding test results in Table 4, 5, and 6 exhibits zero detection rates at zerofalse positive ratio.

As analyzed in the previous subsection, the frame-rate conversion that uses nearestneighbor interpolation exhibits the higher detection rates for the proposed method.The remarkable point in Table 4 is that only the proposed method exhibited theacceptable test results (over than 95%). For the bilinear interpolation test, Bestagini’smethod exhibited best detection results. The designs of the other methods are notsuitable to detect the frame-rate down-conversion because the dropped frames (includ-ing ‘bilinear interpolation’) do not introduce the correlations between adjacent frames.Their test results are considerably degraded compared with those of the frame-rate up-conversion. The proposed method also exhibited degraded performance for bilinearinterpolation.

Table 4 The compared test results for frame-rate down conversion using ‘nearest neighbor interpolation’


Bian et al.(TPR)



30 25 0.00 0.00 0.00 0.54 0.980.01 0.00 0.00 0.54 0.980.03 0.00 0.00 0.54 0.99

24 0.00 0.00 0.00 0.45 0.990.01 0.00 0.00 0.46 0.990.03 0.00 0.00 0.46 1.00

20 0.00 0.00 0.00 0.10 1.000.01 0.00 0.00 0.12 1.000.03 0.00 0.00 0.12 1.00

15 0.00 0.00 0.00 0.00 0.000.01 0.00 0.00 0.00 0.000.03 0.00 0.00 0.00 0.00

25 24 0.00 0.00 0.00 0.12 0.670.01 0.00 0.00 0.13 0.680.03 0.00 0.02 0.15 0.70

20 0.00 0.00 0.00 0.45 0.990.01 0.00 0.00 0.46 0.990.03 0.00 0.02 0.46 0.99

15 0.00 0.00 0.00 0.46 0.990.01 0.00 0.00 0.46 0.990.03 0.00 0.02 0.49 0.99

24 20 0.00 0.00 0.00 0.54 0.980.01 0.00 0.00 0.54 0.980.03 0.02 0.00 0.54 0.98

15 0.00 0.00 0.00 0.16 0.990.01 0.00 0.00 0.16 0.990.03 0.00 0.00 0.16 0.99

20 15 0.00 0.00 0.00 0.43 1.000.01 0.00 0.00 0.43 1.000.03 0.00 0.00 0.43 1.00

Six types (‘Uncompressed’, ‘Q100’, ‘Q90’, ‘Q80’, ‘Q70’, ‘Q60’) compression factors are used in videoencoding. The frame-rate down conversion from 30 fps to 15 fps does not leave any temporal interpolationartifact

Multimed Tools Appl

Figure 5 compares the test results at each different level of compression factors.The detection results depicted in Fig. 5 were sampled when the FPR was 0.01. Bian’smethod and the proposed method exhibited the robustness against H.264/AVC com-pression when the nearest neighbor interpolation was used to convert the frame-rates.However, Wang and Farid’s method and Bestagini’s method presented lower detectionresults along the compression factor decreases.

It is also interesting to analyze how the frame length affects the detection results. Figure 6exhibits the average detection rate along the frame length increases for each detection methodwhen the original fps was 20. In most cases, the proposed method outperformed other methodsand required minimum number of frame to reach the maximum detection ratio. Although theproposed model accommodates the frame dropping, the frame-rate down-conversion testrequired more frames than the frame-rate up-conversion test to reach maximum detectionratio. For nearest neighbor frame-rate conversion, most detection methods were possible toreach the maximum detection results with small number of frames. In case of bilinear frame-rate conversion, more frames were required to reach the maximum detection ratio.

Table 5 The compared test results for frame-rate down conversion using ‘bilinear interpolation’


Bian et al.(TPR)



30 25 0.00 0.36 0.03 0.42 0.440.01 0.36 0.03 0.43 0.440.03 0.38 0.06 0.43 0.44

24 0.00 0.24 0.03 0.34 0.440.01 0.24 0.03 0.34 0.450.03 0.26 0.03 0.35 0.46

20 0.00 0.36 0.01 0.55 0.310.01 0.36 0.01 0.55 0.320.03 0.38 0.01 0.57 0.32

15 0.00 0.00 0.00 0.00 0.000.01 0.00 0.00 0.00 0.000.03 0.00 0.00 0.00 0.00

25 24 0.00 0.32 0.02 0.48 0.150.01 0.32 0.02 0.50 0.150.03 0.34 0.02 0.51 0.20

20 0.00 0.24 0.03 0.34 0.430.01 0.24 0.03 0.34 0.440.03 0.26 0.03 0.34 0.46

15 0.00 0.14 0.01 0.41 0.500.01 0.14 0.01 0.41 0.500.03 0.16 0.02 0.41 0.51

24 20 0.00 0.36 0.03 0.42 0.440.01 0.36 0.03 0.43 0.440.03 0.38 0.05 0.43 0.44

15 0.00 0.14 0.01 0.32 0.620.01 0.14 0.01 0.32 0.620.03 0.14 0.01 0.33 0.63

20 15 0.00 0.26 0.01 0.46 0.550.01 0.28 0.01 0.47 0.560.03 0.30 0.02 0.47 0.64


Multimed Tools Appl

5.3 Processing time test

In this subsection, the processing time of each method is compared. Table. 7 describesthe processing time of each method. To avoid the slow loop operations of MATLAB,most of core algorithms were implemented using C++ and MEX-compiled. Wang andFarid’s method was the fastest method among the comparing methods considering onlythe algorithm computation time. However, their method required the considerableamount of memory since every frame should be loaded for the computation. It madetheir method operate slowest considering the whole processes, including video frameloading, when full-HD resolution video was on the examination. The proposed methodrequired much time to estimate the motions between frames. To reduce the processingtime, the frames whose resolution was larger than 4CIF (704 x 480) size were resized toCIF size. Since the proposed method uses the motion vectors, the detection rate degra-dation due to resizing was not considerable. About 4% of true positive ratio at 0.01 falsepositive was degraded. Since SSIM is the most time consuming part of Bian’s method,

Table 6 The compared test results for frame-rate down conversion using ‘motion-based interpolation’


Bian et al.(TPR)



30 25 0.00 0.08 0.00 0.32 0.920.01 0.12 0.00 0.33 0.920.03 0.12 0.00 0.33 0.93

24 0.00 0.08 0.00 0.32 0.920.01 0.12 0.00 0.33 0.920.03 0.12 0.00 0.33 0.93

20 0.00 0.00 0.00 0.15 0.720.01 0.00 0.00 0.15 0.720.03 0.00 0.00 0.16 0.72

15 0.00 0.00 0.00 0.00 0.000.01 0.00 0.00 0.00 0.000.03 0.00 0.00 0.00 0.00

25 24 0.00 0.00 0.00 0.34 0.560.01 0.00 0.00 0.34 0.560.03 0.00 0.00 0.35 0.61

20 0.00 0.09 0.00 0.33 0.920.01 0.12 0.00 0.33 0.920.03 0.12 0.00 0.33 0.93

15 0.00 0.00 0.01 0.21 0.730.01 0.00 0.01 0.21 0.730.03 0.00 0.02 0.22 0.74

24 20 0.00 0.00 0.00 0.28 0.920.01 0.00 0.00 0.29 0.920.03 0.02 0.00 0.30 0.92

15 0.00 0.00 0.00 0.17 0.820.01 0.00 0.00 0.17 0.820.03 0.00 0.00 0.18 0.83

20 15 0.00 0.00 0.00 0.06 0.840.01 0.00 0.00 0.06 0.850.03 0.00 0.00 0.07 0.88


Multimed Tools Appl

the input video resolution affected the processing time. Since Bestagini’s method usesmotion vector, the processing time was comparable with the proposed method.

Fig. 5 Detection accuracies for frame-rate down-converted videos at the different compression factors (Thecorresponding false positive rates are 0.01 and the case of converting frame-rate from 30 fps to 15 fps isexcluded)

Multimed Tools Appl

6 Conclusion

In this paper, we presented amethod to assess if a video has been frame-rate converted. Video frame-rate conversion is one of the most common temporal domain operations in video tampering. Toaccommodate three types of temporal interpolation methods, which is mandatory for frame-rateconversion, we proposed a model and the periodic artifact on the motion trajectories were analyzed.The proposedmethod estimates themotion vectors according to themotion trajectories. Afterwards,the periodicity of the estimated motion artifacts is assessed. The proposed method demonstrated itsperformance through the extensive experiments on frame-rate converted videos. Furthermore, bycomparing the test results with other frame-rate conversion detection methods, the superiority of the

Fig. 6 Detection accuracies analysis for different temporal window size

Multimed Tools Appl

proposed method was exhibited. Especially, for the frame-rate is down-converted using nearestinterpolation, only the proposed method presented the detection rate over than 95%. Moreover, thetest results proved that the proposed method is even valid on small number of frames, which allowsthe proposed method to be used as a possible tool for frame copy and paste forgery. However, thefurther consideration for the bilinear interpolation in temporal domain is required. Our future workwill include the further investigation on the robustness in frame-rate down-conversion and bilinearinterpolation.

Acknowledgements This work was supported by the National Research Foundation of Korea(NRF) grant fundedby theKorea government(MSIP) (No. 2016R1A2B2009595), and by the Institute for Information& communicationsTechnology Promotion (IITP) grant funded by the Korean government (MSIP) (No.R0126-16-1024, ManagerialTechnology Development and Digital Contents Security of 3D Printing based on Micro Licensing Technology).

References

1. Ascenoso J, Brites C, Pereira F (2005) Improving frame interpolation with spatial motion smoothing forpixel domain distributed video coding. In: 5th EURASIP Conference on Speech and Image Processing,Multimedia communications and Sevices, pp. 1-6. Citeseer

2. Barron JL, Fleet DJ, Beauchemin SS (1994) Performance of optical flow techniques. Int J Comput Vis12(1):43–77

3. Bestagini P, Battaglia S, Milani S, Tagliasacchi M, Tubaro S (2013) Detection of temporal interpolation invideo sequences. 2013 I.E. Int Conf Acoustics, Speech Signal Process: 3033–3037. doi 10.1109/ICASSP.2013.6638215

4. Bian S, Luo W, Huang J (2014) Detecting video frame-rate up-conversion based on periodic properties ofinter-frame similarity. Multimed Tools Appl 72(1):437–451

5. Bouguet JY (2013) Pyramidal implementation of the lucas kanade feature tracker description of thealgorithm. Cit’e en: 69

6. Castagno R, Hassvisto P, Ramponi G (1996) A method for motion adaptive frame rate up-conversion. CircSyst Video Technol, IEEE Trans 6(5):436–446

7. Center for image processing research sequences. URL http://www.cipr.rpi.edu/resource/sequences/. (lastaccess: Oct. 2015)

8. ChenM, Fridrich J,GoljanM,Luka’ˇs J (2007) Source digital camcorder identification using sensor photo responsenon-uniformity. In: Electronic Imaging 2007, pp. 65,051G– 65,051G. International Society for Optics andPhotonics

9. Choi BD, Han JW, Kim CS, Ko SJ (2007) Motion-compensated frame interpolation using bilateral motionestimation and adaptive overlapped block motion compensation. Circ Syst Video Technol, IEEE Trans17(4):407–416

Table 7 Processing time (in second) comparison (100 frames were used and frame loading was excluded)

Wang and Farid Bian et al. Bestagini et al. Proposed

4CIF resolution 2.4 4.9 26.1 29.1Full-HD resolution 5.9 126.0 323.1 331.1

Multimed Tools Appl

http://dx.doi.org/10.1109/ICASSP.2013.6638215http://dx.doi.org/10.1109/ICASSP.2013.6638215http://www.cipr.rpi.edu/resource/sequences/

10. Choi BT, Lee SH, Ko SJ (2000) New frame rate up-conversion using bi-directional motion estimation.Consumer Electro, IEEE Trans 46(3):603–609

11. Dash dataset at itec/alpen-adria-universita¨t klagenfurt. URL http://www-itec.uni-klu.ac.at/dash/?page_id=207. (last access: Oct. 2015)

12. Farid H (2009) Image forgery detection. Sign Process Mag, IEEE 26(2):16–2513. Farid H, Simoncelli EP (2004) Differentiation of discrete multidimensional signals. Imag Process, IEEE

Trans 13(4):496–50814. Fridrich J (2009) Digital image forensics. Sign Process Mag, IEEE 26(2):26–3715. Gallagher AC (2005) Detection of linear and cubic interpolation in jpeg compressed images. In: Computer

and Robot Vision, 2005. Proc 2nd Can Conf: 65–72. IEEE16. Horn B (1986) Robot vision. MIT press17. Kirchner M (2008) Fast and reliable resampling detection by spectral analysis of fixed linear predictor

residue. Proc 10th ACM Workshop Multimed Sec: 11–20. ACM18. Luka’ˇs J, Fridrich J (2003) Estimation of primary quantization matrix in double compressed jpeg images.

Proc Digit Forensic Res Workshop: 5–819. Mahdian B, Saic S (2007) On periodic properties of interpolation and their application to image authenti-

cation. In: Information Assurance and Security, 2007. IAS 2007. Third Int Symp: 439–446. IEEE20. Milani S, Fontani M, Bestagini P, Barni M, Piva A, Tagliasacchi M, Tubaro S (2012) An overview on video

forensics. APSIPA Trans Sign Inform Process 1:e221. Paul RT (2011) Review of robust video watermarking techniques. IJCA Special Issue Computat Sci 3:90–9522. Popescu AC, Farid H (2005) Statistical tools for digital forensics. In: Information Hiding, pp. 128–147. Springer23. Simoncelli EP (1999) Bayesian multi-scale differential optical flow24. Stamm MC, Lin WS, Liu K (2012) Temporal forensics and anti-forensics for motion compensated video.

Inform Forensics Sec, IEEE Trans 7(4):1315–132925. Video trace library yuv video sequences. URL http://trace.eas.asu.edu/yuv/. (last access: Oct. 2015)26. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to

structural similarity. Imag Process, IEEE Trans 13(4):600–61227. Wang W, Farid H (2007) Exposing digital forgeries in interlaced and deinterlaced video. Inform Forensics

Sec, IEEE Trans 2(3):438–449

Dae-Jin Jung received his BS degree in information and computer engineering fromAjou University, Korea, in 2010,and MS and PhD degrees in computer science from Korea Advanced Institute of Science and Technology (KAIST),Korea, in 2012 and 2016, respectively. Since 2016, he has been a senior researcher in the Agency for DefenseDevelopment, Korea. His research interests include digital multimedia forensics, image/videowatermarking, informationsecurity, and multimedia signal processing.

Multimed Tools Appl

http://www-itec.uni-%20klu.ac.at/dash/?page%20id=207http://www-itec.uni-%20klu.ac.at/dash/?page%20id=207http://trace.eas.asu.edu/yuv/

Heung-Kyu Lee received a BS degree in electronics engineering from Seoul National University, Seoul, Korea, in1978, andMS and PhD degrees in computer science fromKorea Advanced Institute of Science and Technology, Korea,in 1981 and 1984, respectively. Since 1986 he has been a professor in the School of Computing, KAIST. He hasauthored/coauthored over 100 international journal and conference papers. He has been a reviewer of many internationaljournals, including Journal of Electronic Imaging, Real-Time Imaging, and IEEE Trans. on Circuits and Systems forVideo Technology. His major interests are digital watermarking, digital fingerprinting, and digital rights management.

Multimed Tools Appl

Frame-rate conversion detection based on periodicity of motion artifactAbstractIntroductionRelated workAnalysis on periodicity of frame-rate conversionProposed methodMotion estimationMotion pruningSum of absolute difference (SAD)Motion directionBackground

Periodicity detection

Experimental resultsFrame-rate up-conversion testFrame-rate down-conversion testProcessing time test

ConclusionReferences

Documents

Frame-rate conversion detection based on periodicity of ...hklee.kaist.ac.kr/publications/Multimedia Tools( with Dae-Jin Jung in... · proposed a method to identify individual source