Social Media Investigations Using Shared Photosd.researchbib.com/f/6nAwtjZmNhpTEz.pdfSocial Media Investigations Using Shared Photos Flavio Bertini Rajesh Sharma ﬂ[email protected]

Social Media Investigations Using Shared Photos

Flavio Bertini Rajesh [email protected] [email protected]

Andrea Iannı Danilo [email protected] [email protected]

Department of Computer Science and EngineeringMura Anteo Zamboni, 7

University of Bologna, Italy

ABSTRACT

In recent years, social networks (SNs) have revo-lutionized many aspects of society. The wide vari-ety of available platforms meets almost every kindof users’ needs: socialization, professional connec-tions and image sharing, to name a few. Smart-phones have brought vital change in user’s be-haviour towards sharing of multimedia content ononline SNs. One of the noticeable behaviour is tak-ing pictures using smartphone’s camera and shar-ing them with contacts through online social plat-forms. On the flip side, SNs have contributed to thegrowth of the cyber crime and one of the main as-pect of social networks security is to identify fakeprofiles. In this paper, we present a method to ver-ify and resolve user profiles in SNs using the im-ages posted on them, assuming taken through user’ssmartphone. Creating a sort of fingerprint of theuser profile, the method is capable enough to re-solve profiles in spite of the fact that the imagesget downgraded during the uploading/downloadingprocess. Also, the method can compare differ-ent images belonging to different SNs without us-ing the original images. To evaluate our method,we use three online SNs and five different smart-phones. The results using real dataset, show pro-file verification can provide results of 96.48% andresolution of 99.49% on an average values, whichshows the effectiveness of our approach.

KEYWORDS

Social Network Security, Profile Verification, Pro-file Resolution, User Profile Fingerprint, PatternNoise.

1 INTRODUCTION

In the last decade, various types of social plat-forms, have been introduced on web. Thesevarious networks cater the specific needs ofthe users. For example, LinkedIn for profes-sional networks, Facebook for more social in-teractions, Instagram for photo sharing, What-sApp for instant messaging on mobiles to namea few. Smartphones have played an importantrole in users’ behaviour with respect to post-ing and sharing of multimedia content on so-cial platforms [18].A recent article has revealed that smartphonesare the usual gateways for social media, espe-cially for teenagers [22]. If SNs provide op-portunities to socialize and share interests (forexample, across political, economic and geo-graphic borders), it is also true that they offer amedium for fraudulent activities.This paper, address the issue of fake profilesproblem which is an important problem in so-cial networks security. Thus, our solution canalso be useful to detect and prevent cyber crimeon SN. In particular, we provide solution foruser profile verification and resolution. Byuser profile verification we mean correctly ver-ifying the person behind the user profile. Inother words, the problem tries to verify if theprofile indeed belongs to the person who isclaiming to be in the profile. User profiles res-olution addresses the task to verify if two dif-ferent user profiles (for example, nicknames)belong to the same user. This can be performedwithin a same network (intralayer) or acrossdifferent social networks (interlayer). Consider

ISBN: 978-1-941968-31-4 ©2016 SDIWC 47

a multilayer network consisting of Facebook,Google+ and WhatsApp (Figure 1). The in-tralayer profile resolution targets to match userprofiles on the same layer as a user can cre-ate two profiles and out of two, one could bea fake profile. For example, in Figure 1 (Face-book layer, dotted line), one could be interestedin determining if the left side girl (brown hair)is same as the right side girl (blonde hair). Theinterlayer profile resolution aims at matchinguser profiles on different layers. For exam-ple, in Figure 1 (Facebook and Google+ layers,dash-dot-dash line), one could be interested inunderstanding if the boy in Facebook layer issame as the boy in Google+ layer.

Figure 1. User profiles resolution problem: interlayer(dotted line) and intralayer (dash-dot-dash line).

The problems of user profile verification andresolution are important for two reasons.Firstly, they correspond to one of many kindsof missing data problem [23], [24]. Secondly,and most importantly are related to a signifi-cant increase in the number of fake profiles.For example, in a recent annual report [10],Facebook estimated that 8.35% of its total ac-tive profiles can be categorized as fake profiles.The company classifies such fake profiles intothree groups:

• 75 million (6.1%) of accounts that usersmaintains in addition to their principal ac-count;

• 18 million (1.45%) of business, organiza-tion or pet accounts;

• 10 million (0.8%) of undesirable profileaccounts that have been set up to annoyother Facebook users.

Thus, the problems of user profile verificationand resolution are more meaningful for resolv-ing different profiles in digital forensic andcriminal investigations.

1.1 Background

Most of the multimedia content posted on so-cial networks is done through smartphones[18]. This intuitively suggests researchers tofingerprint smartphones by exploiting the sen-sor’s imperfections to identify users (in otherwords smartphones). The intuition behind us-ing smartphone’s fingerprint for performingdigital forensic activities is driven by two rea-sons. Firstly, due to hard bound phone con-tracts, smartphones are considered more per-sonal than other digital devices for example,laptops or desktops. Secondly, and more im-portantly, it has been shown that the sensors(for example, GPS, microphone-speaker, cam-era etc.) in smartphones can be used for creat-ing a fingerprint [3], [8] and [9].We propose a solution for detecting fake userprofiles by creating unique fingerprint of asmartphone, assuming that the pictures beingcaptured and published on SNs are taken bythe smartphone itself. The approach is par-tially inspired by a social trend of image shar-ing, which is one of the main activities on so-cial network platforms [21]. In practice, weuse defects in the images to create a uniquefingerprint of the smartphones. We then usethis fingerprint to match different user profiles.Several methods have been proposed to detectdefects due to the manufacturing process of thecamera [26] and it is possible to distinguish be-tween defects introduced by the hardware com-ponents of the device (steps a, b and c in Figure2) to those introduced by the software compo-nents (steps d and e in Figure 2).Our proposed method is based on hardwareimperfections as hardware based techniques

ISBN: 978-1-941968-31-4 ©2016 SDIWC 48

provide better results [16]. Thus, we pro-pose a method based on hardware imperfec-tion to extract the noise component (character-istic noise) systematically present in each im-age and to create the unique fingerprint of thesmartphone. Moreover, we are able to performthis kind of analysis on the original as welldownloaded images from social networks (seeFigure 2), without conducting any hacking ac-tivity on the smartphone’s hardware.

Figure 2. Consumer level digital cameras pipeline. Step(a): the lens collect the light from the scene; step (b):several hardware filters improve the acquisition; step (c):the sensor converts the light into electrical signal; step(d): the processing produces a visually pleasing image;step (e): the digital image is stored.

Our aim is to identify the source and not themodel being used to take a certain picture andthus our problem, source camera identification,should not be confused with the more generaldigital camera model identification. The ideabehind finding the source is that the uniqueidentification of the camera basically maps tothe smartphones. Thus, fingerprint of a smart-phone transitively maps to the users, who areregistered owner of that device. As a result,

if two profiles’ images map to the same smart-phone camera, with high probability we can as-sume that two profiles belong to the same userand thus, helping in user verification and reso-lution.

1.2 Problem Statement

We explore the possibility of uniquely match-ing two different user profiles (on SNs) basedon the pictures being posted on them, assum-ing taken by the same smartphone’s camera. Asmartphone can be uniquely identified by ex-ploiting the characteristic noise present in theimages taken by the smartphone.Let UP = {U1, U2, . . . Un} defines the setof user profiles across social platforms. LetIUi

= {Ii1, Ii2, . . . Iim} represents the set ofimages posted by the user Ui ∈ UP on his pro-file. We try to verify if any two specific userprofiles Ux, Uy ∈ UP actually belongs to thesame user. To resolve this, our approach usesthe set of images belonging to the user profilesunder investigation. That is IUx and IUy .Let f represents the function that extracts thecharacteristic noise from set of images Iij ∈Ui. To create a characteristic noise for a partic-ular user profile Ux, the function f averages thecharacteristic noise of all the images belongingto a user profile.

1

m

m∑j=1

f(Iij) = CN(IUi) ≡ CN(Ui) (1)

We then use the outcome of f that can be con-sidered the characteristic noise of the user pro-file CN(Ui) for deciding if two user profilesbelong to the same user. We define the functiong , which takes as input CN(Ux) and CN(Uy)to decide if Ux and Uy belong to the same user.

g(CN(Ux), CN(Uy)) =

1 if Ux and Uy

belong to thesame user.

0 otherwise(2)

ISBN: 978-1-941968-31-4 ©2016 SDIWC 49

1.3 Contribution

We propose a method to exploit the smart-phone camera’s fingerprint to solve the fakeprofile problem. Our method, extracts finger-print from images based on the characteristicnoise present in them. The approach also over-comes the compression techniques used by var-ious SNs.In our analysis we have compared the origi-nal and unprocessed pictures taken by smart-phones with the pictures processed by SNs.The method is capable enough to resolute im-ages from different SNs. Also, the methoddoes not require original images for verifica-tion and resolute as in some cases due to rea-sons like privacy original images might not beavailable.The rest of the paper is organized as follows.In Section 2, we review literature work re-lated to smartphone fingerprinting techniquesand forensic investigations on SNs. In Section3, we describe our methodology. The resultsof our evaluation along with the analysis of ourresults is discussed in Section 4. We concludethe article with future directions.

2 RELATED WORKS

In this section, we describe various literaturesfrom three different domains, at the intersec-tion of which our work lies. Firstly, in Sec-tion 2.1, we explain techniques to identify fin-gerprints of smartphone devices using variousbuilt in sensors. Then, in Section 2.2, we dis-cuss various source camera identification ap-proaches proposed in literature. In the end, inSection 2.3, we present methods to extract use-ful information from user profiles on SNs forinvestigation activities.

2.1 Fingerprinting the Smartphones

Various techniques for creating smartphonesfingerprints have been proposed recently.These methods are based on built in sen-sors present in smartphones. For example, atechnique by using microphones and speakerspresent in smartphones to create unique fin-gerprint is described in [8]. To track users

through mobile phones, using smartphone ac-celerometers a method is proposed in [9]. Ahybrid approach is proposed in [3], where au-thors exploit both i) speakerphone-microphoneand ii) accelerometer for creating a uniquelyfingerprint for the smartphones. We exploitthis phenomena to verify users profile and reso-lute multiple profiles in SNs, which is the mainfocus of our study. The smartphone cameracould be considered a built in sensor and inthe next section, we present works related tosource camera identification. This is impor-tant as we are using images captured by smart-phones’ camera for our investigations.

2.2 Fingerprinting Source Camera Identi-fication

Researchers have also proposed various tech-niques for source camera identification basedon hardware and software defects, which getsintroduced during the manufacturing process.A technique based on chromatic aberration ispresented in [25] while in [19] a method to rec-ognize a counterfeit image using the filters ispresented. Mainly it is the Colour Filter Array(CFA) which, combined with a monochromesensor, allows to capture multiple colors with agrid of pixels. However, methods based on thefootprints left by the sensor are more suitablefor the source camera identification problem.In [16], authors showed that Photo-ResponseNon-Uniformity (PRNU) is a unique featureof sensor which is capable enough to distin-guish two cameras. However, this method onlyworks with unscaled photos. This limitationwas resolved by proposing a method whichcan work with images of different sizes byresearchers in [11]. In another work in [5],the authors demonstrate that the robustness ofthe method based on PRNU can overcome thechanges made to the images by the various SNsthat commonly causes a loss of effectiveness.Compared to all these approaches of sourcecamera identification, we are using the sourcecamera of the smartphones using the PRNU-based method with the aim of resolving userprofiles on SNs.

ISBN: 978-1-941968-31-4 ©2016 SDIWC 50

2.3 User Profiles Investigation on SNs

SNs often include strict regulations in their ser-vices: “You will not provide any false personalinformation on Facebook” or “You will notcreate more than one personal account” (from“Statement of Rights and Responsibilities” ofFacebook Legal Terms). However, the largeamount of data uploaded on SNs often does notfollow these laws strictly [28].In [1], authors analysed the user activities us-ing the logs found in the smartphone deviceswhich can be used to match user profiles. How-ever, unlike Blackberry, not all the devicesretain this information. In comparison, ourmethod is device independent. An interestingsolution has been proposed in [13] by exploit-ing the multimedia contents on SNs. The au-thors combine user ID and their tags to identifyusers across the social tagging system.In [27] and [15] authors analysed users’s in-formation from different SNs to identify andmatch two different user profiles. However,the method fails in the presence of false in-formation which usually happens in case offake profiles. Works like [20] and [2] addressthe matching of user profiles by providing aweight-based framework, while in [14] the au-thors combine content and network attributeswith profile attributes to improve a traditionalidentity search algorithm. A machine learningtechniques is proposed in [17] for solving thematching user profiles problem across multi-ple SNs. They use 27 features grouped intothree types: name based features, general userinfo based features and SN topological basedfeatures. In [6] a weighted ontology-baseduser profile resolution technique is proposedfor profile matching. Generally, these tech-niques assume that the two user profiles underinvestigation intentionally share attributes andfeatures. In comparison to these methods, ourapproach exploits smartphone’s camera to ver-ify and match user profiles, in order to find outfake profiles.

3 METHODOLOGY

In this section, we firstly present a brief back-ground in image processing. This is important

as it will help in understanding the reasons be-hind the selection of a particular method to ex-tract the noise. Next, we describe the proce-dure followed to extract the characteristic noisefrom an image and the process to verify the im-age source. Finally, we present the evaluationresults of our approach obtained using variousSNs and smartphones.Images captured by cameras1 has two com-ponents, namely signal and noise component.Technically signal represents the light whichhits the camera sensor. In simple words, sig-nal represents image being captured. However,there is always a portion of noise present in theimages, which is unavoidable.The noise component consists of two compo-nents. One of the component is called shotnoise (or photonic noise). This component isintroduced due to external factors such as tem-perature, brightness or humidity. Pattern noise,which is the second component, can be consid-ered a system and regular one. The word sys-tematic signify that it is present in each imageand regular means, in all the images taken fromthe same source, it is present almost at the samelocation.

3.1 Pattern Noise Extraction

In this section, we define the function f usefulto extract the smartphone camera’s fingerprintfrom a set of images. We exploit a PRNU-based method [16] to extract the pattern noise,systematically introduced by the sensor (step cin Figure 2), as it is more suitable to solve theproblem of source camera verification.The pattern noise can be considered as aver-age of large number of images. Denoising al-gorithms [4] can be employed to clean the rep-resentative component. By this way only thenoise component is left in the image. Let I rep-resents the original image,N the residual noiseand d the denoising function, then formally Ncan be represented as:

N = d(I) (3)

1This also apply to the images being taken by smart-phones.

ISBN: 978-1-941968-31-4 ©2016 SDIWC 51

The fingerprint FPi of the camera i (that is thepattern noise) can be approximated as the av-erage residual noise of n images of the camerai [16]:

FPi =1

n

n∑j=1

Nj (4)

The choice of the denoising algorithm can af-fect the pattern noise computation due to thepresence of high-frequency details which gen-erally belong to the signal component of theimage. By increasing the number of samples,the errors can be reduced. However, difficultyin acquiring new samples makes the task im-possible in several cases. We use the BlockMatching 3D (d ≡ dBM3D) denoising algo-rithm [7], which is sufficiently proficient in dis-tinguishing between high frequencies of noisefrom the high frequencies of the images [12].

3.2 Source Verification

Let Si identifies the whole set of images takenby the source i, then:

• Pi ⊂ Si represents the subset of imagesused to generate FPi according to (4). Pi

represents the training set of our method;

• Ci ⊂ Si identifies the subset of images tobe classified. Ci represents the test set ofthe camera i;

• Pi ∪ Ci = Si;

• Pi ∩ Ci = ∅

• V =⋃ Ci represents the whole test set of

m images to be correlated with each FPi.

The goal is to determine for each image I ∈ Vif it belongs to the source i . In practice, we aredefining the function g according to (2) that al-lows to decide if two user profiles Ux and Uy

belong to the same user exploiting theirs im-ages. In the first step, according to (3), weextract the residual noise Nk from each im-age Ik ∈ V of the unknown source. Thenwe apply the standard normalized correlation

corr(Nk, FPi) between Nk and FPi as donein [16]:

(Nk −Nk)(FPi − FPi)

‖(Nk −Nk)‖‖(FPi − FPi)‖(5)

This similarity score is computed between eachresidual noise Nk of the images Ik ∈ V , thatbelong to the unknown source, and FPi of eachsource i. Let µi be the mean correlation valuefor the source i, and σi its standard deviation:

µi =1

m

m∑k=1

corr(Nk, FPi) (6)

σi =

√√√√ 1

m

m∑k=1

(corr(Nk, FPi)− µi)2 (7)

(a)

(b)

Figure 3. Two examples of the correlation values dis-tribution. In (a) the threshold value is computed for theiPhone 4S (1), while in (b) for the iPhone 5 (3) (see Sec-tion 4).

ISBN: 978-1-941968-31-4 ©2016 SDIWC 52

Then in the second step we can define a thresh-old Ti for each source i as:

Ti = µi + σi (8)

All the images in V with a correlation valuethat exceed the threshold Ti can be identified astaken from the camera i. In other words, this isthe function g that determines if the owner ofthe set Pi is the same user having the set ofimages identified in V . Also, each time a newPi is defined, a new threshold value must becomputed. We define the threshold T as themean correlation value plus a standard devia-tion thanks to the distribution of the correlationvalues, as shown in Figure 3.

3.3 Experimental Setting

We performed our evaluations on five smart-phones from two different brands. Table 1summarizes characteristics of these devices.We collected 200 high-resolution photographsfor each of these phones, taken under differ-ent conditions. As proposed in [5], we resizeimages to normalize the dimensions and, assuggested in [16], the crop is used to select a512x512 central window small enough to cutout any lens aberrations.

Table 1. Features of the smartphones chosen.

ID Brand Model Resolution

1 Apple iPhone 4s 3264x2448

2 Apple iPhone 4s 3264x2448

3 Apple iPhone 5 3264x2448

4 Samsung Galaxy S4 4128x3096

5 Samsung Galaxy S4 4128x3096

Table 2. Characteristics of the SNs.

Service Icon Resolution Quality

Facebook 960x720 medium

Google+ 2048x1536 high

WhatsApp 800x600 low

For our analysis, we select Facebook, Google+and WhatsApp SNs. Each of these SNs hasdifferent compression characteristics of the im-age, summarized in Table 2. In comparison toFacebook and Google+, WhatsApp is an appli-cation specifically initially conceived for mo-bile users only.The validation phase can be categorized in thefollowing three classes:Test 1: Original-By-Original. Before apply-ing the method to downloaded images, we needto verify that it works on original ones. Thus,the goal of this test is to validate the proposedmethod by applying it to the original imagestaken by the smartphones. In practice, we tryto verify the source camera of a set of takenimages, which have not been uploaded on anySNs.The test is carried out systematically varyingthe cardinality of each Pi, in order to establishthe minimum number of images useful to de-fine FPi. The threshold Ti is computed eachtime using the FPi extracted from Pi, and isused to classify the images in V .Test 2: Social-By-Social. In this test, thegoal is to check the robustness of our approachfor images whose resolution gets affected dueto uploading and downloading process on theSNs. This is important to verify smartphoneswhich has been used for capturing and up-loading the images. From an investigative andforensic point of view, this test allows the map-ping between a device and the images pub-lished on a SN. In general, it’s possible tomatch a user profile to a physical device andtransitively to the user who is registered ownerof the smartphone.

The classification procedure is the same of theprevious test, however in this case, both im-age sets Pi and V are previously uploaded anddownloaded on every SNs chosen, as shown inFigure 4.Test 3: Cross-Social. The aim of this test is todemonstrate that using our approach it is pos-sible to verify two different user profiles, fromtwo or same SN, using the images posted onthem (Figure 5). Also, this test can be used toverify the source of images posted on a SN, di-

ISBN: 978-1-941968-31-4 ©2016 SDIWC 53

Figure 4. Social-by-social test. Steps (a) and (b):the 5 Pi subsets, related to the i=5 sources, havebeen uploaded and downloaded on each SNs (Facebook,Google+ and WhatsApp). Step (c): the set V has beencomposed using the sets Ci of images downloaded fromthe user profiles on SNs. Step (d): the 5 different finger-prints have been extracted using the downloaded imagesPi, to pair each source to the right user profile.

rectly comparing the original images and theshared ones.In the former case the significant outcome isthe ability to match two different user profilesbelonging to different SNs without using thesmartphone. This is particularly useful in casethe smartphone is not available. Thus the ver-ification for a subject can be done using an-other SN account. Also in this case the setsPi and V are previously uploaded and down-loaded on each SNs. However, we use the sub-set Pi, downloaded from a SN, to extract theFPi for each source i so as to classify the im-ages in V that have been downloaded from an-other SN. For example, a Facebook profile islinked to a Goolge+ profile exploiting their re-spective shared photos, as shown in Figure 5.The latter case is also useful for performing an-

Figure 5. Cross-social test: Facebook–Google+ case.Steps (a) the 5 Pi subsets, related to the i=5 sources,have been downloaded from Facebook and FPi is de-fined. Step (b): the set V has been composed using thesets Ci of images downloaded from Google+. Step (c):the 5 different fingerprints have been used to classify theimages in V , to match each Facebook user profile to theright Google+ user profile.

other type of investigative activity. That is toverify the source (smartphone) of the publishedimages. This is particularly useful in matching(fake) user profile with a smartphone and tran-sitively with a user. In this case, the V repre-sents the set of uploaded and downloaded im-ages on and from each SNs, and Pi as set ofposted images. We again use the subsets Pi toextract the FPi for each source i so as to clas-sify the original images in V .For this purpose we perform all the possiblecombinations exploiting the SNs chosen, asshown in Figure 6.

4 EVALUATION

In Section 3.3, we have described three sce-narios in order to evaluate the capabilities ofour method. For each of them, namely i)Original-By-Original, ii) Social-By-Social andiii) Cross-Social, we performed a specific test

ISBN: 978-1-941968-31-4 ©2016 SDIWC 54

Figure 6. Cross-social test. Each arrow from a pointA to a point B means that we use the subset Pi of Ato extract FPi and classify the images in V of B, forexample the arrow (b2) means that we use the subset Pi

downloaded from Facebook to classify the image in Vdownloaded from Google+. We perform all the twelvecombinations.

whose results are presented in this section. Inparticular, we use two well known statisticalindexes to assess the classification process, thatis sensitivity (SEN) and specificity (SPE). Inour specific context, the sensitivity identifiesthe capability of the method to rightly recog-nize the source of the given images, while thespecificity is the capability to reject the imagesthat does not belong to the source under evalu-ation.Out of all the images Ci to be classified as be-longing to the source i, let C+i be the subset ofimages that our algorithm has successfully as-signed. We define sensitivity as:

SEN =|C+i ||Ci|

(9)

Out of all the images Ci = V\Ci to be classi-fied as not belonging to the source i, let C−i thesubset of images that our algorithm has suc-cessfully rejected. We define specificity as:

SPE =|C−i ||Ci|

(10)

4.1 Test 1: Original-By-Original

We perform this first test to determine the min-imum cardinality of each setPi, with which we

can correctly extract the pattern noise. We startwith a single image, and then the cardinality isincreased in the sequence of : 2, 3, 4, 5, 10, 15,20, 40, 60, 80, 100, 120 and 140, where eachcardinality defines a subtest Pi for each device.We fix the limit to 140 due to the resource con-straints. We perform two steps process to eval-uate the specific cardinality. In the first one,we extract the pattern noise and calculate thethreshold. While in the second one, we classifythe image and evaluate the SEN and the SPE.For each cardinality under evaluation, this pro-cedure is carried for all the devices.Figure 7 shows the computed values of meanbetween sensitivity and specificity values withan increase in cardinality for each smartphone.The obtained results suggest to fix the cardi-nality of the set Pi to 100 images. The moti-vation to select Pi = 100 is based on follow-ing two reasons: (i) starting from this value,the mean between the sensitivity and the speci-ficity index for each source has a stability ofover 97%; (ii) a larger cardinality value to helpthe BM3D function in discriminating the highfrequencies. By using 100 images for eachsmartphone to define the fingerprint FPi, theset V is composed of 500 images.

Figure 7. The graph shows the mean between sensitivityand specificity for each smartphone obtained by chang-ing the cardinality of the test set Pi.

The cardinality is fixed to 100 images and wecompute the SEN and SPE indexes for eachsmartphone. We represent the sensitivity us-ing the confusion matrix (see Table 3), whichis a standard representation for statistical clas-

ISBN: 978-1-941968-31-4 ©2016 SDIWC 55

sifier to highlight the percentage error. Eachcolumn and row identifies the device ID in Ta-ble 1. The main diagonal shows the percent-age of images correctly assigned, while the up-per and lower triangles of the matrix show theincorrect assignments. Along with matrix, weaugment two columns to show SEN and SPEvalues for each smartphone. As shown in theTable, the method reaches 100% sensitivity foreach device. In other words, it has verified thesource in the original–by–original test. Also,the result returns the confidence value of atleast 95% for specificity. It means that themethod correctly discards at least the 95% ofimages that belong to a different source com-pared to the right one.

Table 3. The original-by-original confusion matrix.

ID 1 2 3 4 5 SEN SPE

1 100.00 0.00 0.00 0.00 0.00 100.00 95.47

2 0.00 100.00 0.00 0.00 0.00 100.00 96.62

3 0.00 0.00 100.00 0.00 0.00 100.00 95.47

4 0.00 0.00 0.00 100.00 0.00 100.00 97.80

5 0.00 0.00 0.00 0.00 100.00 100.00 95.01

4.2 Test 2: Social-By-Social

In the second test we classify the images down-loaded from the three different SNs using thethreshold computed after the downloading op-eration. The results we obtain for Facebook,Google+ and Whatsapp are shown in Tables4, 5 and 6 respectively. As in the original-by-original case, we present the confusion matrixalong with SEN and SPE indexes values.Compared to the other two SNs, Google+ doesleast compression of the images (see Table 2).This characteristic justifies the highest sensi-tivity and specificity average values obtainedin case of Google+ for each smartphone, thatis 100% and 97.56% respectively. Facebookcompresses the images more than Google+.However, the method has obtained good sen-sitivity and specificity average values, that is96.92% and 91.58% respectively. WhatsAppis a mobile application mainly conceived forsmartphones. For this reason WhatsApp re-quires a higher compression level than Face-

book and Google+, whose use is mainly donewith large screen computers with medium/highquality definition. This higher compressionlevel significantly alters the information con-tent. As a result, WhatsApp returns worst sen-sitivity and specificity average results, that is92.52% and 90.84% respectively.

Table 4. The Facebook confusion matrix.


1 100.00 0.00 0.00 0.00 0.00 100.00 95.01

2 0.00 100.00 0.00 0.00 0.00 100.00 93.46

3 0.00 1.33 98.67 0.00 0.00 98.67 93.88

4 4.23 2.82 2.82 85.92 4.23 85.92 90.91

5 0.00 0.00 0.00 0.00 100.00 100.00 91.32

Table 5. The Google+ confusion matrix.


1 100.00 0.00 0.00 0.00 0.00 100.00 99.01

2 0.00 100.00 0.00 0.00 0.00 100.00 99.50

3 0.00 0.00 100.00 0.00 0.00 100.00 97.32

4 0.00 0.00 0.00 100.00 0.00 100.00 98.52

5 0.00 0.00 0.00 0.00 100.00 100.00 93.46

Table 6. The WhatsApp confusion matrix.


1 85.33 2.67 4.00 8.00 0.00 85.33 91.53

2 5.19 94.81 0.00 0.00 0.00 94.81 93.62

3 0.00 3.23 96.77 0.00 0.00 96.77 90.87

4 4.76 3.17 6.35 85.71 0.00 85.71 89.47

5 0.00 0.00 0.00 0.00 100.00 100.00 88.69

Tables 4, 5 and 6 provide detailed informationabout sensitivity and specificity for social-by-social test. Also, we plot in the same 3D graphthe SEN and SPE values of the original imageswith the downloaded ones. Figure 8 shows thecomparison of original-by-original with social-by-social for each smartphone. On an average,the method is able to verify which device hastaken and uploaded the images on the SNs with96.48% sensitivity and 93.77% specificity.

4.3 Test 3: Cross-Social

The last test is the most interesting one. Inthis case the pattern noise, characteristic of the

ISBN: 978-1-941968-31-4 ©2016 SDIWC 56

Figure 8. The 3D graphs present the sensitivity (a) andspecificity (b) for comparison of classification results ofthe downloaded images with the original ones.

source, has been extracted from the subsets Pi

not belonging to the same category of images(original or published on SN) of the set V .Following the schema defined in Figure 6 weperform all the tests combining all possibleways using the images downloaded from theSNs and the original ones. Figures 9, 10, 11and 12 present the SEN and SPE results ob-tained from each combination. In each triangu-lar histogram the icon in the middle identifiesthe selected category for the subset Pi. Whilethe three icons on the sides represent the cat-egories that contributed to form the set V ofimages to be classified.As discussed above, intuitively, if FPi and Tiare computed using higher quality images the

method obtains the best results. That is the caseof original images and those downloaded fromGoogle+, as shown in Figure 9 and Figure 11.Although Facebook compresses the imagesstricter than Google+, the results are still good.The sensitivity results lie on high values, how-ever, the average specificity value decreases(see Figure 10).WhatsApp has the highest compression levelamong all the three SNs however, our methodhas correctly resoluted WhatsApp user profileswith the other two SNs profiles, with an aver-age sensitivity of 98.78%, as shown in Figure12.The overall sensitivity values are really closeto 100%, except in the Facebook – WhatsAppcase. In this particular combination, the sen-sitivity has presented deterioration, due to themedium/low image quality level of both SNs.While the average specificity value is over92% in all categories (original 95.80%, Face-book 92.39%, Google+ 95.09% and WhatsApp94.04%). Based on the results of this test, wecan conclude that the proposed method has asuccess rate over the 90% for matching profilesacross SNs.

5 CONCLUSIONS AND FUTUREWORKS

Fake profile identification is an important re-search problem in the domain of cyber security.This is particularly meaningful for crime pre-vention and detection activities coupled withsmartphone device [1]. In this paper, we havepresented a method for matching operations ofSNs’ user profiles using the images shared onthese platforms, in order to address the fakeprofiles problem. We chose five smartphonesof two different brands with two pairs of iden-tical models and three SNs with different com-pression characteristics of the image. Wehave shown that it is possible to verify thedevice which has taken and published imageson a specific user profile among Facebook,Google+ or WhatsApp. We used only imagesfrom the SNs to extract the pattern noise andverify the source camera. Finally, we demon-strate how distinct user profiles belonging to

ISBN: 978-1-941968-31-4 ©2016 SDIWC 57

Figure 9. The threshold is computed using the sub-set Pi composed by the original images. It is used toclassify the images downloaded from WhatsApp (topleft), Facebook (top right) and Google+ (bottom), re-spectively (a2), (a1) and (a3) in Figure 6.

Figure 10. The threshold is computed using the subsetPi downloaded from Facebook. It is used to classify theimages downloaded from Google+ (top left) and What-sApp (bottom) and the original images (top right), re-spectively (b2), (b3) and (b1) in Figure 6.

different SNs can be mapped using the imagesfrom the SN. In practice, we are able to createa fingerprint of the user profile.The method proposed in this article may failwith increase in the number of devices, userprofiles and different characteristics of SNs. Toovercome this we plan to explore the definitionof a more general purpose methodology that al-lows to decide if two user profiles Ux and Uy

Figure 11. The threshold is computed using the sub-set Pi downloaded from Google+. It is used to classifythe images downloaded from WhatsApp (top left) andFacebook (top right) and the original images (bottom),respectively (c3), (c2) and (c1) in Figure 6.

Figure 12. The threshold is computed using the subsetPi downloaded from WhatsApp. It is used to classify theimages downloaded from Google+ (top left) and Face-book (top right) and the original images (bottom), re-spectively (d3), (d2) and (d1) in Figure 6.

belong to the same user. As a future work, weplan to include a clustering algorithm in our ex-isting approach. In this way, we intend to solveseveral other problems, for example images ofthe same user taken from different sources andclassification of images with different resolu-tion taken from the same source (front and rearcamera). Future works also include testing ourapproach on a large set of heterogenous de-

ISBN: 978-1-941968-31-4 ©2016 SDIWC 58

vices. As video sharing is also a common be-havior on SNs, we would also like to test ourmethod on frames extracted from videos.

6 ACKNOWLEDGMENTS

This work has been supported in part by theItalian Ministry of Education, Universities andResearch IMPACT project (RBFR107725) andOPLON project (SCN 00176).

REFERENCES[1] N. Al Mutawa, I. Baggili, and A. Marrington.

Forensic analysis of social networking applica-tions on mobile devices. Digital Investigation,9:S24–S33, 2012.

[2] S. Bartunov, A. Korshunov, S.-T. Park, W. Ryu,and H. Lee. Joint link-attribute user identity res-olution in online social networks. In Proceedingsof the 6th International Conference on KnowledgeDiscovery and Data Mining, Workshop on SocialNetwork Mining and Analysis. ACM, 2012.

[3] H. Bojinov, Y. Michalevsky, G. Nakibly, andD. Boneh. Mobile device identification via sensorfingerprinting. CoRR, abs/1408.1416, 2014.

[4] A. Buades, B. Coll, and J. M. A Review ofImage Denoising Algorithms, with a New One.Multiscale Modeling & Simulation, 4(2):490–530,2005.

[5] A. Castiglione, G. Cattaneo, M. Cembalo, andU. F. Petrillo. Experimentations with source cam-era identification and online social networks. Jour-nal of Ambient Intelligence and Humanized Com-puting, 4(2):265–274, 2013.

[6] K. Cortis, S. Scerri, I. Rivera, and S. Handschuh.An ontology-based technique for online profileresolution. In Social Informatics, pages 284–298.Springer, 2013.

[7] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazar-ian. Image denoising with block-matching and3d filtering. In Electronic Imaging 2006, pages606414–606414. International Society for Opticsand Photonics, 2006.

[8] A. Das, N. Borisov, and M. Caesar. Do you hearwhat i hear?: Fingerprinting smart devices throughembedded acoustic components. In Proceedings ofthe 2014 ACM SIGSAC Conference on Computerand Communications Security, CCS ’14, pages441–452, New York, NY, USA, 2014. ACM.

[9] S. Dey, N. Roy, W. Xu, R. R. Choudhury, andS. Nelakuditi. Accelprint: Imperfections of ac-celerometers make smartphones trackable. In 21stAnnual Network and Distributed System SecuritySymposium, NDSS 2014, San Diego, California,USA, February 23-26, 2013, 2014.

[10] Facebook Inc. Form 10-K Annual Report. Tech-nical Report 001-35551, Securities and ExchangeCommission, December 2013.

[11] M. Goljan and J. Fridrich. Camera identificationfrom cropped and scaled images. In ElectronicImaging 2008, pages 68190E–1–68190E–13. In-ternational Society for Optics and Photonics, 2008.

[12] A. Iannı. Creazione e analisi del pattern noise diuna fotocamera digitale. Bachelor’s degree thesis,University of Palermo, 2013.

[13] T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff.Identifying users across social tagging systems. InICWSM, pages 522–525, 2011.

[14] P. Jain, P. Kumaraguru, and A. Joshi. @ i seek’fb.me’: identifying users across multiple online so-cial networks. In Proceedings of the 22nd interna-tional conference on World Wide Web companion,pages 1259–1268. International World Wide WebConferences Steering Committee, 2013.

[15] X. Liang, X. Li, K. Zhang, R. Lu, X. Lin, andX. Shen. Fully anonymous profile matching in mo-bile social networks. IEEE J. Select. Areas Com-mun., vol, 31:641–655, 2013.

[16] J. Lukas, J. Fridrich, and M. Goljan. Digital cam-era identification from sensor pattern noise. Infor-mation Forensics and Security, IEEE Transactionson, 1(2):205–214, 2006.

[17] O. Peled, M. Fire, L. Rokach, and Y. Elovici. En-tity matching in online social networks. In SocialComputing (SocialCom), 2013 International Con-ference on, pages 339–344. IEEE, 2013.

[18] R. Poisel and S. Tjoa. Forensics investigations ofmultimedia data: A review of the state-of-the-art.In IT Security Incident Management and IT Foren-sics (IMF), 2011 Sixth International Conferenceon, pages 48–61. IEEE, 2011.

[19] A. C. Popescu and H. Farid. Exposing digi-tal forgeries in color filter array interpolated im-ages. Signal Processing, IEEE Transactions on,53(10):3948–3959, 2005.

ISBN: 978-1-941968-31-4 ©2016 SDIWC 59

[20] E. Raad, R. Chbeir, and A. Dipanda. User profilematching in social networks. In Network-Based In-formation Systems (NBiS), 2010 13th InternationalConference on, pages 297–304. IEEE, 2010.

[21] P. Ross. Photos are still king on face-book. http://www.socialbakers.com/blog/2149-photos-are-still-king-on-facebook, 2014.

[22] L. Santhanam. Teens often turn tosmartphones as gateway to social media.http://www.pbs.org/newshour/rundown/teens-often-turn-smartphones-gateway-social-media/,2015.

[23] R. Sharma, M. Magnani, and D. Montesi. Missingdata in multiplex networks: a preliminary study.Signal-Image Technology and Internet-Based Sys-tems (SITIS), 2014 Tenth International Conferenceon, pages 401–407, 2014.

[24] R. Sharma, M. Magnani, and D. Montesi. In-vestigating the types and effects of missing datain multilayer networks. Proceedings of the 2015IEEE/ACM International Conference on Advancesin Social Networks Analysis and Mining 2015,pages 392–399, 2015.

[25] L. T. Van, S. Emmanuel, and M. S. Kankan-halli. Identifying source cell phone using chro-matic aberration. In Multimedia and Expo, 2007IEEE International Conference on, pages 883–886. IEEE, 2007.

[26] T. Van Lanh, K.-S. Chong, S. Emmanuel, andM. S. Kankanhalli. A survey on digital camera im-age forensic methods. In Multimedia and Expo,2007 IEEE International Conference on, pages16–19. IEEE, 2007.

[27] J. Vosecky, D. Hong, and V. Y. Shen. Useridentification across multiple social networks. InNetworked Digital Technologies, 2009. NDT’09.First International Conference on, pages 360–365.IEEE, 2009.

[28] M. L. Ybarra and K. J. Mitchell. How risky aresocial networking sites? a comparison of placesonline where youth sexual solicitation and harass-ment occurs. Pediatrics, 121(2):e350–e357, 2008.

ISBN: 978-1-941968-31-4 ©2016 SDIWC 60

Documents

Social Media Investigations Using Shared Photosd.researchbib.com/f/6nAwtjZmNhpTEz.pdfSocial Media Investigations Using Shared Photos Flavio Bertini Rajesh Sharma ﬂ[email protected]