23
OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale Ruian Duan, Ashish Bijlani, Meng Xu Taesoo Kim, Wenke Lee ACM CCS 2017 1

OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

OSSPolice - IdentifyingOpen-SourceLicenseViolationand1-daySecurityRiskatLargeScale

Ruian Duan,AshishBijlani,Meng XuTaesoo Kim,Wenke Lee

ACMCCS2017

1

Page 2: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Background

• OpenSourceSoftware(OSS)isgainingpopularity,e.g.GitHubreported20Musersand57Mrepos

• Mobileappmarketgrowsfastwithover2MappsonPlayStore

• DevelopersreuseOSSasisforlotsofbenefits

• Legalrisksandsecurityrisksarise

2

Page 3: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

RisksinOSSuse

• OSSlicenseshaveconstraints(e.g.GNUGPLrequiresderivativeworkstoopensource)

• 1-dayvulnerabilitiesinstaleOSSversionsareexploitedbyhackers

3

Fornow,GNUGPLisanenforceablecontract,saysUSfederaljudge!

Artifex SlapsPalmwithPDFReaderCopyrightSuit

Equifaxblamesopen-sourcesoftwareforitsrecord-breakingsecuritybreach

CommunityHealthSystemsBreachPossibleduetoHeartbleedVulnerability

Page 4: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Goal

• Designatool,OSSPolice,toanalyzeAndroidappsforopen-sourcelicenseviolationand1-daysecurityriskbydetectingreuseofOSSandtheirversionsatlargescale

• Requirements• AccuratedetectionforhundredsofthousandsofOSS• Accurateversionpinpointing• Efficientresourceusage• FastsearchtosupportvettingalargenumberofAndroidapps

4

Page 5: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Overviewandchallenges

• Featureselection• Sourcevsbinary:automaticallybuildingsourcecodeishard,duetodependencies,variousbuildconfigs etc.

• CompareAppagainstOSS• Fusedappbinaries:multipleOSScanbelinkedorcompiledintoasinglefile• Partialbuildsandinternalcodeclones:notallOSSfeaturesarebuiltintolibrariesandOSSreusesotherOSS

• IdentifyOSSversions• Cross-matchofuniqueversionfeatures:fusedappbinariesandinternalcodeclonescanconfusetheprovenanceofuniquefeatures

5

Page 6: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Sourcevsbinary

• C/C++OSSarebuiltintostrippednativesharedlibraries(sofiles)

• JavaOSSarebuiltintoobfuscateddalvik executables(dex files)

6

SourceCode SharedLibrary StrippedSharedLibraryFoo.c

voidfoo(){w=“hello”…}

.text.dynsym

.rodata.symtab

.debug_info

Bar.cstaticbar(){w=“world”}

.text.dynsym

.rodata

Sourcecode Dalvik Bytecode ObfuscatedDalvikBytecode.classedu/gatech/Foo

.methodbarconst-stringv1,"HelloWorld”invoke-virtual{v0,v1},println

packageedu.gatech;classFoo{bar(){println(“helloworld”)};}

.classa .methodaconst-stringv1,"HelloWorld”invoke-virtual{v0,v1},println

Page 7: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Featureselection

• C/C++OSSvssofiles• Stringliteral

• Clang-basedlexer forOSSand.rodata forlibraries• Exportedfunction

• Clang-basedparserforOSSand.dynsym forlibraries

• JavaOSSvsdex files• Stringconstant• Normalizedclass

• Capturesinteractionwithframework• Functioncentroid

• Capturesintra-proceduralcontrolflow 7

Page 8: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Fusedappbinaries

• AnappusesmultipleOSS• !"#∩%&&

!"#

• %&&∩!"#%&&

• Iterate𝑁 OSShas𝑂(𝑁) timecomplexity

• FlagallOSSbeingusedatthesametime• IndexOSSandtheirversions!

8

edu.gatech.example

MuPDFOpenCV

OpenSSL OkHttpMoPubLog4j

Page 9: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Flatindexingandmatching

• Indexing:MapsfeaturestoOSS• Matching:Lookupfeature->OSSmappingtoidentifyOSSreuse

• Flatindexingblowuptableto90Gafterindexing7KOSS• IndexingmultipleversionsofOSSfurtheraddstotheproblem• Given𝑁 OSSwith𝐹 featuresand𝑉 versions,𝑂(𝑁𝐹𝑉) spacecomplexity

9

feature1

feature2

feature3

MuPDF

OpenCVedu.gatech.example

Page 10: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Partialbuildsandinternalcodeclones

10

repodir file

LibJPEG LibPNG

MuPDF OpenCV

source thirdparty 3rdparty modules/core

test-dev.cpppdf-lex.c opengl.cpp test-io.cpp

pdf fitz testsrc

jpeglib.hpngtest.c

png.c…… … … …

Internalcodeclonesconfusesthird-partywithcoreandrequires

highmatchratiotofilter

Partialbuilds (e.g.examples,tests)causesthematchratio

tobelow

Page 11: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Hierarchicalindexingandmatching

• HierarchicalIndexing• Recordssourcehierarchytotrackinternalclones• UsesSimhash algorithmtogenerateidsfornon-leafnodesfordeduplication• Recorduniquefeaturesacrossversionsviaseparatelists

• HierarchicalMatching• NormScore (TF-IDFbased)topromoteuniquepartswhencomputingmatchingratioofanode• Allow partialbuildsbyskippingnodeswithlowratio• Drop internalcodeclonesbyskippingnodeslikelytobethird-party

11

feature1

feature2

feature3

file1

file2

file3

dir 1

dir 2

dir 3

dir 4

dir 5MuPDFOpenCVLibPNG

edu.gatech.example

Page 12: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Cross-matchofuniqueversionfeatures

12

1.5.0

1.6.0

1.2.46

foo_string

int bar_func()

MuPDFV 1.5

V1.6

LibPNGV 1.2.46

V1.6.0

edu.gatech.exampleMuPDF V1.6

LibPNG V1.2.46

Page 13: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Context-basedfiltering

• Leveragecontextinformationinhierarchicalindexingtable• UseNormScore toassigndifferentweightstofeatures

13

MuPDF V1.6

LibPNG V1.6.0

pdf.c

1.6.0

int pdf_read()

png.c

1.6.0

int png_read()

edu.gatech.exampleMuPDF V1.6

LibPNG V1.2.46

Page 14: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Evaluation

• FDroid Apps• 4,469apps,579withnativelibraries• 295C/C++OSSuses,7,055JavaOSSuses

• BAT:internalcodeclones• LibScout:partialbuilds(coderemoval)

14

55matches

020406080100

Precision (%) Recall (%) VersionPrecision(%)

C/C++OSSEvaluationResults

OSSPolice BAT

478matches

295matches

020406080100

Precision (%) Recall (%) VersionPrecision(%)

JavaOSSEvaluationResults

OSSPolice LibScout

Page 15: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Dataset

• C/C++OSSfromGitHub• 3,119popularreposand60,450OSSversions• 29%reposareGPL/AGPL• 11%reposarevulnerablewith5,611severeCVEs(𝐶𝑉𝑆𝑆 ≥ 4.0)

• JavaOSSfromMavenandJCenter• 4,777popularartifacts,77,308artifactversions• 2.3%artifactsareGPL/AGPL• 1.7%artifactsarevulnerablewith452severeCVEids

• AndroidAppsfromGooglePlay• 1.6Mapps,515,812withnativelibraries

15

Page 16: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

PerformanceandScalability

• Indexing• 60,450C/C++repos and 77,308Javarepos• Timecost is 1000svs.40sonaverage• Memorygrows sublinearly to 30GBand 9GB

• Matching• Sampled10,000GooglePlayapps• 80%ofdex andsofilesfinishwithin100sand200s

16

0 10 20 30 40 50 60 70 80Number of indexed repos(Thousands)

0.004.669.31

13.9718.6323.2827.9432.6037.25

Mem

ory

usag

e(G

B)

C/C++ Memory UsageJava Memory Usage

Page 17: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Popularlibraries

• Long-taileddistributionofOSSuses

17

050000100000150000

Top10detectedJavaOSSexcludingAndroidandGoogleOSS

#Usesaggregatedbytypes

Utils Network Social Image Codec

020,00040,00060,00080,000100,000

Top10detectedC/C++OSS #Usesaggregatedbytypes

Codec Game Font

Network Audio Viewer

Page 18: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

LegalRisks

• Morethan40KpotentialGPLviolators• MoreviolatorsusingC/C++thanJavaandencodinglibrariesdominate

18

0

500

1000

1500

iTextPDF JavaConnector

GreenDAOGenerator

Proguard Weka-Dev

Top5offendedJavaOSS

010000200003000040000

MuPDF FFmpeg PJSIP VLCandX264

BZRTP

Top5offendedC/C++OSS

#Usesaggregatedbytypes

Codec Utils Compiler

#Usesaggregatedbytypes

Codec Communication

Page 19: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

LegalRisks

• WhyviolatingGPL/AGPL?• MuPDF andiTextPDF areusedduetolackoffreealternatives

• OSSdevelopersresponses• MuPDF gotnewcustomersJ• FFmpeg andVideoLANhaveinterest,butFFmpeg cannotenforceJ• PJSIPnotinterestedduetoNDA,iText didnotreplyL

• AwarenessofOSSlicensingterms• NoneoftheappdevelopersprovidedsourcecodeyetL

19

Page 20: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

SecurityRisks

• Morethan100KappsusingvulnerableOSSversions• MoreappsusingvulnerableC/C++OSSthanJava

20

050001000015000200002500030000350004000045000

Top6C/C++and4JavavulnerableOSS

C/C++ Java

1,244LibPNG and4,919OpenSSLusesarenotdetectedbyAppSecurityImprovementProgram(ASIP)

Page 21: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

SecurityRisks

• WhichversionsofOSSdoappdeveloperschoose?• BothvulnerableandpatchedOSSarebeingused

• WhatcausestheupdateofOSSversions?• ASIPmitigatesvulnerableOSSusage,butstillremainsaproblem

21

0250500750

MoP

ub

0200400600800

Ope

nSSL

0800

16002400

OkH

ttp

2013-05-122013-11-28

2014-06-162015-01-02

2015-07-212016-02-06

2016-08-24

Date

080

160240

FFm

peg

# Vuln. Usage# Patched Usage

ASIP DeadlineASIP Notification

TimelineofOSSusageforthetop10Kapps,300Kappversions

Page 22: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Discussion

• Checkinglicensecompliancerequiresmanualefforts

• Obfuscationandoptimization• Stringencryptionindex files• Functionhidinginsofiles

• Versionpinpointing• Notallversionscanbeuniquelyidentified

• Moreprogramminglanguages(i.e.JS,Python)andplatforms(i.e.iOS)22

Page 23: OSSPolice-Identifying Open-Source License Violation and 1-day … · 2017-10-31 · GreenDAO Generator Proguard Weka-Dev Top 5 offended Java OSS 0 10000 20000 30000 40000 MuPDF FFmpeg

Conclusion

• OSSPolice:anaccurateandscalabletooltoidentifylicenseviolationsand1-daysecurityrisks• Hierarchicalindexingandmatchingscheme• Context-baseduniquefeaturefiltering

• Alargescalemeasurement• 1.6MfreeGooglePlayStoreapps• 40KcasesofpotentialGPL/AGPLviolationsand100KappsusingvulnerableOSS

• Interestinginsights• DevelopersviolateGPL/AGPLduetolackoffreealternatives• AppdevelopersusevulnerableOSSversionsdespiteeffortsfromGoogle

23