Empirical Study of Vulnerability Scanning Tools for JavaScript

Tiago Brito, GSD Meeting - 30/07/2020

Empirical Study of Vulnerability Scanning Tools for JavaScript

Tiago Brito, Nuno Santos, José Fragoso

INESC-ID

Lisbon 2020

Work In Progress


Purpose of this WIP presentation● Current work is to be submitted this year

● Goal: gather feedback on work so far

● Focus on presenting the approach and preliminary results

2


Motivation● JavaScript is hugely popular for web development

○ For both client and server-side (NodeJS) development

● There are many critical vulnerabilities reported for

software developed using NodeJS

○ Remote Code Executions (Staicu NDSS’18)

○ Denial of Service (Staicu Sec’18)

○ Small number of packages, big impact (Zimmermann Sec’19)

● Developers need tools to help them detect problems

○ They are pressured to focus on delivering features

3

http://progress_bar_id


ProblemPrevious work focused on:

● Tools for vulnerability analysis in Java or PHP code (e.g. Alhuzali Sec’18)

● Studying very specific vulnerabilities in Server-side JavaScript

○ ReDos, Command Injections (Staicu NDSS’18 and Staicu Sec’18)

● Studying vulnerability reports on the NodeJS ecosystem (Zimmermann Sec’19)

So, it is still unknown which, and how many, of these tools can effectively

detect vulnerabilities in modern JavaScript.

4


Goal

5

Our goal is to assess the effectiveness of state-of-the-art

vulnerability detection tools for JavaScript code by

performing a comprehensive empirical study.



Research Questions1. [Tools] Which tools exist for JavaScript vulnerability detection?

2. [Approach] What’s the approach these tools use and their main challenges for

detecting vulnerabilities?

3. [Effectiveness] What is the effectiveness of these tools in detecting vulnerabilities?

6



Expected Contributions

1. Qualitative evaluation of JS vulnerability analysis tools in full blown (known)

vulnerable web applications (RQ2)

2. Qualitative evaluation of JS vulnerability analysis tools against real-world

vulnerabilities in JavaScript packages (RQ3)

3. Annotated dataset of JavaScript code with known vulnerabilities (RQ3)

7



Empirical Study - 2 Steps● [Study 1] - How do they do it? (Approach)

● [Study 2] - Do they work? (Effectiveness)

8



Study 1 - Our approach● Collect a set of analysis tools

○ Criteria: 1) Available, 2) CLI, 3) Code Analysis, 4) Vulnerability Detection

○ Academic tools, Open-source Popular tools, Commercial tools, etc.

● Collect a set of Known Vulnerable Applications

○ Web applications written in NodeJS that have known vulnerabilities

○ Purposely used to teach web security and used as a benchmark in some previous work

● Run all collected tools against all collected applications

9



Study 1 - Our approach● Tools:

○ NodeJsScan/njsscan/SemGrep

○ Github’s CodeQL

○ Other tools exists, but we have not tested them yet

● Applications with known vulnerabilities

○ We collected 7 different applications

○ Most popular:

■ Damn Vulnerable Node Application (DVNA)

■ OWASP NodeGoat

■ VulnerableNode

■ ...

10



Study 1 - How do they do it?

11

DVNA A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Others

Total

# Vulns 2 2 2 1 2 2 3 1 1 NA 2 18

NodeJsScan 1 0 0 0 0 0 2 1 0 NA 1 5 (28%)

CodeQL 2 1 0 0 0 0 0 0 0 NA 2 5 (28%)

● OWASP Top 10:

○ A1 - Injection

○ A2 - Broken Auth

○ A3 - Data Exposure

○ A4 - XXE

○ A5 - Broken Access

○ A6 - Security Misconfiguration

○ A7 - XSS

○ A8 - Deserialization

○ A9 - Known Vulnerable Component

○ A10 - No Logging



Study 1 - How do they do it?● NodeJsScan is rule-based

● CodeQL models code into graphs and performs graph queries on it

● Rules/graph queries describe flow conditions seen in previous vulnerabilities

○ Matches new vulnerabilities with similar flow patterns from specific sources to specific sinks

There are 5 main approach takeaways:

1. Correctly implemented rules

2. Over specific (overfitting) rules

3. Unmodelled Sources, Sinks and Dependencies

4. Unmodelled Context

5. Unmodelled Languages/Interactions

12



Study 1 - Correctly implemented rules

NodeJsScan SQL injection rule:

13

var query = "SELECT name FROM Users WHERE login='" + req.body.login + "'";db.sequelize.query(query,{ model: db.User }).then(user => { … });

rules: - id: node_sqli_injection patterns: - pattern-either: - pattern: | $CON.query(<... $REQ.$QUERY.$VAR ...>, ...) - pattern: | $CON.query(<... $REQ.$QUERY ...>, ...) - pattern: | var $SQL = <... $REQ.$QUERY.$VAR ...>; ... $CON.query(<... $SQL ...>, ...); - pattern: | var $SQL = <... $REQ.$QUERY ...>; ... $CON.query(<... $SQL ...>, ...);

(...)



Study 1 - Over specific rules (overfitting)

NodeJsScan Command injection rule:

14

const exec = require('child_process').exec;exec('ping -c 2 '+ req.body.address,(err,stdout,stderr) => { … });

rules: - id: generic_os_command_exec patterns: - pattern-inside: | var $EXEC = require('child_process'); ... - pattern-inside: | $APP.$METHOD(..., function $FUNC($REQ, $RES, ...){ ... }); - pattern: | $EXEC.exec(..., <... $REQ.$QUERY ...>, ...)

(...)

const app = express();

// Routingapp.use(‘/ping’, function (req, res) {

const execP = require(‘child_process’);execP.exec( ‘ping -c 2 ‘+ req.body.address, (err,stdout,stderr) => { … });

});



Study 1 - Unmodelled Sources, Sinks and Dependencies

NodeJsScan Command injection rule:

15

function (req,res){ if(req.files.products) { var products = serialize.unserialize(req.files.products.data.toString('utf8')) (...)

rules: - id: node_deserialize patterns: - pattern-inside: | require('node-serialize'); ... - pattern: | $X.unserialize(...) - id: yaml_deserialize patterns: - pattern-inside: | require('js-yaml'); ... - pattern: | $X.load(...)

CodeQL only models the js-yaml

package. Thus it misses this particular

vulnerable snippet



Study 1 - Unmodelled Context

16

db.User.findAll({}).then(users => { res.status(200).json({ success: true, users: users });});

● Vulnerabilities exist even in ‘correct’ code; Tools miss them without proper context

● The users structure contains sensitive data accessible to everybody at this endpoint

● Definitely possible to detect these vulnerabilities using taint tracking

● Tools need to know which resources can be accessible and which data is sensitive

db.User.findAll({attributes: [ 'id' ,'name', 'email']},).then(users => { res.status(200).json({ success: true, users: users });});



Study 1 - Main Takeaways

17

DVNA A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Others Total

# Vulns 2 2 2 1 2 2 3 1 1 NA 2 18

NodeJsScan 1 0 0 0 0 0 2 1 0 NA 1 5 (28%)

CodeQL 2 1 0 0 0 0 0 0 0 NA 2 5 (28%)

● OWASP Top 10:

○ A1 - Injection

○ A2 - Broken Auth

○ A3 - Data Exposure

○ A4 - XXE

○ A5 - Broken Access

○ A6 - Security Misconfiguration

○ A7 - XSS

○ A8 - Deserialization

○ A9 - Known Vulnerable Component

○ A10 - No Logging

Failure category legend:

Over specific (overfitting) rules

Unmodeled Sources, Sinks and Dependencies

Unmodeled Context

Unmodelled Languages/Interactions



Study 2 - Our approach to study effectiveness● Build a curated dataset of NodeJS vulnerabilities

○ Collect all vulnerable versions of packages in npm security reports

○ Create a dataset of annotated vulnerabilities using the snippets

● Run all collected tools against each snippet

○ Check if the results include the reported vulnerability

○ Assess detection rates (TP/FP/FN)

18



Study 2 - Effectiveness (Curated Dataset)● There are 1550+ advisories for npm

○ Of which 1350 have available code

○ There are other vulnerability DBs for npm we may

look at, such as Snyk’s Vulnerability DB, NVD/CVE

● Challenges with looking at npm advisories:

○ Advisories lack information on the vulnerable code

○ External references do not follow a particular structure

○ Analysis has to be done manually

19



Study 2 - Effectiveness (Preliminary Results)

20

CWE # Advisories # NodeJsScan # CodeQL Percentage (Max)

CWE-506 - Embedded Malicious Code 405 0 0 0.0 %

CWE-22 - Path Traversal 156 0 109 69.9 %

CWE-79 - Cross-site Scripting 127 11 34 26.8 %

CWE-400 - Uncontrolled Resource Consumption 77 3 0 3.9 %

CWE-471 - Modification of Assumed-Immutable Data (MAID) 60 0 23 38.3 %

CWE-78 - OS Command Injection 43 1 33 76.7 %

CWE-94 - Code Injection 34 0 0 0.0 %

CWE-20 - Improper Input Validation 26 0 1 3.8 %

CWE-200 - Exposure of Sensitive Information to an Unauthorized Actor 22 0 0 0.0 %

CWE-89 - SQL Injection 20 15 0 75.0 %

Other CWEs 380 8 152 40.0 %

Total 1350 38 (2.8 %) 352 (26.1 %) 26.1 %




21

Using both: 28.1 %




22

CWE # Advisories # NodeJsScan Percentage (Max)

CWE-89 - SQL Injection 20 15 75.0 %

CWE-185 - Incorrect Regular Expression 3 2 66.7 %

CWE-295 - Improper Certificate Validation 2 1 50.0 %

CWE-918 - Server-Side Request Forgery (SSRF) 2 1 50.0 %

CWE-943 - Improper Neutralization of Special Elements in Data Query Logic 5 2 40.0%

Total 32 (2.4 % total) 21 (65.6 %)

NodeJsScan Top 5 detections by CWE (percentage of detections):




23

CWE # Advisories # CodeQL Percentage (Max)

CWE-26 - Path Traversal: '/dir/../filename' 10 10 100.0 %

CWE-80 - Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS) 6 6 100.0 %

CWE-23 - Relative Path Traversal 6 5 83.3 %

CWE-25 - Path Traversal: '/../filedir' 13 10 76.9 %

CWE-78 - OS Command Injection 43 33 76.7 %

Total 78 (5.8 % total) 64 (82.1 %)

CodeQL Top 5 detections by CWE (percentage of detections):



Study 2 - Main Takeaways● Current JS vulnerability detection tools are ineffective

○ Low detection rates => Open research problem / lots of opportunities for further research

● Detected vulnerabilities in line with results from Study 1

○ Injection, XSS, Path Traversals, etc. (classical taint tracking vulnerabilities) are detected

○ More complex vulnerabilities are not detected: Broken Access, Data Exposure, etc.

● CodeQL performs better than NodeJsScan

○ Modelling code into graph structures is better/more flexible than rule/regex based matching

24



Conclusion● What have we done here?

○ Explained the approach and deficiencies of two of the most popular tools

○ Showed preliminary results on the effectiveness of selected tools

● So what?

○ Current tools are not adequate to detect a large number of vulnerabilities in JS code

○ There’s not sufficient previous work on JS vulnerability detection

○ There’s a lot of work to do in this area

■ Specially in detecting more complex vulnerabilities

25



What’s next?

26

● Finish this current work:

○ Study 1 - Do this analysis for more tools and applications - increase understanding of limitations

○ Study 2

■ Finish manually analyzing/building the curated dataset

■ In depth analysis of tools’ results to assess effectiveness

● Improve tools

○ Add contextual information to CodeQL’s taint tracking queries for specific system resources

○ Inspect npm packages and model their Node API calls to detect more dangerous sinks

Thank You! Questions?



Study 1 - Unmodelled Sources, Sinks and Dependencies

27

● Mathjs package contains a dangerous function that can lead to RCE

● Tools do not inspect dependency code, thus ignore possible transitive vulnerabilities

● Other tools detect dependency versions and compare them to vulnerability DBs

(npm audit, github security advisory/dependabot, snyk, etc.)

● Ideally, a tool could also analyze dependency code (impractical / not scalable).

if (req.body.eqn) { req.flash('result', mathjs.eval(req.body.eqn)); res.render('app/calc');}



Study 1 - Unmodelled Languages/Interactions

28

● Vulnerabilities often occur when different components interact (PL, stacks, etc.)

● This EJS (JavaScript Templating file) contains a XSS vulnerability

● NodeJsScan detects it because it models EJS files, but CodeQL does not

● EJS is not pure JavaScript, but directly interacts with it

<% if (output && output.searchTerm) { %> <p class="bg-success"> Listing products with <strong>search query: </strong> <%- output.searchTerm %>




29

CWE # Advisories # NodeJsScan Percentage (Max)

CWE-89 - SQL Injection 20 15 75.0 %

CWE-79 - Cross-site Scripting 127 11 8.7 %

CWE-400 - Uncontrolled Resource Consumption 77 3 3.9 %

CWE-185 - Incorrect Regular Expression 3 2 66.7 %

CWE-943 - Improper Neutralization of Special Elements in Data Query Logic 5 2 40.0%

Total 232 (17.2 % total) 33 (14.2 %)

NodeJsScan Top 5 detections by CWE (absolute number of detections):




30

CWE # Advisories # CodeQL Percentage (Max)

CWE-22 - Path Traversal 156 109 69.9 %

CWE-79 - Cross-site Scripting 127 34 26.8 %

CWE-78 - OS Command Injection 43 33 76.7 %

CWE-471 - Modification of Assumed-Immutable Data (MAID) 60 23 38.3 %

CWE-25 - Path Traversal: '/../filedir' 13 10 76.9 %

Total 399 (29.6 % total) 209 (52.4 %)

CodeQL Top 5 detections by CWE (absolute number of detections):



Tools

31

Aether - https://github.com/codecombat/aether

SemGrep - https://github.com/returntocorp/semgrep

NodeJsScan - https://github.com/ajinabraham/NodeJsScan

EsLint Security Plugin - https://github.com/nodesecurity/eslint-plugin-security

EsLint Mozilla ScanJS - https://github.com/mozfreddyb/eslint-config-scanjs

JsHint - https://github.com/jshint/jshint

JsPrime - https://github.com/dpnishant/jsprime

ApplicationInspector - https://github.com/microsoft/ApplicationInspector

Coala - https://github.com/coala/coala

Codeburner - https://github.com/groupon/codeburner

Insider - https://github.com/insidersec/insider

PMD - https://github.com/pmd/pmd

WALA - http://wala.sourceforge.net/wiki/index.php/Main_Page

WALA - http://wala.sourceforge.net/wiki/index.php/Main_Page

CodeQL - https://github.com/github/codeql

Graudit - https://github.com/wireghoul/graudit/

Yasca - https://github.com/scovetta/yasca

Google Closure Compiler - https://github.com/google/closure-compiler

Sonarqube - https://github.com/SonarSource/sonarqube

EsLint Security Scanner Configs -

https://github.com/Greenwolf/eslint-security-scanner-configs

Synode (academic) - https://github.com/sola-da/Synode



Vulnerable Applications

32

DVNA - https://github.com/appsecco/dvna

Snyk's Goof - https://github.com/snyk/goof

NodeGoat - https://github.com/OWASP/NodeGoat

VulnerableNode - https://github.com/cr0hn/vulnerable-node

Juice Shop - https://github.com/bkimminich/juice-shop

Appsec JavaScript Security - https://github.com/tlaskowsky/appsec_javascript_security

Vulnerable Web Application - https://github.com/psmorrow/vulnerable-web-application


Documents

Empirical Study of Vulnerability Scanning Tools for JavaScript