Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Tiago Brito, GSD Meeting - 30/07/2020
Empirical Study of Vulnerability Scanning Tools for JavaScript
Tiago Brito, Nuno Santos, José Fragoso
INESC-ID
Lisbon 2020
Work In Progress
Tiago Brito, GSD Meeting - 30/07/2020
Purpose of this WIP presentation● Current work is to be submitted this year
● Goal: gather feedback on work so far
● Focus on presenting the approach and preliminary results
2
Tiago Brito, GSD Meeting - 30/07/2020
Motivation● JavaScript is hugely popular for web development
○ For both client and server-side (NodeJS) development
● There are many critical vulnerabilities reported for
software developed using NodeJS
○ Remote Code Executions (Staicu NDSS’18)
○ Denial of Service (Staicu Sec’18)
○ Small number of packages, big impact (Zimmermann Sec’19)
● Developers need tools to help them detect problems
○ They are pressured to focus on delivering features
3
Tiago Brito, GSD Meeting - 30/07/2020
ProblemPrevious work focused on:
● Tools for vulnerability analysis in Java or PHP code (e.g. Alhuzali Sec’18)
● Studying very specific vulnerabilities in Server-side JavaScript
○ ReDos, Command Injections (Staicu NDSS’18 and Staicu Sec’18)
● Studying vulnerability reports on the NodeJS ecosystem (Zimmermann Sec’19)
So, it is still unknown which, and how many, of these tools can effectively
detect vulnerabilities in modern JavaScript.
4
Tiago Brito, GSD Meeting - 30/07/2020
Goal
5
Our goal is to assess the effectiveness of state-of-the-art
vulnerability detection tools for JavaScript code by
performing a comprehensive empirical study.
Tiago Brito, GSD Meeting - 30/07/2020
Research Questions1. [Tools] Which tools exist for JavaScript vulnerability detection?
2. [Approach] What’s the approach these tools use and their main challenges for
detecting vulnerabilities?
3. [Effectiveness] What is the effectiveness of these tools in detecting vulnerabilities?
6
Tiago Brito, GSD Meeting - 30/07/2020
Expected Contributions
1. Qualitative evaluation of JS vulnerability analysis tools in full blown (known)
vulnerable web applications (RQ2)
2. Qualitative evaluation of JS vulnerability analysis tools against real-world
vulnerabilities in JavaScript packages (RQ3)
3. Annotated dataset of JavaScript code with known vulnerabilities (RQ3)
7
Tiago Brito, GSD Meeting - 30/07/2020
Empirical Study - 2 Steps● [Study 1] - How do they do it? (Approach)
● [Study 2] - Do they work? (Effectiveness)
8
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Our approach● Collect a set of analysis tools
○ Criteria: 1) Available, 2) CLI, 3) Code Analysis, 4) Vulnerability Detection
○ Academic tools, Open-source Popular tools, Commercial tools, etc.
● Collect a set of Known Vulnerable Applications
○ Web applications written in NodeJS that have known vulnerabilities
○ Purposely used to teach web security and used as a benchmark in some previous work
● Run all collected tools against all collected applications
9
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Our approach● Tools:
○ NodeJsScan/njsscan/SemGrep
○ Github’s CodeQL
○ Other tools exists, but we have not tested them yet
● Applications with known vulnerabilities
○ We collected 7 different applications
○ Most popular:
■ Damn Vulnerable Node Application (DVNA)
■ OWASP NodeGoat
■ VulnerableNode
■ ...
10
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - How do they do it?
11
DVNA A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Others
Total
# Vulns 2 2 2 1 2 2 3 1 1 NA 2 18
NodeJsScan 1 0 0 0 0 0 2 1 0 NA 1 5 (28%)
CodeQL 2 1 0 0 0 0 0 0 0 NA 2 5 (28%)
● OWASP Top 10:
○ A1 - Injection
○ A2 - Broken Auth
○ A3 - Data Exposure
○ A4 - XXE
○ A5 - Broken Access
○ A6 - Security Misconfiguration
○ A7 - XSS
○ A8 - Deserialization
○ A9 - Known Vulnerable Component
○ A10 - No Logging
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - How do they do it?● NodeJsScan is rule-based
● CodeQL models code into graphs and performs graph queries on it
● Rules/graph queries describe flow conditions seen in previous vulnerabilities
○ Matches new vulnerabilities with similar flow patterns from specific sources to specific sinks
There are 5 main approach takeaways:
1. Correctly implemented rules
2. Over specific (overfitting) rules
3. Unmodelled Sources, Sinks and Dependencies
4. Unmodelled Context
5. Unmodelled Languages/Interactions
12
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Correctly implemented rules
NodeJsScan SQL injection rule:
13
var query = "SELECT name FROM Users WHERE login='" + req.body.login + "'";db.sequelize.query(query,{ model: db.User }).then(user => { … });
rules: - id: node_sqli_injection patterns: - pattern-either: - pattern: | $CON.query(<... $REQ.$QUERY.$VAR ...>, ...) - pattern: | $CON.query(<... $REQ.$QUERY ...>, ...) - pattern: | var $SQL = <... $REQ.$QUERY.$VAR ...>; ... $CON.query(<... $SQL ...>, ...); - pattern: | var $SQL = <... $REQ.$QUERY ...>; ... $CON.query(<... $SQL ...>, ...);
(...)
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Over specific rules (overfitting)
NodeJsScan Command injection rule:
14
const exec = require('child_process').exec;exec('ping -c 2 '+ req.body.address,(err,stdout,stderr) => { … });
rules: - id: generic_os_command_exec patterns: - pattern-inside: | var $EXEC = require('child_process'); ... - pattern-inside: | $APP.$METHOD(..., function $FUNC($REQ, $RES, ...){ ... }); - pattern: | $EXEC.exec(..., <... $REQ.$QUERY ...>, ...)
(...)
const app = express();
// Routingapp.use(‘/ping’, function (req, res) {
const execP = require(‘child_process’);execP.exec( ‘ping -c 2 ‘+ req.body.address, (err,stdout,stderr) => { … });
});
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Unmodelled Sources, Sinks and Dependencies
NodeJsScan Command injection rule:
15
function (req,res){ if(req.files.products) { var products = serialize.unserialize(req.files.products.data.toString('utf8')) (...)
rules: - id: node_deserialize patterns: - pattern-inside: | require('node-serialize'); ... - pattern: | $X.unserialize(...) - id: yaml_deserialize patterns: - pattern-inside: | require('js-yaml'); ... - pattern: | $X.load(...)
CodeQL only models the js-yaml
package. Thus it misses this particular
vulnerable snippet
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Unmodelled Context
16
db.User.findAll({}).then(users => { res.status(200).json({ success: true, users: users });});
● Vulnerabilities exist even in ‘correct’ code; Tools miss them without proper context
● The users structure contains sensitive data accessible to everybody at this endpoint
● Definitely possible to detect these vulnerabilities using taint tracking
● Tools need to know which resources can be accessible and which data is sensitive
db.User.findAll({attributes: [ 'id' ,'name', 'email']},).then(users => { res.status(200).json({ success: true, users: users });});
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Main Takeaways
17
DVNA A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 Others Total
# Vulns 2 2 2 1 2 2 3 1 1 NA 2 18
NodeJsScan 1 0 0 0 0 0 2 1 0 NA 1 5 (28%)
CodeQL 2 1 0 0 0 0 0 0 0 NA 2 5 (28%)
● OWASP Top 10:
○ A1 - Injection
○ A2 - Broken Auth
○ A3 - Data Exposure
○ A4 - XXE
○ A5 - Broken Access
○ A6 - Security Misconfiguration
○ A7 - XSS
○ A8 - Deserialization
○ A9 - Known Vulnerable Component
○ A10 - No Logging
Failure category legend:
Over specific (overfitting) rules
Unmodeled Sources, Sinks and Dependencies
Unmodeled Context
Unmodelled Languages/Interactions
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Our approach to study effectiveness● Build a curated dataset of NodeJS vulnerabilities
○ Collect all vulnerable versions of packages in npm security reports
○ Create a dataset of annotated vulnerabilities using the snippets
● Run all collected tools against each snippet
○ Check if the results include the reported vulnerability
○ Assess detection rates (TP/FP/FN)
18
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Curated Dataset)● There are 1550+ advisories for npm
○ Of which 1350 have available code
○ There are other vulnerability DBs for npm we may
look at, such as Snyk’s Vulnerability DB, NVD/CVE
● Challenges with looking at npm advisories:
○ Advisories lack information on the vulnerable code
○ External references do not follow a particular structure
○ Analysis has to be done manually
19
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Preliminary Results)
20
CWE # Advisories # NodeJsScan # CodeQL Percentage (Max)
CWE-506 - Embedded Malicious Code 405 0 0 0.0 %
CWE-22 - Path Traversal 156 0 109 69.9 %
CWE-79 - Cross-site Scripting 127 11 34 26.8 %
CWE-400 - Uncontrolled Resource Consumption 77 3 0 3.9 %
CWE-471 - Modification of Assumed-Immutable Data (MAID) 60 0 23 38.3 %
CWE-78 - OS Command Injection 43 1 33 76.7 %
CWE-94 - Code Injection 34 0 0 0.0 %
CWE-20 - Improper Input Validation 26 0 1 3.8 %
CWE-200 - Exposure of Sensitive Information to an Unauthorized Actor 22 0 0 0.0 %
CWE-89 - SQL Injection 20 15 0 75.0 %
Other CWEs 380 8 152 40.0 %
Total 1350 38 (2.8 %) 352 (26.1 %) 26.1 %
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Preliminary Results)
21
Using both: 28.1 %
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Preliminary Results)
22
CWE # Advisories # NodeJsScan Percentage (Max)
CWE-89 - SQL Injection 20 15 75.0 %
CWE-185 - Incorrect Regular Expression 3 2 66.7 %
CWE-295 - Improper Certificate Validation 2 1 50.0 %
CWE-918 - Server-Side Request Forgery (SSRF) 2 1 50.0 %
CWE-943 - Improper Neutralization of Special Elements in Data Query Logic 5 2 40.0%
Total 32 (2.4 % total) 21 (65.6 %)
NodeJsScan Top 5 detections by CWE (percentage of detections):
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Preliminary Results)
23
CWE # Advisories # CodeQL Percentage (Max)
CWE-26 - Path Traversal: '/dir/../filename' 10 10 100.0 %
CWE-80 - Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS) 6 6 100.0 %
CWE-23 - Relative Path Traversal 6 5 83.3 %
CWE-25 - Path Traversal: '/../filedir' 13 10 76.9 %
CWE-78 - OS Command Injection 43 33 76.7 %
Total 78 (5.8 % total) 64 (82.1 %)
CodeQL Top 5 detections by CWE (percentage of detections):
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Main Takeaways● Current JS vulnerability detection tools are ineffective
○ Low detection rates => Open research problem / lots of opportunities for further research
● Detected vulnerabilities in line with results from Study 1
○ Injection, XSS, Path Traversals, etc. (classical taint tracking vulnerabilities) are detected
○ More complex vulnerabilities are not detected: Broken Access, Data Exposure, etc.
● CodeQL performs better than NodeJsScan
○ Modelling code into graph structures is better/more flexible than rule/regex based matching
24
Tiago Brito, GSD Meeting - 30/07/2020
Conclusion● What have we done here?
○ Explained the approach and deficiencies of two of the most popular tools
○ Showed preliminary results on the effectiveness of selected tools
● So what?
○ Current tools are not adequate to detect a large number of vulnerabilities in JS code
○ There’s not sufficient previous work on JS vulnerability detection
○ There’s a lot of work to do in this area
■ Specially in detecting more complex vulnerabilities
25
Tiago Brito, GSD Meeting - 30/07/2020
What’s next?
26
● Finish this current work:
○ Study 1 - Do this analysis for more tools and applications - increase understanding of limitations
○ Study 2
■ Finish manually analyzing/building the curated dataset
■ In depth analysis of tools’ results to assess effectiveness
● Improve tools
○ Add contextual information to CodeQL’s taint tracking queries for specific system resources
○ Inspect npm packages and model their Node API calls to detect more dangerous sinks
Thank You! Questions?
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Unmodelled Sources, Sinks and Dependencies
27
● Mathjs package contains a dangerous function that can lead to RCE
● Tools do not inspect dependency code, thus ignore possible transitive vulnerabilities
● Other tools detect dependency versions and compare them to vulnerability DBs
(npm audit, github security advisory/dependabot, snyk, etc.)
● Ideally, a tool could also analyze dependency code (impractical / not scalable).
if (req.body.eqn) { req.flash('result', mathjs.eval(req.body.eqn)); res.render('app/calc');}
Tiago Brito, GSD Meeting - 30/07/2020
Study 1 - Unmodelled Languages/Interactions
28
● Vulnerabilities often occur when different components interact (PL, stacks, etc.)
● This EJS (JavaScript Templating file) contains a XSS vulnerability
● NodeJsScan detects it because it models EJS files, but CodeQL does not
● EJS is not pure JavaScript, but directly interacts with it
<% if (output && output.searchTerm) { %> <p class="bg-success"> Listing products with <strong>search query: </strong> <%- output.searchTerm %>
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Preliminary Results)
29
CWE # Advisories # NodeJsScan Percentage (Max)
CWE-89 - SQL Injection 20 15 75.0 %
CWE-79 - Cross-site Scripting 127 11 8.7 %
CWE-400 - Uncontrolled Resource Consumption 77 3 3.9 %
CWE-185 - Incorrect Regular Expression 3 2 66.7 %
CWE-943 - Improper Neutralization of Special Elements in Data Query Logic 5 2 40.0%
Total 232 (17.2 % total) 33 (14.2 %)
NodeJsScan Top 5 detections by CWE (absolute number of detections):
Tiago Brito, GSD Meeting - 30/07/2020
Study 2 - Effectiveness (Preliminary Results)
30
CWE # Advisories # CodeQL Percentage (Max)
CWE-22 - Path Traversal 156 109 69.9 %
CWE-79 - Cross-site Scripting 127 34 26.8 %
CWE-78 - OS Command Injection 43 33 76.7 %
CWE-471 - Modification of Assumed-Immutable Data (MAID) 60 23 38.3 %
CWE-25 - Path Traversal: '/../filedir' 13 10 76.9 %
Total 399 (29.6 % total) 209 (52.4 %)
CodeQL Top 5 detections by CWE (absolute number of detections):
Tiago Brito, GSD Meeting - 30/07/2020
Tools
31
Aether - https://github.com/codecombat/aether
SemGrep - https://github.com/returntocorp/semgrep
NodeJsScan - https://github.com/ajinabraham/NodeJsScan
EsLint Security Plugin - https://github.com/nodesecurity/eslint-plugin-security
EsLint Mozilla ScanJS - https://github.com/mozfreddyb/eslint-config-scanjs
JsHint - https://github.com/jshint/jshint
JsPrime - https://github.com/dpnishant/jsprime
ApplicationInspector - https://github.com/microsoft/ApplicationInspector
Coala - https://github.com/coala/coala
Codeburner - https://github.com/groupon/codeburner
Insider - https://github.com/insidersec/insider
PMD - https://github.com/pmd/pmd
WALA - http://wala.sourceforge.net/wiki/index.php/Main_Page
WALA - http://wala.sourceforge.net/wiki/index.php/Main_Page
CodeQL - https://github.com/github/codeql
Graudit - https://github.com/wireghoul/graudit/
Yasca - https://github.com/scovetta/yasca
Google Closure Compiler - https://github.com/google/closure-compiler
Sonarqube - https://github.com/SonarSource/sonarqube
EsLint Security Scanner Configs -
https://github.com/Greenwolf/eslint-security-scanner-configs
Synode (academic) - https://github.com/sola-da/Synode
Tiago Brito, GSD Meeting - 30/07/2020
Vulnerable Applications
32
DVNA - https://github.com/appsecco/dvna
Snyk's Goof - https://github.com/snyk/goof
NodeGoat - https://github.com/OWASP/NodeGoat
VulnerableNode - https://github.com/cr0hn/vulnerable-node
Juice Shop - https://github.com/bkimminich/juice-shop
Appsec JavaScript Security - https://github.com/tlaskowsky/appsec_javascript_security
Vulnerable Web Application - https://github.com/psmorrow/vulnerable-web-application