Upload
doanthuan
View
212
Download
0
Embed Size (px)
Citation preview
String Analysis ——
Introduction
String Analysis is first developed by A. S.
Christensen et al. in 2003 to predict the
possible values of a string variable. Minamide
enhanced string analysis by adding FSTs in
2005.
Input: code, a string variable V whose value
is on demand
Output: a grammar whose language
approximates the set of possible values of V
From Code to SSA Form
CodeString x = "abc"
for (int i = 0; i < n; i++)
x = "0"+x+"1";
String s = x.replace("00","0");
System.out.print(s);
SSA Formx = "abc"
for (i = 0; i < n; i++)
x1 = "0".φ(x1, x)."1";
x2 = φ(x1, x));
s = x2.replace("00","0");
System.out.print(s);
SSA Form to Extended CFG
Rules: x=expression => x->expression
φ(x1, x2) => x1|x2
CFGX->abc
X1->0X1|0X11
X2->X1|X
S->str_replace(“00”, “0”, X2)
X1+X2 => x1x2
String operations => add the
invoking object as an argument
SSA Formx = "abc"
for (i = 0; i < n; i++)
x1 = "0".φ(x1, x)."1";
x2 = φ(x1, x));
s = x2.replace("00","0");
System.out.print(s);
Finite State Transducer
Finite State Transducer is like Finite
Automaton, but it not only accepts input
strings, but also output strings according to
the accepted string.
The following FST simulates the function
str_replace(“00”, “0”, x);
Extended-CFG to CFG
The output string set of a CFG through a FST is a CFG
The algorithm to calculate the output CFG is the similar
with calculating the intersection of a CFG and a DFA
Algorithm:
1. Convert CFG to PDA
2. PDA’ = PDA*(the corresponding FA of the FST)
3. Convert PDA’ to CFG’, when converting the transitions
in the PDA’ to the productions of CFG’, use the output
terminal in the FST instead of the input terminal
String Taint Analysis
Developed by Wassermann and Su in 2007.
Adding a tag to unsafe terminals and propagate
the tags among the CFG to predict whether a
string variable’s values are from unsafe sourcebasic idea:
for S->BC...
if (B has tag | C has tag|...){
add tag to S;
}
Through the process from extended-CFG to CFG using
FSTs, all newly added non- terminals according to an old
tagged non-terminal are tagged
Some Applications of String
Analysis
SQL Injection Detection
Cross-site Scripting Detection
Impact Analysis of Database Schema
Application of String Taint Analysis in
Software Internationalization
Introduction
Example
Approach
Experiments
Globalization Process
One-language
Version
Internationalized
Version
English
Property
German
Property
Chinese
Property
Developer
I18n
L10n
All language specific
code elements are
externalized to
property files
I18n Conducted for
• Old software projects
• New project with no global plan at first
• Using old components
I18n
Two Steps:
Internationalization(I18n)
Localization (L10n)
Language Specific Code
Elements
• Constant Strings
• Date/Number Formats
• Currency/Measures
• Writing Direction
• Color/Culture related elements
• …
Constant Strings are of the largest number, and some of
them are very hard to be located.
Motivation of our work
There are a lot of constant strings
We should not translate all of them
It is sometimes hard to decide which string is
need-to-translate
Application/
Version
#LOC #Constant
Strings
#Need-to-Translate Strings (Not
externalized in the subsequent version)
Rtext0.8.6.9 (Core
Package)
17k 1252 408(121)
Risk1.0.7.5 19k 1510 509(55)
ArtOfIllusion1.1 71k 2889 1221(816)
Megamek0.29.72 110k 10464 1734(678)
Example(1)Risk project: Risk.java and RiskGame.javapublic class Risk{
public void GameParser(String mem){
message=mem; (5)
StringTokenizer StringT = new StringTokenizer(message," "); (4)
String addr = StringT.nextToken(); (4-1)
...
if(addr.equals("CARD")){
if(StringT.hasMoreTokens()){
String name = StringT.nextToken(); (3)
String cardName;
. ..
if(name.equals("wildcard"))
cardName = name; (2)
…
gui.sendMessage("You got a new card:\""
+ cardName + "\"", false , false); (1)
} ...
}
}
Example(2)public void DoEndGo(String mem){
...
GameParser("CARD "+game.getDeservedCard()); (6)
...
}
}
}
public class RiskGame{
public String getDesrvedCard(){
Card c = cards.elementAt(r.nextInt(cards.size()));
if(c.getCountry() == null)
return "wildcard"; (7)
else
return c.getCountry.getName();
...
}
}
Basic Idea
We assume that all need-to-translate strings are those
strings that are sent to the GUI
String Variables
/ExpressionsGUIConstant Strings
Challenges
String operations (concatenate, tokenize, substring, etc..)
String transmissions:
String Comparisons:
Trivial Strings: “123”, “ ”, “Risk”, …
Client GUI
networkServer
Client GUI
String1
String2Comparison
GUI
String1
String1:part1
String1:part2
GUIString1:part1
String1:part2
Approach
Collect output API methods
Locate initial output strings
Adapted String Taint Analysis
String Transmission Analysis
String Comparison Analysis
Filtering
Output API Methods
Output API Methods are methods that pass at least one of its parameters to the GUI
Example
java.awt.Graphics2D.drawString(java.lang.String, int, int) drawString 1 false 0
Initial Output Strings are the arguments sent to Output API Methods
g.drawString (weaponMessage, 30,20)
We locate the string using Eclipse API Search Engine
String Analysis
Determine the possible values of a string variable in the
code as CFGs and DFAs
return1 → wildcard
return2 → &FileInput
return3 → return1|return2
parseCard → CARD return3
message → parseCard|...
StringT → message
addr → nextToken(stringT, " ")
StringT1 → reduceToken(StringT, " ")
name → nextToken(stringT1, " ")
StringT2 → reduceToken(StringT1," ")
output → You got a new card: nameStart
String Taint Analysis
Determine whether a part of a string is from unsafe
source
return1 → wildcard
return2 → &FileInput
return3 → return1|return2
parseCard → CARD return3
message → parseCard|...
StringT → message
addr → nextToken(StringT, " ")
StringT1 → reduceToken(StringT, " ")
name → nextToken(StringT1, " ")
StringT2 → reduceToken(StringT1," ")
output → You got a new card: nameStart
Adapted String Taint Analysis
Propagate List of Originating Positions as the
tags of the non-terminals in the list
parseCard → CARD return3
Positions
Risk:8922
RiskGame:6767
extern
Positions
Risk:6767
externPositions
1-5:Risk:8922
String Transmission Analysis
Scenario
Socket
GUI
Control
Logs
…
Labels
Comparison
Packet1
Label: Info
“Client A kills you”
Other Fields
Packet2
Label: Command
“quit”
Other Fields
Server Side Client Side
String Comparison Analysis
Locating all string comparison operations String.equals(), String.startWith(), String.endWith(),
String.compareTo(), etc.
String taint analysis on both sides of the
operations
If one side contains a need-to-translate string,
mark the constant strings on the other side as
need-to-translateString1
String2Comparison
GUI
Experimental subjects
RText : Simple Editor
Risk : Board Game
ArtOfIllusion : Graph Drawing Project
Megamek : Big Real Time Strategy Game
Application/Version Starting
Month
#Developers #LOC #Files #Constant Strings
RText 0.8.6.9 11/2003 16 17k 55 1252
Risk 1.0.7.5 05/2004 4 19k 38 1510
AOI 1.1 11/2000 2 71k 258 2889
Megamek 0.29.72 02/2002 33 110k 338 10464
Experimental Results Best Results
Turning on and off String Transmission Analysis
App Need-to-Trans (Not Externalized
in subsequent version)
Located FN FP
RText 408(121) 445 0 37
Risk 509(55) 498 18 7
AOI 1221(816) 1280 6 65
Megamek 1734(678) 1765 10 41
App Need-to-trans Located FN FP
Megamek 1734 1765 10 41
Megamek(NT) 1734 1188 585 39
Megamek(ALL) 1734 1777 10 53
Reduce FN
significantly
and reduce
some FP
Experimental Results cont.
Turning on and off
String comparison
analysis
App Located FN FP
RText 445 0 37
RText(NC) 445 0 37
Risk 498 18 7
Risk(NC) 474 42 7
AOI 1280 6 65
AOI(NC) 1280 6 65
Megamek 1765 10 41
Megamek(NC) 1730 36 32
App Located FN FP
RText 445 0 37
RText(NC) 581 0 173
Risk 498 18 7
Risk(NC) 532 18 41
AOI 1280 6 65
AOI(NC) 1487 6 272
Megamek 1765 10 41
Megamek(NC) 2080 10 356
Turning on and off
filterReduce some
FN, but very
important FN
Significantly
reduce FP
Bugs found
We found 17 not-externalized need-to-translate
strings in the latest version of Megamek and
reported them as report 2085049. The
developers confirmed and externalized them.