8
Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison with Singular Value Decomposition

Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Embed Size (px)

Citation preview

Page 1: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika

Review, Chinese Firewall Maximum Entropy Point Feature Comparison with Singular Value

Decomposition

Page 2: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Chinese Firewall We want to monitor what words/phrases are being

censored in China We find out which words are being filtered by

probing the ”Chinese Firewall” with words that are likely to be censored Our main problem is finding the words that are likely

to be censored Challenge: Chinese characters are not like English

letters, we are dealing with Chinese text

Ex: 馬

Page 3: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Maximum Entropy Used for Named Entity Extraction Ex: ”Chinese government passes new law” [Beginning of Named Entity][End of Named Entity] [other] [other] [Unique Named Entity]

Build a model from a training set: our training set is the Chinese Wikipedia

Training set needs to have a specific format:Assign each word a set of featuresLabel each word as a [unique named entity],

[other], etc... Using Maximum Entropy, we can assign a probability

P(named entity) to new words based on features describing those words

Page 4: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Once we extract named entities from news sources, we can test whether new words are added to the ”blacklist”

Problem: Chinese text that is similar, but not exactly, the keyword

we want to test Ex:

法轮功 法十轮十功

Page 5: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Feature Correspondence by Singular Value Decomposition

Point Features 1:1 mapping SVD Given the point features in two images I and J,

build a proximity matrix G: G(ij) = exp(-r(ij)/2σ^2) SVD of G => G = TDU' P = TEU' If P(ij) determines whether I(i) maps to J(j)

Page 6: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Current Status

We are almost done labeling Chinese Wikipedia to use as a training set for our maximum entropy program

Chinese character images Point feature extraction

Page 7: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

(Near) Future Work

Finish and test our maximum entropy model Point feature extraction Ideas: Zip files, Relaxation-based pattern

matching, Segmentation

Page 8: Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison

Questions?

Longuet-Higgins H. Christopher and Scott, Guy L. (1991). An Algorithm for Associating the Features of Two Images. Proc. R. Soc. Lond. B 244, 21-26. doi: 10.1098/rspb.1991.0045

Pilu, Maurizio. (1997) Uncalibrated Stereo Correspondence by Singular Value Decomposition. HP Laboratories Bristol, Digital Media Department, HPL -97-96, August 1997

Nagasaki, Takeshi, Yanagida, Tadashi, Nakagawa, Masaki. () Relaxation-Based Pattern Matching Using Automatic Differentiation for Off-line Character Recognition

Borthwick, Andrew. Sterling, John. Agichten, Eugene. Grishman, Ralph. () Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. New York University.