Upload
fola
View
51
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Efficient Inference on Sequence Segmentation Models. Sunita Sarawagi IIT Bombay [email protected]. Sequence segmentation models. Flexible & accurate models for many applications Speech segmentation on phonemes Syntactic chunking Protein/Gene finding - PowerPoint PPT Presentation
Citation preview
2
Sequence segmentation models Flexible & accurate models for many applications
Speech segmentation on phonemes
Syntactic chunking Protein/Gene finding Information extraction with entity-level features
Whole entity match with database of entities Length of entity between 3 and 8 words Third or fourth token of entity a “-” Last three tokens are digits
From Keshet et al ’05 NIPS wkshp
3
1 2 3 4 5 6 7 8
R. Fagin and J. Halpern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
t
x
y
Features describe the single word “Fagin”
Sequence Vs. Segmentation Models
l1=1, u1=2 l1=u1=3 l1=4, u1=5 l1=6, u1=8
R. Fagin and J. Halpern Belief Awareness Reasoning
Author Other Author Title
x
y
Features describe full entity
Similarity to author’s column in database
l,u
y1 y2 y3 y4 y5 y6 y7 y8
4
Segmentation models Input: sequence x=x1,x2..xn, label set Y Output: segmentation S=s1,s2…sp
sj = (start position, end position, label) = (tj,uj,yj)
Score: F(x,s) = Transition potentials
Segment starting at i has label y and previous label is y’ Segment potentials
Segment starting at i’, ending at i, and with label y. All positions from i’ to i get same label.
Inference Most likely segmentation (Max-margin trainers) Marginal around segments (likelihood-based &
exponentiated-gradient trainers)
5
Inference: Marginal for a segmentForward messages (L = max segment length)
Matrix notation: for L = n
Segment Marginal
O(n L2)
y1 y2 y3 y4 y5 y6 y7 y8
6
Goal Speed up segmentation models
Currently 3—8 times slower than sequence models Eliminate L, the hard limit on segment length Efficiently handle mix of potentials spanning
varying number of tokens Pay the penalty of segmentation models only for longer
entity level features instead of all of them
Empirical results on extraction tasks: Segmentation models with few entity features: higher
accuracy at the same cost as sequence models
7
Succinct potentials Key insight
Compactly represent features on overlapping segments Main challenge
Inference algorithms on compact potentials where cost is independent of segments a potential applies to
Four kinds of potentials
8
Applications with mixed potentials Named Entity Recognition
Speech segmentation on phonemes
9
Efficient Inference: forward pass Sharing computation
Split potentials (-s) into two parts:
Common to all segments starting before i-m
Common to all segments ending after i-1
Maximum gap between boundaryof any
different
10
Two sets of modified forward messages
y1 y2 y3 y4 y5 y6 y7 y8 y9
Similar two sets of backward messages
Same strategy for max-product inference
O(n m2)
Optimized forward pass
11
Marginals around potentials Direct computation of marginals is O(n2) Reduced to O(1) by two tricks
Decomposing potentials as in , Sharing computations across adjacent potentials
Direct
Optimized
a bit more tricky
12
Complexity and data structures Complexity of computing marginals
Optimized: O(nm+H), Original: O(nL+G) H = number of features in succinct form G = O(L2H) (In real-data |G| = 5--10 times |H|)
Achieved via incremental computation of Special data structure for storing to compute i’:i
in O(1) time from previous i’:i-1 Marginals computed in sorted order: increasing
start boundary, decreasing end boundary
13
Empirical evaluation Task:
Citations: Cora, articles (L=20) Address: Indian address (L=7)
Features: Token-level
Orthographic properties/lexicon match of words at the start, end, middle, left, right of segment
Entity-level TFIDF Match with lexicon, entity length
Methods Sequence-BCEU: Begin-Continue-End-Unique labels Segment: Original un-optimized algorithm Segment-Opt: Optimized inference with compact potentials
14
Running time and Accuracy
Address
50
550
1050
1550
2050
2550
3050
0 10 20 30 40 50 60
Training %
Tim
e (s
ec)
Sequence-BCEUSegmentOptSegment
Cora
50
2050
4050
6050
8050
10050
12050
0 10 20 30 40 50 60 70
Training %
Tim
e (seconds)
Sequence-BCEU
SegmentOpt
Segment
78
80
82
84
86
88
90
92
F1 Accuracy
Address Cora Articles
Sequence-BCEU
Segment
15
Limit on segment length (L)
L (Hard limit on segment length) Too small reduced accuracy 9081 Too large increased running time 30 minutes 1 hour
m (Maximum entity-level features) Reduced by half accuracy still 3% higher than Sequence Too large running time does increases by only 30%
16
Concluding remarks Segmentation models: natural, flexible, accurate Main limitation: inference expensive Solved via a compact design of shared potentials New efficient inference algorithms
Pays penalty of entity-level features only when needed Running time comparable to sequence models No hard limit on segment length
Future work: Features that are functions of distance from boundary Other models: 2-D segmentation?
Code: http://crf.sourceforge.net