Efficient Inference on Sequence Segmentation Models

Efficient Inference on Sequence Segmentation Models

Sunita Sarawagi

IIT Bombay

[email protected]

2

Sequence segmentation models Flexible & accurate models for many applications

Speech segmentation on phonemes

Syntactic chunking Protein/Gene finding Information extraction with entity-level features

Whole entity match with database of entities Length of entity between 3 and 8 words Third or fourth token of entity a “-” Last three tokens are digits

From Keshet et al ’05 NIPS wkshp

3

1 2 3 4 5 6 7 8

R. Fagin and J. Halpern Belief Awareness Reasoning

Author Author Other Author Author Title Title Title

t

x

y

Features describe the single word “Fagin”

Sequence Vs. Segmentation Models

l1=1, u1=2 l1=u1=3 l1=4, u1=5 l1=6, u1=8

R. Fagin and J. Halpern Belief Awareness Reasoning

Author Other Author Title

x

y

Features describe full entity

Similarity to author’s column in database

l,u

y1 y2 y3 y4 y5 y6 y7 y8

4

Segmentation models Input: sequence x=x1,x2..xn, label set Y Output: segmentation S=s1,s2…sp

sj = (start position, end position, label) = (tj,uj,yj)

Score: F(x,s) = Transition potentials

Segment starting at i has label y and previous label is y’ Segment potentials

Segment starting at i’, ending at i, and with label y. All positions from i’ to i get same label.

Inference Most likely segmentation (Max-margin trainers) Marginal around segments (likelihood-based &

exponentiated-gradient trainers)

5

Inference: Marginal for a segmentForward messages (L = max segment length)

Matrix notation: for L = n

Segment Marginal

O(n L2)

y1 y2 y3 y4 y5 y6 y7 y8

6

Goal Speed up segmentation models

Currently 3—8 times slower than sequence models Eliminate L, the hard limit on segment length Efficiently handle mix of potentials spanning

varying number of tokens Pay the penalty of segmentation models only for longer

entity level features instead of all of them

Empirical results on extraction tasks: Segmentation models with few entity features: higher

accuracy at the same cost as sequence models

7

Succinct potentials Key insight

Compactly represent features on overlapping segments Main challenge

Inference algorithms on compact potentials where cost is independent of segments a potential applies to

Four kinds of potentials

8

Applications with mixed potentials Named Entity Recognition

Speech segmentation on phonemes

9

Efficient Inference: forward pass Sharing computation

Split potentials (-s) into two parts:

Common to all segments starting before i-m

Common to all segments ending after i-1

Maximum gap between boundaryof any

different

10

Two sets of modified forward messages

y1 y2 y3 y4 y5 y6 y7 y8 y9

Similar two sets of backward messages

Same strategy for max-product inference

O(n m2)

Optimized forward pass

11

Marginals around potentials Direct computation of marginals is O(n2) Reduced to O(1) by two tricks

Decomposing potentials as in , Sharing computations across adjacent potentials

Direct

Optimized

a bit more tricky

12

Complexity and data structures Complexity of computing marginals

Optimized: O(nm+H), Original: O(nL+G) H = number of features in succinct form G = O(L2H) (In real-data |G| = 5--10 times |H|)

Achieved via incremental computation of Special data structure for storing to compute i’:i

in O(1) time from previous i’:i-1 Marginals computed in sorted order: increasing

start boundary, decreasing end boundary

13

Empirical evaluation Task:

Citations: Cora, articles (L=20) Address: Indian address (L=7)

Features: Token-level

Orthographic properties/lexicon match of words at the start, end, middle, left, right of segment

Entity-level TFIDF Match with lexicon, entity length

Methods Sequence-BCEU: Begin-Continue-End-Unique labels Segment: Original un-optimized algorithm Segment-Opt: Optimized inference with compact potentials

14

Running time and Accuracy

Address

50

550

1050

1550

2050

2550

3050

0 10 20 30 40 50 60

Training %

Tim

e (s

ec)

Sequence-BCEUSegmentOptSegment

Cora

50

2050

4050

6050

8050

10050

12050

0 10 20 30 40 50 60 70

Training %

Tim

e (seconds)

Sequence-BCEU

SegmentOpt

Segment

78

80

82

84

86

88

90

92

F1 Accuracy

Address Cora Articles

Sequence-BCEU

Segment

15

Limit on segment length (L)

L (Hard limit on segment length) Too small reduced accuracy 9081 Too large increased running time 30 minutes 1 hour

m (Maximum entity-level features) Reduced by half accuracy still 3% higher than Sequence Too large running time does increases by only 30%

16

Concluding remarks Segmentation models: natural, flexible, accurate Main limitation: inference expensive Solved via a compact design of shared potentials New efficient inference algorithms

Pays penalty of entity-level features only when needed Running time comparable to sequence models No hard limit on segment length

Future work: Features that are functions of distance from boundary Other models: 2-D segmentation?

Code: http://crf.sourceforge.net

Documents

Efficient Inference on Sequence Segmentation Models