20
Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and Computer Engineering Shiraz University General-purpose Programming of Massively Parallel Graphics Processors 1

Machine Translation

Embed Size (px)

DESCRIPTION

Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and Computer Engineering Shiraz University General-purpose Programming of Massively Parallel Graphics Processors. Machine Translation. - PowerPoint PPT Presentation

Citation preview

Page 1: Machine Translation

1

Parallel Implementation Of

Word Alignment Model:IBM MODEL 1

Professor: Dr.Azimi

Fateme Ahmadi-Fakhr

Afshin Arefi Saba Jamalian

Dept. of Electrical and Computer EngineeringShiraz University

General-purpose Programming of Massively ParallelGraphics Processors

Page 2: Machine Translation

2

Machine Translation Suppose we are asked to translate a foreign sentence f into

an English sentence e:

f : f1 … fm

e : e1 … el

What should we do ? For each word in foreign sentence f , we find its most proper

word in English. Based on our knowledge in English language , we change the order of

generated English words. We might also need to change the words themselves.

f1 f2 f3 … fm

e1 e2 e3 … em

e1 e3 e2 em+1…el

Page 3: Machine Translation

3

Example

امروز صبح به مدرسه رفتم

went school to morning today

Finding its most proper word in

English

Reordering and Changing the

words

today morning went

to school

this morning went to schoolI

Transla

tion

Model

Language

Model

Translation

Page 4: Machine Translation

4

Statistical Translation Models

امروز صبح به مدرسه رفتم

went school to morning today

Finding its most proper word in

English

Transla

tion

Model

t( go| مرفت ) > t(x|رفتم) x as all other English words

The machine must know t(e|f) for all possible e and f to find the max.Machine should be trained:

IBM Model 1-5Calculate t(f|e).

Page 5: Machine Translation

5

IBM Models 1 (Brown et.al [1993])

Model 1Corpus

(Large Body Of Text)

t(f|e) for all e and f

which are in the Corpus

Page 6: Machine Translation

6

IBM Models 1 (Brown et.al [1993])

Choose initialize value for t(f|e) for all f and e, then repeat the following steps until Convergence:

Page 7: Machine Translation

7

IBM Models 1 (Brown et.al [1993])

-- -- -- -- -- --t(f|e):

-- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- --

fj

ei

The problem is to find t(f|e) for all e and f

How probable it is that fj be the translation of ei

Page 8: Machine Translation

8

IBM Models 1 (Brown et.al [1993])

-- -- -- -- -- --t(f|e):

-- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- --

-- -- -- -- -- --c(f|e):

-- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- --

fj

ei

- - - - -

Total(e):

ei

∑ of each Row C(f|e)

Initialize

Initialize to Zero

Page 9: Machine Translation

9

IBM Models 1 (Brown et.al [1993])

In each sentence pair , for each f in foreign sentence, we calculate ∑ t(f|e) for all e in the English sentence , called totals . Suppose we are given :

<f(s),e(s)>: < (f1 f2 f3) , ( e1 e2 e3 e4) >

Totals [2]= t(f|e)[1,2]+t(f|e)[2,2]+t(f|e)[3,2]+t(f|e)[4,2]

C(f|e)[1,2]+=t(f|e)[1,2]/totals[2]

Total_e[1]+= t(f|e)[1,2]/totals[2]

Page 10: Machine Translation

10

IBM Models 1 (Brown et.al [1993])

After processing all sentence pairs in the corpus, update the value of t(f|e) for all e and f: t(f|e)[i,j] = C(f|e)[i,j]/total(e)[i]

Start processing the sentence pairs, Calculating C(f|e) and total(e) using t(f|e)

Continue the process until value t(f|e) has converged to a desired value.

Page 11: Machine Translation

11

IBM Model 1 (Psudou Code)

initialize t(f|e)

do until convergec(f|e)=0 for all e and f,total(e)=0 for all e,for all sentence pair dototal(s,f)=0 for all f,for all f in f(s)

dofor e in all e(s) dototal(s,f)+=t(f|e)

for all e in e(s) do{

for all f in f(s) do

c(f|e)+=t(f|e)/total(s,f)

total(e)+=t(f|e)/total(s,f)

for all e do for all f do

t(f|e)=c(f|e)/total(e)

Initialization

Calculating Totals for each f In f(s)

Calculating C(f|e) and total(e)

Initialize to zero

Updating t(f|e) using C(f|e) and total(e)

Page 12: Machine Translation

12

Parallelizing IBM Model 1initialize t(f|e)do until converge

c(f|e)=0 for all e and ftotal(f)=0 for all ffor all sentence pair dototal(s,f)=0 for all f,for all e in e(s)

dofor f in all f(s) do{total(s,f)+=t(f|e)

for all e in e(s) do{

for all f in f(s) do

c(f|e)+=t(f|e)/total(s,f)

total(f)+=t(f|e)/total(s,f)

for all e do for all f do

t(f|e)=c(f|e)/total(f)

For each f,e it is independent of others

Updating the value of each t(f|e) for all t and f is independent of

each other

The process on each sentence pair is

independent of others

For each f,e it is independent of others

Page 13: Machine Translation

13

Initialize t(f|e)

__global__ void initialize(float* device_t_f_e){int

pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(1.0/NUM_F);

}

Underflow is possible

__global__ void initialize(float* device_t_f_e){int

pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(100000/NUM_F);

}

Each thread initialize one entry of t(f|e) to a specified value:

Page 14: Machine Translation

14

Process Of Each Sentence Pairfor all sentence pair do

total(s,f)=0 for all f,

for all e in e(s) do

for f in all f(s) do{

total(s,f)+=t(f|e)

for all e in e(s) do{

for all f in f(s) do

c(f|e)+=t(f|e)/

total(s,f)

total(f)+=t(f|e)/

total(s,f)

Using shared memory

No use of Reduction.

Why?

Use atomicAdd(), as it’s possible that two or more threads add a value to c(f|e) or total(f) simultaneously.

It is data dependent.

Each Thread Process one

Sentence Pair

Page 15: Machine Translation

15

Updating t(f|e)

__global__ void update

(float* device_t_f_e, float* device_count_f_e,

float* device_total_f, int block_size, int Col)

{

int pos=blockIdx.x*block_size+threadIdx.x;

float total=device_total_f[pos/Col];

float count=device_count_f_e[pos];

device_t_f_e[pos]=(100000*count/total);

device_count_f_e[pos]=0;

}

Each thread update one entry of t(f|e) to a specified value And

Set one entry of c(f|e) to zero for next iteration

Here, it is not possible to set total(f) to Zero,

As there is no synchronization

between threads out of a block

Page 16: Machine Translation

16

Setting total(f) to Zero

__global__ void total(float* device_total_f){

int pos=threadIdx.x+blockDim.x*blockIdx.x;

device_total_f[pos]=0;

}

Each thread set one entry of total(f) to Zero:

Page 17: Machine Translation

17

Results

NUM_F NUM_E #SENTPAIR

CPU-Time GPU-Time Speed-Up

2048 2048 512 0.452049 0.061639 7.33

4096 4096 1024 1.736251 0.157878 10.99

4096 4096 2048 1.857686 0.157961 11.76

Page 18: Machine Translation

18

Future Goals Convergence Condition:

We repeat the iterations of calculating C(f|e) and t(f|e) for 5 times.

But it should be driven from the value of t(f|e). We wish to add it to our code as it has a capability

of parallelization.

It’s just one of IBM Model 1-5, which are implemented as GIZA++ package. We wish to parallelize 4 other models.

Page 19: Machine Translation

19

We Want to Express Our Appreciation to:

For her useful comments and valuable notifications.

For his kindness and full support.

Dr.Fazly

Dr.Azimi

Page 20: Machine Translation

20