Machine Translation

1

Parallel Implementation Of

Word Alignment Model:IBM MODEL 1

Professor: Dr.Azimi

Fateme Ahmadi-Fakhr

Afshin Arefi Saba Jamalian

Dept. of Electrical and Computer EngineeringShiraz University

General-purpose Programming of Massively ParallelGraphics Processors

2

Machine Translation Suppose we are asked to translate a foreign sentence f into

an English sentence e:

f : f1 … fm

e : e1 … el

What should we do ? For each word in foreign sentence f , we find its most proper

word in English. Based on our knowledge in English language , we change the order of

generated English words. We might also need to change the words themselves.

f1 f2 f3 … fm

e1 e2 e3 … em

e1 e3 e2 em+1…el

3

Example

امروز صبح به مدرسه رفتم

went school to morning today

Finding its most proper word in

English

Reordering and Changing the

words

today morning went

to school

this morning went to schoolI

Transla

tion

Model

Language

Model

Translation

4

Statistical Translation Models

امروز صبح به مدرسه رفتم

went school to morning today

Finding its most proper word in

English

Transla

tion

Model

t( go| مرفت ) > t(x|رفتم) x as all other English words

The machine must know t(e|f) for all possible e and f to find the max.Machine should be trained:

IBM Model 1-5Calculate t(f|e).

5

IBM Models 1 (Brown et.al [1993])

Model 1Corpus

(Large Body Of Text)

t(f|e) for all e and f

which are in the Corpus

6


Choose initialize value for t(f|e) for all f and e, then repeat the following steps until Convergence:

7


-- -- -- -- -- --t(f|e):

-- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- --

fj

ei

The problem is to find t(f|e) for all e and f

How probable it is that fj be the translation of ei

8


-- -- -- -- -- --t(f|e):

-- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- --

-- -- -- -- -- --c(f|e):

-- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- ---- -- -- -- -- --

fj

ei

- - - - -

Total(e):

ei

∑ of each Row C(f|e)

Initialize

Initialize to Zero

9


In each sentence pair , for each f in foreign sentence, we calculate ∑ t(f|e) for all e in the English sentence , called totals . Suppose we are given :

<f(s),e(s)>: < (f1 f2 f3) , ( e1 e2 e3 e4) >

Totals [2]= t(f|e)[1,2]+t(f|e)[2,2]+t(f|e)[3,2]+t(f|e)[4,2]

C(f|e)[1,2]+=t(f|e)[1,2]/totals[2]

Total_e[1]+= t(f|e)[1,2]/totals[2]

10


After processing all sentence pairs in the corpus, update the value of t(f|e) for all e and f: t(f|e)[i,j] = C(f|e)[i,j]/total(e)[i]

Start processing the sentence pairs, Calculating C(f|e) and total(e) using t(f|e)

Continue the process until value t(f|e) has converged to a desired value.

11

IBM Model 1 (Psudou Code)

initialize t(f|e)

do until convergec(f|e)=0 for all e and f,total(e)=0 for all e,for all sentence pair dototal(s,f)=0 for all f,for all f in f(s)

dofor e in all e(s) dototal(s,f)+=t(f|e)

for all e in e(s) do{

for all f in f(s) do

c(f|e)+=t(f|e)/total(s,f)

total(e)+=t(f|e)/total(s,f)

for all e do for all f do

t(f|e)=c(f|e)/total(e)

Initialization

Calculating Totals for each f In f(s)

Calculating C(f|e) and total(e)

Initialize to zero

Updating t(f|e) using C(f|e) and total(e)

12

Parallelizing IBM Model 1initialize t(f|e)do until converge

c(f|e)=0 for all e and ftotal(f)=0 for all ffor all sentence pair dototal(s,f)=0 for all f,for all e in e(s)

dofor f in all f(s) do{total(s,f)+=t(f|e)



c(f|e)+=t(f|e)/total(s,f)

total(f)+=t(f|e)/total(s,f)

for all e do for all f do

t(f|e)=c(f|e)/total(f)

For each f,e it is independent of others

Updating the value of each t(f|e) for all t and f is independent of

each other

The process on each sentence pair is

independent of others

For each f,e it is independent of others

13

Initialize t(f|e)

__global__ void initialize(float* device_t_f_e){int

pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(1.0/NUM_F);

}

Underflow is possible

__global__ void initialize(float* device_t_f_e){int

pos=blockIdx.x*blockDim.x+threadIdx.x;device_t_f_e[pos]=(100000/NUM_F);

}

Each thread initialize one entry of t(f|e) to a specified value:

14

Process Of Each Sentence Pairfor all sentence pair do

total(s,f)=0 for all f,

for all e in e(s) do

for f in all f(s) do{

total(s,f)+=t(f|e)



c(f|e)+=t(f|e)/

total(s,f)

total(f)+=t(f|e)/

total(s,f)

Using shared memory

No use of Reduction.

Why?

Use atomicAdd(), as it’s possible that two or more threads add a value to c(f|e) or total(f) simultaneously.

It is data dependent.

Each Thread Process one

Sentence Pair

15

Updating t(f|e)

__global__ void update

(float* device_t_f_e, float* device_count_f_e,

float* device_total_f, int block_size, int Col)

{

int pos=blockIdx.x*block_size+threadIdx.x;

float total=device_total_f[pos/Col];

float count=device_count_f_e[pos];

device_t_f_e[pos]=(100000*count/total);

device_count_f_e[pos]=0;

}

Each thread update one entry of t(f|e) to a specified value And

Set one entry of c(f|e) to zero for next iteration

Here, it is not possible to set total(f) to Zero,

As there is no synchronization

between threads out of a block

16

Setting total(f) to Zero

__global__ void total(float* device_total_f){

int pos=threadIdx.x+blockDim.x*blockIdx.x;

device_total_f[pos]=0;

}

Each thread set one entry of total(f) to Zero:

17

Results

NUM_F NUM_E #SENTPAIR

CPU-Time GPU-Time Speed-Up

2048 2048 512 0.452049 0.061639 7.33

4096 4096 1024 1.736251 0.157878 10.99

4096 4096 2048 1.857686 0.157961 11.76

18

Future Goals Convergence Condition:

We repeat the iterations of calculating C(f|e) and t(f|e) for 5 times.

But it should be driven from the value of t(f|e). We wish to add it to our code as it has a capability

of parallelization.

It’s just one of IBM Model 1-5, which are implemented as GIZA++ package. We wish to parallelize 4 other models.

19

We Want to Express Our Appreciation to:

For her useful comments and valuable notifications.

For his kindness and full support.

Dr.Fazly

Dr.Azimi

20

Documents

Machine Translation