41
1 Mixture Models for Diverse Machine Translation: Tricks of the Trade Tianxiao Shen* Myle Ott* Michael Auli Marc’Aurelio Ranzato [email protected] (*: Equal contribution) ICML 2019

Mixture Models for Diverse Machine Translation: Tricks of ...people.csail.mit.edu/tianxiao/papers/icml19_mixture_models_diverse… · !1 Mixture Models for Diverse Machine Translation:

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

  • !1

    Mixture Models for Diverse Machine Translation: Tricks of the Trade

    Tianxiao Shen* Myle Ott* Michael Auli Marc’Aurelio Ranzato [email protected] (*: Equal contribution)

    ICML 2019

  • !2

    Translation Is One-To-Many

    danke

    thank you

    thanks

    thank you very much

    German

    English

    is multi-modal, a sentence can have different translationsp(y|x)AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7W

    sie brauchen zeit

    you need time

    they need time

    it takes time

  • !3

    Translation Is One-To-Many

    danke

    thank you

    thanks

    thank you very much

    German

    English

    is multi-modal, a sentence can have different translationsp(y|x)AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7W

    sie brauchen zeit

    you need time

    they need time

    it takes time

    Goal: efficiently decode a diverse set of hypotheses

  • !4

    Neural Machine Translation

    Input: source sentence Output: target translation

    x = x1, · · · , xLAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLuAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLuAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLuAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLu

    y = y1, · · · , yTAAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==

    p(y|x; ✓) =TY

    t=1

    p(yt|y1:t�1, x; ✓)AAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHKAAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHKAAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHKAAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHK

    (opennmt.net)

  • !5

    Search for Multiple Modes Is Difficult…

    参与投票的成员中,58% 反对该合同交易易。Source

    Of

    Fifty-eight per cent of

    It

    those

    the voting members

    opposed the contract dealvoting

    transaction

    the

    was

    opposed the contract deal

    argmaxy1,··· ,yT

    TY

    t=1

    p(yt|y1:t�1, x; ✓)AAACfXicfZBdaxQxFIaz41cdv7Z66U1wWagyLjO1UKkIC3rhjVhhty3srEMmc3Y3NJOE5EzpMM7f8dd4q+Cv0ex2BG3FA4GH97wnyXlzI4XDOP7RC65dv3Hz1tbt8M7de/cf9LcfHjldWQ5TrqW2JzlzIIWCKQqUcGIssDKXcJyfvln3j8/AOqHVBGsD85ItlVgIztBLWX+cMrss2XnW1FkSpbzQ6KI6m7Q0NVYXWYOvk/bThJqdOsPPddYkB/g8aaPzVymuANnTrD+IR/Gm6FVIOhiQrg6z7d4wLTSvSlDIJXNulsQG5w2zKLiENkwrB4bxU7aEmUfFSnDzZrNqS4deKehCW38U0o3650TDSufqMvfOkuHKXe6txX/1ZhUuXs4boUyFoPjFQ4tKUtR0nRsthAWOsvbAuBX+r5SvmGUcfbph+hb8Lhbe+3s/GLAMtX3WdNm2frdlGq3pf0ahfhs9haEPNrkc41U42h0lL0a7H/cG470u4i3ymDwhOyQh+2RM3pFDMiWcfCFfyTfyvfczGAZRMLqwBr1u5hH5q4L9X9h3w1Q=

    Beam search can effectively find one likely but cannot explore multiple modes

    yAAACh3icfVBraxNBFJ1sfbTro2n96JfBEIgS1p1NYwNSqOgHsYgVTFvIxDA7uUmG7ouZu9LNuv/JXyP4SX+Ks2nqC/XCwJlzzr0z94RZpAz6/peGs3Ht+o2bm1vurdt37m43d3ZPTJprCUOZRqk+C4WBSCUwRIURnGUaRBxGcBqeP6/10w+gjUqTd1hkMI7FPFEzJQVaatJ81c46xceLpxwXgOLhATd5PCmXB6x6f0SzzvKnVPuW3R9Xt81L1uVymqLpHvHKLSbNlu/t94PB3oD6nt9jPmM1CPp91qPM81fVIus6nuw02nyayjyGBGUkjBkxP8NxKTQqGUHl8txAJuS5mMPIwkTEYMblaumKti0zpbNU25MgXbG/dpQiNqaIQ+uMBS7Mn1pN/k0b5TgbjEuVZDlCIi8fmuURxZTWCdKp0iAxKiwQUiv7VyoXQguJNmeXvwC7i4bXdu6bDLTAVD8qudDzWFxUdrc579bof0aVXBktcl0b7FV69N/gJPBYzwve7rUOg3XEm+Q+eUA6hJF9ckhekmMyJJJ8Ip/JV/LN2XIeO0+cwaXVaax77pHfynn2HUeVxZU=

    It was rejected by 58 % of its members who voted in the ballot . 
Of the members who voted , 58 % opposed the contract transaction . 
Of the members who participated in the vote , 58 % opposed the contract .

    References

  • !6

    Explicitly Model Uncertainty

    Introduce a latent variable to capture different translation modes

    Better explore the search space, decode different from different

    zAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==

    zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw==

    yAAACPHicfZDLSgMxFIYzXut4a3XpZrAIIlJmakGXBV24EVuwF+iUciY9bYOZzJBkxDL0Cdzq6/ge7t2JW9emF0GteCDw8Z8/yTl/EHOmtOu+WAuLS8srq5k1e31jc2s7m9upqyiRFGs04pFsBqCQM4E1zTTHZiwRwoBjI7g9H/cbdygVi8SNHsbYDqEvWI9R0EaqDjvZvFtwJ+XMgzeDPJlVpZOzDvxuRJMQhaYclGp5bqzbKUjNKMeR7ScKY6C30MeWQQEhqnY6mXTkHBil6/QiaY7QzkT9fiOFUKlhGBhnCHqgfvfG4l+9VqJ7Z+2UiTjRKOj0o17CHR0547WdLpNINR8aACqZmdWhA5BAtQnH9i/Q7CLxyrx7HaMEHcmj1AfZD+F+ZHbr+8dj+s/IxJfRkG2bYL3fMc5DvVjwTgrFailfLs0izpA9sk8OiUdOSZlckgqpEUqQPJBH8mQ9W6/Wm/U+tS5Yszu75EdZH59aOq3G

    z1AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeCF48V7Qe0oWy2k3bpZhN2N0IN/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3H1FpHssHM0nQj+hQ8pAzaqx0/9T3+uWKW3XnIKvEy0kFcjT65a/eIGZphNIwQbXuem5i/Iwqw5nAaamXakwoG9Mhdi2VNELtZ/NTp+TMKgMSxsqWNGSu/p7IaKT1JApsZ0TNSC97M/E/r5ua8NrPuExSg5ItFoWpICYms7/JgCtkRkwsoUxxeythI6ooMzadkg3BW355lbRqVe+iWru7rNRreRxFOIFTOAcPrqAOt9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AELOo2W

    z2AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mioMeCF48V7Qe0oWy2k3bpZhN2N0It/QlePCji1V/kzX/jts1BWx8MPN6bYWZemAqujet+O4W19Y3NreJ2aWd3b/+gfHjU1EmmGDZYIhLVDqlGwSU2DDcC26lCGocCW+HoZua3HlFpnsgHM04xiOlA8ogzaqx0/9Tze+WKW3XnIKvEy0kFctR75a9uP2FZjNIwQbXueG5qgglVhjOB01I305hSNqID7FgqaYw6mMxPnZIzq/RJlChb0pC5+ntiQmOtx3FoO2NqhnrZm4n/eZ3MRNfBhMs0MyjZYlGUCWISMvub9LlCZsTYEsoUt7cSNqSKMmPTKdkQvOWXV0nTr3oXVf/uslLz8ziKcAKncA4eXEENbqEODWAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QMMvo2X

    z3AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0laQY8FLx4r2lZoQ9lsN+3SzSbsToQa+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T316/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvXq1drtRaVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8OQo2Y

    It was rejected by 58 % of its members who voted in the ballot .

    Of the members who voted , 58 % opposed the contract transaction .

    Of the members who participated in the vote , 58 % opposed the contract .

    p(y|z, x)AAACoXicfVFdb9MwFHXD1whfHTzyYlEVbahUSUHaBEKaBA/wgCio3SY1xXKc29ZaEkf2DWrm5QfyE/gVvMIbThck2BBXsnR07rnHvsdxkUqDQfCt4125eu36ja2b/q3bd+7e627fPzSq1AKmQqVKH8fcQCpzmKLEFI4LDTyLUziKT143/aMvoI1U+QSrAuYZX+ZyIQVHR7Gu6Bc71dl61+9HXC8zvma2YuEgEolCM6jYpKaRkQkYQFtbZvFVWH+e1FGhVULdKMOzitnwBT4N68H6ZYQrQO7cTtnIb5xPB+td1u0Fw2BT9DIIW9AjbY3ZdudxlChRZpCjSLkxszAocG65RilSqP2oNFBwccKXMHMw5xmYud2kUdO+YxK6UNqdHOmG/XPC8syYKoudMuO4Mhd7Dfmv3qzExf7cyrwoEXJxftGiTCkq2kRLE6lBYFo5wIWW7q1UrLjmAt0H+H70BtwyGt474w8FaI5KP7Ft7LVbbhkNGvQ/ocx/Cx3yfZdseDHHy+BwNAyfDUcfn/cORm3GW+QheUR2SEj2yAF5S8ZkSgT5Sr6TH+Sn1/PeeWPv07nU67QzD8hf5c1+AUsgz5g=

  • !7

    Previous Attempt: Conditional VAE

    log p(y|x; ✓) � Eq(z|x,y;�)[log p(y|z, x; ✓)]�DKL(q(z|x, y;�)kp(z|x; ✓))AAACXXicbVFda9swFJW9dmuzrs22hz30RSwMEuiCnQ026EthHQzWhxaWthAZIys3iahsa9L1iOv4T/Zte9lfmZympR+7IDg65x6u7lGilbQYBL89/8na+tNnG5ut51svtnfaL1+d2rwwAoYiV7k5T7gFJTMYokQF59oATxMFZ8nFl0Y/+wXGyjz7gaWGKOXTTE6k4OiouI1M5VOqu+Vivs9wBsh7lE2BspTjLEno17j62b1czPfKfaZnslePbg2Xe7eWiL6nh3HFEOZYfT+q6+49E1vo5nrT3YvbnaAfLIs+BuEKdMiqjuP2FRvnokghQ6G4taMw0BhV3KAUCuoWKyxoLi74FEYOZjwFG1XLdGr6zjFjOsmNOxnSJXvXUfHU2jJNXGeztH2oNeT/tFGBk89RJTNdIGTietCkUBRz2kRNx9KAQFU6wIWR7q1UzLjhAt2HtFwI4cOVH4PTQT/80B+cfOwcDFZxbJBd8pZ0SUg+kQPyjRyTIRHkj0e8Ta/l/fXX/S1/+7rV91ae1+Re+W/+Aeq4s8M=

    (Kingma & Welling, 2014; Zhang et al., 2016)

    “Posterior collapse” in language modeling, the latent variable is ignored (Bowman et al., 2016)

    Gaussian , zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw==

    p(y|x; ✓) =Z

    zp(z|x; ✓)p(y|z, x; ✓)

    AAACGnicbVDLSgMxFM34rPVVdekmWIQKUmaqoCBCwY3LCvYBbRkyadqGZjJDckec1n6HG3/FjQtF3Ikb/8a0HXy0HgicnHMuyT1eKLgG2/605uYXFpeWUyvp1bX1jc3M1nZFB5GirEwDEaiaRzQTXLIycBCsFipGfE+wqte7GPnVG6Y0D+Q1xCFr+qQjeZtTAkZyM06Yi+9uzxrQZUAOzhtcgtvHYa7/I44S/cPvq5vJ2nl7DDxLnIRkUYKSm3lvtAIa+UwCFUTrumOH0BwQBZwKNkw3Is1CQnukw+qGSuIz3RyMVxvifaO0cDtQ5kjAY/X3xID4Wse+Z5I+ga6e9kbif149gvZpc8BlGAGTdPJQOxIYAjzqCbe4YhREbAihipu/YtolilAwbaZNCc70yrOkUsg7R/nC1XG2WEjqSKFdtIdyyEEnqIguUQmVEUX36BE9oxfrwXqyXq23SXTOSmZ20B9YH1+7c6Ce

  • !8

    Our Approach: Mixture Model

    p(y|x; ✓) =KX

    z=1

    p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl

    Simplest, enumerable, exact marginal

    Multinomial , taking values in zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw== {1, · · · ,K}AAAChHicfVBdaxNBFJ2sWuv6lbaPviyGQJQl7KZKC6WloKBQxAqmLWRiuDu5SYbO7g4zd6XJuj/JX+OToP/F2TR+teKFgTPnnHtn7km0kpai6GvDu3Hz1trt9Tv+3Xv3Hzxsbmye2LwwAvsiV7k5S8Cikhn2SZLCM20Q0kThaXL+otZPP6KxMs/e01zjMIVpJidSADlq1HzV1p35p4s9TjMkeLLPbZGOysV+XH04CnRn8VuqfYvw19XnZRxyMc7Jhke8GjVbUTdaVnAdxCvQYqs6Hm002nyciyLFjIQCawdxpGlYgiEpFFY+LyxqEOcwxYGDGaRoh+Vy4ypoO2YcTHLjTkbBkv2zo4TU2nmaOGcKNLNXtZr8lzYoaLI7LGWmC8JMXD40KVRAeVDHF4ylQUFq7gAII91fAzEDA4JcyD5/iW4Xg2/c3LcaDVBunpYczDSFi8rtNuVhjf5nlNlPo0O+74KNr8Z4HZz0uvF2t/fuWeuwt4p4nT1ij1mHxWyHHbLX7Jj1mWCf2Rf2jX331rzQ2/aeX1q9xqpni/1V3sEP8onElw==

  • !9

    Even simpler:

    p(y|x; ✓) =KX

    z=1

    p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl

    =1

    K

    KX

    z=1

    p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXYQa8YLvyzOyqXUc2M9GAJ9albsNpws28JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RCwS8JQ==

    p(z|x; ✓) = 1/KAAAChHicbVFda9swFJXdreuyr7R93ItYCMugZFba0kLJKOxlkJcWmrYQZ0ZWrhNRyTbSdWni+pf0X+1t/2ZKGrY22QXB0TnncqVz41xJi0Hw2/M3XrzcfLX1uvbm7bv3H+rbO5c2K4yAvshUZq5jbkHJFPooUcF1boDrWMFVfPN9rl/dgrEySy9wmsNQ83EqEyk4OiqqPzTz1vT+7iTECSD/0g1toaNy1mXVzx7NW7N/0tw32/t7rTW7YWK4KFlV9qqVthVnOAb61Kz5XTRbsz2b1mVfe1G9EbSDRdF1wJagQZZ1FtV/haNMFBpSFIpbO2BBjsOSG5RCQVULCws5Fzd8DAMHU67BDstFiBVtOmZEk8y4kyJdsE87Sq6tnerYOTXHiV3V5uT/tEGByfGwlGleIKTicVBSKIoZnW+EjqQBgWrqABdGurdSMeEuLXR7q7kQ2OqX18Flp832253zg8ZpZxnHFvlIPpEWYeSInJIf5Iz0ifA877MXeMzf9Pf8ff/w0ep7y55d8qz8b38AWi3A2A==

    —set , each component is equally likely a priori

    Our Approach: Mixture Model

    Multinomial , taking values in zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw== {1, · · · ,K}AAAChHicfVBdaxNBFJ2sWuv6lbaPviyGQJQl7KZKC6WloKBQxAqmLWRiuDu5SYbO7g4zd6XJuj/JX+OToP/F2TR+teKFgTPnnHtn7km0kpai6GvDu3Hz1trt9Tv+3Xv3Hzxsbmye2LwwAvsiV7k5S8Cikhn2SZLCM20Q0kThaXL+otZPP6KxMs/e01zjMIVpJidSADlq1HzV1p35p4s9TjMkeLLPbZGOysV+XH04CnRn8VuqfYvw19XnZRxyMc7Jhke8GjVbUTdaVnAdxCvQYqs6Hm002nyciyLFjIQCawdxpGlYgiEpFFY+LyxqEOcwxYGDGaRoh+Vy4ypoO2YcTHLjTkbBkv2zo4TU2nmaOGcKNLNXtZr8lzYoaLI7LGWmC8JMXD40KVRAeVDHF4ylQUFq7gAII91fAzEDA4JcyD5/iW4Xg2/c3LcaDVBunpYczDSFi8rtNuVhjf5nlNlPo0O+74KNr8Z4HZz0uvF2t/fuWeuwt4p4nT1ij1mHxWyHHbLX7Jj1mWCf2Rf2jX331rzQ2/aeX1q9xqpni/1V3sEP8onElw==

  • —assume is large for one , but nearly zero for others

    a particular translation is only explained by a particular component

    !10

    Even simpler:

    p(y|x; ✓) =KX

    z=1

    p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl

    =1

    K

    KX

    z=1

    p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXYQa8YLvyzOyqXUc2M9GAJ9albsNpws28JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RCwS8JQ==

    � 1K

    maxz

    p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXo7iDXjhV8WZ+VS7LkzGAJ96lXsNpwsu8JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RAgy8JQ==

    p(y|z, x; ✓)AAAClXicbVFdi9NAFJ3Erxq/6vrggy+DpVBBalIFFySwfiBCH3YFu7vQ1DCZ3rRDZ5IwcyNtY/6Rv8Y3/43TbnF3Uy8MnDnnXO7MuUkhhUHf/+O4N27eun2ndde7d//Bw0ftxwenJi81hxHPZa7PE2ZAigxGKFDCeaGBqUTCWbL4uNHPfoA2Is++4aqAiWKzTKSCM7RU3P7VLXqrn8t3Ec4B2YswMqWKq3UY1N+HtOitL6WNb/3y39XrhlGqGa+CuhrWjbaGM5oBvWpWbBmv923XxoXBq6HXsMTtjt/3t0X3QbADHbKrk7j9O5rmvFSQIZfMmHHgFzipmEbBJdReVBooGF+wGYwtzJgCM6m2qda0a5kpTXNtT4Z0y17tqJgyZqUS61QM56apbcj/aeMS08NJJbKiRMj4xaC0lBRzulkRnQoNHOXKAsa1sG+lfM5sfGgX6dkQguaX98HpoB+87g++vukcDXZxtMgz8pz0SEDekiPyhZyQEeHOgXPovHc+uE/d0P3kfr6wus6u5wm5Vu7xX1S3xzI=

    zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw==

    Our Approach: Mixture Model

    Multinomial , taking values in zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw== {1, · · · ,K}AAAChHicfVBdaxNBFJ2sWuv6lbaPviyGQJQl7KZKC6WloKBQxAqmLWRiuDu5SYbO7g4zd6XJuj/JX+OToP/F2TR+teKFgTPnnHtn7km0kpai6GvDu3Hz1trt9Tv+3Xv3Hzxsbmye2LwwAvsiV7k5S8Cikhn2SZLCM20Q0kThaXL+otZPP6KxMs/e01zjMIVpJidSADlq1HzV1p35p4s9TjMkeLLPbZGOysV+XH04CnRn8VuqfYvw19XnZRxyMc7Jhke8GjVbUTdaVnAdxCvQYqs6Hm002nyciyLFjIQCawdxpGlYgiEpFFY+LyxqEOcwxYGDGaRoh+Vy4ypoO2YcTHLjTkbBkv2zo4TU2nmaOGcKNLNXtZr8lzYoaLI7LGWmC8JMXD40KVRAeVDHF4ylQUFq7gAII91fAzEDA4JcyD5/iW4Xg2/c3LcaDVBunpYczDSFi8rtNuVhjf5nlNlPo0O+74KNr8Z4HZz0uvF2t/fuWeuwt4p4nT1ij1mHxWyHHbLX7Jj1mWCf2Rf2jX331rzQ2/aeX1q9xqpni/1V3sEP8onElw==

  • !11

    Training Objective

    p(y|x; ✓) =KX

    z=1

    p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl

    =1

    K

    KX

    z=1

    p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXYQa8YLvyzOyqXUc2M9GAJ9albsNpws28JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RCwS8JQ==

    � 1K

    maxz

    p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXo7iDXjhV8WZ+VS7LkzGAJ96lXsNpwsu8JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RAgy8JQ==

    L(✓) = E(x,y)⇠datahminz

    � log p(y|z, x; ✓)i

    AAACunicbVFba9swFJa9W5ddmm2PexELBQfaYGeDDrpBYRsM1ocOlrZgGSMrsi0iXyYdjziOf+T2tn8zOfFKbwcEn75zvnONSik0uO5fy753/8HDRzuPB0+ePnu+O3zx8kwXlWJ8xgpZqIuIai5FzmcgQPKLUnGaRZKfR4tPnf/8F1daFPkPqEseZDTJRSwYBUOFw997RBYJLp16vTwikHKgY0wSjklGIY0i/CVsfjqr9XK/PiJlKsatfylY7V9KAnyAP4cNAb6E5ttJ2zrXRGRddt//0ePBJjujEp84PffxSkHHCMdEi2ybb06Bti2RPAafZCIPVwd39UCUSFIIwuHInbgbw7eB14MR6u00HP4h84JVGc+BSaq177klBA1VIJjk7YBUmpeULWjCfQNzmnEdNJvVt3jPMHMcF8q8HPCGvapoaKZ1nUUmsptP3/R15F0+v4L4fdCIvKyA52xbKK4khgJ3d8RzoTgDWRtAmRKmV8xSqigDc+2BWYJ3c+Tb4Gw68d5Opt/fjY6n/Tp20Gv0BjnIQ4foGH1Fp2iGmHVoBVZsJfYHO7KFvdiG2laveYWumQ3/AFra16I=

  • !12

    EM Training

    E-step (hard): estimate the responsibility of each component

    M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=

    Take a mini-batch

    L(✓) = E(x,y)⇠datahminz

    � log p(y|z, x; ✓)i

    AAACunicbVFba9swFJa9W5ddmm2PexELBQfaYGeDDrpBYRsM1ocOlrZgGSMrsi0iXyYdjziOf+T2tn8zOfFKbwcEn75zvnONSik0uO5fy753/8HDRzuPB0+ePnu+O3zx8kwXlWJ8xgpZqIuIai5FzmcgQPKLUnGaRZKfR4tPnf/8F1daFPkPqEseZDTJRSwYBUOFw997RBYJLp16vTwikHKgY0wSjklGIY0i/CVsfjqr9XK/PiJlKsatfylY7V9KAnyAP4cNAb6E5ttJ2zrXRGRddt//0ePBJjujEp84PffxSkHHCMdEi2ybb06Bti2RPAafZCIPVwd39UCUSFIIwuHInbgbw7eB14MR6u00HP4h84JVGc+BSaq177klBA1VIJjk7YBUmpeULWjCfQNzmnEdNJvVt3jPMHMcF8q8HPCGvapoaKZ1nUUmsptP3/R15F0+v4L4fdCIvKyA52xbKK4khgJ3d8RzoTgDWRtAmRKmV8xSqigDc+2BWYJ3c+Tb4Gw68d5Opt/fjY6n/Tp20Gv0BjnIQ4foGH1Fp2iGmHVoBVZsJfYHO7KFvdiG2laveYWumQ3/AFra16I=

    {(x(i), y(i))}mi=1AAAC2XicbVJba9swFJa9W5fdsu1xL2IhYEMa4mzQQSkUtsFgfehgSQuWa2RFdkQl27WORxzHD3vYGHvdP9vb/sV+wpTLSm8HhD5953znIinKpdAwGPyx7Fu379y9t3W/9eDho8dP2k+fjXVWFoyPWCaz4jiimkuR8hEIkPw4LzhVkeRH0enbpf/oCy+0yNLPUOU8UDRJRSwYBUOF7b9dIrME5061mO0SmHKgLiYJx0RRmEYRfh/WZ858MetVuySfCrfxzwXz3rkkwNv4XVgT4DOoPx40jXNJRBb58vg/2m11V+kZlfjA2ZB7Fyo6RukSLdQ64YQCbRoieQw+USIN59s3NUEKkUwhaBGjP6kd02uvWu8uacJa7HnNiQrbnUF/sDJ8HXgb0EEbOwzbv8kkY6XiKTBJtfa9QQ5BTQsQTPKmRUrNc8pOacJ9A1OquA7q1cs0uGuYCY6zwqwU8Iq9qKip0rpSkYlcTq+v+pbkTT6/hPhNUIs0L4GnbF0oLiWGDC+fGU9EwRnIygDKCmF6xWxKC8rAfIaWuQTv6sjXwXjY9171h59ed/aHm+vYQi/QS+QgD+2gffQBHaIRYtbYWljfrO+2b3+1f9g/16G2tdE8R5fM/vUPfLfjPg==

    Just like training mixture of Gaussians but in text space and conditioned on source

    r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=

    r(i)z 1[z = argmaxz0

    p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=

  • !13

    Parameterizationlog p(y|x; ✓)

    AAACsHicfZFdb9MwFIbd8LWFrw4ud2NRTevQVCUFCRBCmgQX3CCGRLdKTYhOnJPUzIkj29naZvl5/Ah+A7dwj9N2EnSII1l69J7XOTmv41JwbTzve8e5cfPW7Ttb2+7de/cfPOzuPDrRslIMR0wKqcYxaBS8wJHhRuC4VAh5LPA0Pnvb9k/PUWkui89mXmKYQ1bwlDMwVoq60Z6KFl/qPj9oAoGpAaXkBQ1yMNNE134zWbwJQGU5zKJ6sd/Qsj9fuS8X+4ezFb4OzBQNHIRuIGTWWi5nV1rU7XkDb1n0Ovhr6JF1HUc7nVdBIlmVY2GYAK0nvleasAZlOBPYuEGlsQR2BhlOLBaQow7rZRIN3bNKQlOp7CkMXap/3qgh13qex9bZrqg3e634z16i2w9uTDfpy7DmRVkZLNhqeFoJaiRto6YJV8iMmFsAprj9f8qmoIAZ+yCuG7xDu6DCD3bYxxIVGKme1su0edHYhbPgsKX/GWF2ZbTk2rD9zWivw8lw4D8bDD897x0N17FvkV3yhPSJT16QI/KeHJMRYeQb+UF+kl/O0Bk7kQMrq9NZ33lM/irn628KD9dQ

    Before:

  • log p(y|z, x; ✓)AAACsnicfZFdb9MwFIbd8DXCx7pxyY1FNa1DVZWUSTAhpElwwQ1iSHSbaEJ06pyk1pw4sh1ok+X38Rv4EdzCLU7bSdAhjmTp0Xte5+S8nhaCa+N53zvOjZu3bt/Zuuveu//g4XZ3Z/dUy1IxHDMppDqfgkbBcxwbbgSeFwohmwo8m168bvtnX1BpLvOPZlFgmEGa84QzMFaKurCnoupz3ecHTSAwMaCU/EqDDMws1rXfTKpXAag0g3lUV/sNLfqLlfuy2h/MV/gyMDM0cBC6gZBpa7msBvMrNer2vKG3LHod/DX0yLpOop3OURBLVmaYGyZA64nvFSasQRnOBDZuUGosgF1AihOLOWSow3qZRUP3rBLTRCp7ckOX6p83asi0XmRT62yX1Ju9VvxnL9btBzemm+RFWPO8KA3mbDU8KQU1krZh05grZEYsLABT3P4/ZTNQwIx9EtcN3qBdUOE7O+x9gQqMVE/rZd48b+zCaTBo6X9GmF8ZLbk2bH8z2utwOhr6z4ajD4e949E69i3ymDwhfeKT5+SYvCUnZEwY+UZ+kJ/kl3PofHLAYSur01nfeUT+Kkf8Btp22Ao=

    !14

    Parameterization

    zAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3EAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3EAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3EAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3E

    After:

  • !15

    Testing

    Generate hypotheses by greedily decoding KAAACtXicfZFdi9NAFIan8WuNH9vVS28GS9mulJJUYRURFvRCEHEFu7vQxHA6OUmHnWTCzETbZPMH/Qf+C2/1yknbBe2KBwYe3vNOTs47s0JwbTzve8e5dv3GzVs7t907d+/d3+3uPTjRslQMJ0wKqc5moFHwHCeGG4FnhULIZgJPZ+ev2/7pF1Say/yTWRYYZpDmPOEMjJWibtxXUfW5HvCDJhCYGFBKfqVBBmYe69pvptWrAFSawSKqq/2GFoPl2n1R7Q8Xa3wZmDkaOAjdfiBk2nouquHiUnbfRd2eN/JWRa+Cv4Ee2dRxtNd5EcSSlRnmhgnQeup7hQlrUIYzgY0blBoLYOeQ4tRiDhnqsF7F0dC+VWKaSGVPbuhK/fNGDZnWy2xmne2eervXiv/sxbr94NZ0kzwPa54XpcGcrYcnpaBG0jZvGnOFzIilBWCK2/+nbA4KmLGv4rrBG7QLKnxvh30oUIGR6km9ipznjV04DYYt/c8Ii0ujJdeG7W9HexVOxiP/6Wj88VnvaLyJfYc8Io/JgPjkkByRt+SYTAgj38gP8pP8cg6d0ImdZG11Ops7D8lf5cjfnXDYog== p(y|z, x; ✓), z = 1, · · · ,KAAACB3icbVDJSgNBEO1xjXEb9ShIYxAiDGEmCgoiBLwIXiKYBZIQejo9SZOehe4aMRnjyYu/4sWDIl79BW/+jZ3loIkPCh7vVVFVz40EV2Db38bc/MLi0nJqJb26tr6xaW5tl1UYS8pKNBShrLpEMcEDVgIOglUjyYjvClZxuxdDv3LLpOJhcAO9iDV80g64xykBLTXNvSjbu+9bd2d16DAgh9YD7p87Vp22QlDWVdPM2Dl7BDxLnAnJoAmKTfOr3gpp7LMAqCBK1Rw7gkZCJHAq2CBdjxWLCO2SNqtpGhCfqUYy+mOAD7TSwl4odQWAR+rviYT4SvV8V3f6BDpq2huK/3m1GLzTRsKDKAYW0PEiLxYYQjwMBbe4ZBRETxNCJde3YtohklDQ0aV1CM70y7OknM85R7n89XGmkJ/EkUK7aB9lkYNOUAFdoiIqIYoe0TN6RW/Gk/FivBsf49Y5YzKzg/7A+PwBEjeYGg==

    Solely depend on the latent variable to produce different hypotheses

    Computationally efficient and parallelizable

    No heuristic diverse decoding methods

  • !16

    Try It Out

    参与投票的成员中,58% 反对该合同交易易。Source

    Hypotheses z = 1AAADXHicfVJdaxQxFM3sVq1Tq62CL4IEl4GtTJeZKijKQkEFYRUruG1hsw6ZbHY2NPNBckd2dpxHf42v+mN88beY2Y617UovBE7OPcm99yRhJoUGz/tltdpr167fWL9pb9zavH1na/vuoU5zxfiQpTJVxyHVXIqED0GA5MeZ4jQOJT8KT17V+aMvXGmRJp+gyPg4plEipoJRMFSwbT10sm7xdf6SwIwD3ekTncdBuej71ecBzrqLf6lat3DPtrZjRC5hkxS0O7CdwnZITGHGqMTvumfX1VQY4jdB2Z27xQ7RIibA51BOKNCqIpJPYbRLZBotKy+urEmUiGYwNpVmFHARQJ9QFcV0HhS4ljZ06b+AXb9yL3S70v6g5s7VWpWYAYOtjtfzloFXgd+ADmriwDjqkEnK8pgnwCTVeuR7GYxLqkAwySub5JpnlJ3QiI8MTGjM9bhcvmSFHcNM8DRVZiWAl+z5EyWNtS7i0ChrY/XlXE3+LzfKYfp8XIoky4En7LTQNJcYUlx/CzwRijOQhQGUKWF6xWxGFWVgPo9NXnMzi+Lvzb0fMq4opOpx2Xhfmdki4tboKqFI/goNsm1jrH/ZxlVwuNfzn/T2Pj7t7LuNxevoAXqEushHz9A+eosO0BAx65v13fph/Wz9bq+1N9qbp9KW1Zy5hy5E+/4f2j0OpA==

    z = 2AAADXHicfVJdaxNBFJ1Nqtat1baCL4IMhoVUtiEbhYoSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8YXf4uz6VrbRnph4cy5Z+bcOTtBKoWGbveX1Wiu3bh5a/22vXFn8+69re2dI51kivEhS2SiTgKquRQxH4IAyU9SxWkUSH4cnL6q+sdfuNIiiT9BnvJxRMNYTAWjYCh/23rkpO386/wlgRkHutsnOov8YtH3ys8DnLYX/1qVbuGeL23HiFzCJglod2A7ue2QiMKMUYnftc+Pq6ggwG/8oj13812iRUSAz6GYUKBlSSSfwmiPyCRcOi+u9SRKhDMYG6cZBZz70CdUhRGd+zmupDVdeC9gzyvdS9OujD+ouAteq5JFv+dvtbqd7rLwKvBq0EJ1HZpEHTJJWBbxGJikWo+8bgrjgioQTPLSJpnmKWWnNOQjA2MacT0uln+yxI5hJniaKPPFgJfsxR0FjbTOo8Aoq2D11V5F/q83ymD6fFyIOM2Ax+zMaJpJDAmungWeCMUZyNwAypQws2I2o4oyMI/HJq+5uYvi7825H1KuKCTqSVFnX5q7hcSt0HVCEf8VGmTbJljvaoyr4KjX8Z52eh+ftQ7cOuJ19BA9Rm3koX10gN6iQzREzPpmfbd+WD8bv5trzY3m5pm0YdV77qNL1XzwB9wVDqU=

    z = 3AAADXHicfVJdaxNBFJ1Nqtatta2CL4IMhoVUtiHbCooSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8aX/pbOpmttG+mFhTPnnplz5+wEqRQaut0/VqO5cuv2ndW79tq99fsbm1sPDnWSKcYHLJGJOg6o5lLEfAACJD9OFadRIPlRcPKm6h9940qLJP4CecpHEQ1jMRGMgqH8LeuJk7bz77PXBKYc6HaP6Czyi3nPK7/2cdqe/2tVurl7sbQdI3IJGyeg3b7t5LZDIgpTRiX+0L44rqKCAL/zi/bMzbeJFhEBPoNiTIGWJZF8AsMdIpNw4Ty/0ZMoEU5hZJymFHDuQ49QFUZ05ue4ktZ04b2CHa90r0y7NH6/4i55LUvmvT1/s9XtdBeFl4FXgxaq68Ak6pBxwrKIx8Ak1XrodVMYFVSBYJKXNsk0Tyk7oSEfGhjTiOtRsfiTJXYMM8aTRJkvBrxgL+8oaKR1HgVGWQWrr/cq8n+9YQaTl6NCxGkGPGbnRpNMYkhw9SzwWCjOQOYGUKaEmRWzKVWUgXk8NnnLzV0U/2jO/ZRyRSFRz4o6+9LcLSRuhW4Siviv0CDbNsF612NcBoe7HW+vs/v5eWvfrSNeRY/RU9RGHnqB9tF7dIAGiFk/rJ/WL+t347S50lxrrp9LG1a95yG6Us1HZ93tDqY=

    p(y|z, x; ✓) ! p(y|x; ✓)AAAC7XicfVFda9swFJXdfXTeV9o97kUshKbDBNvZSMMoFLaHQRnrYGkLkWcUWXFEZctI8pbY9c/YW9nrftP+zeR8QBe2XRAczj3S0T13knOmtOf9suydO3fv3d994Dx89PjJ09be/rkShSR0RAQX8nKCFeUsoyPNNKeXuaQ4nXB6Mbl62/QvvlKpmMg+60VOwxQnGZsygrWhotZNR0bll6rLDmvE6VRjKcU3iFKsZ7Gq/HpcHiMskxTPo6o8qGHeXazU1+WBO1/BN0jPqMaHodNBXCSN5rp05xva6Zw6WxSSLJmtvZrWLW157LuIxEIr9zRqtb3eYOj1+6+h1/OCo8Dzl2DQHwbQ73nLaoN1nUV71hDFghQpzTThWKmx7+U6rLDUjHBaO6hQNMfkCid0bGCGU6rCahljDTuGieFUSHMyDZfs7RsVTpVapBOjbPJR272G/GsvVs2DW+56ehRWLMsLTTOyMp8WHGoBmz3BmElKNF8YgIlk5v+QzLDERJttOg56R82Akn4wZh9zKrEW8mW1XBXLajNwgtwG/U+I5xuhQY4Je5Mo/Dc4D3p+vxd8etU+Cdax74Ln4AXoAh8MwAl4D87ACBDLtrqWbwW2sL/bN/aPldS21neegT/K/vkbPenqmA==

    The latent variable is ignored (as in VAE)

    Sharing too many parameters that does not differentiate?

    —use independently parameterized decoders

    p(y|z, x; ✓)AAADhnicfVJbb9MwFHYWLiXcNnjkxaKK1KGsajrQEKhSJUCaVBBDotukukSO66bWcpN9Ak1DfhO/hgde4K/gtGGs67QjWfr8ne/43OynoVDQ6fwytswbN2/dbtyx7t67/+Dh9s6jY5VkkvEhS8JEnvpU8VDEfAgCQn6aSk4jP+Qn/tmbyn/ylUslkvgz5CkfRzSIxVQwCprydoxDO23l3+evCcw40N0eUVnkFYueW34Z4LS1+O+qdAvn/GrZWuQQNklAOQPLzi2bRBRmjIb4fev8uYryffzOK1pzJ98lSkQE+ByKCQValiTkUxjtkTAJlpkX1+YkUgQzGOtMMwo496BHqAwiOvdyXElrunBfwZ5bOmvVbpQ/qLgLuawrGtzfiFuVQKVMvuG1yXnbzU67szS8CdwaNFFtR3r2NpkkLIt4DCykSo3cTgrjgkoQLOSlRTLFU8rOaMBHGsY04mpcLHdeYlszEzxNpD4x4CV7MaKgkVJ55GtltQJ12VeRV/lGGUxfjgsRpxnwmK0STbMQQ4KrD4QnQnIGYa4BZVLoWjGbUUkZ6G9mkbdc9yL5B/3ux5RLCol8VtRbKnVvAXEqdJ1QxP+EGlmWHqx7eYyb4Ljbdvfb3U/Pm/1uPeIGeoKeohZy0QHqo0N0hIaIGT+Mn8Zv44/ZMNvmC/NgJd0y6pjHaM3M/l/o5R6h

    Fifty-eight per cent of those voting opposed the contract deal .
Fifty-eight per cent of those voting opposed the contract deal .
Fifty-eight per cent of those voting opposed the contract deal .

  • “Rich gets richer”—once a component is better than others, it receives more gradients while others starve and eventually die (Teh, 2010)

    !17

    Try Again with Independent Decoders

    参与投票的成员中,58% 反对该合同交易易。Source

    Hypotheses z = 1AAADXHicfVJdaxQxFM3sVq1Tq62CL4IEl4GtTJeZKijKQkEFYRUruG1hsw6ZbHY2NPNBckd2dpxHf42v+mN88beY2Y617UovBE7OPcm99yRhJoUGz/tltdpr167fWL9pb9zavH1na/vuoU5zxfiQpTJVxyHVXIqED0GA5MeZ4jQOJT8KT17V+aMvXGmRJp+gyPg4plEipoJRMFSwbT10sm7xdf6SwIwD3ekTncdBuej71ecBzrqLf6lat3DPtrZjRC5hkxS0O7CdwnZITGHGqMTvumfX1VQY4jdB2Z27xQ7RIibA51BOKNCqIpJPYbRLZBotKy+urEmUiGYwNpVmFHARQJ9QFcV0HhS4ljZ06b+AXb9yL3S70v6g5s7VWpWYAYOtjtfzloFXgd+ADmriwDjqkEnK8pgnwCTVeuR7GYxLqkAwySub5JpnlJ3QiI8MTGjM9bhcvmSFHcNM8DRVZiWAl+z5EyWNtS7i0ChrY/XlXE3+LzfKYfp8XIoky4En7LTQNJcYUlx/CzwRijOQhQGUKWF6xWxGFWVgPo9NXnMzi+Lvzb0fMq4opOpx2Xhfmdki4tboKqFI/goNsm1jrH/ZxlVwuNfzn/T2Pj7t7LuNxevoAXqEushHz9A+eosO0BAx65v13fph/Wz9bq+1N9qbp9KW1Zy5hy5E+/4f2j0OpA==

    z = 2AAADXHicfVJdaxNBFJ1Nqtat1baCL4IMhoVUtiEbhYoSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8YXf4uz6VrbRnph4cy5Z+bcOTtBKoWGbveX1Wiu3bh5a/22vXFn8+69re2dI51kivEhS2SiTgKquRQxH4IAyU9SxWkUSH4cnL6q+sdfuNIiiT9BnvJxRMNYTAWjYCh/23rkpO386/wlgRkHutsnOov8YtH3ys8DnLYX/1qVbuGeL23HiFzCJglod2A7ue2QiMKMUYnftc+Pq6ggwG/8oj13812iRUSAz6GYUKBlSSSfwmiPyCRcOi+u9SRKhDMYG6cZBZz70CdUhRGd+zmupDVdeC9gzyvdS9OujD+ouAteq5JFv+dvtbqd7rLwKvBq0EJ1HZpEHTJJWBbxGJikWo+8bgrjgioQTPLSJpnmKWWnNOQjA2MacT0uln+yxI5hJniaKPPFgJfsxR0FjbTOo8Aoq2D11V5F/q83ymD6fFyIOM2Ax+zMaJpJDAmungWeCMUZyNwAypQws2I2o4oyMI/HJq+5uYvi7825H1KuKCTqSVFnX5q7hcSt0HVCEf8VGmTbJljvaoyr4KjX8Z52eh+ftQ7cOuJ19BA9Rm3koX10gN6iQzREzPpmfbd+WD8bv5trzY3m5pm0YdV77qNL1XzwB9wVDqU=

    z = 3AAADXHicfVJdaxNBFJ1Nqtatta2CL4IMhoVUtiHbCooSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8aX/pbOpmttG+mFhTPnnplz5+wEqRQaut0/VqO5cuv2ndW79tq99fsbm1sPDnWSKcYHLJGJOg6o5lLEfAACJD9OFadRIPlRcPKm6h9940qLJP4CecpHEQ1jMRGMgqH8LeuJk7bz77PXBKYc6HaP6Czyi3nPK7/2cdqe/2tVurl7sbQdI3IJGyeg3b7t5LZDIgpTRiX+0L44rqKCAL/zi/bMzbeJFhEBPoNiTIGWJZF8AsMdIpNw4Ty/0ZMoEU5hZJymFHDuQ49QFUZ05ue4ktZ04b2CHa90r0y7NH6/4i55LUvmvT1/s9XtdBeFl4FXgxaq68Ak6pBxwrKIx8Ak1XrodVMYFVSBYJKXNsk0Tyk7oSEfGhjTiOtRsfiTJXYMM8aTRJkvBrxgL+8oaKR1HgVGWQWrr/cq8n+9YQaTl6NCxGkGPGbnRpNMYkhw9SzwWCjOQOYGUKaEmRWzKVWUgXk8NnnLzV0U/2jO/ZRyRSFRz4o6+9LcLSRuhW4Siviv0CDbNsF612NcBoe7HW+vs/v5eWvfrSNeRY/RU9RGHnqB9tF7dIAGiFk/rJ/WL+t347S50lxrrp9LG1a95yG6Us1HZ93tDqY=

    Only one component gets trained is poor except for onep(y|z, x; ✓)AAAC1HicfVFda9swFJW9j3beR9PtZWMvYiE0HcbYzkYaRqGwPQzKWAdLWoiyoMhKIipbRpK3xI6fyl73//YT9i8mOwls3ccFweHcc+/VPXeScqa073+37Bs3b93e2b3j3L13/8FeY//hQIlMEtonggt5McGKcpbQvmaa04tUUhxPOD2fXL6u8uefqVRMJB/1MqWjGM8SNmUEa0ONG1ctOc4/FW12WCJOpxpLKb5AFGM9j1QRlMP8GGE5i/FiXOQHJUzby7V6lR+4izV8hfScanw4clqIi1mlWeXuYks7rVPnDyo/DlxEIqGVezpuNH2v2/M7nZfQ9/zwKPSDGnQ7vRAGnl9HE2zibLxv9VAkSBbTRBOOlRoGfqpHBZaaEU5LB2WKpphc4hkdGpjgmKpRUbtVwpZhIjgV0rxEw5r9taLAsVLLeGKUlQ3qeq4i/5qLVNXw2nQ9PRoVLEkzTROyHj7NONQCVueAEZOUaL40ABPJzP8hmWOJiTZHcxz0hpoFJX1nhr1PqcRayOdFfRGWlGbhGXIr9D8hXmyFBjnG7K2j8N9gEHpBxws/vGiehBvbd8FT8Ay0QQC64AS8BWegDwj4Ye1Zj60n9sBe2Vf217XUtjY1j8BvYX/7CR4G4P0=

    zAAAC8HicfVFdb9MwFHUCgxG+OnjkxaKK1kFVJemmrkKTJsED0oQYEt0m1aFyHTe15sSR7WxtsvwP3iZe+Uf8G5x+CKiAK1k6OvfYx/fcccaZ0p73w7Lv3N26d3/7gfPw0eMnTxs7z86UyCWhAyK4kBdjrChnKR1opjm9yCTFyZjT8/Hl27p/fkWlYiL9rOcZDRMcp2zCCNaGGjVuXTkqvpQttlchTicaSymuIUqwnkaq9KthcYSwjBM8G5XFbgWz1nypvil227MlfIP0lGq8Fzou4iKuNTdFe7amHffEcTc4JFk8XZnVrV/iwnGLI7+NSCS0ap+MGk2v0+t73e4B9DpecBh4/gL0uv0A+h1vUU2wqtPRjtVHkSB5QlNNOFZq6HuZDkssNSOcVg7KFc0wucQxHRqY4oSqsFwkWUHXMBGcCGlOquGC/f1GiROl5snYKOuI1GavJv/ai1T94Ia7nhyGJUuzXNOULM0nOYdawHpVMGKSEs3nBmAimfk/JFMsMdFmoY6D3lEzoKQfjNnHjEqshXxVLrbF0soMHKN2jf4nxLO10CDHhL1OFP4bnAUdv9sJPu03j4NV7NvgBXgJWsAHPXAM3oNTMADE2rJeW/vWgS3tr/at/W0pta3Vnefgj7K//wSQN+tf

    Fifty-eight per cent of those voting opposed the contract deal .
. 
.

  • !18

    Mixture Models Are Prone to Degeneracies

    D1: all components behave the same, the latent variable is ignored

    D2: only one component gets trained, other components are poor

    Turns out how to train mixture models is not obvious…

    Let’s take a closer look

  • !19

    EM Training

    E-step (hard): estimate the responsibility of each component

    M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=

    r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=

    r(i)z 1[z = argmaxz0

    p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=

    Shared params, latent variable is ignored

  • !20

    Effect of Dropout

    Dropout noise here can confuse latent variable assignments

    0.0 0.1 0.2 0.3 0.4 0.5dropout probability

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    resp

    onsibi

    lity

    flip

    rate

    for

    hMup

    0.0 0.1 0.2 0.3 0.4 0.5dropout probability

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    resp

    onsibi

    lity

    flip

    rate

    for

    hMup

    E-step (hard): estimate the responsibility of each component

    M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=

    r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=

    r(i)z 1[z = argmaxz0

    p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=

    Shared params, latent variable is ignored

  • !21

    Fix Dropout

    E-step (hard): estimate the responsibility of each component

    M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=

    r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=

    r(i)z 1[z = argmaxz0

    p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=

    Shared params, latent variable is ignored

    no dropout

    dropout

  • !22

    Try Our Modified Dropout Strategy

    参与投票的成员中,58% 反对该合同交易易。Source

    Hypotheses z = 1AAADXHicfVJdaxQxFM3sVq1Tq62CL4IEl4GtTJeZKijKQkEFYRUruG1hsw6ZbHY2NPNBckd2dpxHf42v+mN88beY2Y617UovBE7OPcm99yRhJoUGz/tltdpr167fWL9pb9zavH1na/vuoU5zxfiQpTJVxyHVXIqED0GA5MeZ4jQOJT8KT17V+aMvXGmRJp+gyPg4plEipoJRMFSwbT10sm7xdf6SwIwD3ekTncdBuej71ecBzrqLf6lat3DPtrZjRC5hkxS0O7CdwnZITGHGqMTvumfX1VQY4jdB2Z27xQ7RIibA51BOKNCqIpJPYbRLZBotKy+urEmUiGYwNpVmFHARQJ9QFcV0HhS4ljZ06b+AXb9yL3S70v6g5s7VWpWYAYOtjtfzloFXgd+ADmriwDjqkEnK8pgnwCTVeuR7GYxLqkAwySub5JpnlJ3QiI8MTGjM9bhcvmSFHcNM8DRVZiWAl+z5EyWNtS7i0ChrY/XlXE3+LzfKYfp8XIoky4En7LTQNJcYUlx/CzwRijOQhQGUKWF6xWxGFWVgPo9NXnMzi+Lvzb0fMq4opOpx2Xhfmdki4tboKqFI/goNsm1jrH/ZxlVwuNfzn/T2Pj7t7LuNxevoAXqEushHz9A+eosO0BAx65v13fph/Wz9bq+1N9qbp9KW1Zy5hy5E+/4f2j0OpA==

    z = 2AAADXHicfVJdaxNBFJ1Nqtat1baCL4IMhoVUtiEbhYoSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8YXf4uz6VrbRnph4cy5Z+bcOTtBKoWGbveX1Wiu3bh5a/22vXFn8+69re2dI51kivEhS2SiTgKquRQxH4IAyU9SxWkUSH4cnL6q+sdfuNIiiT9BnvJxRMNYTAWjYCh/23rkpO386/wlgRkHutsnOov8YtH3ys8DnLYX/1qVbuGeL23HiFzCJglod2A7ue2QiMKMUYnftc+Pq6ggwG/8oj13812iRUSAz6GYUKBlSSSfwmiPyCRcOi+u9SRKhDMYG6cZBZz70CdUhRGd+zmupDVdeC9gzyvdS9OujD+ouAteq5JFv+dvtbqd7rLwKvBq0EJ1HZpEHTJJWBbxGJikWo+8bgrjgioQTPLSJpnmKWWnNOQjA2MacT0uln+yxI5hJniaKPPFgJfsxR0FjbTOo8Aoq2D11V5F/q83ymD6fFyIOM2Ax+zMaJpJDAmungWeCMUZyNwAypQws2I2o4oyMI/HJq+5uYvi7825H1KuKCTqSVFnX5q7hcSt0HVCEf8VGmTbJljvaoyr4KjX8Z52eh+ftQ7cOuJ19BA9Rm3koX10gN6iQzREzPpmfbd+WD8bv5trzY3m5pm0YdV77qNL1XzwB9wVDqU=

    z = 3AAADXHicfVJdaxNBFJ1Nqtatta2CL4IMhoVUtiHbCooSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8aX/pbOpmttG+mFhTPnnplz5+wEqRQaut0/VqO5cuv2ndW79tq99fsbm1sPDnWSKcYHLJGJOg6o5lLEfAACJD9OFadRIPlRcPKm6h9940qLJP4CecpHEQ1jMRGMgqH8LeuJk7bz77PXBKYc6HaP6Czyi3nPK7/2cdqe/2tVurl7sbQdI3IJGyeg3b7t5LZDIgpTRiX+0L44rqKCAL/zi/bMzbeJFhEBPoNiTIGWJZF8AsMdIpNw4Ty/0ZMoEU5hZJymFHDuQ49QFUZ05ue4ktZ04b2CHa90r0y7NH6/4i55LUvmvT1/s9XtdBeFl4FXgxaq68Ak6pBxwrKIx8Ak1XrodVMYFVSBYJKXNsk0Tyk7oSEfGhjTiOtRsfiTJXYMM8aTRJkvBrxgL+8oaKR1HgVGWQWrr/cq8n+9YQaTl6NCxGkGPGbnRpNMYkhw9SzwWCjOQOYGUKaEmRWzKVWUgXk8NnnLzV0U/2jO/ZRyRSFRz4o6+9LcLSRuhW4Siviv0CDbNsF612NcBoe7HW+vs/v5eWvfrSNeRY/RU9RGHnqB9tF7dIAGiFk/rJ/WL+t347S50lxrrp9LG1a95yG6Us1HZ93tDqY=

    It works! : )

    Fifty-eight per cent of the members who voted opposed the contract deal . Of the members who voted , 58 % opposed the deal . Fifty-eight per cent of the voting members opposed the contract deal .

  • !23

    Design Space

    Model variantshard mixture

    Regularization no dropout at E-step, dropout at M-step

    maxz(1/K) · p(y|z, x; ✓)AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn+l7fe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbJatcVcWnPQeAjEgut/J6HFIupotpUJiwrW9kUtoMXvc5csJ7rBnneLv8m+F9YiM+iyrQqpIoUvguzcLvR3e3ODa6DYAkaYGl929M+igUpUpppwrFSw6Cb65HBUjPCaeWhQtEck3Oc0KGFGU6pGpn5LCvYtEwMx0Lam2k4Z69GGJwqNUsjq6xnpFZ9NXmjL1b1g9eym1qlheBqpSg93h8ZluWFphlZ1DQuONQC1jsEYyYp0XxmASaS2W9BMsESE203zfPQe2r/LelH+/qnnEqshXxu5lvEssr2IUF+jf4lxNNLoUWenUGw2vF1cLy3G7zc3fv8qnHoL6exCZ6CZ6ANAvAaHIIPoA8GgDjfnR/OT+eXu+UG7r77ZiHdcJYxT8A1c4/+AFwbI0U=

    uniform prior

    Training scheduleonline

    Parameterizationshared

  • !24

    Design Space

    Model variantshard mixture

    Regularization no dropout at E-step, dropout at M-step

    maxz(1/K) · p(y|z, x; ✓)AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn+l7fe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbJatcVcWnPQeAjEgut/J6HFIupotpUJiwrW9kUtoMXvc5csJ7rBnneLv8m+F9YiM+iyrQqpIoUvguzcLvR3e3ODa6DYAkaYGl929M+igUpUpppwrFSw6Cb65HBUjPCaeWhQtEck3Oc0KGFGU6pGpn5LCvYtEwMx0Lam2k4Z69GGJwqNUsjq6xnpFZ9NXmjL1b1g9eym1qlheBqpSg93h8ZluWFphlZ1DQuONQC1jsEYyYp0XxmASaS2W9BMsESE203zfPQe2r/LelH+/qnnEqshXxu5lvEssr2IUF+jf4lxNNLoUWenUGw2vF1cLy3G7zc3fv8qnHoL6exCZ6CZ6ANAvAaHIIPoA8GgDjfnR/OT+eXu+UG7r77ZiHdcJYxT8A1c4/+AFwbI0U=

    uniform prior

    Training schedule

    Parameterization

    online

    shared

    D2: only one component gets trainedindependent

    offline perform E-step for all training examples before M-step

    interleave E-step and M-step for each mini-batch

  • !25

    Design Space

    Training schedule

    Model variantshard mixture

    soft mixture

    Regularization no dropout at E-step, dropout at M-step

    Xzp(z|x; ✓) · p(y|z, x; ✓)

    AAADfnicfVJdb9MwFHUWPkb42kA88RJRhbYo65qBxCQ0aQgekCZEkeg2qe4ix3FTa04c2Q60ccOP4RX+EP8Gp+1gayeuEuno3HPutX1vlDMqVbf729qwb9y8dXvzjnP33v0HD7e2Hx1LXghM+pgzLk4jJAmjGekrqhg5zQVBacTISXT+rs6ffCVCUp59UdOcDFOUZHREMVKGCretJ54IyzPdou0KMjJSSAj+zYUpUuNY6qAalAcQiSRFk1CXzcrNW9OFelY2/ckCvoFqTBRqDx0PMp7UmlnpTy5oxztyvBUOCpqMl83q1CVxab6DwIc45kr6xgplkYbld7cV7B615/RaByhpTCRRutJhWdUGIyn/Vb3e5f21hegsqnRz4XwbZuFWo9vpzsNdB8ESNMAyeuYhezDmuEhJpjBDUg6Cbq6GGglFMSOVAwtJcoTPUUIGBmYoJXKo5wOsXM8wsTviwvyZcufsZYdGqZTTNDLKejByNVeT1+ZiWRe80l3XKsU5kyuHUqP9oaZZXiiS4cWZRgVzFXfrxXFjKghWbGoAwoKaa7l4jATCyqyX48D3xNxbkI+m+qecCKS4eKHnq0OzyrxDAv0a/U+IJhdCgxwzg2D1xdfB8V4neNnZ+/yqcegvp7EJnoJnoAUC8Bocgg+gB/oAWzPrh/XT+mUD+7m9Y+8upBvW0vMYXAl7/w++Cx42

    Xz(1/K) · p(y|z, x; ✓)

    AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn3mvfe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbFatcVcWnPQeAjEgut/J6HFIupotpUJiwrpIoUtoMXvc5csF7rBnneLv8W+F9YiM+iyrQWke/CLNxudHe7c4PrIFiCBlha3/a0j2JBipRmmnCs1DDo5npksNSMcFp5qFA0x+QcJ3RoYYZTqkZmPssKNi0Tw7GQ9mYaztmrEQanSs3SyCrrGalVX03e6ItVnfBadVOrtBBcrTxKj/dHhmV5oWlGFm8aFxxqAesdgjGTlGg+swATyey3IJlgiYm2m+Z56D21/5b0o83+KacSayGfm/kWsayyfUiQX6N/CfH0UmiRZ2cQrHZ8HRzv7QYvd/c+v2oc+stpbIKn4BlogwC8BofgA+iDASDOd+eH89P55W65gbvvvllIN5xlzBNwzdyjP6K6I2M=

    maxz p(z|x; ✓) · p(y|z, x; ✓)AAADiXicfVJdb9MwFHUWPkbGxwaPvFhUVVsUlWYgMYEmDY0HpAlRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7bWsnrhLp6Nxzfa59b5RzpnSv99fZcO/cvXd/84G39fDR4yfbO0+PlSgkoQMiuJCnEVaUs4wONNOcnuaS4jTi9CQ6P6zzJ9+pVExk3/Qsp6MUJxkbM4K1pcIdBzZlWJ6ZNutUiNOxxlKKHxClWE9iZYJqWO4jLJMUT0NTtiqYt2cL9UXZ8qcL+B7pCdW4M/KaiIuk1lyU/vSS9ppHXnOFQ5Ilk6VZnbomLu23H/iIxEIr35YixWKqqDaVCcsKqSKF7eDVUWeuWDNbUdvGraS8Mri96sokxGdRZVoLnw9hFm43et3ePOA6CJagAZbRt2/aR7EgRUozTThWahj0cj0yWGpGOK08VCiaY3KOEzq0MMMpVSMzn2UFm5aJ4VhI+2caztnrFQanSs3SyCrrGanVXE3emotVfeANd1OrtBBcrTSlx3sjw7K80DQji57GBYdawHqHYMwkJZrPLMBEMnstSCZYYqLtpnke+kjtvSX9bE//klOJtZAvzXyLWFbZd0iQX6P/CfH0UmiRZ2cQrL74Ojje7Qavu7tf3zQO/OU0NsFz8AK0QQDeggPwCfTBABDnp/PL+e38cbfcwN1z3y2kG86y5hm4Ee7hP3bdI1Q=

    maxz(1/K) · p(y|z, x; ✓)AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn+l7fe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbJatcVcWnPQeAjEgut/J6HFIupotpUJiwrW9kUtoMXvc5csJ7rBnneLv8m+F9YiM+iyrQqpIoUvguzcLvR3e3ODa6DYAkaYGl929M+igUpUpppwrFSw6Cb65HBUjPCaeWhQtEck3Oc0KGFGU6pGpn5LCvYtEwMx0Lam2k4Z69GGJwqNUsjq6xnpFZ9NXmjL1b1g9eym1qlheBqpSg93h8ZluWFphlZ1DQuONQC1jsEYyYp0XxmASaS2W9BMsESE203zfPQe2r/LelH+/qnnEqshXxu5lvEssr2IUF+jf4lxNNLoUWenUGw2vF1cLy3G7zc3fv8qnHoL6exCZ6CZ6ANAvAaHIIPoA8GgDjfnR/OT+eXu+UG7r77ZiHdcJYxT8A1c4/+AFwbI0U=

    uniform priorlearned prior

    uniform prior

    learned prior

    offline

    Parameterizationindependent

    online

    shared

  • !26

    Metrics

    BLEU (Papineni et al., 2002): modified n-gram precision metric for sentence similarity

    from 0 (no overlap) to 100 (same)

  • !27

    Metrics

    • BLEU (quality): average BLEU of each hypothesis against the references

    Source: Thanks a lot!

    Hypo1: Merci! Hypo2: Merci merci! Hypo3: Merci beaucoup!

    Ref1: Merci beaucoup! Ref2: Merci beaucoup. Ref3: Merci!

  • !28

    Metrics

    • BLEU (quality): average BLEU of each hypothesis against the references • Pairwise-BLEU (diversity): average BLEU over each pair of hypotheses

    Source: Thanks a lot!

    Hypo1: Merci! Hypo2: Merci merci! Hypo3: Merci beaucoup!

  • !29

    Metrics

    • BLEU (quality): average BLEU of each hypothesis against the references • Pairwise-BLEU (diversity): average BLEU over each pair of hypotheses

    Also compute human BLEU and Pairwise-BLEU

  • !30

    Datasets

    WMT’17 English-German:

    WMT’14 English-French:

    WMT’17 Chinese-English:

    #train, #ref #test, #ref

    1

    1

    1

    4.5M,

    36M,

    20M,

    500,

    500,

    2001,

    10

    10

    3

  • !31

    Goal: High Quality and Diversity

    WMT English-German (K=3)

    bad, diverse

    good, similar good, diverse : )

    bad, similar

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

  • !32

    Model Exploration

    good, similar good, diverse : )

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

    WMT English-German (K=3)

    bad, diversebad, similar

  • !33

    Model Exploration

    WMT English-German (K=3)

    bad, diverse

    good, similar good, diverse : )

    bad, similar

    D1: latent var is ignored

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

  • !34

    Model Exploration

    WMT English-German (K=3)

    bad, diverse

    good, similar good, diverse : )

    bad, similar

    D1: latent var is ignored

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

  • !35

    Model Exploration

    WMT English-German (K=3)

    bad, diverse

    good, similar good, diverse : )

    bad, similar

    D1: latent var is ignored

    D2: only one compo gets trained

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

  • !36

    Model Exploration

    WMT English-German (K=3)

    bad, diverse

    good, similar good, diverse : )

    bad, similar

    D1: latent var is ignored

    D2: only one compo gets trained

    • online shared has higher quality than offline indep • hard mixture is more diverse than soft mixture

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

  • !37

    Winning Model

    WMT English-German (K=3)

    bad, diverse

    good, similar good, diverse : )

    bad, similar

    • only one backward pass

    • no prior predictor

    • negligible extra params

    • no responsibility storage

    fast

    simple

    memory-

    efficient

    Human

    shared params

    indep params

    hard mixture

    soft mixture

    online training

    uniform prior

    learned prior

    offline trainingshared params

    indep params

    uniform prior

    learned prior

    Variational NMT

  • 45

    55

    65

    75

    85

    90 75 60 45 30

    BLEU

    35

    45

    55

    65

    75

    Pairwise-BLEU

    90 70 50 30 10

    K=510 K=10K=3

    20

    0

    10

    20

    30

    40

    90 70 50 30 10

    Human Sampling Beam Diverse Beam Mixture Model

    WMT English-German (10 refs) WMT English-French (10 refs) WMT Chinese-English (3 refs)

    !38

    Large Scale Evaluation

  • 他 从不不 愿意 与 家⼈人 争吵 。

    He never wanted to be in any kind of altercation .

    He never liked to quarrel with his family . He never wants to quarrel with his family He never likes to argue with his family .

    !39

    Latent Variable Captures Consistent Translation Styles

    不不断 的 恐怖袭击 显然 已 对 他 造成 很⼤大 打击 。

    Repeat terror attacks on Turkey have clearly shaken him too .

    The continuing terrorist attacks had apparently hit him hard . He is clearly already being hit hard by the continuing terrorist attacks . Repeated terrorist attacks have apparently hit him hard .

    Source

    Reference

    hMup

    Source

    Reference

    hMup

    frequency of was, were, had: z=1’s > 3 * z=3’s

    frequency of has, says: z=3’s > 2 * z=1’s

    this vs. that, per cent vs. % …

  • • Conditional text generation is multi-model

    !40

    Conclusions

    • Search for multiple modes is difficult

    p(y|x)AAAB7nicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY8OKxgv2AdinZNG1Ds9mQZMVl7Y/w4kERr/4eb/4bs+0etPXBwOO9GWbmBZIzbVz32ymsrW9sbhW3Szu7e/sH5cOjto5iRWiLRDxS3QBrypmgLcMMp12pKA4DTjvB9CbzOw9UaRaJe5NI6od4LNiIEWys1JHV5OnxvDQoV9yaOwdaJV5OKpCjOSh/9YcRiUMqDOFY657nSuOnWBlGOJ2V+rGmEpMpHtOepQKHVPvp/NwZOrPKEI0iZUsYNFd/T6Q41DoJA9sZYjPRy14m/uf1YjO69lMmZGyoIItFo5gjE6HsdzRkihLDE0swUczeisgEK0yMTSgLwVt+eZW06zXvola/u6w06nkcRTiBU6iCB1fQgFtoQgsITOEZXuHNkc6L8+58LFoLTj5zDH/gfP4Ab0KO7A==

    z1AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeCF48V7Qe0oWy2k3bpZhN2N0IN/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3H1FpHssHM0nQj+hQ8pAzaqx0/9T3+uWKW3XnIKvEy0kFcjT65a/eIGZphNIwQbXuem5i/Iwqw5nAaamXakwoG9Mhdi2VNELtZ/NTp+TMKgMSxsqWNGSu/p7IaKT1JApsZ0TNSC97M/E/r5ua8NrPuExSg5ItFoWpICYms7/JgCtkRkwsoUxxeythI6ooMzadkg3BW355lbRqVe+iWru7rNRreRxFOIFTOAcPrqAOt9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AELOo2W

    z2AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mioMeCF48V7Qe0oWy2k3bpZhN2N0It/QlePCji1V/kzX/jts1BWx8MPN6bYWZemAqujet+O4W19Y3NreJ2aWd3b/+gfHjU1EmmGDZYIhLVDqlGwSU2DDcC26lCGocCW+HoZua3HlFpnsgHM04xiOlA8ogzaqx0/9Tze+WKW3XnIKvEy0kFctR75a9uP2FZjNIwQbXueG5qgglVhjOB01I305hSNqID7FgqaYw6mMxPnZIzq/RJlChb0pC5+ntiQmOtx3FoO2NqhnrZm4n/eZ3MRNfBhMs0MyjZYlGUCWISMvub9LlCZsTYEsoUt7cSNqSKMmPTKdkQvOWXV0nTr3oXVf/uslLz8ziKcAKncA4eXEENbqEODWAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QMMvo2X

    z3AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0laQY8FLx4r2lZoQ9lsN+3SzSbsToQa+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T316/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvXq1drtRaVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8OQo2Y

    It was rejected by 58 % of its members who voted in the ballot .

    Of the members who voted , 58 % opposed the contract transaction .

    Of the members who participated in the vote , 58 % opposed the contract .

    • explicitly model uncertainty with latent variables

    argmaxy1,··· ,yT

    YTt=1

    p(yt|y1:t�1, x; ✓)AAACknicfZBdb9MwFIbd8DXCVwfccWNRDW2oq+KCBBpCGhoXXIAYUrtNakrkOKettcSO7BPUyMsf459wxy38CtwuSLAhjmTp0XveY/u8aZlLi1H0rRNcuXrt+o2Nm+Gt23fu3utu3j+yujICxkLn2pyk3EIuFYxRYg4npQFepDkcp6cHq/7xFzBWajXCuoRpwedKzqTg6KWkO9oqt+uz5U4YczMv+DJxdcL6scg02n6djBoaW5mBBXSNSxy+Zs3nUROXRmfUTyZ4VieO7eEua/rLVzEuAPlO0u1Fg2hd9DKwFnqkrcNks/MkzrSoClAocm7thEUlTh03KEUOTRhXFkouTvkcJh4VL8BO3Xr9hm55JaMzbfxRSNfqnxOOF9bWReqdBceFvdhbif/qTSqcvZw6qcoKQYnzh2ZVTlHTVZY0kwYE5rUHLoz0f6ViwQ0X6BMPw/gt+GUMfPAXfyzBcNTmqWuDbvxy87i/ov8Zpfpt9BSGPll2McfLcDQcsGeD4afnvf1hm/EGeUQek23CyAuyT96RQzImgnwl38kP8jN4GOwFb4KDc2vQaWcekL8qeP8LYmHKxQ==

    p(y|z, x)AAACoXicfVFdb9MwFHXD1whfHTzyYlEVbahUSUHaBEKaBA/wgCio3SY1xXKc29ZaEkf2DWrm5QfyE/gVvMIbThck2BBXsnR07rnHvsdxkUqDQfCt4125eu36ja2b/q3bd+7e627fPzSq1AKmQqVKH8fcQCpzmKLEFI4LDTyLUziKT143/aMvoI1U+QSrAuYZX+ZyIQVHR7Gu6Bc71dl61+9HXC8zvma2YuEgEolCM6jYpKaRkQkYQFtbZvFVWH+e1FGhVULdKMOzitnwBT4N68H6ZYQrQO7cTtnIb5xPB+td1u0Fw2BT9DIIW9AjbY3ZdudxlChRZpCjSLkxszAocG65RilSqP2oNFBwccKXMHMw5xmYud2kUdO+YxK6UNqdHOmG/XPC8syYKoudMuO4Mhd7Dfmv3qzExf7cyrwoEXJxftGiTCkq2kRLE6lBYFo5wIWW7q1UrLjmAt0H+H70BtwyGt474w8FaI5KP7Ft7LVbbhkNGvQ/ocx/Cx3yfZdseDHHy+BwNAyfDUcfn/cORm3GW+QheUR2SEj2yAF5S8ZkSgT5Sr6TH+Sn1/PeeWPv07nU67QzD8hf5c1+AUsgz5g=

  • !41

    • Mixture models work pretty well but hardly explored for text generation

    • Training is not obvious, sub-optimal design choices can lead to degeneracies

    Conclusions Poster #106 tonight!

    • A strong baseline for work on latent variable text modeling

    • More applications to dialogue, image captioning, summarization…

    • Code: https://github.com/pytorch/fairseqTraining schedule

    Model variants

    Parameterizationsharedindependent

    hard Mixture

    soft Mixture

    online

    Regularization no dropout at E-step, dropout at M-step

    uniform priorlearned prior

    uniform priorlearned prior

    offline