Theory of αBiNs: Alphabetic Bipartite Networks
Animesh MukherjeeDept. of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Collaborators:Monojit Choudhury, Microsoft Research India, BangaloreNiloy Ganguly, Abyayananda Maiti, Department of Computer Science and Engineering, Indian Institute of Technology, KharagpurFernando Peruani, Service de Physique de l'Etat Condense & Complex System Institute Paris - Ile-de-France, Paris, FranceLutz Brusch and Andreas Deutsch, Centre for Information Services and High Performance Computing, Technical University of Dresden, Germany
Discrete Combinatorial System (DCS)
• A DCS is a system where the basic building blocks are a finite set of elementary units and the system is a collection of potentially infinite number of discrete combinations of these units
• Examples include two of the greatest wonders on earth – life and language
• Life Elementary units are the nucleotides or codons while their discrete combinations give rise to the different genes
• Language Elementary units are the letters or words and the discrete combinations are the sentences formed from them.
αBiNs to Model a DCS
• αBiNs A special class of complex networksoBipartite in natureoOne partition contains nodes corresponding to
the basic units (or alphabets) while the other contains nodes that represent the discrete combinations of the basic units
oAn edge represents that a particular basic unit is a part of a discrete combination
Example: Phoneme-language Network (PlaNet)• Basic Unit Phonemes that human beings can articulate• Discrete Combination Phoneme inventory of a language, i.e.,
the repertoire of phonemes that the speakers of the language use for communication
l1
l2
l3
l4
/s/
/p/
/k/
/d/
/t/
/n/
PlaNet - Phoneme-Language Network
Topological Properties of PLaNet
Degree distribution of language nodes
Degree distribution of phoneme nodes
0 50 100
150
0.02
0.04
0.06
0.08
Language inventory size (degree k)
pk
pk = beta(k) with α = 7.06, and β = 47.64
pk =Γ(54.7) k6.06(1-k)46.64
Γ(7.06) Γ(47.64)
kmin= 5, kmax= 173, kavg= 21
200
1000Degree of a consonant, k
Pk = k -0.71
Exponential Cut-off
1 10 100
0.001
0.01
0.1
1
Networks constructed from the data available at UCLA Phonological Segment Inventory Database (UPSID) hosts 317 inventories with 541 different consonants found across them
Network Synthesis
• Can we simulate a stochastic network growth model which has similar DD?
• Clue: Preferential attachment leads to power-law degree distributions in both unipartite and unbounded bipartite networks
Evolution of PlaNet
Rules of the game:• A new language is born• Chooses from the set of
existing phonemes preferentially based on the degree
k + (k + )
all phonemes
Phon
emes Languages
Wow! We are quite close
ACL 2006
Theoretical Investigation: The Three Sides of the Coin• Sequential Attachment
o Only one edge per incoming nodeo Exclusive set-membership: Language – {speaker,
webpage}, country – citizen
• Parallel Attachment With Replacemento All incoming nodes has > 1 edgeso Sequences: letter-word, word-document
• Parallel Attachment Without Replacemento Sets: phoneme-languages, station-train
Sequential Attachment
Markov Chain Formulation
t – #nodes in growing partition N – #nodes in fixed partitionpk,t – pk after adding t nodes*One edge added per node
EPL, 2007
Notations
The Hard part• Average degree of the fixed partition diverges• Methods based on steady-state and continuous
time assumptions fail
Closed-form Solution
EPL, 2007
A tunable distribution
k (degree)
p k (p
rob
abili
ty t
hat
rand
om
ly c
hose
n n
od
e h
as
deg
ree k
)
= = 2
= 1 = 4e-4
1< <
< (N/-1)-1
EPL, 2007
Parallel attachment with replacement
• Either use approximation: pk,t ~ B(k/t; ε, Nε/μ – ε) where (> 1) is the number of incoming edges
• An exact Markov Chain:
• Could not solve for exact solution
• But have some closer approximations
To be Submitted to PRE
Parallel Attachment with replacement results
= 1 = 0.0625
• =40, N = 100
• Red broken line Approximation
• Blue symbols Stochastic Simulation
• Black line Numerical integration of the Markov chain
• For very low the approximation falls out of range
One-Mode Projection of the fixed Partition
• One mode projection onto the nodes of the fixed partition corresponds to a network of basic units where two basic units are connected as many times as they are part of discrete combinations: example Phoneme-phoneme Network (PhoNet)
PhoNet - Phoneme-Phoneme Network
/s/
/n//k/
/p/
/t/ /d/
1
1 1
2
2
2
1
2
1
1
1
1
1
Weighted DD
= 5 = 15
N = 500, = 1
Blue dots Stochastic Simulation, Black line Theory
q = k( - 1)
Comparison with real data
Not a very good match
A lot of work for future
• Derive closed form solutions for
oParallel attachment with replacement
oParallel attachment without replacement
• Strike a model and its associated theory to match the properties of the one-mode
• Study other real-world systems with an underlying αBiN-structure
To-DAH