Entropy rate as a measure of animal vocal complexity

This article was downloaded by: [University of Haifa Library]On: 22 October 2013, At: 15:13Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Bioacoustics: The International Journalof Animal Sound and its RecordingPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tbio20

Entropy rate as a measure of animalvocal complexityArik Kershenbauma

a National Institute for Mathematical and Biological Synthesis,University of Tennessee, Knoxville, TN, USAPublished online: 18 Oct 2013.

To cite this article: Arik Kershenbaum , Bioacoustics (2013): Entropy rate as a measure of animalvocal complexity, Bioacoustics: The International Journal of Animal Sound and its Recording, DOI:10.1080/09524622.2013.850040

To link to this article: http://dx.doi.org/10.1080/09524622.2013.850040

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/tbio20

http://dx.doi.org/10.1080/09524622.2013.850040

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Entropy rate as a measure of animal vocal complexity

Arik Kershenbaum*

National Institute for Mathematical and Biological Synthesis, University of Tennessee, Knoxville,TN, USA

(Received 3 May 2013; accepted 23 September 2013)

Vocal complexity is an important concept for investigating the role and evolution ofanimal communication and sociality. However, no one definition of ‘complexity’appears to be appropriate for all uses. Repertoire size has been used to quantifycomplexity in many bird and some mammalian studies, but is impractical in caseswhere vocalizations are highly diverse, and repertoire size is essentially non-limited atrealistic sample sizes. Some researchers have used information-theoretic measuressuch as Shannon entropy, to describe vocal complexity, but these techniques aredescriptive only, as they do not address hypotheses of the cognitive mechanisms behindvocal signal generation. In addition, it can be shown that simple measures of entropy, inparticular, do not capture syntactic structure. In this work, I demonstrate the use of analternative information-theoretic measure, the Markov entropy rate, which quantifiesthe diversity of transitions in a vocal sequence, and thus is capable of distinguishingsequences with syntactic structure from those generated by random, statisticallyindependent processes. I use artificial sequences generated from different stochasticmechanisms, as well as real data from the vocalizations of the rock hyrax Procaviacapensis, to show how different complexity metrics scale differently with sample size. Ishow that entropy rate provides a good measure of complexity for Markov processesand converges faster than repertoire size estimates, such as the Lempel–Ziv metric.The commonly used Shannon entropy performs poorly in quantifying complexity.

Keywords: complexity; entropy rate; Lempel–Ziv; Markov process; renewal process;Shannon entropy; syntax

Introduction

Animal vocal sequences have been particularly well studied in the context of birdsong

research (Catchpole and Slater 2003). However, many recent studies have also shown the

presence of non-random (i.e. statistically non-independent) sequences in a number of

mammalian taxa, such as cetaceans (Shapiro et al. 2010; Green et al. 2011; Cholewiak

et al. 2012), bats (Bohn et al. 2009), hyraxes (Kershenbaum et al. 2012) and primates

(Clarke et al. 2006). While we presume that these non-random vocal structures have

signalling significance (Ruxton and Schaefer 2011), in most cases we know neither what

the content of these signals are nor the method used for encoding information in them.

Analysis of birdsong has indicated a link between vocal complexity and social complexity

(Freeberg et al. 2012), and a similar relationship has been found in some mammalian taxa

(Pollard and Blumstein 2012). While it is not clear whether vocal complexity drives the

evolution of social complexity, or vice versa (Ord et al. 2012), it does seem that for those

species with the potential for complex vocalizations, quantifying this complexity should

be an important goal for researchers. Unfortunately, while complexity in animal

q 2013 Taylor & Francis

*Email: [email protected]

Bioacoustics, 2013

http://dx.doi.org/10.1080/09524622.2013.850040

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

http://dx.doi.org/10.1080/09524622.2013.850040

vocalizations may be an important concept, it is poorly posed. There does not appear to be

a single definition of complexity that can be appropriately applied to all species (Edmonds

1999). Traditionally, complexity in birdsong has been measured as repertoire size

(Catchpole and Slater 2003), i.e. the number of discrete different vocal sequences used by

an individual. Such a measure is the ‘gold standard’ for birdsong research, but is

problematic in the investigation of mammalian vocalizations. Whereas most bird species

have a few songs in their repertoire, up to a few tens, or possibly 100 songs (Catchpole and

Slater 2003), many mammals produce more complex vocal sequences, which are rarely

the same on repetition. For example, in a study of the vocalizations of the rock hyrax

Procavia capensis (Kershenbaum et al. 2012), out of 264 vocalizations of six or more

elements, or ‘syllables’, recorded from 39 different animals, only 15 calls were ever

repeated. Similarly, free-tailed bats combine just three syllable types into sequences that

vary greatly between individuals, and within an individual’s rendition (Bohn et al. 2009).

In these cases, repertoire size is almost impossible to assess with a finite sample size, and

so cannot be an effective measure of complexity.

Another approach to quantifying complexity is to use measures of information theory.

Shannon entropy (Shannon et al. 1949) has been widely used in assessing animal

communication behaviour (Da Silva et al. 2000; Suzuki et al. 2006; Doyle et al. 2008;

Freeberg and Lucas 2012). However, the use of entropy to quantify complexity is

problematic. Shannon entropy measures the ‘unpredictability’ of a sequence without

regard to the order in which the different elements, or ‘characters’, occur, and therefore

does not represent information contained in the order, or syntax, of those characters. Most

previous work on information-theoretic analysis of animal vocal sequences has sufficed

with measures of Shannon, or ‘zero-order’ entropy. Syntax can be captured by higher-

order entropy measurements, such as conditional entropy, which take into account joint

probability and the combination of characters as longer substrings (McCowan et al. 1999).

Unfortunately, estimating higher-order entropies requires very long sample sequences,

and may be impractical for sequences with many different types of characters (Briefer

et al. 2010), or those for which limited empirical data are available. Data gathered in the

field are often of limited length and consist of fragments of varying length. For example, in

the hyrax study cited above, half of the recordings are four characters or less long, whereas

only 10% have a length greater than 15 characters. Hyraxes make use of five different

character types, or ‘syllables’, so even those four character sequences can take any of

54 ¼ 625 combinations; almost the size of the entire corpus. Estimating any information-

theoretic measure based on the transition probabilities between characters in short

sequences is likely to be highly inaccurate (Cover and Thomas 1991; Hausser and

Strimmer 2009).

Despite this, some measure of syntax is required to capture the complexity of animal

vocal sequences. Consider the following ‘word’, or string of characters: 4 4 1 4 3 1 2 3 4 4,

which has a Shannon entropy of 0.881. The same word when sorted into ascending order:

1 1 2 3 3 4 4 4 4 4, has exactly the same Shannon entropy value. Which sequence is more

‘complex’? In general, effective communication appears to require a trade-off between

information capacity and syntactic structure (Ferrer-i-Cancho 2006), as high entropy

increases information capacity, but reduces signal fidelity and increases the cognitive cost

of signal processing (Ferrer-i-Cancho and Sole 2003).

The question of defining animal communication complexity is often poorly posed

because of the absence of knowledge of the production processes generating the data

sequence in question. As different communication scenarios – let alone different species –

might generate sequences based on different statistical and behavioural processes, a

2 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

single measure of complexity is unlikely to be successful for addressing all research

hypotheses.

A number of different probabilistic processes have been used to describe animal vocal

sequence production. The null model is usually that of random character selection,

independent of preceding characters, either with a uniform prior probability for each

character, or with a stationary but non-uniform distribution (more realistic where some

vocal characters are more common than others). Deviation from such a distribution is

considered evidence for non-random sequence generation (Bohn et al. 2009; Sayigh et al.

2012). The simplest non-null model of sequence production is the first-order Markov

process (FOMP; Grinstead and Snell 1997), in which the probability of a particular

character appearing is solely determined by the value of the preceding character. Markov

processes have been used to model vocal sequence production in primates (Robinson

1979), dolphins (McCowan et al. 1999), bats (Bohn et al. 2009) and birds (Gentner and

Hulse 1998; Leonardo and Konishi 1999; Gil and Slater 2000). Higher-order Markov

processes have been used to describe vocal sequences in birds (Briefer et al. 2010), and

even longer-range correlations between characters have been shown in dolphin

vocalizations (Ferrer-i-Cancho and McCowan 2012). However, the more parsimonious

hidden Markov model (HMM) representation (Cappe et al. 2005), in which not all possible

sequences need to be enumerated, has been more successful in birdsong research. Jin and

Kozhevnikov (2011) and Katahira et al. (2011) used HMMs to model the production

process of Bengalese finches, and discussed the possible neural mechanism behind such a

model (Jin 2009).

Despite this, behavioural processes in general may not correspond to simple Markov

processes either of first order, or of any higher order. It has been suggested that

behavioural decisions can be better modelled by a Markov renewal, or semi-Markovian

process (Cane 1959), in which each behaviour continues for a certain duration, determined

by a specific probability distribution, before transitioning to the subsequent behaviour. In

the case of vocal sequences, ‘continuing behaviour’ would correspond to repeating the

same character for a certain number of times, until another character is chosen (Pruscha

and Maurus 1979). The Markov renewal process (MRP) may be a more suitable model for

song generation in those species in which multiple repeated characters are common, and in

which character choice is driven by simple behavioural rules, rather than by grammar (as it

is in human language). The sequences generated by a MRP do not necessarily show

Markovian statistics (Nelson 1995), and such deviation may be used to identify the nature

of the underlying generation process.

In this study, I will examine the use of the statistical property known as the Markov

entropy rate (Cover and Thomas 1991) as a complexity metric. Entropy rate has been

suggested from time to time as a potential metric of sequence complexity in the field of

animal vocal communication, most notably by Chomsky (2002) as a measure of

‘information content’ in finite-state grammars. However, no concerted effort has been

made to assess the relevance of entropy rate to non-human animal communication

research. This measure can in principle be applied usefully to a range of animal

communication examples and has advantages over existing approaches in certain

circumstances, and with certain assumptions. By comparing the results of this metric with

sequences generated by different stochastic production processes, I will examine its

properties and discuss when it is most appropriate to use this measure in place of other,

more established techniques. To simulate an accurate assessment, I use real-world word

length distributions from the hyrax study of Kershenbaum et al. (2012) and investigate

how the accuracy of the complexity estimate scales with data-set size.

Bioacoustics 3

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

I address four questions addressing how appropriate entropy rate is as a metric of

animal communication complexity: (1) To what extent is it a reliable (precise) measure of

entropy rate in realistic situations? That is, how well does the metric agree with the

expected or theoretical value of the metric, for a particular stochastic production process?

(2) To what extent is it possible to estimate entropy rate with realistic sampling effort?

Convergence of any metric with increased sampling effort is an important property for a

reliable estimate. (3) To what extent does entropy rate measure the behavioural property of

interest (complexity)? This question is harder to answer, as we lack a globally relevant

definition of complexity. However, it is instructive to choose a property such as

Kolmogorov complexity, which quantifies the computational effort necessary to specify a

sequence (Denker and Woyczynski 1998, Section 4.3), and observe how the entropy rate

and other metrics of complexity correlate to such a benchmark. (4) What can entropy rate

tell us about putative stochastic production processes of vocal sequences in real data-sets?

Do any of the proposed complexity measures provide a consistent comparison between

stochastic production models?

Methods

I define here three separate putative sequence-generation processes: (a) a zero-order

Markov process (ZOMP), (b) a FOMP and (c) a MRP. Consider a ‘word’, or string of

characters, taken from an ‘alphabet’ A consisting of C characters, A ¼ [1 . . .C ], that occur

with limiting and stationary probabilities pi; i [ A. The ZOMP is a process of random

selection of characters according to the fixed prior distribution vector, p. The FOMP is a

process where the probability of choosing a particular character depends only on the

preceding character, and the C £ C first-order transition (conditional probability) matrix

T, such that the probability of character j occurring after character i is fixed by Ti,j. Note

that a ZOMP is a special case of the FOMP, where the columns of T are constant, Ti,j ¼ kj.

The MRP is similar to the FOMP, except that each character is repeated n times, where n is

taken from a Poisson distribution with variance li; i [ A, defined separately for each

character i. The MRP is therefore defined both by a transition matrix P (known as the

‘embedded’ transition matrix) that defines the conditional probability of transitions

between characters (with zero along the diagonal) and by a vector of Poisson distribution

parameters l. The MRP produces sequences that appear similar to those that would be

generated by a FOMP with large probabilities along the major diagonal of T (i.e. Ti,i), and

near-zero off-diagonal (Figure 1). However, this similarity is illusive. Sequences

generated by an MRP are only Markovian if the number of repeats is distributed

exponentially (Nelson 1995); if the repeats are drawn from a Poisson distribution, the

MRP sequence cannot be adequately described by a single transition matrix T.

I examine three candidate complexity measures: (a) unconditional Shannon entropy,

(b) the Lempel–Ziv complexity as a surrogate for repertoire size and (c) the Markov

entropy rate. First, the Shannon entropy is defined simply in terms of the stationary

probability distribution p of the alphabet A, so that

SE ¼ 2X

i[A

pi logpi: ð1Þ

As mentioned previously, song repertoire is a candidate for a complexity measure, but

is difficult to calculate in the case of rarely repeated sequences. Fortunately, a

computational equivalent exists. Arising from the field of data compression, the Lempel–

4 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

Ziv complexity measure (Lempel and Ziv 1976) is a quantification of how many distinct

patterns (of varying length) exist in a string, in relation to the total number of possible

patterns. For example, in the string [1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0], there are eight

different patterns: 1, 0, 1 0, 0 1, 0 1 0, 0 1 0 1, 1 1 and 1 1 0 (taken from Doganaksoy and

Gologlu 2006). As such, Lempel–Ziv complexity is an estimate of the Kolmogorov

complexity of the sequence (Evans and Barnett 2002). Although not directly comparable

to repertoire size, we can use the Lempel–Ziv complexity as an estimate of the diversity of

a vocal sequence. To my knowledge, this metric has not previously been used for this

purpose, but has been suggested by Suzuki et al. (2006), and has been used to estimate

Figure 1. (Colour online) Comparison of the MRP and the FOMP. (a) Examples of stringsgenerated by the two processes. The first string (1) is generated by the MRP defined by transitiontable P and Poisson variances l (with the corresponding distributions shown to the right). The FOMPtransition table T is the maximum likelihood estimator of the transition table, generated from anextended version of string (1). Note that while the MRP P has zeros along the main diagonal (asrepeated characters are defined by the Poisson vector l), the FOMP T attempts to capture both thetransitions between characters, and the character repeats, using a single matrix. The second string (2)is an example of a sequence of characters generated by T. (b) Statistical difference betweensequences generated by the two processes. In a sequence of 104 characters, the MRP (left) generatesapproximately 10% of the 55 ¼ 3125 possible 5-gram combinations. In contrast, in sequencesgenerated by the FOMP derived from the maximum likelihood estimator of the transition table,approximately 20% of the possible 5 gram appear.

Bioacoustics 5

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

complexity in other applications such as the analysis of DNA sequences (Orlov and

Potapov 2004) and network security (Evans and Barnett 2002).

The entropy rate of a stationary Markov process (Cover and Thomas 1991) can be

calculated as follows. Given the C £ C transition matrix T, and the stationary distribution

p, entropy rate is then defined as:

ER ¼ 2X

i[A

pi

X

j[A

Ti;j log Ti;j: ð2Þ

Even if the conditions for the formal definition of entropy rate (e.g. stationary, first-

order process) are not met, Equation (2) conditional entropy given the preceding character

still provides a potentially useful measure of the entropy rate. Notice that the entropy rate

in a sense measures the ‘entropy’, or ‘unevenness’ of the transition matrix itself; that is,

2Pj

Ti;j log Ti;j is high when all the elements of T have similar values, but low when T is

‘uneven’, i.e. some transitions are likely, while others are less likely. This feature of

unevenness in transition likelihood appears to be a promising candidate for predicting

complexity, as the sequences with low entropy rate will be less random and more

stereotyped.

To simulate the constraints of data collection in animal field studies, I generated

artificial sequences according to the word length distribution found in a real data-set. For

this, I used data from Kershenbaum et al. (2012) on the vocalizations of the rock hyrax in

northern Israel. This data-set consists of 967 coded sequences of hyrax song, with

sequence length varying between 1 and 48 characters long (mean length ¼ 4.4). The hyrax

songs were coded into five distinct characters, i.e. C ¼ 5; for details see Kershenbaum

et al. (2012).

I performed a Monte Carlo analysis in which I repeatedly generated full data-sets of

967 random words (sequences) according to the word length distribution of the hyrax data-

set. I generated data-sets using each of the three models: ZOMP, FOMP and MRP, each

time selecting random values for the parameters: p, T, P and l, so that each data-set was

generated by a different set of random parameter values. I selected random values from a

uniform distribution: for the parameters p, T and P in the range [0–1], and l in the range

[0–2C ]. Having selected random parameters, I normalized the probabilities to sum to

unity (e.g.Pi

pi ¼ 1;Pj

Ti;j ¼ 1). I then calculated the three complexity measures:

Shannon entropy, Lempel–Ziv complexity and Markov entropy rate, for a set of N

randomly selected words from each data-set, varying the number N of words selected,

between 5 and 965. I simulated a total of 1000 Monte Carlo replicates for each model, with

different random parameter values for each replicate.

For each replicate, I compared the three complexity measures with the expected

complexity. The expected Shannon entropy SE* and Markov entropy rate ER* can be

derived deterministically given the transition matrices for the ZOMP and FOMP (with

their replicate-specific parameter values) as follows. Shannon entropy is given by

Equation (1), because for the ZOMP, p is simply the prior probability parameter

vector. For the FOMP, the stationary distribution p of T can be found by solving

pT ¼ p. Shannon entropy and Markov entropy rate cannot be calculated directly from

the parameters of the MRP model, nor can the expected Lempel–Ziv complexity LZ*be derived from the parameters of any of the three models. To approximate the true

values of the complexity measures for each data-set, without being constrained by the

sample size of the hyrax data-set, I generated a very long sequence (106 characters)

6 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

using the same process parameters and calculated the complexity empirically from

this. Having determined the expected complexity metrics, SE*, ER* and LZ*, I

compared these with the complexity measures se, er and lz, calculated on the artificial

sequences for varying sample size N and for each production model m ¼ {ZOMP,

FOMP, MRP}.

To compare the metrics in a real data-set, I also calculated the absolute values of

the complexity estimates on the original hyrax data-set. I then compared these with the

complexity metrics on additional artificial ZOMP, FOMP and MRP data-sets,

generated from the three models, using maximum likelihood estimators of p, T and P,

l, respectively, as calculated from the hyrax data. I bootstrapped these sets by

excluding a random 20% of the 967 sequences on each of 1000 iterations, to generate

the equivalent of the previous Monte Carlo simulation. As well as testing the

complexity metrics on the hyrax data-set as a whole, I repeated this test separately for

the vocalizations of the three-most prolifically vocal individual hyraxes, taken from

three geographically separate sites (Table 1), to ensure that the conclusions drawn can

be applied both to individual vocal behaviour and to the vocal characteristics of the

regional population.

All calculations were performed in Matlab 7.14, with the use of the Applied Nonlinear

Time Series Analysis library (Small 2005) to calculate the Lempel–Ziv complexity.

Results

Precision of the complexity metrics

Figure 2 shows the estimated complexity metric plotted against the true metric value, as

determined either from theoretical considerations or from very long simulations.

Measuring each of the three complexity metrics on a realistic-sized data-set gives good

estimates of the true value for the ZOMP and FOMP series, but all metrics overestimate

the true value on the MRP data.

When assessing the MRP specifically, the error of each of the metrics is significantly

correlated to the mean of the Poisson variances l ( p , 0.001 for all metrics). The

Lempel–Ziv (R ¼ 0.55) and entropy rate (R ¼ 0.93) metrics are most accurate when the

mean of l is small (Figure 3), i.e. when few repeats exist, and the sequence tends to be

defined solely by the embedded transition matrix P. In such a case, the sequence is stronglyMarkovian. In contrast, the Shannon entropy metric (R ¼ –0.55) is most accurate when

the mean of l is large, and the sequence consists mostly of repeats, so that an accurate

estimate of the syllable prior probabilities can be made.

Table 1. Vocal characteristics of the hyrax data-sets used.

Prior probability of syllable types

Numberof songs

Numberof syllables 1 2 3 4 5

All sites 969 4378 0.2308 0.1355 0.3999 0.1320 0.1018Haifa 68 464 0.1746 0.3211 0.3772 0.1078 0.0194Karkum 32 288 0.2153 0.1667 0.2743 0.2604 0.0833Wadi Oren 26 287 0.0453 0.1742 0.6132 0.0557 0.1115

Notes: ‘All sites’ shows the total number of songs and total number of syllables (equivalent to ‘characters’) for theentire data-set, as well as the prior probabilities of each of the five-syllable types. The three sites, ‘Haifa’,‘Karkum’ and ‘Wadi Oren’, represent three different individuals from geographically distinct populations,recorded on separate occasions.

Bioacoustics 7

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

Convergence with sampling effort

Figure 4 shows the convergence of the complexity estimates with increased sampling

(larger number of words). For each metric q, the relative change Dq ¼ q(w þ 1)–q(w)

decreases as more words w are sampled, but for each model the Lempel–Ziv complexity

converges the most slowly. Both the Shannon entropy and the entropy rate converge well,

but the Lempel–Ziv complexity appears to require a much larger corpus of samples to

provide a stable estimate of the metric.

Relevance to true complexity

Although we have no good measure of ‘true’ complexity, I used the expected Lempel–Ziv

complexity LZ*, as a proxy for Kolmogorov complexity (Evans and Barnett 2002), against

Figure 2. (Colour online) (a) Relative errors of the three complexity measures on the three processmodels, for the maximum sample size of 965 words. Boxplots show the values over the 1000 MonteCarlo iterations. (b) Plots showing the estimated metrics versus the true metrics, for each of the threemodel types. Points represent the first 100 Monte Carlo iterations, and the solid line indicatestrue ¼ estimated. Note that the scaling varies for clarity.

8 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

which to assess Shannon entropy and entropy rate. Figure 5 shows the estimated Shannon

entropy and entropy rate, compared with LZ*. For the ZOMP and FOMP processes, both

Shannon entropy and entropy rate underestimate LZ*, although the distribution of entropy

rate shows less variance. For theMRP process, bothmetrics overestimate the complexity, but

the entropy rate gives a truer estimate than the Shannon entropy.

Comparison with the real data-set

Comparing the complexity measures on the simulated results with the hyrax data-set, we

see that in both the entropy rate and Lempel–Ziv metrics rate, the hyrax vocalizations are

similar in complexity to the ZOMP and FOMP, but considerably less complex (higher

metrics) than the MRP (Figure 6). This trend is consistent between the entire data-set and

between the vocal data for the individual animals. The Shannon entropy metric of the

hyrax data fails to show any consistent trend. Given the small sample size in the real-world

data-set, it is not possible to draw conclusions about the speed of convergence of the

different metrics on these data.

Figure 3. (Colour online) Correlation of the error in the complexity metric estimate, to the varianceof the MRP Poisson repeat process l (averaged over all five characters) for (a) Shannon entropy, (b)Lempel–Ziv complexity and (c) entropy rate. All correlations are significant at p , 0.001.

Figure 4. (Colour online) Convergence of the complexity metrics with sampling effort, displayedas the relative change in the metrics, per additional 30 words sampled. Each panel shows theconvergence for different production models (ZOMP, FOMP and MRP), and the lines show themedian over all replicates for the different metrics (dotted: Shannon entropy; dashed: Lempel–Zivcomplexity; solid: entropy rate).

Bioacoustics 9

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

Discussion

I found that Shannon entropy of individual characters appears to be a poor estimator of

sequence complexity. Although the Shannon entropy converges quickly, and accurately

estimates the true Shannon entropy of the production process, it does not provide a good

estimate of Kolmogorov (LZ) complexity for any production process tested. More

importantly, when applied to a real data-set, the Shannon entropy gives highly variable

results and does not provide insight into the statistical nature of the sequences. The

Lempel–Ziv metric appears a good candidate for estimating Kolmogorov complexity.

However, it converges slower than the other metrics and is likely to be inaccurate at

smaller sample sizes. The entropy rate metric converges well for all production models

and is a better estimator of Kolmogorov complexity than Shannon entropy. For the MRP

model, for which errors were highest for all metrics tested, the entropy rate is the most

strongly dependent on the renewal process l, probably because the large number of

repeats in an MRP with large l leads to a strongly non-Markovian sequence (Figure 1b),

and therefore one poorly characterized by the entropy rate (Figure 5). Handel et al. (2012)

found that a large number of repeated units in the song of the humpback whaleMegaptera

novaeangliae markedly reduce the estimated first-order entropy measure. In contrast, I

found that all three metrics overestimate the theoretical value in the case of the MRP

(Figure 2b), implying that the sequences are more random than they really are. This

difference in findings is almost certainly the result of the highly structured nature of

humpback whale song, in contrast to the less ordered hyrax song, and emphasizes the

importance of understanding the nature of song structure before interpreting a particular

complexity metric.

My examination of the hyrax vocal sequences showed that the entropy rate and

Lempel–Ziv metrics scaled much more similarly to that of the pure Markov processes

(zero- and first-order), than to the renewal process, and this may imply that the generation

process in hyrax vocal production has more in common with the Markov chain paradigm

than with the renewal or semi-Markov process. This tendency was consistent between

different individual animals, and across the entire hyrax data-set.

Researchers have long attempted to model animal vocal sequences as probabilistic

processes, e.g. Pruscha and Maurus (1979), and a logical step forward was to use

information-theoretic metric to quantify such sequences. Li (1991) attempted to describe

the relationship between ‘intuitive’ complexity and computational complexity (i.e.

Figure 5. (Colour online) Relationship of Shannon entropy and entropy rate, with Lempel–Zivcomplexity, each normalized to 1 by dividing by their maximum values. Each panel shows theresults for a different production model, ZOMP, FOMP and MRP. Shannon entropy is indicated bycrosses, and entropy rate by circles. The solid line indicates 1:1. Note that the scaling varies forclarity.

10 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

entropy), for both Markov chains and regular languages, but observed that the relationship

is not one-to-one and that the deviations from a simple complexity–entropy relationship

depend on the details of the short-range correlations in a sequence. Several studies have

used entropy and other information-theoretic measures to describe animal communication

complexity (e.g. Da Silva et al. 2000; Ferrer-i-Cancho and McCowan 2009; Freeberg and

Lucas 2012), and the approach has even been proposed as a technique for searching for

extra-terrestrial intelligence (Doyle et al. 2011). However, the relevance of the whole

paradigm of an information-theoretic analysis of animal communication has been hotly

debated (Owren et al. 2010; Ruxton and Schaefer 2011). The use of some information

theory metrics such as Zipf’s law has not gained acceptance as a means of representing

information content (McCowan et al. 1999; McCowan et al. 2005; Suzuki et al. 2005).

Figure 6. (Colour online) Complexity metric measures for the three simulated data-sets, ZOMP(dotted line), FOMP (dashed line) MRP (dot-dash line), and for the real hyrax data-set (solid line), asthe sample size is varied. The first column shows the Shannon entropy, the middle column shows theLempel–Ziv complexity and the right column shows the entropy rate. The top row shows the resultsfor the entire hyrax data-set, while the three lower rows show examples from the vocalizations ofthree individual animals. Note that the scaling varies for clarity.

Bioacoustics 11

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

Much of the controversy arises from misunderstandings of the role of information theory

metrics in describing proximal behaviour. The field of information theory was first

developed to quantify signalling efficiency over a noisy channel (Shannon et al. 1949),

rather than to assess behavioural complexity. When measures of entropy are used to

explain animal vocal complexity as a mechanism for communication reliability (e.g.

Doyle et al. 2008), little controversy exists. However, Shannon entropy seems a poor

metric for explaining behavioural complexity (Suzuki et al. 2005). Repertoire size has

been a standard and accepted measure of birdsong complexity (Berwick et al. 2011),

which is probably less appropriate for more diverse vocal sequences.

Conclusion

I have shown how different information-theoretic metrics behave quite differently when

used to describe artificial data-sets arising from different stochastic processes. As the

mechanism of generation of animal vocalizations invariably is unknown a priori, it

follows that selection of a comparative complexity metric is highly problematic. The best

advice is, if possible, to choose a complexity metric based on the behavioural hypothesis

being investigated. For example, if we want a measure of communication diversity, then

the Lempel–Ziv metric may be appropriate, although care must be taken to ensure a

sufficient sample size to allow for convergence. However, if we are interested in the

diversity of transitions in a sequence – as in many cases when investigating syntactic

diversity – entropy rate appears to be a good choice, as it captures the unevenness of the

transition matrix, and converges at small sample sizes.

Acknowledgements

Arik Kershenbaum is a Postdoctoral Fellow at the National Institute for Mathematical and BiologicalSynthesis, an Institute sponsored by the National Science Foundation, the U.S. Department ofHomeland Security and the U.S. Department of Agriculture through NSF Award #EF-0832858, withadditional support from The University of Tennessee, Knoxville. Part of this work was conductedwhile Arik Kershenbaum was provided with a doctoral scholarship by the University of Haifa.

References

Berwick RC, Okanoya K, Beckers GJL, Bolhuis JJ. 2011. Songs to syntax: the linguistics ofbirdsong. Trends Cogn Sci (Regul Ed) 15:113–121.

Bohn KM, Schmidt-French B, Schwartz C, Smotherman M, Pollak GD. 2009. Versatility andstereotypy of free-tailed bat songs. PLoS ONE 4:e6746.

Briefer E, Osiejuk TS, Rybak F, Aubin T. 2010. Are bird song complexity and song sharing shapedby habitat structure? An information theory and statistical approach. J Theor Biol 262:151–164.

Cane VR. 1959. Behaviour sequences as semi-Markov chains. J R Stat Soc Ser B Stat Methodol21:36–58.

Cappe O, Moulines E, Ryden T. 2005. Inference in hidden Markov models. New York: SpringerScience Business Media.

Catchpole CK, Slater PJB. 2003. Bird song: biological themes and variations. Cambridge:Cambridge University Press.

Cholewiak DM, Sousa-Lima RS, Cerchio S. 2012. Humpback whale song hierarchical structure:historical context and discussion of current classification issues. Mar Mamm Sci 29:E312–E332.

Chomsky N. 2002. Syntactic structures. 9th ed. The Hague: de Gruyter Mouton.Clarke E, Reichard UH, Zuberbuhler K. 2006. The syntax and meaning of wild gibbon songs. PLoS

ONE 1:e73.Cover TM, Thomas JA. 1991. Elements of information theory. New York (NY): Wiley.

12 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

Da Silva ML, Piqueira JRC, Vielliard JME. 2000. Using Shannon entropy on measuring theindividual variability in the rufous-bellied thrush Turdus rufiventris vocal communication.J Theor Biol 207:57–64.

Denker M, Woyczynski WA. 1998. Introductory statistics and random phenomena: uncertainty,complexity and chaotic behaviour in engineering and science. Boston: Springer.

Doganaksoy A, Gologlu F, et al. 2006. On Lempel–Ziv complexity of sequences. In: Gong G,Helleseth T, Song H, editors. Sequences and their applications – SETA 2006. Berlin: Springer.p. 180–189.

Doyle LR, McCowan B, Hanser SF, Chyba C, Bucci T, Blue JE. 2008. Applicability of informationtheory to the quantification of responses to anthropogenic noise by southeast Alaskan humpbackwhales. Entropy 10:33–46.

Doyle LR, McCowan B, Johnston S, Hanser SF. 2011. Information theory, animal communication,and the search for extraterrestrial intelligence. Acta Astronaut 68:406–417.

Edmonds B. 1999. What is complexity? The philosophy of complexity per se with application tosome examples in evolution. In: Heylighen F, Aerts D, editors. The evolution of complexity.Dordrecht: Kluwer. p. 1–16.

Evans SC, Barnett B. 2002. Network security through conservation of complexity. IEEEProceedings MILCOM 2002:1133–1138.

Ferrer-i-Cancho R. 2006. When language breaks into pieces: a conflict between communicationthrough isolated signals and language. BioSystems 84:242–253.

Ferrer-i-Cancho R, McCowan B. 2009. A law of word meaning in dolphin whistle types. Entropy11:688–701.

Ferrer-i-Cancho R, McCowan B. 2012. The span of correlations in dolphin whistle sequences. J StatMech 2012:P06002.

Ferrer-i-Cancho R, Sole RV. 2003. Least effort and the origins of scaling in human language. ProcNatl Acad Sci USA 100(3):788–791.

Freeberg TM, Dunbar RIM, Ord TJ, Freeberg TM, Dunbar RIM, Ord TJ. 2012. Social complexity asa proximate and ultimate factor in communicative complexity. Philos Trans R Soc Lond B BiolSci 367:1785–1801.

Freeberg TM, Lucas JR. 2012. Information theoretical approaches to chick-a-dee calls of Carolinachickadees (Poecile carolinensis). J Comp Psychol 126:68–81.

Gentner TQ, Hulse SH. 1998. Perceptual mechanisms for individual vocal recognition in Europeanstarlings, Sturnus vulgaris. Anim Behav 56:579–594.

Gil D, Slater PJ. 2000. Song organisation and singing patterns of the willow warbler, Phylloscopustrochilus. Behaviour 137:759–782.

Green SR, Mercado P, III E, ack AA, Herman LM. 2011. Recurring patterns in the songs ofhumpback whales (Megaptera novaeangliae). Behav Processes 86:284–294.

Grinstead CM, Snell JL. 1997. Chapter 11 Markov chains. In: Grinstead CM, Snell JL, editors.Introduction to probability. 2nd ed. Providence (RI): American Mathematical Society. p. 405–470.

Handel S, Todd SK, Zoidis AM. 2012. Hierarchical and rhythmic organization in the songs ofhumpback whales (Megaptera novaeangliae). Bioacoustics 21:141–156.

Hausser J, Strimmer K. 2009. Entropy inference and the James–Stein estimator, with application tononlinear gene association networks. J Mach Learn Res 10:1469–1484.

Jin DZ. 2009. Generating variable birdsong syllable sequences with branching chain networks inavian premotor nucleus HVC. Phys Rev E 80:051902.

Jin DZ, Kozhevnikov AA. 2011. A compact statistical model of the song syntax in Bengalese finch.PLoS Comput Biol 7:e1001108.

Katahira K, Suzuki K, Okanoya K, Okada M. 2011. Complex sequencing rules of birdsong can beexplained by simple hidden Markov processes. PLoS ONE 6:e24516.

Kershenbaum A, Ilany A, Blaustein L, Geffen E. 2012. Syntactic structure and geographical dialectsin the songs of male rock hyraxes. Proc R Soc Lond B Biol Sci 279:2974–2981.

Lempel A, Ziv J. 1976. On the complexity of finite sequences. IEEE Trans Inform Theory 22:75–81.Leonardo A, Konishi M. 1999. Decrystallization of adult birdsong by perturbation of auditory

feedback. Nature 399:466–470.Li W. 1991. On the relationship between complexity and entropy for Markov chains and regular

languages. Complex Syst 5:381–399.

Bioacoustics 13

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

McCowan B, Doyle L, Jenkins J, Hanser S. 2005. The appropriate use of Zipf’s law in animalcommunication studies. Anim Behav 69:1–7.

McCowan B, Hanser SF, Doyle LR. 1999. Quantitative tools for comparing animal communicationsystems: information theory applied to bottlenose dolphin whistle repertoires. Anim Behav57:409–419.

Nelson R. 1995. Probability, stochastic processes, and queueing theory: the mathematics ofcomputer performance modeling. New York: Springer Verlag.

Ord TJ, Garcia-Porta J, Ord TJ, Garcia-Porta J. 2012. Is sociality required for the evolution ofcommunicative complexity? Evidence weighed against alternative hypotheses in diversetaxonomic groups. Philos Trans R Soc Lond B Biol Sci 367:1811–1828.

Orlov YL, Potapov V. 2004. Complexity: an internet resource for analysis of DNA sequencecomplexity. Nucleic Acids Res 32(Suppl 2):W628–W633.

Owren MJ, Rendall D, Ryan MJ. 2010. Redefining animal signaling: influence versus information incommunication. Biol Philos 25:755–780.

Pollard KA, Blumstein DT. 2012. Evolving communicative complexity: insights from rodents andbeyond. Philos Trans R Soc Lond B Biol Sci 367:1869–1878.

Pruscha H, Maurus M. 1979. Analysis of the temporal structure of primate communication.Behaviour 69:118–134.

Robinson JG. 1979. An analysis of the organization of vocal communication in the titi monkeyCallicebus moloch. Z Tierpsychol 49:381–405.

Ruxton GD, Schaefer HM. 2011. Resolving current disagreements and ambiguities in theterminology of animal communication. J Evol Biol 24:2574–2585.

Sayigh L, Quick N, Hastie G, Tyack P. 2012. Repeated call types in short-finned pilot whales,Globicephala macrorhynchus. Mar Mamm Sci 29:312–324.

Shannon CE, Weaver W, Blahut RE, Hajek B. 1949. The mathematical theory of communication.Urbana: University of Illinois Press.

Shapiro AD, Tyack PL, Seneff S. 2010. Comparing call-based versus subunit-based methods forcategorizing Norwegian killer whale, Orcinus orca, vocalizations. Anim Behav 81:377–386.

Small M. 2005. Applied nonlinear time series analysis: applications in physics, physiology andfinance. Singapore: World Scientific Publishing Company Incorporated.

Suzuki R, Buck JR, Tyack PL. 2005. The use of Zipf’s law in animal communication analysis. AnimBehav 69:9–17.

Suzuki R, Buck JR, Tyack PL. 2006. Information entropy of humpback whale songs. J Acoust SocAm 119:1849–1866.

14 A. Kershenbaum

Dow

nloa

ded

by [

Uni

vers

ity o

f H

aifa

Lib

rary

] at

15:

13 2

2 O

ctob

er 2

013

Documents

Entropy rate as a measure of animal vocal complexity