Christopher SmithMay 4, 2010REL 441 HC, American ScripturesDr. Richard Bushman
The Urantia Book as a Test Case for Statistical Authorship Attribution in Genre-Distinctive Texts
Introduction
A recent statistical study of the Book of Mormon by Matthew Jockers, Daniela Witten,
and Craig Criddle (hereafter Jockers, et. al.) attempted to determine who authored that book by
measuring a set of word frequencies for each chapter and comparing them to word frequencies in
texts known to have been penned by a set of candidate authors. The word frequencies were
analyzed using two related classifications: Delta and Nearest Shrunken Centroids (NSC). In
their control tests, the authors found both classifications to be quite accurate, with NSC emerging
as a slightly more robust technique.i Application of the classifications to the Federalist Papers
resulted in similarly encouraging results, with Delta producing only three cross-validation errors
and NSC producing none.ii
Although the Federalist Papers are a classic case, however, they are not really
comparable to the Book of Mormon. According to Shlomo Argamon, the assumptions of word-
frequency analysis “fundamentally limit use of the method [to cases in which] all the samples
(from all authors) are of pretty much the same textual variety, otherwise we would expect the
word frequency distributions over the comparison set to be a mixture of several disparate
distributions, one for each genre found in the set, thus potentially biasing results depending on
the variety of the test text.”iii The Jockers and Witten study of the Federalist Papers satisfied this
criterion, but the Jockers, et. al. study of the Book of Mormon unequivocally did not.
In the Jockers, et. al. study of the Book of Mormon, the individual candidate authors’
“wordprints” were based largely on texts of a single genre or style, with a different genre
predominating for each author. Very few of the samples were of a similar type to the Book of
Mormon. Under these conditions, we would expect the control samples to be reliably attributed
to the proper author even if—perhaps especially if—the Delta method is highly sensitive to genre
and context. If the method is genre-sensitive, however, we would expect to obtain much less
accurate results when testing the candidate authors against a text of a different genre, such as the
Book of Mormon.iv
The present study applies the Delta word frequency classification method to the Urantia
Book (also known as the Urantia Papers), a religious text in many respects comparable to the
Book of Mormon. Like the Book of Mormon, the Urantia Book is highly distinctive in its genre
and style. Also like the Book of Mormon, the Urantia Book claims to have been authored by a
number of divinely inspired superhuman narrators. Skeptics of each book, meanwhile, disagree
as to whether each had a single human author or is the product of a multiple-author conspiracy.
If the Delta attribution method can produce meaningful results when applied to the Urantia
Book, it would tend to bolster its applicability to the Book of Mormon and to other, similar
cross-genre cases.
Unfortunately, the method turns out to be of dubious usefulness in choosing among
candidate authors. When the 197 Urantia Papers were tested against seven candidate authors,
including three likely candidates and four control authors, the large majority of the Papers were
attributed to two of the control authors: Sigmund Freud and myself. Only a very few of the
Papers were attributed to the candidate who, from other evidence, seems to be their most likely
author. Similarly, a test of the text’s internal authorship claims turned out to be moderately
successful in choosing the correct narrator, but it is difficult to assess the significance of this
finding, given that narrator and genre tended to be covariates. The method turns out to be highly
robust for determining the genre of a text, which demonstrates that it is very context-sensitive.
Another application of the method turned out to be more fruitful. In addition to genre,
the method also turns out to be somewhat sensitive to changes in an author’s style over time.
Controlling for genre, we can use the method to chart stylistic trends within the Book. The
basically linear developmental trend that emerges is suggestive of unitary rather than multiple
authorship.
i Matthew L. Jockers, Daniela M. Witten, and Craig S. Criddle, “Reassessing Authorship of the Book of Mormon Using Delta and Nearest Shrunken Centroid Classification,” Literary and Linguistic Computing Advance Access (December 6, 2008), available from http://llc.oxfordjournals.org/cgi/content/short/23/4/465 [accessed April 16, 2010].ii Matthew L. Jockers and Daniela M. Witten, "A Comparative Study of Machine Learning Methods for Authorship Attribution,” Literary and Linguistic Computing Advance Access (April 12, 2010), available from http://llc.oxfordjournals.org/cgi/content/full/fqq001 [accessed April 16, 2010].iii Shlomo Argamon, “Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations, Literary and Linguistic Computing Advance Access (March 1, 2008), available from http://llc.oxfordjournals.org/cgi/content/full/23/2/131 [accessed April 16, 2010]. Even Jockers and Witten admit that “context-specific words” can adversely impact the results. Thus even in the Federalist Papers case, “if the Madison training texts and the test texts address a particular topic that is not addressed by the Hamilton or Jay training texts, then the NSC classifier might use these words as very strong evidence that the test texts were written by Madison.” Jockers and Witten, “Comparative Study,” 6.iv The control tests by which Jockers, et. al. determined their accuracy rates, moreover, split the author corpuses in half and tested the two halves against each other. Such split-halving will tend to average out genre differences on both sides of the test. For the control tests to be comparable to testing individual Book of Mormon chapters, they should have separated out small, individual texts and tested them against the larger corpus. See Jockers, et. al., “Reassessing Authorship,” 7. The application of Delta to the Book of Mormon is further complicated by studies which show that when authors who have no familiarity with statistical attribution methods attempt to obfuscate their style or to imitate the style of another author, the accuracy of stylometric methods is reduced “to the level of random guessing.” Since the Book of Mormon imitates the King James Version of the Bible, stylometry is unlikely to be useful in determining its authorship. See Michael Brennan and Rachel Greenstadt, “Practical Attacks Against Authorship Recognition Techniques,” available from www.cs.drexel.edu/~greenie/brennan_paper.pdf [accessed April 16, 2010].
The Urantia Authorship Controversy
The story of the Urantia Book began sometime between 1906 and 1911, when
psychologist and former Adventist minister William S. Sadler examined an individual known as
the “sleeping subject” (probably Wilfred Kellogg), whose wife was concerned about his
“abnormal movements” while sleeping. To Sadler’s surprise, Kellogg began to speak in his
sleep, claiming to be “a student visitor on an observation mission from another planet.” Sadler
was initially skeptical, but eventually came to believe that this was an authentic spiritual
phenomenon. A group called “the Forum” formed in 1923, and asked questions of the celestial
beings that were then put to the sleeping subject by Sadler and the five other members of the
“Contact Commission”. Answers to the Forum’s questions were provided as formal essays
known as the Urantia Papers. The Forum and the Commission were sworn to secrecy about the
identity of the sleeping subject and the mode by which the Book was received, for fear that
people would become preoccupied with these details rather than studying the Book itself. Sadler
insisted, however, that the Book was not received through channeling or automatic writing. He
strongly implied that the typed pages of the manuscript simply materialized in the room. In 1950
the Urantia Foundation was founded to publish the Book, which it did in 1955.v
The Papers themselves claim to have been written by celestial beings in order to inform
the denizens of Urantia—which is what they call our planet—about God, science, history, the
cosmos, and the life and teachings of Jesus. For the most part, the authors of the Papers are
identified by order of being rather than by name.vi It is clear in some cases, however, that certain
Papers are supposed to have been written by the same individuals. Ken Glasziou has tested these
v Sarah Lewis, “The Peculiar Sleep: Receiving the Urantia Book,” in The Invention of Sacred Tradition, James R. Lewis and Olav Hammer, eds. (Cambridge University Press, 2007), 200-203; Marian Rowley and William S. Sadler, “A History of the Urantia Movement,” typed manuscript (1960), available from http://urantiabook.org/archive/history/histumov.htm [accessed April 21, 2010].
internal authorship claims using a technique pioneered by Mosteller and Wallace that basically
looks at the frequencies with which certain “function words” (articles, conjunctions, and
demonstrative pronouns) are used to begin sentences or clauses. Glasziou found that the method
was able to distinguish among five of the narrators with a high level of statistical significance.vii
Skeptics of the Book’s internal claims have proposed a number of possible human
authors. The most likely scenarios would seem to be that the Book was dictated by the sleeping
subject, that the manuscripts were planted by William Sadler, or that the Book was collectively
authored by the members of the Contact Commission.
Certain stylistic continuities throughout the Book would seem to point in the direction of
unitary authorship, even though the four different “parts” into which it is divided—particularly
the fourth—deal with quite different subject matter. Numbered lists are employed in every
section of the Book. The vocabulary of the Book is almost comically pretentious throughout,
and concepts are presented in excruciating detail. There is a persistent concern to delineate
hierarchies of being and to fit Hebraic and Christian concepts into a systematic, scientific
framework. These features are found even in the most unique part of the book: the extended
narrative of the life of Jesus attributed to the Second Midwayer attached to the Apostle Andrew.
The author turns the teachings of Jesus into a systematic philosophy, and in his narration of the
events of Jesus’s life exhibits an obsession with minor details such as names, locations, and exact
dates. It seems highly unlikely that more than one individual in William Sadler’s circle could
have possessed the distinctive turn of mind of which the Urantia Book seems to be a product.
vi Sometimes a paper is said to be “presented” or “sponsored” by a particular being rather than “written” or “indited”. It is difficult to know whether these all should be treated as equivalent terms.vii Ken Glasziou, “Part 3: Who Wrote the Urantia Papers,” (1996), available from http://urantiabook.org/archive/readers/doc183.htm [accessed April 21, 2010].
As for the identity of the author, William S. Sadler seems the most likely candidate.
Martin Gardner has quite convincingly argued that while some of the conceptual content of the
papers may have been channeled through Wilfred Kellogg, it was in fact Sadler who formulated
the written text. Gardner provides a very extensive list of unusual words and phrases that appear
both in the Urantia Book and in Sadler’s many works. He also demonstrates that the science of
the Book—particularly its endorsements of eugenics and of De Vries’ “mutation theory” of
evolution—reflects of Sadler’s own strongly held views. Other parallels are found in its
theology, its psychiatric prescriptions, and its economic and political theories.viii
Certainly it would have taken a prolific writer who was well-read on many subjects to
produce the English text of the Urantia Book. Its more than 2,000 printed pages provide a
virtually comprehensive view of life, religion, and the universe. Of those involved in the
production of the Book, it is Sadler who best fits this description. A reviewer of one of his
psychology books complained that it did not take long to discover why the book had 1229 “big
pages”: “The author wishes to tell everything, and in . . . a rambling way, a way of much
overlapping, of not a little repetition.”ix These words could as easily have been written of the
Urantia Book. A great many of Sadler’s works fit this description, and he wrote quite a few, on
many different subjects.
The Delta Methodology
The Delta methodology employed in the present paper is slightly modified from Jockers,
et. al.
viii Martin Gardner, Urantia: The Great Cult Mystery (Amherst, New York: Prometheus Books, 1995), 273-320, 423-35.ix Review of William S. Sadler, Theory and Practice of Psychiatry, The Journal of nervous and Mental Disease 86, no. 5 (November 1937): 605-606.
First, lists of frequently-occurring words are generated according to two slightly different
criteria. The first, slightly more stringent criterion admits only those words that occur at least
once in each of the 197 Urantia Papers. There are 39 such words in all.x The second criterion is
to admit all words that occur in each sample corpus—that is, each set of sample texts from a
particular author or genre—at least once per thousand words.xi Both rules are designed to
exclude infrequent or highly contextual words that might skew the results. Most of the tests in
the present paper will employ the second rule, but the first will be used in the test of time-
dependence (which requires testing every Urantia Paper individually against every other Urantia
Paper).
Next, a set of word-frequency vectors for all sample texts is produced. The mean is
subtracted from each vector and it is divided by its standard deviation in order to turn the raw
word frequencies into z-scores. Basically what this does is weights each word vector equally. If
we used raw frequencies rather than z-scores, word vectors with higher absolute variance across
all samples would end up more heavily weighted in the final Delta comparison. For example, if
“and” occurs between 20 and 100 times per thousand words across all samples, whereas “for”
occurs between only 5 and 10 times per thousand words across all samples, then aggregating the
x The words generated under this criterion are: a, all, and, are, as, at, be, but, by, even, for, from, have, in, is, it, more, no, not, of, on, one, or, so, such, that, the, their, these, they, this, time, to, when, which, with. Most of these are what authorship-attribution experts term “function words”. Although they have occasionally been described as “non-contextual”, I have found in my own testing that usage of most of these words is strongly influenced by genre and context.xi For the eight narrator case, the words used are: a, all, an, and, as, at, be, but, by, father, for, from, have, his, in, is, it, life, no, not, of, on, one, only, or, such, that, the, their, there, these, they, this, time, to, when, which, who, with. For the seven author case, the words used are: a, all, an, and, are, as, at, be, but, by, do, for, from, have, in, into, is, it, not, of, on, one, or, so, that, the, their, there, these, this, to, was, we, when, which, will, with. For the genre-controlled case, the words used are: a, all, an, and, are, as, at, be, been, but, by, even, first, for, from, god, has, have, he, his, human, in, into, is, it, life, man, more, no, not, of, on, one, only, or, other, so, spiritual, such, that, the, their, there, these, they, this, those, time, to, upon, was, when, which, while, who, will, with, world, would.
frequency distances between texts would result in the "for" distances being swamped out by the
much larger "and" distances. Dividing each word vector by its standard deviation normalizes the
various word vectors by their variance so that they are comparable and equally weighted.xii
Next, we find the “Delta” distances for each word vector between each test document and
each sample corpus. If we are testing sample X against author Y, for example, then we find the
absolute value of the difference between sample X’s z-score for a given word vector and author
Y’s average z-score for that vector. Once we have done this for all vectors, we average them
together in order to get an average distance between the text and the author. If author Y’s
distance from the sample is smaller than all other authors’, then we conclude that author Y is the
most probable of our candidate authors to have written the text.
An estimate of the accuracy of the method can be obtained by performing control tests in
which each of an author’s sample texts is individually subtracted out from his or her corpus, and
tested against all authors. In theory, the percentage of these control tests that result in attribution
to the correct author represents the method’s accuracy rate. As we have already discussed,
however, this method of estimating accuracy is rendered very problematic by the effect of
exogenous variables such as genre.
Another way to estimate accuracy is a simple chi-squared test. This tells us if our results
are substantially different from what would be expected if we assigned each text to a random
author. Generally, if a chi-squared test results in a “p-score” under .05, the result is considered
“statistically significant”. Again, however, that our results are non-random does not mean that
authorship is the causative variable. If other variables, such as genre, are resulting in
xii The use of standard deviation in weighting the word vectors assumes that the “spread” of each word vector across all our sample texts is a typical spread. If we have a large number samples from one authors of genre, then the standard deviation for one or more of the word vectors used in the analysis might skew small, in which case the vector(s) in question would still be unduly weighted.
substantially skewed and misleading results, we would still expect a low p-score. Thus, neither
of our options for accuracy-estimation is really reliable.
Using the above method, several tests will be conducted. The first will test the internal
authorship claims of the Urantia Book, in order to determine whether its celestial narrators can
be distinguished from each other. The second will test each of the Urantia Papers against three
likely human authors and four control authors. The third will divide the Urantia Book into three
broad genre categories, and test each of the Papers against these categories in order to determine
to what extent genre may be influencing our results. And finally, an additional set of tests will
be performed in order to establish whether the style of the Urantia Book undergoes a linear
evolution when the Papers are arranged sequentially and tested against each other. An attempt
will here be made to control for genre by testing each genre independently against itself, and
then testing the samples from each genre against the larger book (excluding the other samples
from the same genre as a given test text).
All of these analyses are performed using a computer program that I designed myself
using Visual Basic 6 (in conjunction with some features of Microsoft Excel 2003). My program
automates most of the processes required, including determining word ratios, calculating z-
scores and Delta distances, and creating graphs. The role of the researcher, then, is simply to
prepare and classify the text samples to be studied, and to evaluate the results.
Results
In order to test the internal claims of the Urantia Book, it was necessary to omit from the
test those narrators who are identified too ambiguously in the text to allow for meaningful
testing. Narrators to whom only a single paper was attributed were also omitted. This left eight
subjects: the Chief of Seraphim,xiii the Chief of the Archangels of Nebadon,xiv the Divine
Counselor assigned to reveal the attributes of God,xv the Mighty Messenger temporarily
sojourning on Urantia,xvi the Perfector of Wisdom commissioned by the Ancient of Days,xvii the
Second Midwayer attached to the Apostle Andrew,xviii the Solitary messenger of Orvonton,xix and
Solonia, the Seraphic voice in the Garden.xx
Of the 114 text samples tested, 89 of them (i.e., ~78%) were attributed correctly. If the
results were random, 14.25 correct attributions would have been expected. Thus, the results
were highly statistically significant (p < .0001). This would seem to confirm Ken Glasziou’s
results (based on a similar method) supporting the text’s internal claim to multiple authorship.
Actually, though, our chi-squared results indicate only that the results of the test were non-
random. Since the narrator variable is highly correlated to the genre and sequence (i.e., time)
variables, it is very difficult to know which variable actually determined our results. For
example, the most accurate attributions were made to the Second Midwayer attached to the
Apostle Andrew. The section of the book for which this narrator is supposed to have been
responsible also happens to be the most distinctive in terms of content and genre. The texts in
this section that were misattributed tended to be those that deviated from the theme and genre of
the remainder of the section. Thus, it is very possible that the ability of the method to distinguish
among narrators is merely an artifact of the genre, content, or time differentials between the text-
groupings each narrator is supposed to have produced.
xiii Papers 82-84, 113, 114.xiv Papers 33, 35.xv Papers 0-5.xvi Papers 32, 34, 40, 42, 54, 55, 115-118.xvii Papers 11-14.xviii Papers 121-96.xix Papers 107-112.xx Papers 73-76.
For our test of possible early twentieth-century human forgers, text samples were
assembled for three of the most likely authors and five control authors. The three likely authors
are William S. Sadler,xxi Lena Sadler,xxii and William Sadler, Jr.xxiii (Wilfred Kellogg was
excluded because no writing samples could be obtained for him in a searchable digital format.)
The five control authors were Sigmund Freud,xxiv myself,xxv and the biblical writers Matthew and
Luke (from the King James Version).xxvi
Of the 105 control samples, 79 (~75%) were attributed to the correct author. This is
again a very statistically significant result (p < .0001). As in the previous test, however,
sequence and genre were exogenous variables that correlated highly with authorship. The extent
to which authorship was the measured variable, then, is an open question. As for the Urantia
xxi The William S. Sadler sample was comprised of four chapters from his book The Mind at Mischief: Tricks and Deceptions of the Subconscious and How to Cope with Them (New York: Funk & Wagnals Company, 1929), available from http://www.cimmay.us/pdf/sadler.pdf [accessed April 21, 2010]. This is one of his shorter, more popular works, and so is perhaps not ideal for comparison to the Urantia Book. But it was the only work for which a searchable digital text was available.xxii The Lena Sadler sample consists of three chapters from a book she co-authored with her husband: The Mother and Her Child (Toronto: McClelland, Goodchild & Stewart, 1916), available from http://www.gutenberg.org/files/20817/20817-h/20817-h.htm [accessed April 21, 2010]. Again, it is hardly ideal to use a co-authored text, but this was the only digital writing sample available for her. She does seem to have been the primary author of most of the book.xxiii The William Sadler, Jr. sample consists of three chapters from his book, A Study of the Master Universe: A Development of Concepts in the Urantia Book (Second Society Foundation, 1968), available from http://urantiabook.org/studies/smu/index.html [accessed April 21, 2010].xxiv The Sigmund Freud sample consists of a chapter from his Dream Psychology: Psychoanalysis for Beginners, tr. M. D. Eder (New York: The James A. McCann Company, 1920), available from http://www.gutenberg.org/files/15489/15489-h/15489-h.htm [accessed April 21, 2010], and three chapters from The Interpretation of Dreams, tr. A. A. Brill, 3rd ed. (New York: Macmillan, 1911), available from http://www.psychwww.com/books/interp/toc.htm [accessed April 21, 2010].xxv The Christopher Smith sample consists of academic and narrative writings from my personal files, as well as several entries from my personal religious studies blog.xxvi For these samples, the books of Matthew, Luke, and Acts were divided into individual chapters according to the KJV chapter numbering. The KJV text was obtained from http://www.biblegateway.com [accessed April 21, 2010].
Book itself, 91 of the Papers were attributed to Sigmund Freud, 74 to Christopher Smith (that’s
me), 17 to Luke, 11 to William Sadler, 3 to Lena Sadler, 2 to Matthew, and none to William
Sadler, Jr. In a number of cases—particularly in the section that narrates the life of Jesus—the
individual Papers were more similar to the author to which they were assigned than to the
averages for the Urantia Book itself. Obviously, these results are nonsensical. Sigmund Freud
lived in Austria and wrote in German, and thus cannot have authored the Book. I was not even
alive at the time the Book was written. The results for the most likely authors, meanwhile, are
all considerably below the level of randomness. There are a few possible explanations for these
results. The first is that the true author(s) of the Book was/were not included in the test. The
second is that the method simply is not robust for attribution across genres.
In order to get some idea of the sensitivity of the method to genre differences, I
conducted a third test in which the Urantia Papers were divided into three broad genre categories,
and each individual Paper was then subtracted out and individually tested against these
groupings. The three groupings chosen were Cosmo-Theology (which includes texts on the
nature of God, the hierarchies of celestial beings, the structure of the universe, and the
philosophy of religion),xxvii Earth History (including evolutionary, sociological, and religious
history),xxviii and Pseudo-Biblical Narrative (including events and teachings from the life of Jesus,
as well as the author’s reflections on said events and teachings).xxix It might be desirable to
subdivide these genres further, but unfortunately the subgenres are so highly interwoven that to
separate them would be a life’s work. Even the three divisions above required some rather
arbitrary judgments in the cases of several Papers that mix aspects of more than one category. In
xxvii Papers 0-56, 99-120, 196.xxviii Papers 57-98, 121, 195.xxix Papers 122-94.
any case, this test attributed 181 (~92%) of the 197 Papers to the proper genre, with a high level
of statistical significance (p < .0001). As in previous cases, the determinative variable(s) might
be something other or in addition to merely genre.
Presumably, the test that had the highest accuracy is the one that measured the variable
that was most determinative for our results. The accuracies reported for the tests above,
however, are not really comparable, since the number of attribution “candidates” in each case
was different. (For example, there is a much higher likelihood of a “lucky guess” when choosing
among three genres than when choosing among eight narrators.) Thus, a somewhat different
measure of accuracy was created, based on the attribution “rank” the method assigned to the
correct candidate. If all the correct candidates were assigned first rank, the accuracy of the
method is said to be 100%. If all were assigned the mean rank (second out of three, for
example), the accuracy of the method is said to be 0%. If all correct candidates were assigned
last rank, the method’s accuracy is said to be -100%. This measure is calculated by the
following formula:
weighted accuracy = 1 - (1 / (highest rank * # of samples - mean rank * # of samples)) * (sum of
observed rankings - # of samples)
This basically measures the deviation of our results from the mean, “random” level of each test,
and expresses it as a percentage of the total possible variance. It thus equally weights tests with
different numbers of candidates, such that we can compare accuracy rates across multiple tests.
When the weighted accuracy measure is applied to the narrator and genre tests, we find
that the narrator test was ~83% accurate, and the genre test was ~91% accurate. Thus, the
narrator results could be explained as an artifact of covariance with genre, but the genre results
could not be entirely explained as an artifact of covariance with narrator. The genre variable
offers predictive power above and beyond what the narrator variable provides. Having said this,
it is very possible that both narrator and genre are causative variables.
Applying our weighted accuracy measure to the control tests for our human author
candidates returns an accuracy of ~88%. Thus author, too, may be a causative variable. On the
other hand, much of the accuracy of the control tests may be an artifact of covariance with genre.
The absurd results of our attempt to attribute the Urantia Papers to a human author suggest that
major differences in text-type may completely obscure the effects of the authorship variable.
(Alternatively, deliberate obfuscation or imitation on the part of the author may be to blame.)
A final variable to be tested is time, or sequence. Assuming that the Urantia Papers were
written by a single author in the sequence in which they currently appear, we might expect to
detect a more or less linear pattern of stylistic development over the course of the Book.
Significant deviations from this pattern might indicate that the Book had multiple authors or that
the Papers are arranged out of sequence.
In order to assess time dependence, we first test a Urantia Paper against every other
individual Paper. We then graph the Paper’s Delta distances from the other Papers, in sequence,
and perform a linear regression analysis. The slope of the resulting regression line is basically a
measure of the test-Paper’s relative similarity to the front and back halves of the book. If the
slope is positive, the Paper is most similar to Papers near the beginning of the Book. If the slope
is negative, the Paper is most similar to Papers near the end of the Book. Assuming that style is
time-dependent, we would expect our regression slopes to become gradually more negative as
we repeat this analysis for successive test-Papers throughout the Book. And, in fact, this is more
or less what we observe in Figure 1.
Here again, however, we face the problem of exogenous variables. There are three major
genre groupings in the Urantia Book, and they fall more or less in sequence. Thus, the linear
pattern here could easily be an artifact of shifts in genre. In order to control for this problem,
each genre was tested internally against itself, excluding the rest of the Book. The same negative
linear patterns emerge in Figures 2, 3, and 4.
Figure 1
Figure 2
Figure 3
Figure 4
These figures suggest that, at the very least, each of the individual genres had a single
author. Time-dependence across genres is more difficult to assess. It is possible to conduct a
control test in which each individual text within a given genre is tested against the rest of the
Book, excluding the other texts of the test text’s genre. As Figure 5 shows, we still find a
negative linear trend for two of the three genres, but the third genre has a positive linear trend.
Moreover, in the cases of the first two genres, the fit of the data to the regression lines is not as
good as in the within-genre tests. There are a few possible explanations for these results. First,
our genre classifications may be inadequate. Second, the genres may be arranged out of
sequence. (For example, the undated fourth Part of the Urantia Book may have been written
prior to Parts I-III.) Third, the different genres may be the work of different authors. And fourth
and finally, an author’s “voice” for a given genre may evolve independently of his or her
“voices” for other genres, such that we would not expect time-dependence to be clearly
discernible across genres. (For example, a given word may be becoming more frequent in an
author’s narrative writings even as it is becoming less frequent in his or her theological writings.
Presumably, the more different two genres are from each other, the greater the probability that
they will exhibit independent developmental patterns.) It is difficult to know which, if any, of
the above propositions explains our results. Possibly more than one is true.
Figure 5
Conclusion and Directions for Future Research
The results reported here suggest that while Delta classification of word-frequency scores
may well be an accurate authorship attribution method for texts of the same general type, it
cannot be considered effective for attributing highly distinctive pseudonymous texts. Moreover,
analysts should be very careful in their assumptions about causality. That the results of a Delta
analysis are statistically significant does not mean that authorship is the variable they are
measuring. Exogenous variables, including genre, narrator, and sequence of composition, may
considerably complicate attempts to draw meaningful conclusions about authorship from a Delta
analysis. Identifying and finding ways to control for important variables may help achieve more
reliable results.
Even if the Delta method cannot be used to identify the author of a genre-distinctive text,
however, it may still be useful for drawing conclusions about the internal makeup of that text. If
linear patterns of stylistic development can be discerned in the text, it may indicate that the text
was written in sequence by a single author. Alternatively, deviations from this pattern of
development may indicate multiple authorship or out-of-sequence composition. Other texts
should be analyzed using this approach in order to assess its overall usefulness in making
determinations about authorship and order of composition.
Given the present state of statistical authorship methods, they are unlikely to supplant
more traditional modes of analysis for the foreseeable future. If we conclude that William S.
Sadler wrote the Urantia Book, for example, it must be based primarily on carefully collated
textual and historical evidence. Even so, statistical methods can perhaps provide some additional
insight, if carefully controlled and responsibly interpreted. In the present case, the statistics
suggest that the three major genre groupings in the Book seem likely to each have had a single
author, and this author may have been the same individual for all three groups. To say more than
that would require further methodological refinement beyond the scope of the present study.