For more information concerning given names and popularity statistics,
see the Given Name Frequency Project
Comments and Suggestions Welcomed
Version 1.1[1]
A New Account of Personalization and Effective Communication
Douglas A. Galbi
Senior Economist
Federal Communications Commission[2]
September 16, 2001
Abstract
To contribute to understanding of information economies of daily life, this paper explores over the past millennium given names of a large number of persons. Analysts have long both condemned and praised mass media as a source of common culture, national unity, or shared symbolic experiences. Names, however, indicate a large decline in shared symbolic experience over the past two centuries, a decline that the growth of mass media does not appear to have affected significantly. Study of names also shows that action and personal relationships, along with time horizon, are central aspects of effective communication across a large population. The observed preference for personalization over the past two centuries and the importance of action and personal relationships to effective communication are aspects of information economies that are likely to have continuing significance for industry developments, economic statistics, and public policy.
Contents
I. Analyzing Names
II. A Statistical History of Personalization
III. Trends in Effective Communication
IV. Conclusions
References
Appendix A: Additional Data on Name Communication in England Before 1825
Appendix B: Evidence on Variations in Name Statistics
Appendix C: Mary, Group Polarization, and Symbolic Consensus
Appendix D: Analytical Details and Sources for Name Statistics
Broad, quantitative studies of information and knowledge economies have been primarily concerned with inputs, technology, and outputs. A pioneering study pointed to the importance of knowledge growth by identifying growth in aggregate output that growth in aggregate capital and labor inputs cannot explain.[3] Other studies, including an important US Office of Telecommunications report, have used national accounting data to estimate the value of knowledge production and the share of national output associated with information activities.[4] Studies have also estimated the number of information workers and their share in the national workforce.[5] More recently, measures of technology diffusion, such as the share of persons that have telephones, computers, and Internet connections, have played prominent roles in discussion and analysis.[6]
While measures of inputs, technology, and outputs associated with information have considerable value, they also have major weaknesses. Classifying groups of workers, types of output, or output growth residuals as being associated with information involves a data naming exercise with considerable scope for discretion.[7] The results may thus provide more evidence about the particular naming exercise than about the general nature of the economy.[8] Moreover, consistent national level data on economic inputs and outputs are difficult to construct for a long period. While a long-run historical perspective is important for understanding information economies, statistical agencies face significant challenges just in coping with the effects of recent information technology developments.[9] Approaches that focus on inputs, technology, and outputs also can obscure that persons are the subjects of the information economy, and that persons thinking and communicating produce non-marketed human goods and create culture for common use.[10]
Creative empirical approaches are needed to complement widely recognized theoretical developments in the economics of information. The economics of information have shaped the way economists and others think.[11] Information is in general imperfect and asymmetric, like a tomato selected at random from a backyard garden. Areas in which the economics of information has thus far only made limited progress include:
how and how well organizations and societies absorb new information, learn, adapt their behavior, and even their structures; and how different economic and organizational designs affect the ability to create, transmit, absorb, and use knowledge and information.[12]
These questions require study that goes well beyond price systems. The key questions relate to dynamics of the information economy not captured in traditional models of markets.
Personal given names offer several advantages for studying an information economy.[13] On a daily basis, for most types of information, and in much of human communications, “who” and “to whom” are key questions. Personal names matter in normal human activity, they are a crucial aspect of personal identity and dignity, and they have deep cultural significance. Moreover, from an operational perspective, personal names have been collected extensively and over a long period of time in the process of public administration.[14] A given name, which forms part of a contemporary personal name, is generally given to a person shortly after birth, and given names are seldom changed.[15] Given names thus provide a means for disciplined, quantitative study of information economies across major social, economic, and technological changes.
The work of influential analysts points to the importance of studying names. Pierre Bourdieu has declared that the social sciences must focus on “the social operations of naming,” or using one of Bourdieu’s distinctive terms, naming habitus, meaning a social perspective on naming habits, an aspect of which will be measured in this paper in bits.[16] Niklas Luhmann has elaborated upon the three-in-one unity (unitas multiplex) of information, message and understanding, and Luhmann has explored theoretically how communication constructs social systems and shifts them among different states.[17] The distribution of name frequencies, which consistently produces a particular order as part of the communication that characterizes social life, is an important empirical example of Luhmann’s theory.[18] Jürgen Habermas has discussed communicative rationality in relation to the historical emergence of the public sphere, its refeudalization, and the colonization of the lifeworld.[19] Public discussion and public opinion concerning personal names affects practical private interests, such as the ability to attract attention, get respect, or communicate status. Study of names can provide important historical evidence concerning Habermas’s distinction between communicative and instrumental rationality. More generally, study of names can help one better understand the widely cited work of Habermas, as well as that of Luhmann and Bourdieu.
To contribute to understanding of information economies of daily life, this paper explores over the past millennium given names of a large number of persons. Analysts have long both condemned and praised mass media as a source of common culture, national unity, or shared symbolic experiences.[20] Names, however, indicate a large decline in shared symbolic experience over the past two centuries, a decline that the growth of mass media does not appear to have affected significantly. Study of names also shows that action and personal relationships, along with time horizon, are central aspects of effective communication across a large population. The observed preference for personalization over the past two centuries and the importance of action and personal relationships to effective communication are aspects of information economies that are likely to have continuing significance for industry developments, economic statistics, and public policy.
I. Analyzing Names
Choosing and communicating names have long been important actions in information economies. In Hebrew scripture, the stewardship that human beings exercise over nature is expressed in God’s giving the first man power to name all living creatures, and the calling and giving of names played a key role in establishing God’s special relationship with Israel.[21] The classical culture of learning recognized the importance of naming in the Latin saying “Nomen est numen”: to name is to know. In Tudor and Stuart England (1485-1714):
Naming was a serious business, securing legal, social, religious, and semantic identity. According to conventional commentators, the name given at baptism was indeed one’s Christian name, a sign of ‘our regeneration’ and ‘a badge that we belong to God’. It also put one in fellowship with all others who had worn the name before, to be ‘recorded not only in the church’s register, but in the book of life, and stand there forever’.[22]
The importance attached to naming is not anachronistic today. The popularity of name-your-baby books and websites emphasizes that fact.[23] Consider as well the Society for Creative Anachronism (SCA), a worldwide group of person that study and re-create the European Middle Ages. In its activities, the SCA puts considerable emphasizes on naming. Each SCA member adopts a unique name appropriate to the Middle Ages through a formal SCA process of authentication and registration, and in all SCA activities and communications SCA members use these names.[24]
Over the last several decades, choosing names for businesses and products has developed as a special line of commerce. Firms such as Landor, Interbrand, Enterprise IG, Idiom, NameLab, TrueNames, and others provide commercial naming services:
Each of the firms has its own jealously guarded methodology, a signature “naming module” that distinguishes it from its competitors. Enterprise IG has its proprietary NameMaker program, good for generating thousands of names by computer. Landor uses a double-barrelled approach; deploying both its “Brand Alignment Process” and a “BrandAsset Valuator.” Others find that their module must be described in more than a few words. “We have a wonderful approach,” says Rick Bragdon of Idiom. “We use an imaginative series of turbo-charged naming exercises, including Blind Man’s Brilliance, Imagineering, Synonym Explosion and Leap of Faith…We find that when clients are playing, literally playing creative games, they create names that come from a place of joy, a place of fun. [25]
The commercial goal is to find a “good name”: a name that sounds well, that is memorable, and that has appealing connotations with respect to the particular naming situation.
As for commercial names, the value of personal names depends on norms, memories, connotations, and other aspects of shared experiences. Norms governing naming, such as naming after parents, grandparents, biblical figures, or deceased siblings, are common laws in the economy of names. They evolve through common awareness of patterns of cases and possibilities for differences and exceptions. Estimating the value of a particular name involves collecting and assessing information about other persons’ perceptions of the name within the information economy. While norms and social values structure naming choices, the actual personal choice has largely been a domain of freedom, i.e. personal preference is the recognized ultimate authority.[26] Thus chosen names provide evidence about the preferences that free persons express in a particular historical context.
Personal given names relate to a significant part of shared symbolic experience. Persons who have the same given name literally share the experience of being called by that name; they share the experience of being associated with all the social meaning attached to the name. Birth parents and chosen others, such as godparents, also share the experience of determining a good name for another person. Through the course of their lives persons have a wide range of other symbolic experiences. Naming, however, is probably unique in its combination of personal significance, universal prevalence, and consistency through time.
A. Charting Name Trends
How to analyze given names and their changes over time is not obvious. One might ponder why particular names are chosen and think about factors that affect popularity trends. A recent book, entitled A Matter of Taste, sought to develop theory to address such issues.[27] A chapter entitled, “Broader Issues: The Cultural Surface and Cultural Change,” moves from subsections labeled “A Causal Hierarchy” and “Birth and Death Are Not the Same” to one labeled “Monica.”[28] This subsection used the theory developed in the book to consider how the sexual liaison between President Bill Clinton and Monica Lewinsky would affect the popularity of the name Monica.
The author’s analysis is interesting. First, he notes that “the necessary basis for making a prediction is more complicated that it might appear.”[29] He then commences by visualizing four possibilities: 1) the name was rarely used, 2) the name was gaining in popularity in the preceding years, 3) the name was failing in popularity in the preceding years, 4) the name was relatively stable in popularity. Without the Lewinsky affair, “In each case, our best expectation would be more of the same,” although “in some small proportion of the time we would be wrong.”[30]
The analysis of “what can cause something to happen that differs from these expectations” is essentially the same in all four cases.[31] Here’s the analysis:
We could say that a modest proportion of parents, m, had been using the name Monica and that a far larger proportion of parents, o, had been using some other name. If we can assume that the new set of parents in the year following the scandal had identical dispositions, then the net movement of Monica is the product of two transitions: what number of the m population are now turned away from the name and what number of the o population are now turned toward the name. The difference between the two will mean that Monica gains or loses in popularity. Again, because we start with so many more people initially disposed not to use Monica, it takes only a small proportion of the o parents to switch for the name to gain in popularity even if the vast number of m are no longer attracted to the name.[32]
The author does not provide any specific prediction about changes in the popularity of Monica. He does, however, note “how easy it is to misinterpret the eventual answer – no matter what that answer is.”[33]
The above analysis, and much of the rest of the analysis in A Matter of Taste, is similar to what is known in the financial world as technical analysis. Technical analysis concerns the study and interpretation of stock price trends separate from external factors or the fundamental value of a company.[34] The focus is on “internal mechanisms” that drive price movements, such as momentum and symbolic enhancement or contamination from crossing levels of support or resistance (usually round numbers like multiples of ten or a hundred).[35] Such analysis is commonplace in the financial world and a regular part of mainstream financial reporting.[36]
While technical analysis provides a rich discourse for discussing observed trends and possible future developments, this paper seeks effective tools for uncovering hidden truths about information economies. Three scientific virtues will guide the analysis: observability, simplicity, and consistency. Important factors that affect the popularity of particular names may be difficult to observe, they may be many and complex, and they may vary significantly across names. Thus the analysis will not address the popularity of particular names. Instead, it will focus on characteristics of the over-all sample of names, characteristics that can be informatively measured in actual name samples of about 500 or more English-language names.
B. Statistically Measuring Names
Names present some subtle statistical challenges. A sample of persons’ names may cover a significant share of the finite population under study. Thus statistical issues associated with finite samples are relevant. Moreover, the abstract sample space of names is of very high dimension, and all samples sparsely populate that space. Thus the natural space of names as tokens is awkward to manipulate. One way to simplify the sample space is to define the sample as a token frequency distribution. A disadvantage is that the sample space then becomes a function of the sample size. In such a context analysis of properties of estimators is complex.
Rather than exploring such statistical issues abstractly, this paper takes an operational approach. Conditional on interest in a particular name or set of names, the distribution of names is a binomial or multinomial distribution. Based on available name sample sizes and associated sampling errors, the desirability of powerful statistics, and empirical evaluation of alternatives, this paper focuses on the ten most popular names in a sample.[37] The values of the statistics in this paper depend on the rank cutoff used in the analysis. However, the overall trends observed do not appear to depend on this choice.[38]
Measuring name frequencies in actual samples requires attention to name definition and standardization. Given names can include multiple names and name variants, as well as abbreviations, non-standard spellings, and mistakes in recording. Throughout the analysis in this paper, names have been truncated to the shorter of either the first eight letters of the given name or the letters preceding the first period, space, hyphen, or other non-alphabetic character. These shortened names have then been standardized through a name coding available on the Internet for public inspection, use, and improvement on an open source basis.[39] This procedure attempts to identify feasibly and consistently names with common communicative properties.[40]
For name samples comprising between 1,000 and 10,000 names, coding inconsistencies appear to be similar in magnitude to sampling variability. Table 1 shows sampling variability for a single name, given different probabilities for the name in the population and different sample sizes. Sampling variability is likely to be insignificant in modern name samples that can easily comprise over a million names. Medieval name samples, however, are often limited to 1000 names or less. For such sample sizes, sampling variability can easily account for a percentage point difference in a name frequency statistic. The importance of coding depends on the particular name, time, place, and recording process. Table 2 shows name variants coded to “Mary” for England/Wales and US name samples in different periods. Clearly coding matters, but the nature of coding errors and inconsistencies is more speculative. Experience with different name samples from the same population suggests that coding variability can be reduced to less that half a percentage point for the frequency of a single name and less than three percentage points for total frequency of the top ten names.[41]
| Table 1 Sampling Variability for Name Popularity
| ||||
|
Name Probability |
Sample Size |
Expected Name Freq. |
Standard Deviation | Std. Dev. (% of sample) |
| 20.0% | 100 | 20 | 4 | 4.0% |
| 3.0% | 100 | 3 | 2 | 1.7% |
| 1.5% | 100 | 2 | 1 | 1.2% |
| 20.0% | 1,000 | 200 | 13 | 1.3% |
| 3.0% | 1,000 | 30 | 5 | 0.5% |
| 1.5% | 1,000 | 15 | 4 | 0.4% |
| 20.0% | 10,000 | 2,000 | 40 | 0.4% |
| 3.0% | 10,000 | 300 | 17 | 0.2% |
| 1.5% | 10,000 | 150 | 12 | 0.1% |
| 20.0% | 100,000 | 20,000 | 126 | 0.1% |
| 3.0% | 100,000 | 3,000 | 54 | 0.1% |
| 1.5% | 100,000 | 1,500 | 38 | 0.0% |
| Table 2 Names Coded to Mary
| |||||
|
|
US |
|
| England/Wales |
|
| Years | Name | Popularity | Year | Name | Popularity |
| 1810-1819 | Mary | 7.6% | 1820 | Mary | 18.1% |
|
| Mary A | 1.8% |
| Maria | 1.9% |
|
| Maria | 1.1% |
| Maryann | 0.1% |
| 1900-1910 | Mary | 5.6% | 1900 | Mary | 3.9% |
|
| Marie | 1.3% |
| Marion | 0.3% |
|
| Marion | 0.6% |
| Maria | 0.3% |
| 1990-1999 | Mary | 0.5% | 1975 | Marie | 0.6% |
|
| Maria | 0.5% |
| Maria | 0.2% |
|
| Marissa | 0.3% |
| Mary | 0.1% |
| Note: For sources for all the name statistics in this paper, see Appendix D and References. | |||||
C. An Important Empirical Regularity
For names occurring sufficiently frequently, name frequencies follow a power law. This means that, to a good approximation, name frequency is log-linearly related to frequency rank. Chart 1 shows on logarithmic scales the relationship between name frequency and frequency rank for females born in the US in 1831-40 and in 1990-99. While some concavity is evident, in each case a line provides a high goodness of fit.[42]

This empirical regularity is important for several reasons. First, it highlights an order associated with naming that is potentially amenable to explanation.[43] Second, it provides a basis for describing changes in naming over time. As Chart 1 shows, the slope and position of the line describing the relationship between log-rank and log-frequency has changed significantly from 1831-40 to 1990-99. Changes in these parameters have taken a relatively smooth path that can be summarized simply. Third, a variety of other phenomena, such as word frequencies, city sizes, income distribution, and the proportion of rock surface area that barnacles, mussels and other organisms occupy in an intertidal zone, follow power laws.[44] Through this common regularity, evidence and insights regarding other phenomena can be related to naming, and insights from the study of naming gain more general significance.
Power laws are in fact prevalent in the information economy. Where persons and organizations are free to create and choose among many collections of symbols instantiated and used in a similar way, the relative popularity of the symbolic artifacts typically follows a power law.[45] Thus the circulation of magazines of similar type have followed power laws throughout the twentieth century.[46] The total box office receipts of movies follow a power law.[47] The popularity of musical groups, as measured by “gold records,” follows a power law.[48] The popularity of Internet web sites, measured in users or page views, also follows a power law.[49] Insights into the evolution of such power laws over time can provide insights into personal preferences, media diversity, and industry structure in the information economy.
II. A Statistical History of Personalization
Mass media create shared symbolic experiences by producing and distributing common packages of symbols to large numbers of persons. As little as one and half centuries ago, sharing symbols was largely a matter of decentralized, peer-to-peer diffusion, performances, public meetings, monuments, and other special-purpose artifacts. In contrast, in many countries today, through mass media millions of persons regularly experience exactly the same presentations of sports, news, songs, and dramatic stories.
Concern about the role of mass media in shaping shared experience has been commonplace. As early as the mid-1940s observers warned that applying industrial technology and organization to symbol production and distribution was producing a “ruthless unity,” “the same stamp on everything,” a world in which “[t]the might of industrial society is lodged in men’s minds,” and “[r]eal life is becoming indistinguishable from the movies.”[50] By the early 1990s, the same assumptions about the facts prevailed, but a sense of nostalgia had developed, at least among some:
For forty years we were one nation indivisible, under television. That’s ending. Television is turning into something else, and so are we. We’re different. We’re splintered. We’re not as much ‘we’ as the ‘we’ we were. We’re divisible.[51]
Many policy analysts and policy makers have considered mass media necessary both to promote diversity and to encourage national unity, and they have balanced according to current needs these important social and cultural values.[52]
New computing and communications technologies may significantly affect the extent of shared experiences. For most persons, purchasing goods and services is a significant shared experience; in the US, retail chains such as Wal-Mart, CVS, and 7-Eleven are real icons of consumer life.[53] Some have argued that e-commerce and associated personalization technologies will radically reshape retailing.[54] Particularly in societies in which a common experience of continually increasing material prosperity is an important political ideal, this change in shared experiences might present risks of political fragmentation and polarization. New technologies are also expanding opportunities to personalize education, entertainment, news, and other forms of digital content.[55] These new opportunities might lead to a reduction in shared symbolic experiences, less exposure to diverse views and topics, more social fragmentation, and more group polarization.[56]
Discriminating between the possible and the likely is worth attempting. Given the vast opportunities for personalization in the information economy, a fundamental issue is whether opportunities for personal choices will lead to similar choices or diverse choices. Similar choices might be produced from common, primal attractions such as sex, violence, and truth, from bandwagon, fashion, or tipping effects, or from social structures and institutions that homogenizes habits and preferences. Diverse choices might express the uniqueness of each person, the requirements and processes of innovation and creativity, or social forces promoting differentiation and individualism. Carefully interpreted facts with respect to aggregate symbolic choices could offer important insights into the potential social and economic significance of expanding technological possibilities for symbolic personalization.
A. Changes in Name Popularity
The popularity of the most popular given name provides an informative indicator of shared symbolic experiences. In both England/Wales and the US early in the nineteenth century, the most popular names were highly popular. Table 3 shows that in England/Wales in 1800, 23.9% of females were named Mary, the most common female name. Since living siblings almost never bore the same given name and the average fecund marriage produced 3.28 recognized daughters, the share of married women who had a daughter named Mary was higher, probably about 30%.[57] This represents a high degree of social consensus about an important symbol.
That the name Mary would generate such consensus is particularly remarkable given the bitter split between the Church of England and the Roman Catholic Church. Roman Catholicism highly venerates Mary, the mother of Jesus. Anti-Catholicism in England since the mid-sixteenth century has included contempt for Catholic veneration of Mary. Catholics were associated with irrational and idolatrous religious representation in which the name Mary figured highly:
A Papist is an Idolater, who worships Images, Pictures, Stocks and Stones, the Works of Men’s Hands; calls upon the Virgin Mary [distinctive typeface in original], Saints and Angels to pray for them…[58]
Yet, judging from names, there must have been something about Mary for ordinary English persons early in the nineteenth century.[59]
| Table 3 Most Popular Names in England/Wales
| ||||||||
|
| Females | Males | ||||||
| Birth | Top |
| Top 10 | Top 10 | Top |
| Top 10 | Top 10 |
| Year | Name | Pop. | Pop. | Info Is | Name | Pop. | Pop. | Info Is |
| 1800 | Mary | 23.9% | 82.0% | 0.511 | John | 21.5% | 84.7% | 0.356 |
| 1810 | Mary | 22.2% | 79.4% | 0.465 | John | 19.0% | 81.4% | 0.299 |
| 1820 | Mary | 20.4% | 76.5% | 0.433 | John | 17.8% | 80.4% | 0.274 |
| 1830 | Mary | 19.6% | 75.8% | 0.372 | John | 16.4% | 78.2% | 0.244 |
| 1840 | Mary | 18.7% | 75.0% | 0.333 | William | 15.4% | 76.0% | 0.231 |
| 1850 | Mary | 18.0% | 72.1% | 0.315 | William | 15.2% | 73.8% | 0.220 |
| 1860 | Mary | 16.3% | 68.3% | 0.265 | William | 14.5% | 69.8% | 0.209 |
| 1870 | Mary | 13.3% | 61.1% | 0.193 | William | 13.1% | 63.5% | 0.173 |
| 1880 | Mary | 10.6% | 53.8% | 0.116 | William | 11.7% | 58.9% | 0.144 |
|
|
|
|
|
|
|
|
|
|
| 1900 | Elizabet | 7.2% | 38.5% | 0.079 | William | 9.0% | ||