What the Unsolved Mystery of Language Can Teach Market Researchers

This is a guest post by Chris Martin at Market Research company FlexMR. With a relentless focus on consumer experience, Chris is skilled in managing online communications. Combined with an in-depth knowledge of the digital era and a sharp analytical mind, he creatively develops the FlexMR brand in accordance with a constantly evolving industry. 


Market research relies on language – it forms the basis of mutual understanding between researcher and participant. But, more than that, it provides meaning to qualitative research. Well-expressed emotion can be the driver behind business direction. Language has a direct influence over us as individuals, and we often believe it to be so complex it is entirely unpredictable. But, in reality, language is much more predictable (and even mathematical) than we would care admit.

The Great Zipfian Mystery

Perhaps the prime example of this is what is known as Zipfian distribution. This mathematical principle was first studied by the linguist George Zipf, who noticed something that to this day has yet to be explained. His principle states that in any corpus, the most frequently used word will appear twice as often as the second most commonly used word, three times more often as the third most frequently used word etc. Plotted onto a graph, this would form a power law distribution – where the frequency of any word is inversely proportionate to its frequency position.

In its most simplest form, this means that in a book of 1,000 words, if the word ‘the’ was to appear 100 times, it could be accurately estimated that the word ‘a’ appears 50 and the word ‘of’ appears 33 (assuming that these are the three most frequently used words). By knowing the relative frequency of any given word, and the amount of occurrences the most frequently used word appears throughout a text, it becomes possible to accurately predict the amount of times it will occur.

Zipf and Pareto

So, what’s the great mystery? What makes this law unique is that it can be applied to any text – whether it is a Shakespeare classic, or this article itself. Not only that, but Zipf’s law can be applied to all languages. It seems programmed within us to use language in this fashion, whether we are aware of it or not. Perhaps even more interesting is that as the concept has been studied further, it has been found to be true for much more than language. With applications that range from the population of cities, the size of organisations, website visitors and more.

In fact, in most cases, Zipf’s law is indistinguishable from the Pareto principle. The now famous Pareto principle began life as an economic concept that described a similar inverse correlation between income & population distribution. Of course, as adoption of the term has become more widespread, the original idea has filtered down into everyday use. The popular business mantra ‘20% of your customers create 80% of your revenue’ is, in essence, a simplified application of the Pareto principle.

This inverse correlation between frequency and rank appears throughout the natural world, but more conspicuously all throughout human behaviour patterns. Yet there is no logical explanation for it. There is an argument states it could be derived from the law of probability. In language, it is reasoned that if you typed randomly, this would still occur because the length of a two or three letter word is so much shorter & therefore more likely than a longer word. But this explanation does not take into account how language is formed. It is not random, but based on the words and ideas that came before.

We might not be able to ever explain why this occurs – whether it is due to statistical happenstance, or biologically wired into us as a species. But knowing that the Zipf and Pareto principles exist is useful to market researchers, and there are a number of learnings we can take from it.

Applications in Market Research

First and foremost – understanding these principles embodies the proof that human (and consumer) behaviour is more predictable than we would often care to admit. By statistical probability, our consumption habits will have an inverse correlation between frequency and relative position. With enough research, and in the correct markets, it may even be possible to create predictive algorithms that project sales forecasts based on an organisation’s position within an industry and the frequency of goods sold. In essence, it is these kind of predictions and models that form the basis of economics. The larger the organisation, and the more data it has – the better it is able to predict the future of the market (and its own lifespan).

In a more qualitative field, such principles can be applied to language. To quote the 80/20 rules – 80% of the insight can be gathered from 20% of the data. Filtering through this data to find the insights is what is key. The most commonly used words in the English language are: the, of, and, to, a, in, I, that, it & for. Moving beyond these pronouns and articles, there is very little substance left within the top 20% of any text. But it is that which will become the most valuable: describing both emotion and cognitive reasoning with great accuracy.

But aside from the (relatively limited) technical applications of Zipf’s law, the most important thing it can tell us as researchers is that people are predictable. We are creatures of habit and our actions can be predicted with surprising accuracy. Whether it is which website we will visit, what city we are most likely to live in or even what products we are likely to buy. It should give us confidence that the insights that we gather on individuals will be reflected across the population.

The more times we discover a connection between a brand & emotion, the more likely it is to hold true. It provides us with a roadmap success – allowing us to identify how brands are performing against their closest competitors and predicting our place in consumer preferences. Knowing this information & using qualitative research to complement it with the reasons why this is the case, will help build a strategy for the future. Do you think these principles can be used in market research? Let us know in the comments below and join the discussion.


Read more about FlexMR's industry leading market research software. Or, to book a demo of any of their 16 flexible qual & quant tools, contact one of our experts.

Tags: Linguistics,, Qualitative Research

Print Email