A Trade-off Between Simplicity & The Reality

In spite of being originated from medieval philosophy, the law of parsimony which famously goes as Ockham’s Razor still remains practical in the modern times of AI and the pursuit of artificial general intelligence. Ockham’s Razor asks to cut all the unnecessary things while understanding any system to reduce complexity. This idea is a part of creating efficient ML algorithms. The tool of parsimony has its limitations too and these limitations can create an objective fake picture of the reality, and can be used to twist the facts.
People most of the times miss the point of parsimony which is to make a realistic attempt to check how and why our understanding of things which we have and the real nature of things differ, how can we fill the gap between what we theorize, what we can test and what real there exists.
Context thus plays very important role in every pursuit of knowledge, even in the knowledge of the self. It is important to understand the boundary conditions of our knowledge. One should know where their beliefs (even if they are true) can be limited, can be challenged, can be difficult to prove. That is why parsimony in any pursuit of knowledge needs to be handled with utmost care while studying the real nature of things.

Medieval Idea of Ockham’s Razor For The Modern World

Craving for Simplicity

One of the key driving factors for humans is to have complete understanding of how things work. The reason behind this is to maximize the chances of survival. Now in modern times those odds have become better. The urge to understand the working of things has been evolved into improving the quality of the survival or the existence.

The key events in the quest to understand everything that is there could be summarized as follows:

  1. There is some unexpected event which causes pain, suffering, loss (it can be opposite too, like extreme favorable growth, happiness, high gains. But the human tendency is to be more concerned about uncertain losses.)

Curiosity actually emerges from the urge to control everything that can be controlled and identifying what cannot be controlled and then working towards how to control uncontrollable things by understanding them in depth.

This is how we try to assign meaning to life, our very existence.  

  1. Then we try to observe similar events associated with such experiences, record them. We try to recreate them until we have clarity on the factors which are responsible for such events. We experiment the events repeatedly so that we can have a proper theoretical understanding or a concrete reasoning behind such events.
  2. The key factor for the reasoning to be accepted as the practical one is its consistency with another unconnected or remotely connected events. There is some “universality” in that reason or that theory.

This is roughly how we try to understand the existence. If one asks why we are always on the quest of understanding the existence the answers are diverse.

The simple answer I think is that our brain prefers simplicity so that it can spend the saved energy to maximize its chances of survival. Our brain hates complexity because once the complexity is accepted the uncertainty has to be accepted and then the brain would start to invest its energy into those things which won’t even get materialized but could get materialized because of the non-zero probability.       

Our brain craves certainty of survival.

This trait of brain to prefer simplicity might not be the nature of the reality in which it exists and tries to survive but if doing so maximizes the chances of its existence then it is a pretty much the best way.

In epistemology, the philosophy – the theory of knowledge this trait is investigated in depth. We will try to see one dimension of this thought which goes popularly as the law of parsimony and even more famously as Ockham’s Razor   

William of Ockham and Ockham’s Razor

William of Ockham was an important medieval philosopher, theologian who brought the law of parsimony into the focus. Although the idea was already in existence from Aristotle.

Aristotle believed that nature always works in efficient ways possible and thus the explanation for the events in nature ought to be the efficient ones.

Although Medieval, Ockham’s razor is one crucial idea in the age of Artificial Intelligence and Machine Learning.

Ockham’s Razor emerges from his writing called “Summa Totius Logicae” exactly as:

“Pluralitas non est ponenda sine necessitate” meaning “Plurality should not be posited without necessity”.

In modern Science, philosophy, the idea “simply” goes like this:

“Do not mix unnecessary things”

OR

“All things being equal the simplest solution is the best.”

Consequences of Ockham’s Razor

The principle of parsimony (lex parsimoniae in Latin) thereby Ockham’s Razor helps us to not complicate things when we are investigating them. It is used as a thumb rule or heuristic to generate theories having better predictability. The moment we are saying that the preferences should be to ‘the simpler theory with better predictability’ is the moment when people most of the times misinterpret the Ockham’s razor. Razor implying that chopping off everything unnecessary, if not chopped would contribute to the increase in the complexity thereby compromising the predictability. We will see how Ockham’s Razor affects positively and negatively when we are trying to understand the things around us.

Good consequences:

Search for the theory of everything

Aristotle’s belief that nature always chooses the efficient route to decide the fate of anything reinforced the idea that the theories which would explain the nature are the best if they involve the least possible variables.

Einstein’s theory of relativity and the equation of energy connecting to the mass is the best example to explain this. An elegant equation with mere 1 inch length encompasses all the big secrets of the universe.

The theory of relativity is elegant in a way that it covers Newton’s understanding of motion and gravity and furthermore extends it to the understanding of the black holes where Newton’s same theory would become limited.

Quantum mechanics explains everything that atom can create. It justifies why the earlier models of atoms were perceived in those particular ways (like atom being a solid sphere, a plum pudding, a thing with nucleus at center and electrons in their orbits).

Quantum mechanics thus is the most efficient way to explain what we observed and why we interpreted those observations in a particular way. Please note that the goal is not to falsify something, prove something wrong; the goal of knowledge or science is to understand why we theorized something in wrong way and why it doesn’t align with the reality we are trying to observe and understand.

Efficient Machine Learning Models – Generalization Error

In the age of AI, the efficiency of Machine Learning Algorithms is one crucial decision maker of the investments to be made to evolve it further. The key goal of any Machine Learning algorithm is to create a mathematical equation (or layers of mathematical equations) which would understand the data provided, make sense of it and now predict the outcomes after understanding the data.

This sounds simple while establishing theoretically but the real-life data one provides ML algorithms is filled with variety of noises – unwanted, unconnected, irrelevant data points.

If the ML algorithm would try to fit the noise too, it would add too many variables in its mathematical equations. Now the model would fit each and every data point but at the same point it loses confidence to predict the outcomes because the noise is not really connected to the system one is trying to define.

That is why a complex ML algorithm fitting all the data points (R2=1) is an ideal situation – ideal meaning practically impossible because it is exposed to a very limited dataset. An ideal ML algorithm has a “generalized” idea of the data points on which it was not trained. Meaning that this ML algorithm has such an effective understanding of what is happening in the dataset with least number of equations that it is now able to understand what could happen if something is to be predicted outside of its training dataset (Q2 – algorithm’s ability to predict the unseen data – should be maximum). L1, L2 regularization techniques used in ML are example of that. Now the ML algorithm is not just interpolating proportionally the points in between, it has its own mathematical justifications to decide whether and how to interpolate aggressively or not – in order to predict the realistic outcome.

Ockham’s Razor thus proves to be important in the age of AI to select efficient algorithms, efficient algorithms ensure efficient use of power, resources thereby the investments.

Parsimony in Psychology – Morgan’s Canon

In very simple words, I would say three words to explain what this means – “Life of Pie”.

The movie Life of Pie has a moment when Pie’s father tells him that the emotions which Pie is seeing in the Tiger Richard Parker’s eyes are mere the reflection of how Pie feels the tiger would be feeling i.e., hungry in that specific case.

In animal psychology (Comparative Psychology) researches, Morgan’s Canon asks scientist to not over-attribute any human quality that humans possess to animals without any concrete basis.

“In no case is an animal activity to be interpreted in terms of higher psychological processes if it can be fairly interpreted in terms of processes which stand lower in the scale of psychological evolution and development.”

The scene from Life of Pie strongly resonates with Morgan’s canon.

There is a reason why Morgan established this idea. We humans have a tendency to see human form in everything that is not even human – this is anthropomorphism. While studying animals, these anthropomorphic tendencies would mislead each and every study because other animals and human share many common things. Unless there is no strong evidence to justify the human like intelligent behavior the simplest explanation should be selected to justify the behavior of the animal in their psychological studies.

These are some of the examples where Ockham’s razor proves to be very valuable.

Bad Consequences (limitations of Ockham’s Razor)

There is other side to simplification of things, we will now see how people misinterpret the principle of parsimony thereby Ockham’s Razor.

Universe might prefer complexity to exist

In the pursuit of the theory of everything, Einstein himself was confused that “how could God play the dices?” How can one bridge the gap that exists between the theory of relativity and the quantum mechanics. One explains the heavenly objects and the other explains what lies at the bottom of the bottom of particles which make the universe existent.

One will realize that there is more than what we are using in current theory which needs to be considered to explain the reality in a better way.

One reason why Einstein was genius of all times is because he knew that something was missing in his theory. He was not ashamed of the complexity the theory of everything may carry. Even while speaking about his elegant theory of relativity Einstein had this opinion:

Artificial General Intelligence (AGI)

Those who are actually working in the field of AI would explain this to you that how difficult it is to create an Artificial General Intelligence (AGI). Even though we have some of the greatest chat-bots, AI assistants, AI agents, they are experts in executing specific tasks only. They can immediately get biased, they can be fooled easily, bypassed easily.

The key reasons behind these shortcomings are many. The AI tools perform the best when they are designed to perform specific tasks, they lack common sense like the humans do, they lack the emotional dimension in the decision making (one of the important aspects of how humans generalize the understanding of their surrounding), they cannot directly build the bridges between their algorithms unless enough data is provided. AI doesn’t have intuition which humans have developed over the thousands of years of natural evolution.

It is also important to understand how greatly we underestimate the computation and decision-making capability of our brains and how much power it takes to replicate the same in machines. 

So, maybe complexity is prerequisite for AGI and thus the enormous number of resources that will be required to achieve it.

Human like intelligence in Animals

The story of Koko and Robin Williams could be good example to explain this. Koko – a female gorilla was trained in American Sign Language (ASL) by Francine “Penny” Patterson. Penny called this language as Gorilla Sign Language (GSL).

Penny with Koko

There is a very famous video of the meeting between the movie actor Robin Williams and Koko. Soon after the death of her gorilla friend Michael, Koko met Robin Williams and she laughed after a long time along with Robin, she played with him, she even recognized Robin from his movie cassette cover.

Robin Williams having fun with Koko

When the instructors of Koko told her about the death of Robin Williams, she expressed her grief by signaling the instructors if she could cry, her lips were trembling in grief. See the emotional depth she had just like normal humans do.

Dolphins are also one good example to demonstrate human like intelligence in animals.

This means that Ockham’s Razor, Principle of parsimony or Morgan’s canon are of no use. What is happening here? What goes missing during the oversimplification? What are we misunderstanding?    

What goes missing in simplification?

The main problem with Ockham’s razor or its any other equivalent philosophies is the convenience they bring. Just like by collecting a biased data you can actually prove anything wrong which in reality is right, in the same way people misinterpreted the principle of parsimony.

The key reason for William of Ockham to support the principle of parsimony was because he was a nominalist. “Nominalism” says that there is nothing common between anything and everything that is there in reality. Everything has its own individual nature and what we see common in many things collectively are just the ‘names’ given to them. The red which we see in blood and in rose is just the name of the color and there is nothing like red which actually exists on its own.

This means that the color which we see in things, there is no such thing as color in its absoluteness, it is just some signal our eyes generate to tell brain the difference between the light absorbed and light reflected or the temperature of the surface of the object.

So, William of Ockham posed that as everything has its own attributes individually, when you are trying to create a philosophy for a group of things, you should consider only those individual attributes which are necessary to create a theory.

(William of Ockham himself drifted away in his ideas of Parsimony and Nominalism; I will discuss that specifically in the Philosophy of Nominalism next time.)

What people still misinterpret today when they talk about Ockham’s razor is “to select the simplest explanation to things”. This is not what he meant actually.

Same is the story with Morgan’s Canon. Morgan’s main intent was to have a concrete justification when someone is explaining human-like behavior in animals. His idea was that the conclusions should be reasoning-based and not based on the observation that animals in the study had that specific type of intelligence. The idea was to observe without any preconditioning, prejudice or any impression or expectation.

I have already explained how Einstein was a genius; he was very well aware that during creating the very elegant understanding of the universe he might have missed something on the expense of simplification.

The standard mathematical model in particle physics looks like this (maybe sometime in future I will be able to appreciate and explain it to its core):

Context is everything

Now you will be able to appreciate why Ockham’s razor is a tool and not the final truth. People exploit Ockham’s Razor to demonstrate their philosophical grandeur and simplify the meaning to their favors consciously (sometimes unconsciously).

What people ignore is the purpose of the chopping unnecessary parts in any process to develop understanding, philosophy or theory. The goal was never to simplify things, the goal was to remove things which would interfere in the process of testing our hypotheses.

People most of the times miss the point of parsimony which is to make a realistic attempt to check how and why our understanding of things which we have and the real nature of things differ, how can we fill the gap between what we theorize, what we can test and what real there exists.

Context thus plays very important role in every pursuit of knowledge, even in the knowledge of the self. It is important to understand the boundary conditions of our knowledge. One should know where their beliefs (even if they are true) can be limited, can be challenged, can be difficult to prove because what we know is just a drop, what we cannot know is ocean.    

I think what people miss in simplification or parsimony is the context and context varies from situation to situation.

Scientifically, Newton’s laws of gravitation have no problem when we are talking about our solar system. In fact, they are so accurate that modern space missions still rely on these laws. There rarely is any need to use the science of black holes in most of such missions.

The context is the precision of deciding the trajectory of objects in solar system.

But, when it comes to Global Positioning System (GPS), the theory of relativity becomes important. The bending of space time due to earth’s mass and the slowing down of time for navigation satellites from it and the time adjustments for atomic clocks at these two points matters a lot. Newton’s laws cannot explain that.  

The context is how precise can the time be measured and how the difference in time can be connected to the understanding of the position of the object around the globe.

It is very easy to demonstrate how Ockham’s razor still remains important in scientific community and how scientists are aware of its limitations.

It becomes problematic when we try to understand and justify life with it.

The problem is that we get to decide the context (most of the times)

Call it a blessing because scientific community is always in the state of its own renewal because it relies on objective evidences, but it is still not immune to missing context or wishful context. (The falsified biased scientific studies published to create confusions are best example of that.)

The best example of losing context while still being scientific or unbiased is the Debates on News channels or any debate (sadly) that exists on popularity. Soon you will realize that the context of most of such debates is to entertain people, create controversies and not find the ultimate truth or facts.

In the very opening of this discussion, I had explained how our brains try to optimize processing to save energy for better tasks to guarantee better survival. The death of our own beliefs, our identity is also failure to survive. Psychological, ideological death is as equal as the actual death, maybe it is more painful than real death for almost all of us. Religion is one stream of such ideologies where people are ready to die physically just because the religious beliefs, they live for should remain alive. Most of the people are scared of mathematics not because it is too complicated, they fear math because it shows them the vulnerabilities in their process of step-wise thinking, same people can be expert dancers, master artists, master instrument players which involve rather more complicated mathematical manipulations – music in a simple way is manipulation of certain sound wave-forms. The music theory, harmony, color theory, physiological manipulation of body with the rhythm, and sound are all purely mathematical concepts. It’s just that we don’t want ourselves to remain in the states of vulnerabilities for longer times. It’s equivalent of exposing cover to our enemy thereby reducing our chances of survival.

The thing is that the tendency of nature to choose the path of least resistance gets reflected in our own nature too. Which is why simplification and Ockham’s Razor seems attractive. But at the same time, we forget that it is the same nature whose deliberate and continuous actions against the adversities made us who we are, made impossible things possible for us.

Daniel Kahneman has explained the two cognitive systems our brain uses in his book Thinking Fast and Slow.

System 1 is fast and intuitive good for repetitive tasks but bad at finding biases, errors, hostile to new and complicated scenarios.

System 2 is slow and deliberate for analytical and reasoning-based tasks but is not effective for routine tasks.

The people who exploit Ockham’s Razor (even William of Ockham himself! –  this story will show up in post on nominalism!) are oversimplifying things because the belief they have is justified through it. It will stand some limited tests but the moment it is exposed to universal tests they fail. And that is how religions, sects, faiths operate when they are blinding people from the real truths. I am not saying religion is bad, I am saying how objectivity in religion can be used to show its scientific nature and still fool the people. Same can happen in scientific communities, all of the pseudo-scientific concepts are one great examples of that.

Now you can see the problem. People want to create understanding of the surrounding not because they really want to understand it. They want to do it because it will feed the beliefs they already have and Ockham’s Razor or the principle of Parsimony is a great tool to facilitate that. In the end, it is just a tool. How it impacts the creation is solely based on the intent of the one who is using it.

That is exactly why when you are questioning something or are standing against something or supporting something ask yourself this one question:

Are you doing this for understanding the reality or to feed your own wishful picture of reality?

So, whenever you are trying to understand something make sure that your context is to really understand the thing and not expect it to be in certain thing you wish. Remember, you are the controller of the context and it is very easy to fool ourselves.

Further reading:

The Essence of Nominalism

Logarithmic Harmony in Natural Chaos

Mathematics is one powerful tool to make sense out of randomness but bear in mind that not every randomness could be handled effectively with the mathematical tools we have at our disposal today. One of such tools called Benford’s Law proves that nature works in logarithmic growth and not in linear growth. The Benford’s law helps us to make sense of the natural randomness generated around us all the time. This is also one of the first-hand tools used by forensic accountants to detect possible financial frauds. It is one phenomenal part of mathematics which finds patterns in sheer chaos of the randomness of our existence.

Benford’s Law for natural datasets and financial fraud detection

People can find patterns in all kinds of random events. It is called apophenia. It is the tendency we humans have to find meaning in disconnected information.

Dan Chaon, American Novelist

Is There Any Meaning in Randomness?

We all understand that life without numbers is meaningless. Every single moment gazillions and gazillions of numbers are getting generated. Even when I am typing this and when you are reading this – some mathematical processing is happening in bits of the computer to make it happen. If we try to grasp/understand the quantity of numbers that are getting generated continuously, even the lifetime equivalent to the age of our Universe (13.7 billion) will fall short.

Mathematics can be attributed to an art of finding patterns based on certain set of reasoning. You have certain observations which are always true and you use these truths to establish the bigger truths. Psychologically we humans are tuned to pattern recognition, patterns bring in that predictability, predictability brings in safety because one has knowledge of future to certain extent which guarantees the higher chances of survival. So, larger understanding of mathematics in a way ensures better chances of survival per say. This is oversimplification, but you get the point.

Right from understanding the patterns in the cycles of days and nights, summers, and winters till the patterns in movements of the celestial bodies, the vibration of atoms, we have had many breakthroughs in the “pattern recognition”. If one is successful enough to develop a structured and objective reasoning behind such patterns, then predicting the fate of any process happening (and would be happening) which follows that pattern is a piece of cake. Thus, the power to see the patterns in the randomness is kind of a superpower that we humans possess. It’s like a crude version of mini-time machine.

Randomness inherently means that it is difficult to make any sense of the given condition, we cannot predict it effectively. Mathematics is one powerful tool to make sense out of randomness but bear in mind that not every randomness could be handled effectively with the mathematical tools we have at our disposal today. Mathematics is still evolving and will continue to evolve and there is not end to this evolution – we will never know everything that is there to know. (it’s not a feeling rather it is proved by Gödel’s incompleteness theorem.)

You must also appreciate that to see the patterns in any given randomness, one needs to create a totally different perspective. Once this perspective is developed then it no longer remains random. So, every randomness is random until we don’t have a different perspective about it.

So, is there any way to have a perspective on the gazillions of the numbers getting generated around us during transactions, interactions, transformations?

The answer is Yes! Definitely, there is a pattern in this randomness!!

Today we will be seeing that pattern in detail.

Natural Series – Real Life Data       

Take your account statement for an example. You will see all your transactions, debit amount, credit amount, current balance in the account. There is no way to make sense out of how the numbers that are generated, the only logic behind those numbers in account statement is that you paid someone certain amount and someone paid you certain amount. It is just net balance of those transactions. You had certain urgency someday that is why you spent certain amount on that day, you once had craving for that cake hence you bought that cake, you were rooting for that concert ticket hence you paid for that ticket, on one bad day you faced certain emergency and had to pay the bills to sort things out. Similarly, you did your job/ work hence you got compensated for those tasks – someone paid you for that, you saved some funds in deposits and hence that interest was paid to you, you sold some stocks hence that value was paid to you.

The reason to explain this example to such details is to clarify that even though you have control over your funds, you actually cannot control every penny in your account to that exact number that you desire. This is an example of natural data series. Even though you have full control over your transactions, how you account will turn out is driven by certain fundamental rules of debit/ credit and interest. The interactions of these accounting phenomenon are so intertwined that ultimately it becomes difficult to predict down to every last penny.

Rainfall all around the Earth is very difficult to predict to its highest precision due to many intermingling and unpredictable events in nature. So, by default finding trend in the average rainfall happened in given set of places is difficult. But we deep down know that if we know certain things about rainfall in given regions we can make better predictions about other regions in a better way, because there are certain fundamental predictable laws which govern the rainfall.  

The GDP of the nations (if reported transparently) is also very difficult to pin down to exact number, we always have an estimate, because there are many factors which affect that final number, same goes for the population, we can only predict how it would grow but it is difficult to pin point the number.

These are all examples of real life data points which are generated randomly during natural activities, natural transactions. We know the reason for these numbers but as the factors involved are so many it is very difficult to find the pattern in this randomness.

I Lied – There is A Pattern in The Natural Randomness!

What if I told you that there is certain trend and reference to the randomness of the numbers generated “naturally”? Be cautious – I am not saying that I can predict the market trend of certain stocks; I am saying that the numbers generated in any natural processes have preference – the pattern is not predictive rather it only reveals when you have certain bunch of data already at hand – it is retrospective.

Even though it is retrospective, it can help us to identify what was manipulated, whether someone tried to tamper with the natural flow of the process, whether there was a mechanical/ instrument bias in data generation, whether there was any human bias in the data generation?

Logarithm and Newcomb

Simon Newcomb (1835-1909) a Canadian-American astronomer once realized that his colleagues are using the initial pages of log table more than the other pages. The starting pages of log tables were more soiled, used than the later pages.

Simon Newcomb

Log tables were instrumental in number crunching before the invention of any type of calculators. The log tables start with 10 and end in 99.

Newcomb felt that the people using log tables for their calculations have more 1’s in their datasets repetitively in early digits that is why the initial pages where the numbers start with 1 are used more. He also knew that the numbers used in such astronomical calculations are the numbers available naturally. These numbers are not generated out randomly, they signify certain quantities attributed to the things available in nature (like diameter of a planet, distance between stars, intensity of light, radius of curvature of certain planet’s orbit). These were not some “cooked up” numbers, even though they were random but they had natural reason to exist in a way.

He published an article about this but it went unnoticed as there was no way to justify this in a mathematical way. His publication lacked that mathematical rigor to justify his intuition.

Newcomb wrote:

“That the ten digits do not occur with equal frequency must be evident to anyone making much use of logarithmic tables, and noticing how much faster the first one wears out than the last ones.”   

On superficial inquiry, anyone would feel that this observation is biased. It seemed counterintuitive, also Newcomb just reported the observation and did not explain in detail why it would happen. So, this observation went underground with the flow of time.

Frank Benford and The Law of Anomalous Numbers

Question – for a big enough dataset, how frequently any number would appear in first place? What is the probability of numbers from 1 to 9 to be the leading digit in given dataset?

Intuitively, one would think that any number can happen to be in the leading place for given dataset. If the dataset becomes large enough, all nine numbers will have equal chance to be in first place.

Frank Benford during his tenure in General Electric as a physicist made same observation about the log table as did Newcomb before him. But this time Frank traced back the experiments and hence the datasets from these experiments for which the log table was used and also some other data sets from magazines. He compiled some 20,000 data points from completely unrelated experiments and found one unique pattern!

Frank Benford

He realized that even though our intuition says that any number from 1 to 9 could appear as the leading digit with equal chance, “natural data” does not accept that equal chance. The term “Natural data” refers to the data representing any quantifiable attribution of real phenomenon, object around us, it is not a random number created purposefully or mechanically; it has some origin in nature however random it may seem.

Frank Benford thus discovered an anomaly in natural datasets that their leading digit is more 1 or two than the remaining ones (3,4,5,6,7,8,9). In simple words, you will see 1 as leading digit more often in the natural datasets than the rest of the numbers. As we go on with other numbers the chances that other numbers will be frequent in leading position are very less.

In simple words, any naturally occurring entity will have more frequent 1’s in its leading digits that the rest numbers.

Here is the sample of the datasets Frank Benford used to find this pattern:

Dataset used by Frank Benford in his 1938 paper “The Law of Anomalous Numbers”

So, according to Benford’s observations for any given “natural dataset” the chance of 1 being the leading digit (the first digit of the number) is almost 30%. 30% of the digits in given natural dataset will start with 1 and as we go on the chances of other numbers to appear frequent drop drastically. Meaning that very few number in given natural data set will start with 7,8,9.

Thus, the statement of Benford’s law is given as:

The frequency of the first digit in a populations’ numbers decreases with the increasing value of the number in the first digit.

Simply explained, as we go on from 1 to 9 as first digit in given dataset, the possibility of their reappearance goes on reducing.

1 will be the most repeated as the first number then 2 will be frequent but not more than 1 and the frequency of reappearance will reduce and flatten out till 9. 9 will rarely be seen as the leading digit.

The reason why this behavior is called as Benford’s Law (and not Newcomb’s Law) is due to the mathematical equation that Benford established.

Where, P(d) is the probability that a number starts with digit d. Digit d could be anything 1,2,3,4,5,6,8 or 9.

If we see the real-life examples, you will instantly realize how counterintuitive this law is and still nature chooses to follow it.

Here are some examples:

I have also attached an excel sheet for complete datasets and to demonstrate how simply one can calculate and verify Benford’s law.

Population of countries in the world –

The dataset contains population of 234 regions in the world. And you will see that 1 appears the most as first digit in this dataset. Most of the population numbers start with 1 (70 times out of 234) and rarely with 9 (9 times out of 234)

Country-wise average precipitation –

The dataset contains average rainfall from 146 countries in the world. Again, same pattern emerges.

Country wise Gross Domestic Product –

The dataset contains 177 countries’ GDP in USD. See the probability yourself:

Country-wise CO2 emissions:

The data contains 177 entries

Country wise Covid cases:

Here is one more interesting example:

The quarterly revenue of Microsoft since its listing also shows pattern of Benford’s Law!

To generalize we can find the trend of all these data points by averaging as follows:

This is exactly how Benford avearaged his data points to establish a generalized equation.

Theoretical Benford fit is calculated using the Benford equation expressed earlier.

So here is the relationship graphically:

Now, you will appreciate the beauty of Benford’s law and despite seeming counterintuitive, it proves how seemingly random natural dataset has preferences.

Benford’s Law in Fraud Detection

In his 1938 paper “The Law of Anomalous Numbers” Frank Benford beautifully showed the pattern that natural datasets prefer but he did not identify any uses of this phenomena.

1970 – Hal Varian, a Professor in University of California Berkely School of Information explained that this law could be used to detect possible fraud in any presented socioeconomic information.

Hal Varian

1988 – Ted Hill, an American mathematician found out that people cannot cook up some numbers and still stick to the Benford’s Law.

Ted Hill

When people try to cook up some numbers in big data sets, they reflect certain biases to certain numbers, however random number they may put in the entries there is a reflection of their preference to certain numbers. Forensic accountants are well aware of this fact.    

The scene where Christian pinpoints the finance fraud [Warner Bros. – The Accountant (2016)]

1992 – Mark Nigrini, a South African chartered accountant published how Benford’s law could be used for fraud detection in his thesis.

Mark Nigrini

Benford’s Law is allowed as a proof to demonstrate accounts fraud in US courts at all levels and is also used internationally to prove finance frauds.

It is very important to point the human factor, psychological factor of a person who is committing such numbers fraud. People do not naturally assume that some digits occur more frequently while cooking up numbers. Even when we would start generating random numbers in our mind, our subconscious preference to certain numbers gives a pattern. Larger the data size more it will lean to Benford’s behavior and easier will be the fraud detection.

Now, I pose one question here!

If the fraudster understands that there is such thing like Benford’s Law, then wouldn’t he cook up numbers which seem to follow the Benford’s Law? (Don’t doubt my intentions, I am just like a cop thinking like thieves to anticipate their next move!!!)

So, the answer to this doubt is hopeful!

The data generated in account statements is so huge and has multiple magnitudes that it is very difficult for a human mind to cook up numbers artificially and evade from detection.

Also, forensic accountants have showed that Benford’s Law is a partially negative rule; this means that if the law is not followed then it is possible that the dataset was tampered/ manipulated but conversely if the data set fits exactly / snuggly with the Benford’s law then also there is a chance that the data was tampered. Someone made sure that the cooked-up data would fit the Benford’s Law to avoid doubts!

Limitations of Benford’s Law

You must appreciate that nature has its ways to prefer certain digits in its creations. Random numbers generated by computer do not follow Benford’s Law thereby showing their artificiality.

Wherever there is natural dataset, the Benford’ Law will hold true.

1961 – Roger Pinkham established one important observation for any natural dataset thereby Benford’s Law. Pinkham said that for any law to demonstrate the behavior of natural dataset, it must be independent of scale. Meaning that any law showing nature’s pattern must be scale invariant.

In really simple words, if I change the units of given natural dataset, the Benford law will still hold true. If given account transactions in US Dollars for which Benford’s Law is holding true, the same money expressed in Indian Rupees will still abide to the Benford’s Law. Converting Dollars to Rupees is scaling the dataset. That is exactly why Benford’s Law is really robust!

After understanding all these features of Benford’s Law, one must think it like a weapon which holds enormous power! So, let us have some clarity on where it fails.

  1. Benford’s Law is reflected in large datasets. Few entries in a data series will rarely show Benford’s Law. Not just large dataset but the bigger order of magnitude must also be there to be able to apply Benford’s Law effectively.
  2. The data must describe same object. Meaning that the dataset should be of one feature like debit only dataset, credit only dataset, number of unemployed people per 1000 people in population. Mixture of datapoints will not reflect fit to Benford’s Law.
  3. There should not be inherently defined upper and lower bound to the dataset. For example, 1 million datapoints of height of people will not follow Benford’s Law, because human heights do not vary drastically, very few people are exceptionally tall or short. This, also means that any dataset which follows Normal Distribution (Bell Curve behavior) will not follow Benford’s Law.
  4. The numbers should not be defined with certain conscious rules like mobile numbers which compulsorily start with 7,8, or 9; like number plates restricted 4, 8,12 digits only.
  5. Benford’s Law will never pinpoint where exactly fraud has happened. There will always be need for in depth investigation to locate the event and location of the fraud. Benford’s Law only ensures that the big picture is holding true.

Hence, the examples I presented earlier to show the beauty of Benford’s Law are purposefully selected to not have these limitations. These datasets have not bounds, the order of magnitude of data is big, range is really wide compared to the number of observations.     

Now, if I try to implement the Benford’s Law to the yearly revenue of Microsoft it reflects something like this:

Don’t freak out as the data does not fully stick to the Benford’s Law, rather notice that for the same time window if my number of datapoints are reduced, the dataset tends to deviate from Benford’ Law theoretically. Please also note that 1 is still appearing as the leading digit very frequently, so good news for MICROSOFT stock holders!!!

In same way, if you see the data points for global average temperatures (in Kelvin) country-wise it will not fit the Benford’s Law; because there is no drastic variation in average temperatures in any given region.

See there are 205 datapoints – big enough, but the temperatures are bound to a narrow range. Order of magnitude is small. Notice that it doesn’t matter if I express temperature in degree Celsius of in Kelvins as Benford’s Law is independent of scale.

Nature Builds Through Compounded Growth, Not Through Linear Growth!

Once you get the hold of Benford’s law, you will appreciate how nature decides its ways of working and creating. The Logarithmic law given by Frank Benford is a special case of compounded growth (formula of compound interest). Even though we are taught growth of numbers in a periodic and linear ways we are masked from the logarithmic nature of the reality. Frank Benford in the conclusion of his 1937 paper mentions that our perception of light, sound is always in logarithmic scale. (any sound engineer or any lighting engineer know this by default) The growth of human population, growth of bacteria, spread of Covid follow this exponential growth. The Fibonacci sequence is an exponential growth series which is observed to be at the heart of nature’s creation. That is why any artificial data set won’t fully stick to logarithmic growth behavior. (You can use this against machine warfare in future!) This also strengthens the belief that nature thinks in mathematics. Despite seemingly random chaos, it holds certain predictive pattern in its heart. Benford’s Law thus is an epitome of nature’s artistic ability to hold harmony in chaos!  

You can download this excel file to understand how Benford’s law can be validated in simple excel sheet:

References and further reading:

  1. Cover image – Wassily Kandinsky’s Yellow Point 1924
  2. The Law of Anomalous Numbers, Frank Benford, (1938), Proceedings of the American Philosophical Society
  3. On the Distribution of First Significant Digits, RS Pinkham (1961), The Annals of Mathematical Statistics
  4. What Is Benford’s Law? Why This Unexpected Pattern of Numbers Is Everywhere, Jack Murtagh, Scientific American
  5. Using Excel and Benford’s Law to detect fraud, J. Carlton Collins, CPA, Journal of Accountancy
  6. Benford’s Law, Adrian Jamain, DJ Hand, Maryse Bйeguin, (2001), Imperial College London
  7. data source – Microsoft revenue – stockanalysis.com
  8. data source – Population – worldometers.info
  9. data source – Covid cases – tradingeconomics.com
  10. data source – GDP- worldometers.info
  11. data source – CO2 emissions – worldometers.info
  12. data source – unemployment – tradingeconomics.com
  13. data source – temperature – tradingeconomics.com
  14. data source – precipitation – tradingeconomics.com