An Explanation and Statistical Demonstration of Benford’s Law
Simcha Pollack, The Peter J. Tobin College of Business, Department of Computer Information Systems/Decision Sciences
Abstract: Benford's law states that in many lists of numbers, the leading significant digit is distributed logarithmically. More precisely, the probability of “1” in the first position is log2(1+1/1) or approximately 1/3. More generally, P(1st significant digit=d)=log2(d+1/d). Thus, the probability that a number in a series which follows Benford’s law begins with a “9” is about 0.05.
This counter-intuitive, almost mysterious, result has been found to apply to a wide variety of data sets, including street addresses, bills, stock prices, population numbers, death rates and the lengths of rivers.
It is named after physicist Frank Benford, who demonstrated in 1938 that it applied to a wide variety of series. It had been previously stated by Simon Newcomb in 1881. Newcomb (supposedly) discovered this law after noticing that published tables of logarithms seemed to be more worn out for the lower digits than for the higher ones.
My non-mathematical explanation for this pattern is that if a ‘filling up’ process is at work, e.g. giving out addresses starting with1, 2, 3 … it is more likely to begin with a ‘1’ because if the numbers run into the hundreds, it is less likely to go to 200’s and very unlikely to go to 900’s. And if it can get past the 900’s then it will stay in the 1,000 range for a while.
One important application of this pattern is in the detection of fraud. When numbers are ‘made up’ the number of times “1” appears will naively follow the intuitive uniform distribution and occur about 10% (and not 31%) of the time. I contributed to a statistical program which calculates the probability that a set of data randomly deviates from the hypothesized Benford distribution. This program is applied to a set of data which is known to be honest and to a set of data whose violation of Benford’s law makes it suspicious.