- Meditation on Valuation
- Posts
- Post 4 - Mild vs Extreme Data
Post 4 - Mild vs Extreme Data
If Not Gauss, then Who?
The last post explored how LTCM’s Gaussian assumptions blew up in their face. In short, they were using the wrong tools and were blindsided by events they didn’t think were statistically possible. Let us further discuss the nature of data itself by separating it into two very different domains and exploring the pitfalls when mistaking one for the other. This was effectively LTCM’s fatal mistake.
Remember: we would like to avoid Thanksgiving.
Benoit Mandelbrot describes what we will call the two primary domains of data behavior. The first is what he calls the domain consisting of “mild” randomness (Nassim Taleb would call this domain “Mediocristan”). The Gaussian world (the standard bell curve) belongs to the domain of mild randomness (Mediocristan); it describes nice and neat data sets, is much simpler and easier to use, and as mentioned in other posts, is the basis of a majority of econometric and financial analysis still used today.
The second domain is the one consisting of “wild” randomness (Nassim Taleb would call this domain “Extremestan”). This domain, if you haven’t guessed already, behaves much more erratically. It is not so easy to understand, as just a few data points can influence the entire data set. Unfortunately for us, the financial domain (and a large portion of real life) falls into this latter category. It is why, I believe, economic reality is so hard to predict.
But let’s start with the mild domain (Mediocristan), and more specifically, let’s try and understand the basics of the Gaussian distribution, as despite not describing reality too well, it is still used widely to do so.
The Gaussian distribution is only defined by two variables: a mean and a standard deviation. The higher moments, the skewness and kurtosis specifically, remain constant at 0 and 3, respectively. So, if you know those first two variables, you know all the statistical information belonging to that data set’s distribution. Let’s quickly define what we mean by “statistical information” via the four most common statistical moments:
· The first moment is the mean, or average, something most everyone should be familiar with. You sum a data set up and divide by how many data points there are. Easy (just kidding, a whole piece in the future will be dedicated to averages. There are different types of averages. It is never easy).
· The second moment is the variance, or how much the data “varies” around the mean. The higher the variance, the more “spread out” the data is. The purple distribution in the graph below depicts the one with the largest variance, therefore it appears as the flattest and most “spread out.” The standard deviation, or the measure used most when talking about the risk of an investment, is just the square root of the variance.

ResearchGate
· The third moment is the skewness. It represents the asymmetry of a distribution. A positive skewness value implies that the distribution has more of a “right” skew, or that the mean resides to the right of the typical center of the distribution. A negative skewness value implies the opposite.

Wikipedia
· Finally, the fourth statistical moment is the kurtosis. This one deserves special attention, yet is largely forgotten by 99.9% of people attempting to model reality. The kurtosis measures the “tailedness” of a distribution. The Gaussian distribution has a kurtosis of 3, so that is our standard measure. Any distribution with higher kurtosis than 3 (also called leptokurtic), implies that the tails are fatter. Fatter tails correspond to more extreme and dominant “outliers.” In effect, the higher the kurtosis, the more extreme the data can behave. Price data, such as the chart of IBM’s relative price changes from 1956 – 1996 shown in Post 2, is visually and mathematically leptokurtic (fat-tailed). This is a big reason why Gaussian assumptions fail us in the financial realm – you are mathematically cutting yourself off from the true nature of the market if you assume that the kurtosis of your assumption is 3. In really extreme cases (such as in real life, as will be seen shortly), sometimes we don’t even have a grasp on this number. If that is the case, we don’t have a grasp on reality.

ScienceDirect
Leptokurtic: kurtosis > 3, Mesokurtic: kurtosis = 3, Platykurtic: kurtosis < 3
Another way to interpret the idea of kurtosis is that it describes the relative ability of that data set to be dominated by outliers. The Gaussian distribution, with a kurtosis of 3, cannot be dominated by outliers, which is typically why it is reasonable to ignore them if using Gaussian-based techniques to analyze data.
How many times have you been told to ignore outliers when looking at data? I think this is one of the pivotal mistakes we have all been taught that is acceptable. In science in particular, underlying the notion of a p-value is a Gaussian assumption. The reason why statistical significance is generally considered a p-value <0.05 is because that is exactly two full standard deviations away from the mean in the Gaussian distribution. Said another way, we find a scientific discovery “statistically significant” (AKA valid) if the measured effect of the study drives results that are 2+ standard deviations away from the mean (or the control group). But wait, what if the data we are gathering belongs to a much more extreme (leptokurtic) distribution?
A lot of the time, that doesn’t matter. The Central Limit Theorem (CLT) tells us that even if the data or observations (more generally, a random variable) follow a more extreme distribution, the distribution of the sums of those samples will follow a normal distribution as you increase the number of samples. A simple example involves the distribution of tossing a pair of dice in a game of craps. Each die has an equal chance of landing on each of the numbers. Throwing one over and over again, over a long period of time, would result in a uniform distribution, with no one bar being taller than the next. But as soon as you start “summing your samples” the distribution magically morphs into a Gaussian (and increasingly so as you sum more and more samples). This even works for more extreme distributions, but up to a point, and is why scientists usually feel confident in their statistical analyses.

Wolfram
The Law of Large Numbers (LLN) is also an important cousin to note. This law simply states that as you add more and more samples to your distribution, the average of the sample converges to the average in reality. In the Gaussian case, this is particularly true (You can get really close to the true average most of the time after only 30 samples). The problem is when we get outside the thin-tailed realm (mild randomness, Mediocristan) into the (very) fat-tailed realm (wild randomness, Extremestan).
The two graphs, below, represent examples of the “mild” randomness domain and the “wild” randomness domain, respectively. Even with a Gaussian distribution with a high variance, we converge to at or near the population average very quickly. In the “wild” randomness domain, represented by a Pareto 80/20 distribution (α = 1.13), it takes significantly, by orders of magnitude, more data points to come to converge to the correct average (It takes 1014 observations in the Pareto 80/20 to converge to an average with the same confidence level that 30 observations would get you in the normal Gaussian domain). Worse yet, in that second graph, we see multiple instances of pseudo-convergence to a supposed average before an extremely large jump proves the supposed average quite incorrect. This means our data can disguise itself as Gaussian, a very dangerous prospect. This was a major issue for LTCM. (The Central Limit Theorem has similar convergence problems. CLT requires a non-infinite variance to work. Even if there is one, sometimes we converge towards the Gaussian too slowly to be useful as shown by Taleb. See Statistical Consequences of Fat Tails for more.)

Statistical Consequences of Fat Tails
Let’s look at two real-life examples to further illustrate: human weight and human wealth.
Weight: If we were to start sampling the global population, we would observe a lot of people around the average and very few at the extremes. Even then, the extremes are not very extreme. There is a physical limitation on how heavy a human can be and still be alive. Even if we had only sampled, say 1000 people, and we were to add one more person that weighed 800lbs, a likely “extreme” outlier for this Gaussian example, the average weight (and other statistical information) of the humans in the sample would not change very much at all (If the average weight of 1000 people is 160lbs, and we add an 800lb person to the mix, the average weight of the group goes to 160.6lbs). In other words, it would take a significant amount of people to gain or lose weight for this average to move. This is why sample sizes can be relatively low in a Gaussian domain and one can still have a grasp on the mean and other statistical information with significant confidence; an additional data point can’t possibly carry too much more information at a sufficient sample size. This allows for nice, neat, and easy analyses even with limited data. However, as mentioned before, reality is generally not so nice, neat, and easy.
Wealth: Now let’s see what happens when things get a little crazier. A reminder: “Extremestan” or the domain of “wild” randomness, is characterized as “fat-tailed.” “Fat-tailed” domains, in contrast to the “thin-tailed” Gaussian, can be dominated by just a few data points. It is important to note that fat-tailed data does not necessarily have more large, rare observations, but that these large and rare observations are significantly more consequential to the statistical information provided by the distribution. Said another way, outliers dominate. An example of this wild randomness can be found in one of the very first studies of income inequality:

Pareto Distribution, Statistical Consequences of Fat Tails
In the late 19th century, Vilfredo Pareto, an Italian economist and sociologist, began work on Italian wealth and land distribution and landed on a peculiar finding. He discovered that about 80% of the land in Italy was owned by only 20% of Italians. Odd. Is this normal, or is something wrong with the Italian economy? Nope - He found similar distributions of land and income inequality across the globe and at different periods in history. Thus, his famous “Pareto 80/20” distribution was born.
The Pareto 80/20 is a classic example of a fat-tailed, leptokurtic, insert-all-the-interchangeable-words-I’ve-been-using, etc. distribution. More technically, when mapping land ownership and wealth onto a histogram, he observed frequencies that exponentially decay as you approach larger values of x; the vast majority of the data points fall closer to the y-axis, at smaller values of x. Even so, the wealth of the 1% is so great, that it outweighs even the rate at which the line exponentially decays. 98% of the data points in a Pareto 80/20 fall below the mean.
This particular distribution-of-wealth model would imply something even more extreme: the top 1% would own just over 50% of the total wealth in any given country (still sounds familiar, doesn’t it?). After looking at different countries at different times through different stages of their development, he concluded that this type of wealth distribution, a relatively large fraction of the total wealth in a relatively small number of hands, was a theme “through any human society, in any age, or country.”
A “fun” political aside: In the late 19th century, a major banking crisis tore Italy apart. Banca Romana, one of the six national banks authorized to issue currency, went bust after an array of corruption and scandal. This, combined with a growth in labor strikes in 1892 and 1893, started to really threaten the Italian economy. The second-longest serving Prime Minister of Italy, Giovanni Giolitti, had taken a softer stance on these strikes by taking the position to just let the strikes/negotiations play out. He refused to authorize anything to be done outside of the current law. During this period of severe political unrest, both the far-right Fasci Party and the far-left Italian Socialist Party gained in influence (history really loves to rhyme, huh?). Both parties were putting immense pressure on Giolitti to do something, but he still refused to push the legal boundaries in either direction - in support for or against labor.
In the summer of 1893, in an attempt to clamp down on corruption, Giolitti demanded all the records of the leadership of the Fasci party and investigated them for past criminal activity or Mafia connections. However, he did not do the same for the rising socialist party. After yet another credit institution’s collapse, the conservatives, including the King of Italy, became disillusioned with Giolitti. His government subsequently collapsed (He would later win back office four more times, solidifying himself as the solid second-most tenured Italian Prime Minister).
Vilfred Pareto, being an outspoken economist and sociologist, was also left disillusioned by Giolitti’s reign, and became deeply anti-socialist. He taught his classes as such. After becoming the Chair of Political Economy at the University of Lausanne in Switzerland, he was lucky enough to have the opportunity to lecture and influence the man who was to become the first-longest serving Italian Prime Minister in history. At the time he began classes at Lausanne, this future Prime Minister was sympathetic to socialist ideals. But, Pareto was intelligent and influential. In fact, Pareto’s biographer claims his lectures were so influential on this man, that some of the first actions he took as Prime Minister were to enact Pareto’s policies. Of course, the man we are talking about is none other than Benito Mussolini.
In this global wealth example, say you were to start randomly sampling the population in order to get to the average wealth globally. You’d likely observe many more instances of impoverished humans and likely not see anyone with a particularly large amount of wealth to their name for a time. However, let’s say after 9,999 samples (again, likely none of which are particularly wealthy), the 10,000th sample you take turns out to be Elon Musk. Your sample average, variance, skewness, and kurtosis will dramatically shift (the average wealth of 9,999 people with $30,000 each and one person with $100 billion is over $10 million). Unlike the Gaussian domain where one human cannot possibly weigh 100+ billion pounds and skew the sample average, standard deviation, etc., one person (or more generally, one observation or outcome) can drive almost all of the statistical information in this wealth example. The upshot: In the wild or Extremestan domain, such as in financial markets, the average, standard deviation, and other higher moments are mostly or entirely the function of rare outliers.
In 2009, Nassim Taleb calculated how much of the kurtosis of a given security was affected by one observation. In the case of silver, one observation accounted for 94% of the kurtosis out of 46 years of data. The maximum contribution one observation should have over this timespan in the Gaussian domain should be less than 1%. The fact that one observation can affect the statistical information this much implies we know almost nothing about how the future of prices/values can play out, especially when assuming the Gaussian.

Statistical Consequences of Fat Tails
So, if extreme events are driving these numbers, it would be wise to conclude that they are what we should be paying attention to. In effect, our statistical information (assumptions) we choose to use in the financial realm (average growth, variability of earnings, etc.) are extremely flimsy and almost completely sample-dependent (Remember the critique of Markowitz and Sharpe in my second post? What time frame you choose to lock in on and use as your basis to calculate averages, variability, correlation, etc. is incredibly important. And you will still likely be (very) incorrect in the long term). When operating in an extreme environment such as the financial realm, the average, “calm” scenario tells us nothing; assuming a sample average, especially if not accounting for outliers, can be a death sentence, as it was for LTCM. It is more rigorous to assume fatter-tailed, uncapped data rather than assuming “capped” (in the sense that 20-sigma events should be “impossible”) data via a Gaussian.
LTCM (part deux)
Speaking of LTCM, let’s bring those turkeys back into this mess. To reiterate: the main problem when assuming the Gaussian is that you are assuming the distribution isn’t extreme, or that it cannot provide for extreme observations, exposing you to larger risks in the tails. You assume that one big deviation cannot make or break you, largely because the odds of that deviation are so small it may as well be impossible. As seen in the case of LTCM (or in the price chart of IBM and pretty much every other price chart on earth) financial reality is patently non-Gaussian.

These are two graphs of LTCM’s daily profits & losses pre- and post-Russian Debt Crisis, respectively. I know the pictures are terrible, but you just need to recognize the shapes. Remember the graphs of our turkey’s well-being pre- and post-Thanksgiving? Hint: They are the same.
The point of diving into this detail, as will be seen later, is that the statistical outputs (mean, standard deviation, etc.) are more or less informative depending on the distribution one is working with in reality. In a commercial real estate context (what I’m personally most confident in valuing), if one wants to assume average rent growth, an average terminal cap rate (AKA ending/terminal value), average vacancy, etc. when modeling a property, and that data is Gaussian, then one can be fairly confident in their assumptions and their subsequent valuation. However, if that data belongs to a more extreme distribution, then one’s estimate of the averages is significantly less meaningful. We may as well be practicing astrology.

Thank you for reading
Reply