Panel on Resolved: Traditional Statistics is Deadfiles.meetup.com/4501642/Panel on -Resolved- Traditional... · 2013. 10. 25. · Understanding Statistical Insignificance In 2005

Panel on "Resolved: Traditional Statistics is Dead"

Data Science Association, Inc.at

University of Colorado DenverOctober 22, 2013

Moderator: Michael Walker

Panelists: Nancy Abramson, Theodore Van Rooy, Mark Labovitz, Michael Malak, Joseph Rickert

Data science uses statistical and applied probabilistic knowledge to help determine if something is true, false, or merely anecdotal.

All data scientists should have a solid “graduate level” grounding in statistics.

Statistics is the instrument of risk-taking; it is the applied tools of epistemology; you can't be a modern data scientist and not think probabilistically.

Yet statistics can fool you. In fact it is fooling data scientists every day presenting a false view of reality.

It can even bankrupt the financial system - use of probabilistic methods for the estimation of risks (value at risk or VaR) did blow up the banking system.

Wall Street risk management models known as "Value at Risk" (VaR).

For a given portfolio, probability and time horizon, VaR is defined as a threshold value such that the probability that the mark-to-market loss on the portfolio over the given time horizon exceeds this value (assuming normal markets and no trading in the portfolio) is the given probability level.

For example, if a portfolio of stocks has a one-day 5% VaR of $1 million, there is a 0.05 probability that the portfolio will fall in value by more than $1 million over a one day period if there is no trading. Informally, a loss of $1 million or more on this portfolio is expected on 1 day out of 20 days (because of 5% probability). A loss which exceeds the VaR threshold is termed a “VaR break.”

http://en.wikipedia.org/wiki/Mark_to_market_accounting

Various VaR models were complex and very precise - yet the assumptions embedded in the models were dangerously wrong and created an illusion of reality.

Decision makers relying on VaR had a false sense of security that caused inaccurate conclusions - putting not only their own firms at risk but the entire global financial system.

Statisticians resent the old Mark Twain adage "Lies, Damned Lies and Statistics", yet the best and brightest statisticians on Wall Street (called "Quants") created a false view of reality that caused serious economic damage.

It is argued that VaR:

Ignored 2,500 years of experience in favor of untested models built by non-traders;Was charlatanism because it claimed to estimate the risks of rare events, which is impossible;Gave false confidence and would be exploited by traders;Led to excessive risk-taking and leverage at financial institutions;Focused on the manageable risks near the center of the distribution and ignored the tails;Created an incentive to take “excessive but remote risks”;Was “potentially catastrophic when its use creates a false sense of security among senior executives and watchdogs.”

Questions

Are tail risks measurable scientifically? If yes, how would you use statistics to measure tail risks?

Why did the best and brightest statisticians on Wall Street not understand (or fail to disclose) the flawed VaR models?

How would you use statistics to:

1. create a robust VaR model?

2. find flaws in VaR models?

Understanding Statistical Insignificance

In 2005 Dr. John Ioannidis, an epidemiologist from Stanford University, published a paper showing why, as a matter of statistical logic, the idea that only one such paper in 20 gives a false-positive result was hugely optimistic.

Instead, he argued, “most published research findings are probably false.”

Ioannidis asserts that the customary approach to statistical significance ignores three (3) things:

1. The “statistical power” of the study (a measure of its ability to avoid type II errors, false negatives in which a real signal is missed in the noise);

2. The unlikeliness of the hypothesis being tested; and

3. The pervasive bias favoring the publication of claims to have found something new.

A statistically powerful study is one able to pick things up even when their effects on the data are small.

In general bigger studies - those which run the experiment more times, recruit more patients for the trial, or whatever - are more powerful.

A power of 0.8 means that often true hypotheses tested, only two will be ruled out because their effects are not picked up in the data; this is widely accepted as powerful enough for most purposes.

But this benchmark is not always met - big studies are more expensive.

A recent study by Ioannidis and colleagues found that in neuroscience the typical statistical power is a dismal 0.21 - in that field the average power is 0.35.

Ioannidis argues that in his field, epidemiology, you might expect one in ten hypotheses to be true.

In exploratory disciplines like genomics, which rely on combing through vast troves of data about genes and proteins for interesting relationships, you might expect just one in a thousand to prove correct.

Consider 1,000 hypotheses being tested of which just 100 are true.

Studies with a power of 0.8 will find 80 of them, missing 20 because of false negatives.

Of the 900 hypotheses that are wrong, 5%—that is, 45 of them—will look right because of type I errors.

Add the false positives to the 80 true positives and you have 125 positive results, fully a third of which are specious.

If you dropped the statistical power from 0.8 to 0.4, which would seem realistic for many fields, you would still have 45 false positives but only 40 true positives. More than half your positive results would be wrong.

The negative results are much more trustworthy; for the case where the power is 0.8 there are 875 negative results of which only 20 are false, giving an accuracy of over 97%.

Questions

How would you - as a professional statistician - deal with such problems?

Has the field of statistics innovated and kept pace with the development of complex mathematical techniques for crunching large data sets?

How do we prevent drowning in illusions brought about by thunderous misuses of statistics that have become implacably seductive only with the recent availability of big data and vast, connected computer resources?

Statistical Overfitting

In statistics (& machine learning) overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship.

Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations.

A model which has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.

Danger is you see relationships in data that aren’t really there.

A once-famous “leading indicator” of economic performance, for instance, was the winner of the Super Bowl. From Super Bowl I in 1967 through Super Bowl XXXI in 1997, the stock market gained an average of 14 percent for the rest of the year when a team from the original National Football League (NFL) won the game. But it fell by almost 10 percent when a team from the original American Football League (AFL) won instead.

Through 1997, this indicator had correctly “predicted” the direction of the stock market in twenty-eight of thirty-one years.

A standard test of statistical significance, if taken literally, would have implied that there was only about a 1-in-4,700,000 possibility that the relationship had emerged from chance alone.

Models which can be “tuned” in many different ways give researchers more scope to perceive a pattern where none exists.

According to some estimates, three-quarters of published scientific papers in the field of machine learning are bunk because of this “overfitting”, according to Sandy Pentland, a computer scientist at MIT.

Questions

How do you recognize and prevent statistical overfitting?

Danger of Statistical Models

All models are wrong - but some are useful. George Box - Statistician

Danger is many hide behind models attempting to understand reality.

Are we using models of uncertainty to produce certainties?

Central lesson from decision-making (as opposed to working with data on a computer or bickering about logical constructions) is the following:

It is the exposure (or payoff) that creates the complexity - and the opportunities and dangers - not so much the knowledge ( i.e., statistical distribution, model representation, etc.).

In some situations, you can be extremely wrong and be fine, in others you can be slightly wrong and explode.

If you are leveraged, errors blow you up; if you are not, you can enjoy life.

A Turkey is fed for a 1000 days - every day confirms to its statistical department that the human race cares about its welfare "with increased statistical significance".

On the 1001st day, the turkey has a surprise.

The graph above shows the fate of close to 1000 financial institutions (includes busts such as FNMA, Bear Stearns, Northern Rock, Lehman Brothers, etc.).

The banking system (betting AGAINST rare events) lost > 1 Trillion dollars on a single error, more than was ever earned in the history of banking.

Yet bankers kept their previous bonuses and it looks like citizens have to foot the bills.

Questions

Can any known statistical method can capture the probability of rare events with any remotely acceptable accuracy (except, of course, in hindsight, and "on paper")?

Would you ever cross a river because it is “on average” 4 feet deep?

Traditional or conventional statistics purports to honor the scientific method.

Yet we appear to be drowning in illusions of reality from misuses of statistics considering the recent availability of massive amounts of data (of varying and dubious quality) and powerful, cheap computer resources.

How do we correct the flawed “scientific” research currently conducted in academia? Our future data scientists are being trained by academics who produce mostly bad research (majority found to be dead wrong).

In science, it has been proposed that statistics in the computing cloud can and should replace the process of understanding within a scientist's brain. Really?

There appears to be a need for innovation in statistical methods for finding valuable, actionable insights in large data sets.

Hedge fund trader counsel:

“Never let a day go by without studying the changes in the prices of all available trading instruments. You will build an instinctive inference that is more powerful than conventional statistics.”

Questions

How can data scientists best learn to detect misuse (negligence or intentional) of statistics?

How can data scientists help develop new statistical techniques to find meaning from large data sets while avoiding the traps of traditional or conventional statistics?

How can we get academia to change to produce future data scientists who use statistics correctly, honor the scientific method and produce solid data science representing reality so the consumers of data science can make optimal decisions?

Thank you.

Panel on "Resolved: Traditional Statistics is Dead"

Data Science Association, Inc.at

University of Colorado DenverOctober 22, 2013

Moderator: Michael Walker

Panelists: Nancy Abramson, Theodore Van Rooy, Mark Labovitz, Michael Malak, Joseph Rickert

Documents

Panel on Resolved: Traditional Statistics is Deadfiles.meetup.com/4501642/Panel on -Resolved- Traditional... · 2013. 10. 25. · Understanding Statistical Insignificance In 2005