Send
Close Add comments:
(status displays here)
Got it! This site "creationpie.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Bayesian statistics and models
1. Bayesian statistics and models
2. Statistical methods
The following is a traditional approach of
frequentist statistical .
Create a hypothesis called the null hypothesis.
Gather arguments for or against the hypothesis. This often involves "cherry picking" of evidence.
Order the arguments based on how convincing they are. Hide arguments not favorable.
Use any deception techniques on the assumption that the "end justifies the means".
It has been shown (e.g.,, Judea Pearl, 1970's) that the
frequentist statistical can show almost anything true and anything false as desired.
The
Bayesian statistical approach, and the
causation models developed by Pearl, fit more like a
constraint-based logic system in a probabilistic fault-tolerant search for truth within constraints.
|
Details are left as a future topic.
|
This is an introduction to Bayesian statistics.
Once data becomes too large to look at all the data, and one needs results based on many factors, query results will (and sometimes now have) error bars associated with them.
3. Trends
In computer science, a linear algorithm is needed to at least look at all of the data once.
As databases become bigger and bigger, the only way to get sub-linear algorithms is to not look at all of the data, which requires probabilistic models.
4. Michael Jordan
Michael Jordan: Berkeley (machine learning, computer science, statistics):
Data is getting really big.
Sub-linear algorithms are needed (cannot look at all the data).
Computer science and statistics will be merged in 50 years.
You can view his YouTube video at
http://www.youtube.com/watch?v=LFiwO5wSbi8.
5. Duality in statistics
Statistics has two correct ways of looking at reality. Both are correct. One may work better in a given situation.
Frequentist statistics (null hypothesis, confidence intervals, etc.)
Bayesian statistics (inverse probability, probability of causes, etc.)
Many statisticians disagree over both frequentist and Bayesian statistics being correct ways of looking at reality.
6. Thomas Bayes
Rev. Thomas Bayes (mid 1700's) had the original idea as a specific instance of a problem, used Newtons cumbersome notation to describe it.
7. Laplace
Pierre-Simon Laplace (1749-1827) in late 1700's and early 1800's independently developed the idea of Bayes Rule, and credits Bayes for one insightful idea, then refined and formalized the ideas in an elegant way.
Laplace developed both frequentist statistics and Bayesian statistics.
Problems with Bayesian statistics:
No formal theory for realistic problems
No computational way to solve problems
8. Others
Bayes Rule was independently developed and used by many during the next 150 years.
Alan Turing used Bayes Rule to help break the Enigma encryption during World War II.
9. Computational methods
In the 1980's, algorithmic advances using
MCMC (Markov Chain Monte Carlo) techniques make Bayesian system computationally tractable.
Technique: Gibbs Sampling
Software: BUGS, WinBuGS, OpenBUGS
Some of these techniques were known during World War II, but were kept secret until later discovered by academic researchers.
10. Formal theory
Computer scientist Judea Pearl (and others) developed the theory for causation and Bayesian inference and graphical models needed to model practical problems.
Problem: Statisticians for over 150 years have been doing one approach and are not ready for a new approach to the same problems.
11. Difference
According to Jordan, in general:
A frequentist approach will average over the possible data to see if the sample result is within a certain limit (i.e., confidence interval).
A Bayesian approach will look at just the available data, and what is known about the past data, in making a decision.
12. Reference
A good book on the history of Bayes Rule, and also of frequentist statistics, is "
The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy" by Sharon Bertsch McGrayne.
13. Conditional probability
These two (symmetric) conditional probability equations can be related by the common joint probability to get Bayes Rule.
14. Symmetry
The above equation is symmetric if
A and
B are interchanged.
15. Bayes Rule
Here, then, is Bayes Rule.
In usual form Bayes Rule appears as follows.
16. Bayes Rule
Let
A be the proposed
Model and
B be the observed
Data. Then Bayes rule becomes the following.
The posterior is P(Model | Data).
The likelihood is P(Data | Model).
The prior is P(Model).
The evidence is P(Data).
What happens if the prior,
P(Model), is
0.0?
17. Real world
In real world calculations, the posterior is proportional to the likelihood times the prior (together the numerator on the right side) so that the evidence can be ignored (in the denominator of the right side).
18. Cromwell's Rule
Cromwell's Rule has to do with setting the prior P(Model) to 0.0.
"As long as you are set that the probability is going to be zero, then nothing's going to change your mind." Mandansky.
"I beseech you, in the bowels of Christ, think it possible you may be mistaken." Oliver Cromwell.
19. Arthur Bailey
Favorite introduction to Bayes by Author Bailey, accountant and Bayes Rule popularizer in the early 1950's.
"If thou canst believe, all things are possible to him that believeth." Mark 9:23.
Takeaway: As soon as one gives no chance to something happening, there is no evidence that will logically sway that opinion. (Think certain viewpoints in politics, religion, sports, etc.).
20. Bayesian statisticians
How do you recognize a Bayesian statistician?
"
Ye shall know them by their posteriors."
21. Stumbling block
Stumbling block: In the absence of information about the domain of application, what value should be used for the
prior probability?
22. One approach
In the absence of information, the error in the prior probability is minimized by assuming equal odds (50.0%, or 0.5 probability for a sample space of two events) or a uniform prior.
Does this make you uncomfortable?
23. End of page