Send
Close Add comments:
(status displays here)
Got it! This site "creationpie.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Regression and correlation
1. Regression and correlation
In English, to be
independent is not to be dependent on someone are something else.
The American War of Independence was fought to achieve independence from Great Britain.
The "
Declaration of Independence" is a document that claimed the right to be independent of Great Britain.
2. Statistical independence
In statistics, two events A and B are
independent if the following holds.
Prob(A) =
Prob(A | B)
That is, the probability of A occurring is not influenced by the occurrence of B.
It follows that events A and B are dependent (ii.e., not independent) if
Prob(A)
≠ Prob(A | B).
.
3. Dependence
But, if events A and B are dependent, what is the next question?
If two events A and B are dependent (i.e., not independent), then the next obvious question is, "What is the nature of that dependence?".
4. Variables
Often a model is constructed to model a dependent variable in terms of one or more independent variables.
One meaning of the word dependent is relying on something else.
A dependent variable is the variable that is to be predicted or estimated. It relies on something else.
One meaning of the word independent is not relying on something else.
5. Variables
An independent variable is a variable that is the basis for estimating or predicting a dependent variable.
A dependent variable is hypothesized to be dependent on the independent variable.
6. Function
Consider the following function that describes the relationship between variables
x and
y.
y = f(x) = 2 * x + 1
What is the independent variable?
What is the dependent variable?
Consider the following function.
y = f(x) = 2 * x + 1
The value of
y is dependent on the value of
x. Thus, in this form, the variable
x is the dependent variable and the variable
y is the dependent variable.
7. Inputs and outputs
The dependent variable of a function are often called the function output.
The independent variables of a function are often called the function inputs.
8. Correlation
A correlation between two things is a relationship or association between those two things.
Techniques called correlation analysis are used to measure the strength of associations between two variables (i.e., things).
There is a correlation between two variables x and y when there is a dependence of x on y or y on x.
9. Causation
There is a causation between variables x and y when x causes y or y causes x.
Such a relationship is called a casual relationship.
10. Chain reaction
In physics, a nuclear chain reaction is caused by neutrons hitting radioactive uranium 235 (under certain conditions).
Showing causation is hard to do. That is, that
x causes
y.
11. Lawyers
If a lawyer's client tripped over a ladder and fell down and got hurt, the lawyer will not argue the following.
The ladder caused my client to fall and get hurt.
Instead, the lawyer will argue the following, which is part of legal logic.
If it were not for the ladder, my client would not have fallen.
12. Doctors
A patient goes to a doctor.
Patient: Doc. It hurts when I do this. (the patient shows the doctor what is done to cause the pain).
Doctor: Well, then. Don't do that.
13. Coincidence
A
coincidence between variables
x and
y is a relationship based on chance.
You are on a trip. You meet someone you never knew before who grew up in your same little hometown. This is a coincidence.
Coincidences happen all of the time.
It would be very unlikely that coincidences would not happen.
However, it is very unlikely that you could predict a coincidence before it happened.
Sometimes it can be hard to determine if something is a coincidence or caused by something else.
14. Causation and coincidence
Correlation, causation, and coincidence are not the same!
correlation: same trends between somethings (can be used to deceive)
causation: something causes something else
coincidence: a chance happening between somethings
It would be unlikely if coincidences did not happen. (Paulos in
Innumeracy).
15. Reading and height
For example, consider children in the age range of 1 to 15 years. Is there a correlation between reading ability and height?
The question to ask is the following.
Does the child's reading ability increase as a child's height increases?
reading_ability = f ( height )
Yes, as a child's height increases, the child's reading ability increases. So, the following is true.
reading_ability = f ( height )
There is a correlation.
Does increased height cause increased reading ability?
No. There is a correlation between reading ability and height, but increased height does not cause reading ability to increase.
What about weight?
16. Reading and weight
Is there a correlation between reading ability and weight?
The question to ask is the following.
Does the child's reading ability increase, In general, as a child's weight increases?
reading_ability = f ( weight )
Children who weigh more read better.
17. Reading and age
Children who are older read better.
Yes, as a child's weight increases, the child's reading ability increases. So, the following is true.
reading_ability = f ( weight )
There is a correlation.
Does increased weight cause increased reading ability?
There is a correlation between reading ability and weight, but increased weight does not cause reading ability to increase.
18. Stock prices
A commonly heard statement on the nightly news is something to the effect that, "
Stocks went up amid rumors that ...".
Is there a casual relationship between stocks going up and some other event?
Maybe. Maybe not.
We might like to think that there is a reason, but variations in the stock market are often better explained as the result of statistical fluctuation and not of casual relationships.
19. Relationships
A direct positive linear relationship means that as the independent variable increases, the dependent variable increases.
A direct negative linear relationship means that as the independent variable increases, the dependent variable decreases.
20. Strength of relationships
The
coefficient of correlation is a measure of the strength of the linear relationship between two sets data.
The coefficient of correlation, or
r-value, ranges from
-1.0 to
+1.0.
A correlation of -1.0 or +1.0 indicates perfect correlation.
A correlation of -1.0 is a indicates a perfect negative correlation.
A correlation of +1.0 is a indicates a perfect positive correlation.
A correlation of 0.0 indicates no correlation.
Even though there is a correlation, this does not mean that there is a casual relationship between the independent variable (i.e., weight) and the dependent variable (i.e., height).
21. Predictions
However, correlations can be used to make predictions, although there is almost always error involved in making such predictions.
22. End of page