Sunday, May 8, 2016

Types of Non-parametric Tests

Two Independent (unpaired) Samples Student's t test: 

Assumptions of the parametric test

  1. Data from both samples are randomly selected
  2. Data from both samples come from normally distributed populations
  3. Homogeneity of variance (variances are equal)
Non-parametric alternatives
  • Mann-Whitney U test
Two Dependent (paired) samples Student's t test:

Assumptions of the parametric test
  1. the differences (di) must come from a normally distributed population of differences
Non-parametric alternatives
  • Wilcoxon signed rank (paired samples or matched pairs) test
ANOVA

Assumptions of the parametric test
  1. Data from all samples are randomly selected 
  2. Data from all samples come from normally distributed populations
  3. Homogeneity of variance (variances are equal)
Non-parametric alternatives 
  • Kruskal-Wallis H test
Pearson Product Moment Correlation Coefficient Analysis

Assumptions of the parametric test
  1. Y data for each X must be randomly selected form a normal distribution of Y values
  2. X data must be randomly selected from a normal distribution of X values
Non-parametric Alternatives
  • Spearman Rank Correlation Coefficient Analysis 


   

Medical Application of Stats

Special Terms and Definitions:

Prevalence - total number of cases with a disease in a population at a given time
Simple Example:
-100 patients with disease out of a population of 200
prevalence is 100/200 = 50%

  • True positive - someone who tests positive for a disease and actually has the disease
  • False Positive - someone who tests positive for a disease but does not have the disease
  • False negative - someone who tests negative for a disease and actually has the disease
  • True negative - someone who tests negative for a disease and does not have the disease
Sensitivity - refers to the sensitivity of a test for people with the disease.
Definition: probability that a person with the disease will be correctly identified by a test for the disease
Specificity - refers to the sensitivity of a test for people without the disease: 
Definition: probability that a person who does not have the disease will be correctly identified
Double-blind study - neither the investigator nor the subject knows which subject is receiving an experimental treatment or control experience (e.g., drug versus placebo)

Reliability - reproducability of a test

Validity - whether a test truly measures what it purports to measure, refers to the appropriateness of a test's measurements

Meta-analysis - pooling results from several previous studies to achieve greater statistical power

Case-control study
- observational study, samples chosen based on presence/absence of disease

Cohort study - observational study, samples based on presence/absence of risk factors and subjects followed over a period of time

Clinical trial - experimental study, compares therapeutic benefit of 2 or more treatments


Data Transformation

Logarithmic - used when:
  • the variances are not equal (heterogeneity of variances)
  • standard deviations are proportional to the means (CV's are equal) or 
  • when the data is positively skewed 
Procedure:
1) Convert raw data into their logarithms
2) Perform analysis on log data
3) Convert back into units of the raw data by taking the antilog of the results
Square Root - used usually with counts (number of ......) data and when the variance is:
  • proportional to the mean (i.e., variance increases with increasing size of the mean). 
Procedure:
1) Convert raw data into square root transformation
2) Perform analysis on square root data
3) Convert back into units of the raw data by subtracting .5 and squaring the results

Arcsine - used to normalize data in percentages or proportions whose distributions fits the binomial distribution. 
Procedure:
1) Convert raw data into arscin transformations
2) Perform analysis on arscin data
3) Convert back into units of the raw data by taking the sin of squared the results
Reciprocal Transformation - used when: the standard deviation is proportional to the square of the mean. 

Squared Transformation - used when: the standard deviation decreases as the mean increases. 




Chi Square Analysis

Chi Square Analysis

Used to examine differences in distributions of nominal data. Developed by Pearson in the early 1900s

Chi Square Goodness of Fit: used to evaluate whether observed data fit a theoretical or known distribution
Different variations of Chi Square Goodness of Fit:
Simple - 2 cateogries at a time (k=2)
Example - k=2 cateogries of flower color: 1) yellow flowers and 2) green flowers.
Ex: compare observed distribution of individuals with yellow or green flowers with a hypothetical distribution of 3 yellow : 1 green or 75% yellow 25% green.
Complex - More than 2 categories at time (k>2)
Example - k=4 categories of seeds: 1) yellow and smooth seeds, 2) yellow and wrinkled seeds, 3) green and smooth seeds, 4) green and wrinkled seeds.
Ex: compare observed distribution of individuals in these categories with a theoretical distribution of 9:3:3:1 or 9/16 yellow and smooth seeds, 3/16 yellow and wrinkled seeds, 3/16 green and smooth seeds, 1/16 green and wrinkled seeds.
Mechanism:

  1. Statistical hypotheses: simple statements that a population fits a theoretical or known distribution or it does not
  2. Formula for Chi Square involves comparison of deviations between observed and expected frequencies
  3. Compare observed Chi Square with critical value from a table of critical values
Chi Square Contingency Analysis:

-Evaluates whether frequency of occurrence of one variable is independent of frequencies in a second variable or asks question: is membership in one category influenced by membership in a second category
Example: is hair color independent of gender or would you expect more boys to have dark hair and more girls to have light hair?
Mechanism:
Statistical hypotheses: simple statements that one variable is independent or is not influenced by a second variable
Formula for Chi Square involves comparison of deviations between observed and expected frequencies

  1. use a contingency table with rows (r) and columns (c)
  2. obtain expected values for each cell in the table
  3. compare observed Chi Square with critical value from a table of critical values
  4. use the number of rows and columns to calculate the df for the critical value



Linear Correlations

Linear Correlations: measure the strength of association between 2 variables
Variables are X and Y but they are different from regression analysis
-no independent or dependent variables - just 2 different variables
3 types of relationships

  • positive
  • negative
  • none
Use the correlation coeficient to measure the strength of association
-referred to as the simple correlation coefficient, PPM or Pearson product Moment correlation coefficient
Coefficent of determination, r2 - square of the correlation coefficient, amount of variation in one variable explained by the correlation of both variables.

Assumptions for statistical testing:
-X values at each Y are assumed to have been randomly drawn from a normal population of X values
-Y values at each X are assumed to have been randomly drawn from a normal population of Y values
Steps:

  1. statistical hypotheses
  2. calculate the correlation coefficient
  3. conduct statistical tests 
-use the cricital values of the correlation coefficient
-t testing
-F testing

Monday, April 11, 2016

Measure of Central Tendency


Measures of Central Tendency
  • Arithmetic Mean or Average
  • Median- Middle datum of a sample
    • 50% of data lies about mean
    • 50% of data lies below mean
      • To find the mean
        • Step 1- sort data
        • Step 2- determine whether n= even or odd
  • Mode- datum that occurs most often in a sample
    • 2 Step process
      • Sort the data
      • Conduct frequency analysis
      • Count the number of occurrences of each datum

Measure of Variation

Measure of Variation
Used to asses the variation of data around the mean or median or mode
  • Range
    • the numerical difference between the minimum and maximum values in a data set
  • Variance
    • calculate in squared unites of the original data
      • population
      • sample
  • Standard deviation
    • measure of the spread of data around a mean
  • Standard error
    • describes dispersion of sample mans around their population mean
  • Coefficient of variation
    • used to compare amount of variation among samples with data that differs in magnitude
Variance 
  • 2 step process
    • Find SS
    • Divide SS by n-1=df
      • use n-1=df 
      • Conservative estimate of the population variance 

Normal Distribution

Normal Distribution

  • Used to determine the probability of obtaining random samples with different means
  • Many samples and populations contain data that fit a normal distribution
  • Can be used for basis of statistical testing
Characteristics
  • Bell Curve
    • most values near the middle datum or average of the sample
    • very few values near the upper and lower extremes
  • Data fit the formula of a normal distribution
    • Y= frequency of a value of x 
Deviations from a Normal Distribution
  • Asymmetric Deviations
    • Skewed distributions 
      • Skewed to the right= pos. skewed
      • Skewed to the left= neg. skewed
    • Mathematical Analysis of skewness
  • Kurtic Deviations 
    • Platykurtic
    • Leptokurtic
    • Mathematical analysis of kurtosis

Student's t-test

Student's t-test

  • Background
    • developed by Gusset working at Guinness Brewery
    • Problems with the Normal Distribution
    • Gossett discovers the t distribution
    • Publishes it under an assumed name-student
Calculation of Z scores require knowledge of population parameter

  • Population Mean
  • Population Standard Deviation and Standard Error
  • Small samples do not provide reliable enough estimates of population parameters
Characteristics of the t distribution 
  • Leptokurtic
  • As n and v (df=v=n-1) increased the t distribution begins to approach a normal distribution
Types of Student's t tests
  • One-sample Student's t test
  • Two independent (unpaired) Samples Student's t test
  • Two dependent (paired) Samples Student't test
One-sample Student's t test
  • Used to compare a population mean inferred from a sample with a hypothetical population mean 
Two Independent (unpaired) Sample Student's t test
  • Used to compare two independent population mean inferred from two samples (independent indicated that the value from both samples are numerical independent of each- there is no correlation 
Two dependent (paired) Samples Student's t test 
  • Used to compare two dependent populations inferred from two samples (dependent indicates that the value from both samples are numerically dependent upon each other- there is a correlation between corresponding values)
Two variations of all Student's t test
  • Two-tailed test
  • One-tailed test 
Two-tailed test- evaluates whether a difference exists between 2 samples, not the direction of the difference

One-tailed test- evaluates whether a difference exists between 2 samples, and specifically evaluates the direction of the difference 


ANOVA

ANOVA

  • Developed by Fisher
  • Studied Agriculture and crop output with different fertilizers 
  • Needed test that could evaluate differences between three or more means
  • Why- problems with applying the Student's t test
One- ANOVA
  • Examines one factor at a time, tests for differences among levels of the factor
Fixed Effects or Model I One-way ANOVA
  • the levels of the factor are specifically chosen by the investigator
Random effects or Model II One-way ANOVA
  • the levels of the factor are randomly chosen by the investigator
Mechanics of One-way ANOVA
  • Statistical hypothesis
  • Formulae
  • Critical values and decisions to reject/ not reject the null hypothesis
Formulae
  • Focus in on analysis of variance- comparison between 2 types of variance
  • Numerator= among group variance (variation among the grand mean and the sample means)
  • Denominator= within group variance (sum of the variation within each sample-around each sample mean)
Results of ANOVA
  • No difference between among group variance and within group variance
  • There is no difference among means
  • Stop all testing and write the results section
  • Differences exists between among group variance and within group variance
  • There is a difference among means
  • Follow with a multiple comparisons test to determine which means are different from each other
Example of an ANOVA
  1. State the biological question
  2. Translate into statistical hypotheses
  3. State the alpha level
  4. State the statistical test
  5. State the assumption of the test
  6. Calculate the observed test statistic
  7. Find the degrees of freedom and critical value
  8. Compared the observed and critical value 
  9. Interpret the results 


Multiple Comparisons Test

Multiple Comparisons Tests

  • Only used if the results of an ANOVA yield significant difference
  • ANOVA results only indicate that a difference exits among means, not where the difference is
  • Referred to as ad hoc or a posteriori test 
  • Used after you know there is a significant difference from the ANOVA
  • Several types of multiple comparisons tests
  • Three broad categories
    • Generic multiple comparisons test
    • Control group test
    • Multiple contrasts tests
  • Generic Multiple comparisons test
    • evaluate all possible pairs/combinations of means
    • Tukey's HSD test
    • Student-Newman-Keels (SNK) test
  • Control Group test
    • Evaluate differences between experimental group versus the control group
    • Dunnett's test
  • Multiple Contrats tests
    • can be used like the traditional tests mentioned above to evaluate differences among pairs of mean but is better used to evaluate homogeneous groups of means against other such groups or individual means 
    • Scheme Test
  • Basic mechanics of Multiple Comparisons Test
    • Observed Test Statistic
    • SE=standard error
    • Statistical Hypotheses
    • A and B represent any pairs of means
    • Pairwise comparisons 
      • arranged in order from largest to smallest
      • Calculate observed test statistic for each comparison
    • Enclosure Rule
      • If two mean are not different from each other then all means in between them are also not different from each other
  • Similarities among different Multiple Comparisons Test
    • All test involve pairwise comparisons of means
    • Rank order the means for comparisons 
    • Calculate an observed q value similar to the t test and z scores
    • Compare with a critical value and reject or do not reject the null hypothesis for each pairwise comparison 
    • Use the enclosure rule in all tests
  • Differences among different Multiple Comparisons test
    • how the means are rank order
    • most test are two-tailed but control group tests can be one tailed test
    • the SE term differs among tests 
  • Mechanics 
    • Arrange statistical hypotheses 
    • Calculate test statistics= observed q
    • Decisions rules and critical values
  • Test 
    • Tukey's HSD
    • SNK
    • Dunnett
    • Scheme

Linear Regression

Linear Regression

  • Tests for significant relationship between 2 variables 
  • Defines each variable
    • Y- dependent variable
    • X- independent variable
    • Y varies in response to changes in X
  • Defines functional mathematical relationship 
    • y=bx+a
  • Used for prediction 
  • Relationship is a functional dependence
  • The magnitude of a dependent variable (Y) is dependent on magnitude of an independent variable (X)
  • Functional dependence is a mathematical relationship that can be quantified 
  • Linear Regression equation used to describe the mathematical relationship 
    • Y=a+bX or 
    • Y=bx+a
      • Y or dependent or criterion or response variable
      • X independent or predictor or regressor variable 
      • a= y-intercept (where x=0)
      • b= slope or regression coefficient 
  • Functional dependence is a mathematical relationship that can be quantified
    • Postitive, negative, and no relationship

Saturday, February 6, 2016

Hypothesis Testing

Scientific Hypotheses and the Scientific Method

  • Science- Search for natural explanations of natural phenomena 
  • Methodology- Scientific Method 
Scientific Method 
  1. Obtain background information (literature search and review, Internet, lab observations) know what's known and what's unknown 
  2. Ask biological questions- what questions: descriptive research, how questions: search for casual mechanisms 
  3. Develop testable hypotheses
  4. Design experiment to test hypothesis 
  5. Collect Data
  6. Analyze data
  7. Interpret results 
  8. Answer biological questions- Did data and results support the hypothesis 
  9. Present the results- oral or written presentations for publication in a scientific journal 
Statistical Hypotheses and Statistical Testing 
  • Null Hypothesis- no differences from experimental hypotheses, no relationship 
  • Alternate Hypothesis- opposite the Null Hypothesis 
  • Two-tailed and one-tailed variations 

Properties of Data and Variable in Statistics

All scientific data in Science:
Measurements in the Metric System

  • Volume (mL or cc)
  • Distance (m or km)
  • Temperature (°C)
  • Weight (g) 




Types of Data:
  • Ratio-constant size interval between values. Commonly used. True zero. Ex: most measurements weight, volume, length, etc. 
  • Interval- Constant size interval between values, arbitrary zero. Ex: temperature, time of day, date of year
  • Ordinal- ordered or ranked data, no numerical difference between data. Ranking. Ex: darker versus lighter, shorter versus taller, faster versus slower
  • Nominal- non-numeric qualities or artistes, names, qualitative data. Ex: colors, gender, locations.
Variables:
  • Continuous- data where there are an infinite number of values between any two individual values ex: 2.7 and 2.8
  • Discrete-integers (counts) ex: 35 seals, 15 subjects
Measurement Concepts:
  • Accuracy-About the measuring device. Refers to how close a measurement is to the real measurement, evaluates measuring device. Ex: a scale or balance is only accurate to the nearest 0.1g
  • Precision-About the researcher. Refers to how close repeated measurements are to each other. 
  • Significant Digits- Implied range
  • Rounding Rules- If x>5 then round up. If x<5 then do not round. 



Definitions

Definition of Statistics:

1. Statistics: science or discipline, branch of mathematics. The science of how to make statistics. Concerned with the study and development of concepts ex: data, statistical testing, probability distributions

2. Statistics: numerical summary of data, descriptive or inferential statistics. Used to estimate or make inferences about populations parameters. Ex: population mean, population variance, population standard deviation

3. Statistics: test status that is a result of a statistical test, result of computation. The results of a comparison between 2 or more populations based on samples drawn from those populations. The results of test for a relationship between variables. Ex: t-tests

History of Statistics

Ancient Greek: The Philosophers- they came up with ideas but had no quantitative analyses.
17th Century: Graunt- studied affairs of the state and began the stepping stones for statistics. Petty- Economist who studied probability and created the census technique. Pascal- Studied probability through gambling.
17th-18th Century: Bernoulli-studied probability and risk.
18th Century: Laplace- created the normal curve and studied regression. Gauss- created the bell-shaped curve.
19th Century: Quetelet- applied statistics to human behavior (criminals) in order to predict criminal behavior through physical properties. Galton- (Darwin's Cousin) looked at relationships of parents and their offspring. Applied genetic base for size (height).
Early 20th Century: Pearson- Father of Statistics. Created first statistic journal Biometrika. Formed the first academic department for statistics. Gossett (Student)- student of Pearson's. Studied brewing and created the first student-t test to compare means. Fisher- developed ANOVA to compare 3 or more means.
Later 20th Century: Wilcoxon-biochemist who developed the wilcoxon test. Kruskal, Wallis- economist who developed the non-paramedic equivalent of the ANOVA. Spearman- a psychologist who developed a non-parametric equivalent of the correlation coefficient. Kendall- the first real statistician, developed another non-parametric equivalent. Tukey- Statistician, developed multiple comparisons procedure. Dunnett- biochemist, developed multiple comparisons procedure to compare the control groups. Keuls- agronomist who developed multiple comparisons procedure.
Computer Technology: ENIAC- used during WWII to position guns. Easier to use than calculating by hand.