Biostats BI 345 Blog

Sunday, May 8, 2016

Types of Non-parametric Tests

Two Independent (unpaired) Samples Student's t test:

Assumptions of the parametric test

Data from both samples are randomly selected
Data from both samples come from normally distributed populations
Homogeneity of variance (variances are equal)

Non-parametric alternatives

Mann-Whitney U test

Two Dependent (paired) samples Student's t test:

Assumptions of the parametric test

the differences (di) must come from a normally distributed population of differences

Non-parametric alternatives

Wilcoxon signed rank (paired samples or matched pairs) test

ANOVA

Assumptions of the parametric test

Data from all samples are randomly selected
Data from all samples come from normally distributed populations
Homogeneity of variance (variances are equal)

Non-parametric alternatives

Kruskal-Wallis H test

Pearson Product Moment Correlation Coefficient Analysis

Assumptions of the parametric test

Y data for each X must be randomly selected form a normal distribution of Y values
X data must be randomly selected from a normal distribution of X values

Non-parametric Alternatives

Spearman Rank Correlation Coefficient Analysis

Medical Application of Stats

Special Terms and Definitions:

Prevalence - total number of cases with a disease in a population at a given time

Simple Example:

-100 patients with disease out of a population of 200
prevalence is 100/200 = 50%

True positive - someone who tests positive for a disease and actually has the disease
False Positive - someone who tests positive for a disease but does not have the disease
False negative - someone who tests negative for a disease and actually has the disease
True negative - someone who tests negative for a disease and does not have the disease

Sensitivity - refers to the sensitivity of a test for people with the disease.

Definition: probability that a person with the disease will be correctly identified by a test for the disease

Specificity - refers to the sensitivity of a test for people without the disease:

Definition: probability that a person who does not have the disease will be correctly identified

Double-blind study - neither the investigator nor the subject knows which subject is receiving an experimental treatment or control experience (e.g., drug versus placebo)

Reliability - reproducability of a test

Validity - whether a test truly measures what it purports to measure, refers to the appropriateness of a test's measurements

Meta-analysis - pooling results from several previous studies to achieve greater statistical power

Case-control study - observational study, samples chosen based on presence/absence of disease

Cohort study - observational study, samples based on presence/absence of risk factors and subjects followed over a period of time

Clinical trial - experimental study, compares therapeutic benefit of 2 or more treatments

Data Transformation

Logarithmic - used when:

the variances are not equal (heterogeneity of variances)

standard deviations are proportional to the means (CV's are equal) or

when the data is positively skewed

Procedure:

1) Convert raw data into their logarithms
2) Perform analysis on log data
3) Convert back into units of the raw data by taking the antilog of the results

Square Root - used usually with counts (number of ......) data and when the variance is:

proportional to the mean (i.e., variance increases with increasing size of the mean).

Procedure:

1) Convert raw data into square root transformation
2) Perform analysis on square root data
3) Convert back into units of the raw data by subtracting .5 and squaring the results

Arcsine - used to normalize data in percentages or proportions whose distributions fits the binomial distribution.

Procedure:

1) Convert raw data into arscin transformations
2) Perform analysis on arscin data
3) Convert back into units of the raw data by taking the sin of squared the results

Reciprocal Transformation - used when: the standard deviation is proportional to the square of the mean.

Squared Transformation - used when: the standard deviation decreases as the mean increases.

Chi Square Analysis

Chi Square Analysis

Used to examine differences in distributions of nominal data. Developed by Pearson in the early 1900s

Chi Square Goodness of Fit: used to evaluate whether observed data fit a theoretical or known distribution

Different variations of Chi Square Goodness of Fit:

Simple - 2 cateogries at a time (k=2)

Example - k=2 cateogries of flower color: 1) yellow flowers and 2) green flowers.

Ex: compare observed distribution of individuals with yellow or green flowers with a hypothetical distribution of 3 yellow : 1 green or 75% yellow 25% green.

Complex - More than 2 categories at time (k>2)

Example - k=4 categories of seeds: 1) yellow and smooth seeds, 2) yellow and wrinkled seeds, 3) green and smooth seeds, 4) green and wrinkled seeds.

Ex: compare observed distribution of individuals in these categories with a theoretical distribution of 9:3:3:1 or 9/16 yellow and smooth seeds, 3/16 yellow and wrinkled seeds, 3/16 green and smooth seeds, 1/16 green and wrinkled seeds.

Mechanism:

Statistical hypotheses: simple statements that a population fits a theoretical or known distribution or it does not

Formula for Chi Square involves comparison of deviations between observed and expected frequencies

Compare observed Chi Square with critical value from a table of critical values

Chi Square Contingency Analysis:

-Evaluates whether frequency of occurrence of one variable is independent of frequencies in a second variable or asks question: is membership in one category influenced by membership in a second category

Example: is hair color independent of gender or would you expect more boys to have dark hair and more girls to have light hair?

Mechanism:

Statistical hypotheses: simple statements that one variable is independent or is not influenced by a second variable

Formula for Chi Square involves comparison of deviations between observed and expected frequencies

use a contingency table with rows (r) and columns (c)

obtain expected values for each cell in the table

compare observed Chi Square with critical value from a table of critical values

use the number of rows and columns to calculate the df for the critical value

Linear Correlations

Linear Correlations: measure the strength of association between 2 variables

Variables are X and Y but they are different from regression analysis

-no independent or dependent variables - just 2 different variables

3 types of relationships

positive

negative

none

Use the correlation coeficient to measure the strength of association

-referred to as the simple correlation coefficient, PPM or Pearson product Moment correlation coefficient

Coefficent of determination, r² - square of the correlation coefficient, amount of variation in one variable explained by the correlation of both variables.

Assumptions for statistical testing:

-X values at each Y are assumed to have been randomly drawn from a normal population of X values
-Y values at each X are assumed to have been randomly drawn from a normal population of Y values

Steps:

statistical hypotheses

calculate the correlation coefficient

conduct statistical tests

-use the cricital values of the correlation coefficient
-t testing
-F testing

Monday, April 11, 2016

Measure of Central Tendency

Measures of Central Tendency

Arithmetic Mean or Average
Median- Middle datum of a sample

50% of data lies about mean
50% of data lies below mean

To find the mean

Step 1- sort data
Step 2- determine whether n= even or odd

Mode- datum that occurs most often in a sample

2 Step process

Sort the data
Conduct frequency analysis
Count the number of occurrences of each datum

Measure of Variation

Measure of Variation
Used to asses the variation of data around the mean or median or mode

Range

the numerical difference between the minimum and maximum values in a data set

Variance

calculate in squared unites of the original data

population
sample

Standard deviation

measure of the spread of data around a mean

Standard error

describes dispersion of sample mans around their population mean

Coefficient of variation

used to compare amount of variation among samples with data that differs in magnitude

Variance

2 step process

Find SS
Divide SS by n-1=df

use n-1=df
Conservative estimate of the population variance