Discrete Distrubutions

DistributionFormulaMean μVariance σ2Moment Generating Function
Bernoulli Distributionp if k = 1, 1 - p if k = 0, else 0pp(1 - p)1 - p + pet
Binomial Distribution
f(k;n,p)  =  n! * pk(1 - p)n - k
  k!(n - k)!

npnp(1 - p)(1 - p + pet)n
Negative Binomial Distribution
f(n;k,p)  =  (n - 1)! * pk(1 - p)n - k
  (k - 1)!(n - k)!

k
p

k(1 - p)
p2

(1 - p)r
(1 - pet)r

Geometric DistributionP(x = n) = p * (1 - p)(n - 1)
1
p

1 - p
p2

pet
1 - (1 - p)et

Poisson Distribution
P(k; λ)  =  λk
  eλk!

λλeλ(et - 1)
Hypergeometric Distribution
P(x;n,N,k)  =  (kCx) * (N - kCn - x)
  NCn

nk
N

nk(N - k)(N - n)
N2(N - 1)

N/A
Multinomial Distribution
ƒ(x0!·x1!·x2...·n;x;n,θ012,...,θn)  =  n!(θ0 · θ1 · θ2 · ... · θn)
  x0 · x1 · x2 · ... · xn

npinpi(1 - pi)(Σpieti)n
Uniform Distribution
P(k; λ)  =  λk
  eλk!

½a + b
(b - a)2
12

etb - eta
t(b - a)


Test Statistic Decision Tree:

TypeKeywordsSample SizeUse Test
Z-scoreMean, AverageGreater than 30 or population σ is known
z  =  X - μ
  σ/√n

t-scoreMean, Average30 or less: population σ is not known
t  =  x - μ
  s/√n

proportion-scoreProportion (Test p), Fraction, Percentage, Rate, Probabilitymore than 30
z  =  p^ - p0
  p0q0/n

Variance σ2Variance, Variability, SpreadN/A
χ2  =  (n - 1)s2
  σ2

Equal Variances σ12 = σ22Equal Variances, Ratio or Difference in VariancesN/A
F  =   σ12
   σ22


Confidence Interval Decision Tree

TypeKeywordsSample SizeUse Test
Test the MeanConfidence Interval, Mean, AverageGreater than 30X - zscoreα/2 * s/√n < μ < X + zscoreα/2 * s/√n
Test the MeanConfidence Interval, Mean, Average30 or lessX - tscoreα * s/√n < μ < X + tscoreα * s/√n
Test the VarianceConfidence Interval, VarianceGreater than 30(n - 1)s22α/2 < σ2 < (n - 1)s221 - α/2
Test the Standard DeviationConfidence Interval, Standard DeviationGreater than 30Square Root((n - 1)s22α/2) < σ2 < Square Root((n - 1)s221 - α/2)
Test the ProportionConfidence Interval, Proportion, percentage, rate, PopulationGreater than 30(n - 1)s22α/2 < σ2 < (n - 1)s221 - α/2
Test the Difference of MeansConfidence Interval, Difference of MeansGreater than 30(x1 - x2) - zscoreα x √a < μ1 - μ2 < (x1 - x2) - zscoreα x √a
Test the Difference of MeansConfidence Interval, Difference of Means30 or less(x1 - x2) - tscoreα x √a < μ1 - μ2 < (x1 - x2) - tscoreα x √a
p^ Confidence IntervalConfidence Interval (test p), criteria,characteristic, proportion30 or lessp^ - zα/2σ√p(1 - p)/n < p < p^ + zα/2p(1 - p)/n

Sample Size Decision Tree:

TypeKeywordsUse Test
Sample Size for μSample Size, average, mean
n  =  Z-scoreα/22 x σ2
  SE2

Proportion Sample SizeSample Size, Proportion, Population, Percentage, Rate
n  =  Z-score2 x p x (1 - p)
  SE2

μ1 - μ2 Sample SizeSample Size, Difference of Means, μ1 - μ2
n  =  Z-score212 + σ22)
  ME2

p1 - p2 Sample SizeSample Size, Difference of p, p1 - p2
n  =  Z-score2(p1q1 + p2q2)
  ME2

Hypothesis Testing Decision Tree

p-value Significance Test (observed level of significance):

Find your z-score, then find the probability in the z-table associated with that score, and if α > probability (p-value), reject H0

Hypothesis Testing Errors:

Type I error - Reject null hypothesis H0 when H0 is TRUE: Probability = α
Type II error - Accept null hypothesis H0 when H0 is FALSE: Probability = β
Power of the Test = Probability you Reject null hypothesis H0 when H0 is FALSE: --> 1 - β
Note: It is a bigger mistake to make a Type II error than a Type I error

Finite Population Correction Factor:

If n/N > 0.05, then you multiply your confidence interval by the following factor
N - n
N

Regression Testing and Correlation Coefficients:

Cov(X,Y)  =  Σ(Xi - X)(Yi - Y)
  n

Correlation Coefficient (r)  =  Cov(X,Y)
  sxsy

β  =  Σ(Xi - X)(Yi - Y)
  Σ(Xi - X)2

Least Squares Regression Line ← α = Y - βX
y^ = α + βx where α is the y-intercept for the line that contains the points in X & Y and β is the is the slope of the line that the set of points lies on.
α & β are designed such that they produce the smallest possible SSE defined below
Sum of Squares about the Mean (SSM) = (yi - y)2
Square of the Residual Difference (SSE) (yi - y^i)2
SSE represents the difference between the straight line that we create and the plotted points from our data
Coefficient of Determination (r2)  =  SSM - SSE
  SSM

Large Sample Condition Requirement:

1. A random sample is selected from the target population.
2. The sample size n is large (i.e., n ≥ 30). (Due to the Central Limit Theorem, this condition guarantees that the test statistic will be approximately normal regardless of the shape of the underlying probability distribution of the population.)