Q: What does the p-value actually mean?

The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. It is not the probability that H₀ is true, nor the probability that your result happened by chance. A p-value below α (commonly 0.05) means the observed data would be surprising if H₀ were true, so you reject H₀. A p-value above α means the data are consistent with H₀, so you fail to reject it — but this does not prove H₀ is correct.

Q: When should I use a one-tailed versus a two-tailed test?

Use a two-tailed test when a difference in either direction is scientifically meaningful and you have no strong prior reason to expect a specific direction. Use a one-tailed test when theory or prior evidence clearly specifies the direction of the effect before data collection begins. Switching to a one-tailed test after seeing the data to achieve significance is p-hacking and invalid. A one-tailed test at α=0.05 is equivalent to a two-tailed test at α=0.10.

Q: What is the significance level α and how do I choose it?

The significance level α is the maximum acceptable probability of a Type I error — incorrectly rejecting a true null hypothesis. The conventional choice is 0.05 (5%), but 0.01 is used when false positives are particularly costly (medical diagnostics, safety-critical systems). Some fields now recommend reporting exact p-values rather than relying on a fixed threshold, and combine them with confidence intervals and effect sizes for a fuller picture.

Q: What are Type I and Type II errors?

A Type I error (false positive) occurs when you reject H₀ even though it is true; its probability is α. A Type II error (false negative) occurs when you fail to reject H₀ even though it is false; its probability is β, and statistical power is 1−β. Reducing α tightens the criterion for rejection, which lowers Type I errors but increases Type II errors. Increasing sample size is the cleanest way to reduce both simultaneously.

Q: Can I use this calculator for proportions from a survey?

Yes — the Z-Test for Proportion mode is designed exactly for this. Enter the hypothesised population proportion p₀ (your baseline or theoretical value), your sample size n, and the observed sample proportion p̂ (successes divided by n). The calculator applies the standard formula Z = (p̂ − p₀) / √(p₀(1−p₀)/n). The normal approximation is reliable when both n·p₀ and n·(1−p₀) exceed 5 or 10.

Question 1

What is the difference between a Z-test and a T-test?

Accepted Answer

A Z-test is used when the population standard deviation σ is known, which allows the use of the standard normal distribution to compute exact p-values. A T-test is used when σ is unknown and must be estimated from the sample standard deviation s; the resulting test statistic follows a t-distribution with n−1 degrees of freedom, which has heavier tails than the normal to account for the added uncertainty. As the sample size grows, the t-distribution converges to the normal, so the distinction matters most for small samples (roughly n < 30).

Question 2

What does the p-value actually mean?

Accepted Answer

The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. It is not the probability that H₀ is true, nor the probability that your result happened by chance. A p-value below α (commonly 0.05) means the observed data would be surprising if H₀ were true, so you reject H₀. A p-value above α means the data are consistent with H₀, so you fail to reject it — but this does not prove H₀ is correct.

Question 3

When should I use a one-tailed versus a two-tailed test?

Accepted Answer

Use a two-tailed test when a difference in either direction is scientifically meaningful and you have no strong prior reason to expect a specific direction. Use a one-tailed test when theory or prior evidence clearly specifies the direction of the effect before data collection begins. Switching to a one-tailed test after seeing the data to achieve significance is p-hacking and invalid. A one-tailed test at α=0.05 is equivalent to a two-tailed test at α=0.10.

Question 4

What is the significance level α and how do I choose it?

Accepted Answer

The significance level α is the maximum acceptable probability of a Type I error — incorrectly rejecting a true null hypothesis. The conventional choice is 0.05 (5%), but 0.01 is used when false positives are particularly costly (medical diagnostics, safety-critical systems). Some fields now recommend reporting exact p-values rather than relying on a fixed threshold, and combine them with confidence intervals and effect sizes for a fuller picture.

Question 5

What are Type I and Type II errors?

Accepted Answer

A Type I error (false positive) occurs when you reject H₀ even though it is true; its probability is α. A Type II error (false negative) occurs when you fail to reject H₀ even though it is false; its probability is β, and statistical power is 1−β. Reducing α tightens the criterion for rejection, which lowers Type I errors but increases Type II errors. Increasing sample size is the cleanest way to reduce both simultaneously.

Question 6

Can I use this calculator for proportions from a survey?

Accepted Answer

Yes — the Z-Test for Proportion mode is designed exactly for this. Enter the hypothesised population proportion p₀ (your baseline or theoretical value), your sample size n, and the observed sample proportion p̂ (successes divided by n). The calculator applies the standard formula Z = (p̂ − p₀) / √(p₀(1−p₀)/n). The normal approximation is reliable when both n·p₀ and n·(1−p₀) exceed 5 or 10.

Scenario	Result	Interpretation
Quality control: x̄=10.01mm, μ₀=10mm, σ=0.03, n=50, α=0.05, two-tailed Z-test	Z=2.357, p=0.0184 → Reject H₀	The mean bolt diameter has shifted significantly from the 10 mm target; the process needs adjustment.
Drug trial: x̄=12 mmHg, μ₀=10, s=3, n=30, α=0.05, right-tailed T-test	T=3.651, df=29, p=0.0005 → Reject H₀	Strong evidence that the drug reduces blood pressure by more than 10 mmHg on average.
A/B test: p̂=0.095, p₀=0.08, n=1000, α=0.05, right-tailed Z-test (proportion)	Z=1.750, p=0.0401 → Reject H₀	The new button design significantly increases the click-through rate above the baseline 8%.
Fuel efficiency: x̄=29 mpg, μ₀=30, σ=2, n=40, α=0.01, left-tailed Z-test	Z=−3.162, p=0.0008 → Reject H₀	Evidence at the 1% level that the car model's fuel efficiency is below the advertised 30 mpg.

Hypothesis Testing Calculator - Z-Test, T-Test & P-Value

About the Hypothesis Testing Calculator

Hypothesis Testing Examples

How to Use the Hypothesis Testing Calculator

Hypothesis Testing FAQ