Q: What is the difference between a one-tailed and two-tailed test?

A two-tailed test checks for a difference in either direction and divides α equally between both tails. A one-tailed test focuses the full α on one direction and provides more power to detect an effect in that direction, but it is only valid when the direction of the effect is specified before seeing the data. Using a one-tailed test to rescue a borderline two-tailed result is p-hacking.

Q: How are degrees of freedom determined?

Degrees of freedom (df) reflect the number of independent pieces of information in the data. For a one-sample t-test, df = n − 1. For an independent-samples t-test, df = n₁ + n₂ − 2. For a Chi-square test of independence in an r × c table, df = (r − 1)(c − 1). For a one-way ANOVA F-test, the numerator df = k − 1 (groups minus 1) and denominator df = N − k (total observations minus groups).

Q: What is p-hacking and why is it harmful?

P-hacking is the practice of running multiple tests, subgroups, or model specifications until a p < 0.05 result appears, then reporting only that result. It inflates the true Type I error rate far above α and produces false positives that fail to replicate. To avoid it, pre-register your analysis plan, correct for multiple comparisons (e.g., Bonferroni correction), and report all tests performed.

Q: Can a very small p-value mean the result is unimportant?

Yes. With a large enough sample, even a trivially small effect (say, a drug that lowers blood pressure by 0.1 mmHg) will produce p < 0.001. Statistical significance and practical significance are not the same. Always compute and report an effect size measure (Cohen's d, odds ratio, R², etc.) alongside the p-value so readers can judge whether the effect is large enough to matter in practice.

Question 1

What does the p-value actually measure?

Accepted Answer

The p-value measures the probability of observing a test statistic as extreme as (or more extreme than) the one you calculated, assuming the null hypothesis is true. It quantifies how surprising your data are under H₀. It does not measure the probability that H₀ is true, the size of the effect, or the probability that you made an error.

Question 2

Why is α = 0.05 the conventional threshold?

Accepted Answer

The 0.05 threshold was popularised by Ronald Fisher in the 1920s as a convenient convention, not a universal truth. It means you accept a 5% chance of a false positive (rejecting a true H₀). Different fields use different thresholds: particle physics requires p < 5×10⁻⁷, genomics typically uses p < 5×10⁻⁸, and clinical trials sometimes use α = 0.01. The right threshold depends on the costs of false positives and false negatives in your domain.

Question 3

What is the difference between a one-tailed and two-tailed test?

Accepted Answer

A two-tailed test checks for a difference in either direction and divides α equally between both tails. A one-tailed test focuses the full α on one direction and provides more power to detect an effect in that direction, but it is only valid when the direction of the effect is specified before seeing the data. Using a one-tailed test to rescue a borderline two-tailed result is p-hacking.

Question 4

How are degrees of freedom determined?

Accepted Answer

Degrees of freedom (df) reflect the number of independent pieces of information in the data. For a one-sample t-test, df = n − 1. For an independent-samples t-test, df = n₁ + n₂ − 2. For a Chi-square test of independence in an r × c table, df = (r − 1)(c − 1). For a one-way ANOVA F-test, the numerator df = k − 1 (groups minus 1) and denominator df = N − k (total observations minus groups).

Question 5

What is p-hacking and why is it harmful?

Accepted Answer

P-hacking is the practice of running multiple tests, subgroups, or model specifications until a p < 0.05 result appears, then reporting only that result. It inflates the true Type I error rate far above α and produces false positives that fail to replicate. To avoid it, pre-register your analysis plan, correct for multiple comparisons (e.g., Bonferroni correction), and report all tests performed.

Question 6

Can a very small p-value mean the result is unimportant?

Accepted Answer

Yes. With a large enough sample, even a trivially small effect (say, a drug that lowers blood pressure by 0.1 mmHg) will produce p < 0.001. Statistical significance and practical significance are not the same. Always compute and report an effect size measure (Cohen's d, odds ratio, R², etc.) alongside the p-value so readers can judge whether the effect is large enough to matter in practice.

Test Configuration	P-Value	Verdict at α = 0.05
Z-test, two-tailed, Z = 2.5, α = 0.05	p = 0.0124	p < 0.05 → significant. The probability of \|Z\| ≥ 2.5 under H₀ is about 1.24%.
T-test, right-tailed, t = 2.1, df = 15, α = 0.05	p = 0.0267	p < 0.05 → significant. A one-tailed t-test with 15 df at t = 2.1 yields p ≈ 0.027.
Chi-square, right-tailed, χ² = 18.3, df = 10, α = 0.01	p = 0.0499	p > 0.01 → not significant at α = 0.01. The same result is significant at α = 0.05.
F-test, right-tailed, F = 3.8, df1 = 2, df2 = 27, α = 0.05	p = 0.0347	p < 0.05 → significant. An ANOVA F-ratio of 3.8 with 2 and 27 degrees of freedom.

P-Value Calculator - Z, T, F & Chi-Square Tests

About the P-Value Calculator

Worked Examples

How to Use the P-Value Calculator

Frequently Asked Questions