P-Value Calculator - Z, T, F & Chi-Square Tests
Find the p-value from any test statistic — Z, t, F, or Chi-square — with two-tailed, right-tailed, or left-tailed options for instant significance decisions.
Select your statistical test type and tail, enter the test statistic and degrees of freedom, and get the exact p-value and a significance verdict.
P-Value Calculator - Z, T, F & Chi-Square Tests
Find the p-value from any test statistic — Z, t, F, or Chi-square — with two-tailed, right-tailed, or left-tailed options for instant significance decisions.
About the P-Value Calculator
The p-value is the probability of obtaining a test statistic at least as extreme as the one actually observed, assuming the null hypothesis is true. It is the core output of almost every classical statistical test and serves as the primary criterion for deciding whether to reject the null hypothesis. A small p-value means the observed data are unlikely under the null hypothesis, which is evidence in favour of the alternative hypothesis.
The procedure begins with a null hypothesis H₀ (typically a statement of no effect, no difference, or no association) and an alternative hypothesis H₁. You collect data, compute a test statistic (Z, t, F, or χ²), and then use the probability distribution of that statistic under H₀ to find the p-value. If the p-value is less than or equal to your pre-specified significance level α (most commonly 0.05), you reject H₀ and declare the result statistically significant.
Different test statistics follow different probability distributions. The Z-statistic follows a standard normal distribution and is used when the population standard deviation is known or the sample is very large. The t-statistic follows a Student's t-distribution with a specific number of degrees of freedom (df = n − 1 for a one-sample test) and is used for small-to-moderate samples when the population standard deviation is unknown. The F-statistic follows an F-distribution with numerator and denominator degrees of freedom and is the basis for ANOVA and the F-test for equality of variances. The Chi-square statistic follows a Chi-square distribution with df degrees of freedom and is used for tests of independence in contingency tables and goodness-of-fit tests.
The tail type determines which region of the distribution is used to compute the p-value. A two-tailed test is appropriate when the alternative hypothesis is non-directional (H₁: μ ≠ μ₀) and the p-value sums the probability in both extremes. A right-tailed test applies when H₁ specifies a positive direction (H₁: μ > μ₀), and a left-tailed test when H₁ specifies a negative direction (H₁: μ < μ₀). For the F-test and Chi-square test, which are inherently one-sided in practice (the test statistic cannot be negative), the right-tailed p-value is the standard reported value.
A critical and common misconception is that the p-value is the probability that H₀ is true. It is not. The p-value is a conditional probability: P(data this extreme | H₀ is true). It says nothing about the probability that H₀ or H₁ is true; for that you need Bayesian inference with prior probabilities. Another misconception is that p < 0.05 means the effect is large or practically important. Statistical significance depends on sample size — with a large enough sample, even a trivially small and meaningless effect will yield p < 0.05. Always report effect sizes alongside p-values.
The significance level α should be decided before looking at the data and should reflect the tolerable risk of a false positive (Type I error). Different fields use different conventions: α = 0.05 is standard in most biomedical and social science research, α = 0.01 is common when false positives are costly, and α = 5 × 10⁻⁸ is used in genome-wide association studies to account for the large number of tests performed simultaneously. This calculator supports α values of 0.01, 0.05, and 0.10.
Worked Examples
Four examples covering each supported test type, showing the input, p-value, and significance verdict.
| Test Configuration | P-Value | Verdict at α = 0.05 |
|---|---|---|
| Z-test, two-tailed, Z = 2.5, α = 0.05 | p = 0.0124 | p < 0.05 → significant. The probability of |Z| ≥ 2.5 under H₀ is about 1.24%. |
| T-test, right-tailed, t = 2.1, df = 15, α = 0.05 | p = 0.0267 | p < 0.05 → significant. A one-tailed t-test with 15 df at t = 2.1 yields p ≈ 0.027. |
| Chi-square, right-tailed, χ² = 18.3, df = 10, α = 0.01 | p = 0.0499 | p > 0.01 → not significant at α = 0.01. The same result is significant at α = 0.05. |
| F-test, right-tailed, F = 3.8, df1 = 2, df2 = 27, α = 0.05 | p = 0.0347 | p < 0.05 → significant. An ANOVA F-ratio of 3.8 with 2 and 27 degrees of freedom. |
How to Use the P-Value Calculator
- Select the statistical test type (Z-Test, T-Test, F-Test, or Chi-Square) that matches how your test statistic was computed.
- Choose the tail type: Two-Tailed for H₁: ≠, Right-Tailed for H₁: >, or Left-Tailed for H₁: <.
- Enter your test statistic in the 'Test Statistic' field. For the T-Test, F-Test, and Chi-Square test, also enter the degrees of freedom (two values for F-test).
- Set the significance level α. Click Calculate to get the p-value and the significance verdict.
- If p ≤ α, reject H₀ and report the result as statistically significant. If p > α, fail to reject H₀. Always supplement with an effect size.
Frequently Asked Questions
What does the p-value actually measure?
The p-value measures the probability of observing a test statistic as extreme as (or more extreme than) the one you calculated, assuming the null hypothesis is true. It quantifies how surprising your data are under H₀. It does not measure the probability that H₀ is true, the size of the effect, or the probability that you made an error.
Why is α = 0.05 the conventional threshold?
The 0.05 threshold was popularised by Ronald Fisher in the 1920s as a convenient convention, not a universal truth. It means you accept a 5% chance of a false positive (rejecting a true H₀). Different fields use different thresholds: particle physics requires p < 5×10⁻⁷, genomics typically uses p < 5×10⁻⁸, and clinical trials sometimes use α = 0.01. The right threshold depends on the costs of false positives and false negatives in your domain.
What is the difference between a one-tailed and two-tailed test?
A two-tailed test checks for a difference in either direction and divides α equally between both tails. A one-tailed test focuses the full α on one direction and provides more power to detect an effect in that direction, but it is only valid when the direction of the effect is specified before seeing the data. Using a one-tailed test to rescue a borderline two-tailed result is p-hacking.
How are degrees of freedom determined?
Degrees of freedom (df) reflect the number of independent pieces of information in the data. For a one-sample t-test, df = n − 1. For an independent-samples t-test, df = n₁ + n₂ − 2. For a Chi-square test of independence in an r × c table, df = (r − 1)(c − 1). For a one-way ANOVA F-test, the numerator df = k − 1 (groups minus 1) and denominator df = N − k (total observations minus groups).
What is p-hacking and why is it harmful?
P-hacking is the practice of running multiple tests, subgroups, or model specifications until a p < 0.05 result appears, then reporting only that result. It inflates the true Type I error rate far above α and produces false positives that fail to replicate. To avoid it, pre-register your analysis plan, correct for multiple comparisons (e.g., Bonferroni correction), and report all tests performed.
Can a very small p-value mean the result is unimportant?
Yes. With a large enough sample, even a trivially small effect (say, a drug that lowers blood pressure by 0.1 mmHg) will produce p < 0.001. Statistical significance and practical significance are not the same. Always compute and report an effect size measure (Cohen's d, odds ratio, R², etc.) alongside the p-value so readers can judge whether the effect is large enough to matter in practice.