Math · Statistics · Live

P-value calculator —
Z, t, Chi-Square & F tests.

Calculate the exact p-value from any Z, t, chi-square, or F test statistic. Choose one-tailed or two-tailed for Z and t tests, see significance at every standard α level, and compare against critical values, all in real time.

How it worksReal-time

Inputs

Hypothesis test

Test type

Z-statistic

Can be negative

Tail typeP(|T| ≥ |t|)

p-value

Z-test

p = 0.0198

Significant (p < 0.05)

α = 0.001

Not sig.

α = 0.01

Not sig.

α = 0.05

Significant

α = 0.1

Significant

p = 0.0198. For a two-tailed Z-test , the result is statistically significant at α = 0.05. Reject the null hypothesis.

Critical values

Your statistic vs standard thresholds

Significance level	Critical value	Your stat	Reject H₀?
α = 0.1	±1.6449	2.33	Yes ✓
α = 0.05	±1.9600	2.33	Yes ✓
α = 0.01	±2.5758	2.33	No
α = 0.001	±3.2905	2.33	No

Critical values for two-tailed test. Z uses standard normal quantiles.

Interpretation guide

p = 0.0198 — what does that mean?

p = 0.0198 means: if the null hypothesis (H₀) were true, the probability of observing a test statistic as extreme as |2.33| or higher by chance alone is 1.98%.

Common misconception: The p-value is NOT the probability that H₀ is true. It is the probability of the data (or more extreme data) given H₀ is true. A low p-value means the data are unlikely under H₀ — not that H₀ is false.

Field guide

What the p-value is and what people get wrong about it.

What is a p-value?

A p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one actually observed, assuming the null hypothesis (H₀) is true. It measures how surprising the data are under the null hypothesis — not how likely the null hypothesis is.

Formally: p = P(observing data this extreme | H₀ is true). A small p-value (conventionally p < 0.05) indicates that the observed data would be unlikely if H₀ were true, so we reject H₀. A large p-value means the data are not particularly surprising under H₀, so we fail to reject it.

The most common misconceptions

The p-value is one of the most frequently misinterpreted statistics in all of science. Common wrong interpretations include:

"p = 0.03 means there is a 3% probability that H₀ is true."Wrong. The p-value says nothing about the probability of H₀. H₀ is either true or false; it doesn't have a probability. The p-value is a conditional probability: P(data | H₀), not P(H₀ | data).
"p = 0.03 means there is a 97% probability that H₁ is true."Also wrong, for the same reason. Determining P(H₁ | data) requires prior probabilities and uses Bayesian inference, which is different.
"p > 0.05 means H₀ is true."Failing to reject H₀ is not the same as accepting it. The data may simply be insufficient to detect a real effect.

One-tailed vs two-tailed tests

The choice of tail affects the p-value and should be made beforelooking at the data, based on the research question:

Two-tailed: H₁ is that the parameter differs from H₀ in either direction. p = P(|T| ≥ |t_obs|). Use this when you have no directional hypothesis. It is the more conservative and more common default.
Right-tailed: H₁ is that the parameter is greater than H₀. p = P(T ≥ t_obs).
Left-tailed: H₁ is that the parameter is less than H₀. p = P(T ≤ t_obs).

For a given test statistic, the one-tailed p-value is exactly half the two-tailed p-value (when the statistic is in the hypothesised direction). Switching from two-tailed to one-tailed to cross the 0.05 threshold after seeing the data is a form of p-hacking.

The four test families

Z-test

Used when the population standard deviation σ is known, or when the sample is large enough that the sample standard deviation is a reliable estimate (typically n ≥ 30). The test statistic follows a standard normal distribution N(0, 1) under H₀. Common applications: one-sample and two-sample proportion tests, large-sample means.

t-test

Used when σ is unknown and estimated from the sample, especially for small samples (n < 30). The test statistic follows a Student's t-distribution with (n − 1) degrees of freedom for a one-sample test, (n₁ + n₂ − 2) for an independent two-sample test, or (n − 1) for a paired t-test. The t-distribution has heavier tails than the normal, producing higher p-values for the same test statistic — a conservative correction for the extra uncertainty.

Chi-Square (χ²) test

Used for categorical data. Common applications include goodness-of-fit tests (does observed frequency match a theoretical distribution?), tests of independence (are two categorical variables independent?), and tests of homogeneity. The χ² statistic is always non-negative; the p-value is always the upper-tail probability. Degrees of freedom depend on the specific test: for a goodness-of-fit test with k categories, df = k − 1; for a contingency table with r rows and c columns, df = (r−1)(c−1).

F-test

Used to compare variances or model fit. ANOVA uses the F-test to compare group means by taking the ratio of between-group variance to within-group variance. Regression uses it to test overall model significance. The F statistic is always non-negative; the p-value is the upper-tail probability. Two degrees-of-freedom parameters are required: df₁ (numerator) and df₂ (denominator), which in ANOVA correspond to (k−1) and (N−k) respectively.

The 0.05 threshold — why it is and isn't magic

The α = 0.05 threshold was popularised by R.A. Fisher in the 1920s as a convenient rule of thumb, not a fundamental truth. Over time it has become a de facto publishing gate in many fields, causing significant problems:

Studies with p = 0.049 and p = 0.051 are treated radically differently despite being statistically indistinguishable.
Publication bias toward p < 0.05 results inflates the literature with false positives.
Effect size and confidence intervals are often more informative than the binary significant/not-significant classification.

Modern statistical practice increasingly recommends reporting exact p-values, effect sizes, and confidence intervals rather than binary significance labels.

Quick reference

Common Z-test critical values

α	One-tail	Two-tail
0.10	1.282	±1.645
0.05	1.645	±1.960
0.025	1.960	±2.241
0.01	2.326	±2.576
0.005	2.576	±2.807
0.001	3.090	±3.291

FAQ

Frequently asked

When should I use a t-test instead of a Z-test?+

Use a t-test whenever the population standard deviation σ is unknown and estimated from the sample, which is the case in virtually all real-world experiments. Only use a Z-test when you genuinely know σ from prior knowledge, or when the sample is very large (n > 200) and the difference between t and Z critical values is negligible.

What degrees of freedom should I use for a two-sample t-test?+

For a pooled two-sample t-test (equal variances assumed): df = n₁ + n₂ − 2. For Welch's t-test (unequal variances): use the Welch–Satterthwaite equation, which gives a non-integer df. When in doubt, Welch's test is more robust and preferred by most statisticians.

Why is chi-square always one-tailed?+

The χ² statistic measures deviation from expected values. Large values indicate poor fit regardless of direction, both too many and too few observations in a cell increase χ². Since the test question is always 'is the deviation large?', only the upper tail is relevant. A small χ² means good fit; a large χ² means poor fit.

Related tools