Scatter Plot Calculator - Correlation & Linear Regression
Calculate the correlation coefficient (r), R², and line of best fit for any two sets of data points — instant scatter plot statistics.
Enter comma-separated X values and Y values to compute linear regression, the Pearson correlation coefficient, and key descriptive statistics.
Scatter Plot Calculator - Correlation & Linear Regression
Calculate the correlation coefficient (r), R², and line of best fit for any two sets of data points — instant scatter plot statistics.
About the scatter plot calculator
A scatter plot is a type of data visualisation that displays two numerical variables as points on a Cartesian plane. Each point represents one observation: its horizontal position corresponds to the X value and its vertical position to the Y value. By examining the pattern of points, you can judge whether a relationship exists between the two variables, how strong it is, and whether it is linear or non-linear.
This scatter plot calculator computes three groups of statistics. The first group is descriptive: the number of data points n, the mean of X (x̄), and the mean of Y (ȳ). The second group is the linear regression line — the straight line that minimises the sum of squared vertical distances from each point to the line. It is described by the equation y = mx + b, where m is the slope and b is the y-intercept. The slope m is calculated as Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²], and the intercept b = ȳ − m·x̄.
The third group is correlation statistics. The Pearson correlation coefficient r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²] measures the strength and direction of the linear relationship between X and Y. It ranges from −1 to +1. A value near +1 indicates a strong positive relationship (as X increases, Y increases), near −1 indicates a strong negative relationship, and near 0 indicates little or no linear relationship. R² (the coefficient of determination) equals r² and represents the proportion of variance in Y that is explained by the linear regression on X. An R² of 0.90, for example, means 90% of the variability in Y is accounted for by the linear relationship with X.
Common uses include economics (relating price to demand), biology (studying the relationship between height and weight), education (correlating study time with test scores), engineering (predicting output from input variables), and business analytics (relating advertising spend to sales revenue).
When interpreting results, remember that correlation does not imply causation. A high r only means the two variables move together linearly; it says nothing about whether one causes the other. Also, linear regression assumes that the relationship is actually linear. If the scatter plot suggests a curve, a linear model will be a poor fit no matter how many points you supply. Always check the residuals or plot the data alongside the line to validate the model.
Scatter plot calculator examples
Three representative data sets with computed correlation coefficients and regression lines.
| X values, Y values | Key Results | Interpretation |
|---|---|---|
| X: 1,2,3,4,5 — Y: 2,4,5,4,5 | m≈0.7, b≈2.0, r≈0.8165, R²≈0.6667 | Moderate positive linear relationship. 67% of Y variance explained by X. |
| X: 1,2,3,4,5 — Y: 5,4,3,2,1 | m=−1, b=6, r=−1, R²=1 | Perfect negative linear relationship. Every 1-unit increase in X decreases Y by exactly 1. |
| X: 2,4,6,8,10 — Y: 3,7,8,13,15 | m≈1.5, b≈−0.2, r≈0.9918, R²≈0.9837 | Very strong positive relationship. The line y = 1.5x − 0.2 explains 98.4% of the variation in Y. |
How to use the scatter plot calculator
- Enter your X-axis data as comma-separated numbers in the 'X-Axis Values' field — for example: 1, 2, 3, 4, 5.
- Enter the corresponding Y-axis data in the 'Y-Axis Values' field. The number of values must match the X field.
- Click Calculate. The tool computes the regression slope m, intercept b, correlation coefficient r, and R².
- Read the regression equation y = mx + b to predict Y for any new X value.
- Interpret r: values close to ±1 indicate strong linear relationships; values near 0 suggest weak or no linear correlation.
Scatter plot calculator FAQ
What is the Pearson correlation coefficient r?
The Pearson correlation coefficient r measures the strength and direction of the linear relationship between two variables. It ranges from −1 (perfect negative linear correlation) to +1 (perfect positive linear correlation). A value of 0 means no linear relationship exists, though a non-linear relationship could still be present.
What is R² and how do I interpret it?
R² (the coefficient of determination) equals r² and tells you what proportion of the variance in Y is explained by the linear regression on X. An R² of 0.85 means 85% of the spread in Y values is accounted for by the linear model. The remaining 15% is attributed to other factors or random variation.
What does the slope of the regression line mean?
The slope m in y = mx + b represents the average change in Y for each one-unit increase in X. A slope of 2 means Y increases by 2 units on average for every 1-unit increase in X. A negative slope means Y decreases as X increases.
Does correlation imply causation?
No. A high correlation coefficient tells you that two variables move together linearly, but it does not tell you why. One could cause the other, both could be driven by a third variable (confounding), or the correlation could be coincidental. Establishing causation requires controlled experiments or causal inference methods.
How many data points do I need for linear regression?
You need at least 2 points to fit a line, but that always gives r = ±1 by definition and provides no useful information about the real relationship. In practice, at least 10–20 points are needed for a meaningful regression, and the more data you have, the more reliable your estimates of m, b, and r.
What if my correlation coefficient is near zero?
A value near zero means there is little or no linear relationship between X and Y. However, this does not mean the variables are unrelated — they could have a strong non-linear relationship (e.g., quadratic or sinusoidal). Consider plotting your data to check for non-linear patterns before concluding the variables are independent.