Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable).
Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple linear regression, also known as multivariable linear regression.
Multiple regression also allows you to determine the overall fit of the model and the relative contribution of each of the predictors to the total variance explained. For example, you might want to know how much of the variation in exam performance can be explained by revision time, test anxiety, lecture attendance and gender “as a whole”, but also the “relative contribution” of each independent variable in explaining the variance.
At the center of the multiple linear regression analysis is the task of fitting a single line through a scatter plot. More specifically the multiple linear regression fits a line through a multi-dimensional cloud of data points. The simplest form has one dependent and two independent variables, the general form of the multiple linear regression is defined as
for i = 1…n .
Sometimes the dependent variable is also called endogenous variable or prognostic variable. The independent variables are also called exogenous variables, predictor variables or regressors.
There are 3 major uses for Multiple Linear Regression Analysis – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting. Other than correlation analysis, which focuses on the strength of the relationship between two or more variables, regression analysis assumes a dependence or causal relationship between one or more independent and one dependent variable.
Firstly, it might be used to identify the strength of the effect that the independent variables have on a dependent variable. Typical questions are what is the strength of relationship between dose and effect, sales and marketing spend, age and income.
Secondly, it can be used to forecast effects or impacts of changes. That is multiple linear regression analysis helps us to understand how much will the dependent variable change, when we change the independent variables. Typical questions are how much additional Y do I get for one additional unit X.
Thirdly, multiple linear regression analysis predicts trends and future values. The multiple linear regression analysis can be used to get point estimates. Typical questions are what will the price for gold be in 6 month from now? What is the total effort for a task X?
When selecting the model for the multiple linear regression analysis another important consideration is the model fit. Adding independent variables to a multiple linear regression model will always increase its statistical validity, because it will always explain a bit more variance (typically expressed as R²).