D. Fitting a Straight Line
1. Equation
The equation of a straight , Figure I-7,line is:
y = mx+b | Equation I-1 | |||
m is the line slope b is the y intercept |
Figure I-7 Straight Line Geometry |
Slope, m, is rise/run. It is also the tangent of the angle from the x-axis to the line; m = tan(θ).
2. Residuals
Empirical data doesn't exactly fit a straight line because:
- the objects measured may not have a linear relationship
- the measurements have random errors
A best-fit straight line is one which minimizes the sum of the squares of the residuals, Figure I-8.
Figure I-8 Profile Residuals |
The residual is the difference between the theoretical and empirical values at an independent variable, Equation I-2
vi = Yt - Ye | Equation I-2 | ||
Yt - theoretical value | |||
Ye - empirical value |
3. Least Squares
The best-fit line minimizes the sum of the residuals squared, Equation I-3
Σ(vi)2= minimum | Equation I-3 |
There are two least squares solution methods for a straight line: Linear regression and Observation equations. Each has its particular advantage(s) and disadvantage(s).
a. Linear Regression
Linear regression can be done manually or using a built-in function on most scientific calculators. It is a standard function in most spreadsheet software and included in their graphic options (aka, trendline).
The line slope, m, is determined using Equation I-4:
Equation I-4 |
and intercept, b, from Equation I-5:
Equation I-5 |
The correlation coefficient, r, indicates how well the data fits a straight line.
Equation I-6 | |
Equation I-7a | |
Equation I-7b |
The coefficient varies between -1 (negatively sloped line) and +1 (positively sloped line); the closer to -1 or +1, the better the data fits a straight line.
b. Observation Equations
An observation equation is written for each coordinate pair. Then one of two ways can be used to reach the solution:
1. Direct minimization: This is covered in Chapter C.
2. Matrix method: This is covered in Chapter D.
Direct minimization is the most arduous method requiring a considerable amount of computations to perform manually. Anything over four coordinate pairs increase computations substantially. The matrix method is more efficient and easier perform, even manually without benefit of software.
4. Application Example
Let's determine the best-fit line for the measured profile data of Table 2.
a. Linear regression
Organizing the data in an extended table simplifies computations, particularly since some of the terms are so large.
X is Station, Y is Elevation.
X, ft |
Y, ft |
X2 |
vx | vy | (vx)(vy) | vx2 |
vy2 |
|
2200 | 1250.2 | 4,840,000 | 2195 | 1225.20 | 2,689,319.49 | 4,818,025 | 1,501,121.2 | |
2300 | 1248.7 | 5,290,000 | 2295 | 1223.70 | 2,808,397.24 | 5,267,025 | 1,497,447.8 | |
2400 | 1245.5 | 5,760,000 | 2395 | 1220.50 | 2,923,103.49 | 5,736,025 | 1,489,626.4 | |
2500 | 1243.8 | 6,250,000 | 2495 | 1218.80 | 3,040,912.24 | 6,225,025 | 1,485,479.5 | |
sums |
9400 | 4988.2 |
22,140,000 | 9380 | 4888.20 | 11,461,732.46 | 22,046,100 | 5,973,674.9 |
Substituting terms in the linear regression equations:
The best-fit equation is: Elev = -0.0224(Station) + 1299.69
The correlation coefficient is -0.99 which is a pretty good fit.
Because Stations are used for x values, the slope, m, is the grade expressed as a ratio: grade = -0.0224 ft/ft = -2.24%
The simplicity of the table belies all the computations needed to construct it. The primary disadvantage of Liner Regression is the amount of calculations if done manually. It's quick, however, if using built-in calculator or spreadsheet functions.
b. Matrix Method
An observation equation is written for each data pair, using Equaton I-1, with a residual included on each dependent variable:
In matrix notation, the observation equations are [K] + [V] = [C] x [U].
The matrices are:
The matrix algorithm [U] = [Q] x [CTK] is solved to determine m and b.
Instead of the complete solution process step-by-step, intermediate products are shown:
Since the [CTC] matrix is only 2x2, it can be quickly inverted using the determinant method.
The matrix algorithm results are m = -0.0224 and b = 1299.69 just like Linear Regression's results.
Surprise.
While there is no direct equivalent to the correlation coefficient, the uncertainties for m and b can be determined.
Using the equations from Chapter D Section 3: