Difference between revisions of "MATLAB:Fitting"
m |
m |
||
Line 1: | Line 1: | ||
− | This document contains examples of polynomial fitting, general linear regression, and nonlinear regression. In each section, there will be example code that may come in useful for later courses. The example code is based on the existence of a file in the same directory called <code>Cantilever.dat</code> that contains two columns of data - the first is an amount of mass (in kg) placed at the end of a beam and the second is a displacement, measured in inches, at the end of the beam. For EGR 53, this file is: | + | This document contains examples of polynomial fitting, general linear regression, and nonlinear regression. In each section, there will be example code that may come in useful for later courses. The example code is based on the existence of a file in the same directory called <code>Cantilever.dat</code> that contains two columns of data - the first is an amount of mass (in kg) placed at the end of a beam and the second is a displacement, measured in inches, at the end of the beam. For EGR 53, this file is: |
<source lang="matlab"> | <source lang="matlab"> | ||
0.000000 0.005211 | 0.000000 0.005211 |
Revision as of 03:44, 26 October 2009
This document contains examples of polynomial fitting, general linear regression, and nonlinear regression. In each section, there will be example code that may come in useful for later courses. The example code is based on the existence of a file in the same directory called Cantilever.dat
that contains two columns of data - the first is an amount of mass (in kg) placed at the end of a beam and the second is a displacement, measured in inches, at the end of the beam. For EGR 53, this file is:
0.000000 0.005211
0.113510002 0.158707
0.227279999 0.31399
0.340790009 0.474619
0.455809998 0.636769
0.569320007 0.77989
0.683630005 0.936634
0.797140015 0.999986
Contents
Polynomial Fitting
Polynomial fits are those where the dependent data is related to some set of integer powers of the independent variable. MATLAB's built-in polyfit
command can determine the coefficients of a polynomial fit.
Example Code
In the example code below, N
determines the order of the fit. Not much else would ever need to change.
%% Initialize the workspace
clear; format short e
figure(1); clf
%% Load and manipulate the data
load Cantilever.dat
Force = Cantilever(:,1) * 9.81;
Displ = Cantilever(:,2) * 2.54 / 100;
%% Rename and create model data
x = Force;
y = Displ;
xmodel = linspace(min(x), max(x), 100);
%% Determine the polynomial coefficients
N = 1;
P = polyfit(x, y, N)
%% Generate estimates and model
yhat = polyval(P, x);
ymodel = polyval(P, xmodel);
%% Calculate statistics
% Compute sum of the squares of the data residuals
St = sum(( y - mean(y) ).^2)
% Compute sum of the squares of the estimate residuals
Sr = sum(( y - yhat ).^2)
% Compute the coefficient of determination
r2 = (St - Sr) / St
%% Generate plots
plot(x, y, 'k*',...
x, yhat, 'ko',...
xmodel, ymodel, 'k-');
xlabel('Independent Value')
ylabel('Dependent Value')
title('Dependent vs. Independent and Model')
legend('Data', 'Estimates', 'Model', 0)
General Linear Regression
General linear regression involves finding some set of coefficients for fits that can be written as:
where the \(a_j\) are the coefficients of the fit and the \(\phi_j\) are the specific functions of the independent variable that make up the fit.
Example Code
In the example code below, there are several examples of general linear fits of one variable.
%% Initialize the workspace
clear; format short e
figure(1); clf
%% Load and manipulate the data
load Cantilever.dat
Force = Cantilever(:,1) * 9.81;
Displ = Cantilever(:,2) * 2.54 / 100;
%% Rename and create model data
x = Force;
y = Displ;
xmodel = linspace(min(x), max(x), 100);
%% Define model equation and A matrix
model = 'linear'
switch model
case 'linear'
yeqn = @(coefs, x) coefs(1)*x.^1 + coefs(2)*x.^0;
A = [x.^1 x.^0];
case 'quadratic'
yeqn = @(coefs, x) coefs(1)*x.^2 + coefs(2)*x.^1 + coefs(3)*x.^0;
A = [x.^2 x.^1 x.^0];
case 'line through origin'
yeqn = @(coefs, x) coefs(1)*x.^1;
A = [x.^1];
case 'trig'
yeqn = @(coefs, x) coefs(1)*cos(x) + coefs(2)*sin(x);
A = [cos(x) sin(x)];
case 'trig with offset'
yeqn = @(coefs, x) coefs(1)*cos(x) + coefs(2)*sin(x) + coefs(3)*x.^0;
A = [cos(x) sin(x) x.^0];
otherwise
error('Don''t know the model...')
end
%% Determine the function coefficients
MyCoefs = A\y
%% Generate estimates and model
yhat = yeqn(MyCoefs, x);
ymodel = yeqn(MyCoefs, xmodel);
%% Calculate statistics
% Compute sum of the squares of the data residuals
St = sum(( y - mean(y) ).^2)
% Compute sum of the squares of the estimate residuals
Sr = sum(( y - yhat ).^2)
% Compute the coefficient of determination
r2 = (St - Sr) / St
%% Generate plots
plot(x, y, 'k*',...
x, yhat, 'ko',...
xmodel, ymodel, 'k-');
xlabel('Independent Value')
ylabel('Dependent Value')
title('Dependent vs. Independent and Model')
legend('Data', 'Estimates', 'Model', 0)
Linearized Models
There are three primary nonlinear models which, through a transformation of variables, may yield a linear relationship between the transformed variables: the exponential model, the power-law model, and the saturation-growth model. In the subsections before, these are addressed individually by showing the modelling equation, describing a transformation mechanism, and showing the transformed equation in the form of
where \(\xi\) is the transformed version of the independent variable \(x\) and \(\eta\) is the transformed version of the dependent variable \(y\). In each of the three cases below, then, the polyfit
command can be used, with the transformed variables, to find the coefficients of the straight-line fit.
Exponential Model
Exponential models come from relationships where the derivative of the dependent variable is proportional to the value of the dependent variable itself. That is:
Separating the variables and integrating both sides yields:
Exponentiating both sides gives:
Simplifying the constants yields
where \(a\) is some constant and \(k\) is called the growth rate. Transform by taking the logarithm of each side. Using the natural logarithm is most useful:
meaning the transformed measurement \(\ln\left(y\right)\) can be fit to a straight line function of \(x\). That is, using the transformations:
The relationship between the slope and intercept for the transformed variables and the constants in the model equation are:
Power Law Model
Power Law models come from situations where the derivative of the dependent variable is proportional to the ratio of the dependent variable to the independent variable. That is:
where here \(k\) is the constant of proportionality. Separating the variables and integrating both sides yields:
Exponentiating both sides gives:
Where the constants can be simplified to reveal a relationship where the dependent variable is proportional to some power of the independent variable. That is:
where \(a\) is some constant and \(k\) is called the scaling exponent. Transform by taking the logarithm of each side. Unlike the transformation of the exponential model, there is generally no mathematical advantage to using one logarithm over another. There is, however, a graphical advantage if the base-10 logarithm is used since MATLAB's semilogy, semilogx, and loglog plots use the base-10 logarithm.
meaning the transformed measurement \(\log_{10}\left(y\right)\) can be fit to a straight line function of \(\log_{10}\left(x\right)\). That is, using the transformations:
The relationship between the slope and intercept for the transformed variables and the constants in the model equation are:
Saturation-Growth Model
Saturation-Growth models come from situations where the derivative of the dependent variable is proportional to the square of the ratio of the dependent variable to the independent variable. That is:
where \(k\) is again the constant of proportionality. Separating the variables and integrating both sides yields:
By finding a common denominator for the right side of this equation and the inverting both sides, you can obtain:
To get this into the form preferred by Chapra[1] requires a bit of work. First, divide all the terms by \(x_0-ky_0\):
Then simply rename the constants to
to yield the version in the Chapra book on p. 301:
This form is more convenient for several reasons. First, in the limit as the independent variable goes to infinity, the dependent variable approaches \(\alpha_3\). Second, when the independent variable is equal to \(\beta_3\), the dependent variable will be half that value. This means that \(\alpha_3\) is the limiting value of \(y\) and \(\beta_3\) represents a measure of the rate at which it gets there.
With respect to the transformation - several lines ago there was a linear relationship - specifically between the inverses of the dependent and independent variables. That gives a hint as to how to transform the Chapra version of the equation - just flip it:
meaning the transformed measurement \(1/y\) can be fit to a straight line function of \(1/x\). That is, using the transformations:
The relationship between the slope and intercept for the transformed variables and the constants in the model equation are:
Example Code
Note in the example code below that there is more work done in the switch statement than in the above examples. This is primarily because the type of linearization will determine not only the model equation but also the transformation equations into and out of the linearized regime.
%% Initialize the workspace
clear; format short e
figure(1); clf
%% Load and manipulate the data
load Cantilever.dat
% Remove first point since x=0 there...
Cantilever = Cantilever(2:end, :);
Force = Cantilever(:,1) * 9.81;
Displ = Cantilever(:,2) * 2.54 / 100;
%% Rename and create model data
x = Force;
y = Displ;
xmodel = linspace(min(x), max(x), 100);
%% Define the model equation; transform the variables; find the linearized fit; transform back
model = 'exponential'
switch model
case 'exponential'
yeqn = @(coefs, x) coefs(1).*exp(coefs(2).*x);
xi = x;
eta = log(y);
P = polyfit(xi, eta, 1);
MyCoefs(1) = exp(P(2));
MyCoefs(2) = P(1)
case 'power law'
yeqn = @(coefs, x) coefs(1).*x.^coefs(2);
xi = log10(x);
eta = log10(y);
P = polyfit(xi, eta, 1);
MyCoefs(1) = 10^(P(2));
MyCoefs(2) = P(1)
case 'sat growth'
yeqn = @(coefs, x) coefs(1).*x./(coefs(2)+x);
xi = 1./x;
eta = 1./y;
P = polyfit(xi, eta, 1);
MyCoefs(1) = 1/P(2);
MyCoefs(2) = P(1)/P(2)
otherwise
error('Unknown linearization')
end
%% Generate estimates and model
yhat = yeqn(MyCoefs, x);
ymodel = yeqn(MyCoefs, xmodel);
%% Calculate statistics
% Compute sum of the squares of the data residuals
St = sum(( y - mean(y) ).^2)
% Compute sum of the squares of the estimate residuals
Sr = sum(( y - yhat ).^2)
% Compute the coefficient of determination
r2 = (St - Sr) / St
%% Generate plots
plot(x, y, 'k*',...
x, yhat, 'ko',...
xmodel, ymodel, 'k-');
xlabel('Independent Value')
ylabel('Dependent Value')
title('Dependent vs. Independent and Model')
legend('Data', 'Estimates', 'Model', 0)
Nonlinear Regression
Nonlinear regression is both more powerful and more sensitive than linear regression. For inherently nonlinear fits, it will also produce a better \(S_r\) value than linearization since the nonlinear regression process is minimizing the \(S_r\) of the actual data rather than that of the transformed values. The sensitivity comes into play as the fminsearch
command principally finds local minima versus global minima. A good starting guess will work wonders.
Example Code
In the code below, note that the use of the variable fSSR
was taken from Section 14.5 of the Chapra book[1].
%% Initialize the workspace
clear; format short e
figure(1); clf
%% Load and manipulate the data
load Cantilever.dat
Force = Cantilever(:,1) * 9.81;
Displ = Cantilever(:,2) * 2.54 / 100;
%% Rename and create model data
x = Force;
y = Displ;
xmodel = linspace(min(x), max(x), 100);
%% Define model equation and initial guesses
model = 'linear'
switch model
case 'linear'
yeqn = @(coefs, x) coefs(1)*x.^1 + coefs(2)*x.^0;
InitGuess = [0 0]
case 'trig with offset'
yeqn = @(coefs, x) coefs(1)*cos(x*coefs(3)) + coefs(2)*sin(x*coefs(3)) + coefs(4);
InitGuess = [0 0 100 0]
otherwise
error('Don''t know the model...')
end
%% Determine the function coefficients
fSSR = @(coefs, x, y) sum(( y - yeqn(coefs, x) ).^2)
[MyCoefs, Sr] = fminsearch(@(MyCoefsDummy) fSSR(MyCoefsDummy, x, y), InitGuess)
%% Generate estimates and model
yhat = yeqn(MyCoefs, x);
ymodel = yeqn(MyCoefs, xmodel);
%% Calculate statistics
% Compute sum of the squares of the data residuals
St = sum(( y - mean(y) ).^2)
% Compute sum of the squares of the estimate residuals
Sr = sum(( y - yhat ).^2)
% Compute the coefficient of determination
r2 = (St - Sr) / St
%% Generate plots
plot(x, y, 'k*',...
x, yhat, 'ko',...
xmodel, ymodel, 'k-');
xlabel('Independent Value')
ylabel('Dependent Value')
title('Dependent vs. Independent and Model')
legend('Data', 'Estimates', 'Model', 0)
Questions
Post your questions by editing the discussion page of this article. Edit the page, then scroll to the bottom and add a question by putting in the characters *{{Q}}, followed by your question and finally your signature (with four tildes, i.e. ~~~~). Using the {{Q}} will automatically put the page in the category of pages with questions - other editors hoping to help out can then go to that category page to see where the questions are. See the page for Template:Q for details and examples.
External Links
References
- ↑ a b Applied Numerical Methods with MATLAB for Engineers and Scientists, 2/e, Steven C. Chapra