More Complicated Models of Experimental Data
There are obvious problems with the PL and PC models that we have been
using. For example, many processes occur smoothly, so it may be
important to choose a model which has no breaks or kinks. For other
processes, we may want to choose a model which is periodic (that is,
it repeats itself) or which has a certain number of derivatives. In
this section we introduce a few new models that may be used to
approximate functions underlying experimental data. In each case, we
can approximate the integral of the (unknown) function underlying the
data by exactly integrating the (known) model of our choice. It is
often the case that the choice of "the best" model to use is not made
on mathematical grounds, but is made by knowing something about
physics, biology, economics, or some other disciplne.
Modeling Data Sets
Given a data set, each of the functions below is often used to model
the (unknown) function underlying the data. The interactive
document on the next page allows you to select a model for the pollution-rate
data presented in Figure 2.
A little notation will be useful. Suppose that you have n pieces of data.
The data was recorded at times
t0, t1, t2, ..., tn and the corresponding measurements
were P0, P1, P2, ..., Pn.
Piecewise Constant: Left Hand Rule
For any instant in time between t0 and t1, the value
of the (left hand) piecewise constant model is P0. For any
instant between t1 and t2, the value of the model is
P1, and so on.
We need to make an assumption about how to define the model for time
less than t0 or time greater than t1. We will make
the simplest assumption: the model is always zero for times prior to
t0 or for times greater than tn.
Piecewise Constant: Right Hand Rule
This model is similar to the previous model, except that for any
instant in time between t0 and t1, the value of the
(right hand) piecewise constant model is P1. For any instant
between t1 and t2, the value of the model is
P2, and so on.
The extension of the model outside of the range of data is the same as
above.
Figure 4: The graph of two piecewise constant models
for the "pollution-rate function" over the interval [0,10.5].
Question 5
Above, you were given the definition of two piecewise constant models.
Now it is your turn to construct a
definition of a piecewise linear model. This model
is linear between two data points.
Construct a formula, valid for any set of data points,
that explicitly gives the value
of the model when the input is between t0 and t1,
between t1 and t2, and so on.
Extend the model outside of the range of data in the same way as for
the previous models.
Test your model on the "EPA
data" shown in Figure 2. For this data, the graph of your
piecewise linear model looks like the Figure below.
Figure 4: The graph of a piecewise linear model
for the "pollution-rate function": time versus rate of soot
production.
Cubic Spline
This model is used by engineers and architects in order to fit a
smooth curve to a set of data points. The model is a cubic polynomial
on each interval between data points. (The model is not,
however, a cubic polynomial over its entire domain!) The cubic
polynomials are chosen in such a way as to make the derivative of the
model be continuous over the entire domain.
The extension of the model outside of the range of data is the same as
the other models, but the fact that we are fitting a cubic polynomial
to the data set gives us additional freedom. In this lab, we have
chosen the cubic polynomial so that the slope of the model at
t0 is the slope of the line segment from (t0,P0) to
(t1,P1). Similarly, the slope of the model at tn is
set to be the slope of the line segment from
(t_(n-1),P_(n-1)) to (tn,Pn).
We will see examples of these models in later portions of the lab.
The graph of a cubic spline
model.
Trigonometric Polynomial of Best Fit
You may know that it is possible to write down the equation of a line
that best fits a set of data points. Similarly, it is possible to
write down the equation of a quadratic function, a cubic function, or
any other polynomial of a fixed degree that "best fits" the given
data. The exact way to do this is often presented in a course in
statistics or linear algebra; we will not concern ourselves with the
details, but typically "best fit" means looking at the difference
between the model and data points, and then minimizing the sum of the
squares of those differences.
If you suspect that the data you are gathering is periodic over some
interval of time, then it may make sense to choose your model to be
periodic as well. In analogy to the "polynomials of best fit," it is
possible to write down a model that consists of a sum of sine and cosine
functions that best fit the given data. It is necessary, however, to
decide ahead of time how many sines and cosines you want to use in
your approximation, just as it is necessary to decide on the degree of
the polynomial model that you are fitting to the data.
The models that consist of trigonometric functions are called
Fourier polynomials. These models are widely used in
engineering, physics, and other sciences to approximate processes that
are periodic.
As an example, suppose that the EPA worker knows that the factory that
he is testing runs two twelve-hour shifts. The worker suspects that
the rate of soot production may be periodic over a twelve hour period.
The simplest Fourier polynomial of "best fit" is then
2.5 - 0.775 cos(w t) + 1.342 sin(w t)
where w= 2 Pi/12 = Pi/6.
More complicated models (higher "degrees") could include trigonometric
functions like sin(2 w t), cos(2 w t), and, in general,
sin(k w t) and cos(k w t) for any integer value of
k. For example, the best-fit Fourier polynomial of degree-two is
2.5 - 0.732 cos(w t) + 1.268 sin(w t) + 0.232 cos(2 w t) - 0.134 sin(2 w t).
Note that Fourier polynomial we produced is periodic over the
time interval [0,12]. The comparison of this function with the
experimental data is shown below.
Figure 5: The graph of a Fourier polynomial model
for the "pollution-rate function," assuming the data has
a 12 hour period, beginning at t=0.
(A) The Fourier polynomial of
degree one that best fits the experimental data.
(B) The Fourier polynomial
of degree two that best fits the data.
Summary of Models
There are many ways to model the unknown pollution-rate function.
Each model has certain advantages and disadvantages, and in practice scientists
try to choose a model whose characteristics best reflect what is known about the
underlying function. In the remainder of this lab, we apply these models to
two sets of experimental data.
Next:Models of CO2 Concentrations in a River
Previous:Models of Experimental Data
Return to:Introduction
The Geometry Center Calculus Development Team
A portion of this lab is based on a problem appearing in
the Harvard Consortium Calculus book, Hughes-Hallet, et al,
1994, p. 174
Last modified: Wed Feb 21 13:10:29 1996