## Gordonsurge.dvi

Fitting Surge Functions to Data

**FITTING SURGE FUNCTIONS**
Sheldon P. Gordon
ADDRESS: Department of Mathematics, Farmingdale State University of
New York, Farmingdale NY 11735 USA.

[email protected]
ABSTRACT: The problem of fitting a surge function to a set of data such
as that for a drug response curve is considered. A variety of differenttechniques are applied, including using some fundamental ideas fromcalculus, the use of a CAS package, and the use of Excel's regressionfeatures for fitting a multivariate linear function to a set of trans-formed data. The results of the different approaches are contrastedand discussed.

KEYWORDS: Surge functions, drug response curves, curve fitting, least
squares, multivariate regression analysis, rational functions.

One of the "new" families of functions that are being introduced in the earlyyears of the mathematics curriculum is the

*surge *function, which is treatedat the calculus level in [2, 3] and at the precalculus level in [1]. The surgefunction, whose graph is shown in Figure 1, has the form

*f *(

*x*) =

*Ax*p

*e*−bx

*,*
where

*p > *0 and

*b > *0, which is equivalent to

*f *(

*x*) =

*Ax*p

*c*x, 0

*< c < *1.

Surge functions are used to model a variety of real-world applications, suchas the response to an initial dose of a drug (the level of the medication inthe bloodstream rises relatively rapidly to a peak and thereafter decays asthe drug is washed out of the body by the kidneys) or the body's responseto an infection. Surge functions are also used to model the results of anadvertising campaign that initially causes a fast increase in sales, but whichthen slowly diminish. From a modeling point of view, the initial surge is
accounted for by the power function term

*x*p and the subsequent slow decayis accounted for by the decaying exponential term

*e*−bx or

*c*x.

Figure 1. The graph of a surge function.

A surge function such as the one pictured in Figure 1 has a maximum
and two points of inflection for

*x>*0. As shown in [4] using standard calculustechniques, the maximum occurs at the point where

*b *6= 0

*.*
A second horizontal tangent occurs at the origin where

*f *0(0) = 0; de-
pending on the value of

*p*, this could be a turning point or an inflectionpoint. The two inflection points seen in the graph in Figure 1 occur at

*x*
. This can be written in a more insightful way as
which indicates that the two inflection points are located symmetricallyabout the turning point at

*x *= p ; however, the inflection points do occur
at different heights since the curve is not symmetric about the vertical linethrough the turning point.

Another extremely important theme in the modern mathematics curricu-
lum is that of fitting a function to a set of data. All graphing calculators (aswell as spreadsheet packages such as Excel) have the capability of fitting alinear, exponential, power, logarithmic and polynomial function (up to 4thdegree on a calculator and up to 6th degree on Excel) to a set of data. Manycalculators also have the capability of fitting a logistic function and a sinu-soidal function to data. However, unless one uses a specialized computerpackage such as Mathematica or Maple, there is no readily available toolfor fitting a surge function to a set of data. This issue was addressed in [4],where several different techniques were discussed, but none was particularlysatisfactory in the sense of yielding good results in an easily accessible way.

Fitting Surge Functions to Data
In this article, we look at this issue again and consider in detail an approachthat has the advantage of giving reasonably accurate results with a readilyavailable tool, as well as the approach of applying the least squares criteriondirectly with the assistance of a CAS package.

Figure 2. The drug response curve for Viagra.

To illustrate the different approaches, we will use data on the mean
plasma concentration level for sildenafil citrate (Viagra), a drug that ourstudents would certainly be aware of and which would obviously pique theirinterest. We show the drug response curve for Viagra, as posted at thePfizer website [5], in Figure 2. In this situation, the independent variable

*t*is time in hours since a dose of Viagra was taken initially and the dependentvariable, the mean plasma concentration level

*C *for a group of healthy malevolunteers, is measured in nanograms per milliliter (ng/ml). It is evidentthat the drug achieves its maximum level slightly more than an hour afterit is taken and thereafter the level decays relatively slowly.

From this graph, we can estimate a set of data points to use as our
target in finding an equation for the surge function that matches the curveshown in the Pfizer graph. In particular, we estimate the following values(see Table 1) for

*t *in hours and the corresponding concentration level

*C *innanograms per milliliter:
Table 1. Data on level of Viagra.

It is interesting to note that the first point shown in Pfizer's graph in
Figure 2 is not at the origin although we would presume that the initialViagra concentration level would be 0 at time 0. We will address this issuelater in the article.

From the data, we conclude that the maximum concentration level of
about 440 ng/ml occurs at about

*t *= 1

*.*2 hours. Furthermore, the two inflec-tion points, which correspond to the points where the function is changingmost rapidly, occur at about

*t *= 0

*.*4 and

*t *= 2

*.*4 hours after the Viagra isfirst taken. Since we know that the inflection points for a surge functionshould be symmetrically located about the turning point, we might opt toaverage the two deviations about

*t *= 1

*.*2 and so estimate that the inflectionpoints are 1 hour above and 1 hour below

*t *= 1

*.*2; that is, at

*t *= 0

*.*2 and

*t *= 2

*.*2.

Alternatively, we might reason that just because the largest value of

*C *in the table is at

*t *= 1.2 hours does not necessarily mean that this isthe absolute maximum value – the Viagra level might reach a higher levelsomewhat beyond

*t *= 1.2, say at

*t *= 1

*.*3 or

*t *= 1

*.*4, and the latter valuemight make for a more symmetric format. We leave this possibility for theinterested reader to pursue.

We now substitute the estimates for the location of the turning point
(

*t *= 1

*.*2) and the inflection points (

*t *= 0

*.*2 and 2.2) into Equations (2) and
(3) to get p = 1

*.*2 and
= 2

*.*2 and the latter is equivalent to
These two equations can be solved readily to get

*p *= 1

*.*44 and

*b *= 1

*.*2, sothat the form of our surge function is
We know that the peak concentration value of approximately 440 ng/ml
occurs at about

*t *= 1

*.*2, so that

*C*(1

*.*2) =

*A*(1

*.*2)1.44

*e*−1.44 = 0

*.*30806

*A *=440, and so

*A *= 1428

*.*29. The corresponding model for the surge functionis therefore
We show the graph of this function superimposed over the data points inFigure 3 and conclude that it is a reasonably good fit for

*t *between 0 andabout 3 hours, although thereafter the surge function dies out much morerapidly than the concentration of Viagra does.

Probably the most common measure used to assess how well a function
fits a set of data is the sum of the squares of the vertical deviations betweenthe curve and the data points. The corresponding value for the surge func-
Fitting Surge Functions to Data
tion (4) shown in Figure 3 is 53,459.9. We will use this value for comparisonin our subsequent calculations.

Figure 3. Viagra data and the surge function.

Figure 4. Another surge function with the Viagra data.

We can obtain a better approximation to a best-fit surge function by
using some of the features in Mathematica to minimize the value of thesum of the squares. (The actual Mathematica session, including the specificcommands used and the corresponding output, is shown in Appendix A.)
The result of this process is the quite different surge function
All three of the parameters have changed considerably and should change
the function significantly compared to the surge function in Equation (4).

The corresponding value for the sum of the squares associated with thisfunction is 25,815.1, so that it is significantly smaller (about 1 as large)
than the value of 53,459.9 that we got by simply applying the calculus-based results. We show this function superimposed over the data points inFigure 4, and observe that it seems to be a better fit to more of the pointsthan the surge function (4) in Figure 3. Nevertheless, this function is still arather poor fit to the data, especially after about

*t *= 5 hours. And, perhapsmore importantly, because this approach requires the use of a specializedCAS program that might not be available to all students, it may not be themost effective approach with which to investigate other sets of data thatfall into the pattern of a surge function.

**USING MULTIVARIATE REGRESSION**
We now consider a different way to find the equation of a surge functionthat fits a set of data. To do so involves using some ideas on fitting alinear function of two or more variables to a set of multivariate data. Inparticular, suppose we have a set of (

*x*1,

*x*2, . . ,

*x*n

*, y*) data, where

*y*depends on the

*n *independent variables

*x*1,

*x*2, . . ,

*x*n. Multivariate linearregression is a standard tool that is used to find the linear function

*Y *=

*c*0 +

*c*1

*X*1 +

*c*2

*X*2 +

*. . *+

*c*n

*X*n that is the best fit to the data. Think ofthis as finding the hyperplane in

*n *+ 1 dimensional space that comes closestto all of the points in the data set. This capability is available in manysoftware packages, including Excel, so it is readily at hand for use. (Notethat this feature is not automatically loaded when Excel is first installed;rather, it must be loaded one time as an Add-In under Excel's Tools menu– just select Analysis ToolPak, and it is thereafter available under Tools.

One of the options you then get is Regression. We describe the use of thisfeature in Appendix B.)
One somewhat unexpected application of multivariate linear regression
is in fitting a polynomial in one variable

*x *to a set of (

*x, y*) data. Supposewe wish to find a polynomial of degree 3,

*y *=

*a*0 +

*a*1

*x *+

*a*2

*x*2 +

*a*3

*x*3,that fits such a table of values. We can think of the polynomial expressionas a linear function of

*x*,

*x*2, and

*x*3 and then apply multivariate linearregression, where

*X*1 =

*x*,

*X*2 =

*x*2, and

*X*3 =

*x*3. The resulting coefficients
Fitting Surge Functions to Data
for the constant term and for the three variables

*X*1,

*X*2, and

*X*3 are thenthe coefficients of the desired polynomial.

We now apply a similar approach to fitting a surge function to a set of
(

*x, y*) data, and will then apply the procedure to the data on the concen-tration levels of Viagra. Since the surge function we seek has the form

*y *=

*Ax*p

*e*−bx

*,*
we can take logarithms of both sides to get
log(

*y*) = log(

*A*) + log(

*x*p) + log(

*e*−bx) = log(

*A*) +

*p *log(

*x*) −

*bx*
and so log

*y *is a

*linear *function of

*x *and log

*x*. Thus, we can set

*Y *= log

*y*,

*X*1 =

*x *and

*X*2 = log

*x *and apply multivariate linear regression to anextended table of values that also includes a column of log

*x *values and acolumn of log

*y *values. In order to take logs of the

*t *and the

*C *values,we need to avoid the obvious starting point where

*t *= 0 and

*C *= 0; wedo this by making a very minor change in the values of the two variablesat that point and use

*t *= 0

*.*05 instead of 0 and

*C *= 10 instead of 0. Wepresume that the researchers at Pfizer did the same in producing the graphin Figure 2 on their website; otherwise, it is far more natural to use (0

*, *0) asthe starting point. For the Viagra data, we then have the extended Table2.

Table 2. Original and transformed data.

When we "hit" this set of transformed data with the multivariate regressionfeatures of Excel, we get the linear regression equation

*Y *= 2

*.*3190 − 0

*.*1242

*X*1 + 0

*.*7613

*X*2

*,*
which is equivalent to
log

*C *= 2

*.*3190 − 0

*.*1242

*t *+ 0

*.*7613 log

*t.*
We can eliminate the logs algebraically by undoing the original transforma-tion using powers of 10 and so obtain
10log C =

*C*
Figure 5. The surge function based on multivariate regression.

The base 0.7513 in the exponential term can be converted to an ap-
propriate power of

*e *by solving the exponential equation

*e*−b = −0

*.*7513,which leads to

*b *= 0

*.*2859.

Thus, we have the surge function

*C*(

*t*) =
208

*.*45

*t*0.7613

*e*−0.2859t, which is shown superimposed over the data pointsin Figure 5. It is obviously a very poor fit to the data, other than at thevery beginning and the very end. Furthermore, the associated value for thesum of the squares is 192,812.4, which is considerably larger than anythingwe had before and which therefore corroborates our visual conclusion that
Fitting Surge Functions to Data
the fit is extremely poor. Yet, the logic leading up to this result seems rea-sonable in the sense that the multivariate regression process produces thebest-fit plane to the transformed data and therefore leads us to expect amuch better fit. Let's see what has gone wrong.

In Figure 6, we show the plot of the points (log

*t, *log

*C*). Notice that,
other than the left-most point (−1

*.*5

*, *1), the remaining points are clusteredrelatively tightly and mostly display a clear pattern. This suggests that theresults we get for the regression equation might be very sensitive to smallchanges in the values of the coordinates at the left-most point in the sensethat this point may have a disproportionate effect on the coefficients in theregression equation.

Moreover, the left-most point is what we estimated to avoid the problem
with taking logs of 0. And, because it involves a negative value for

*t*, arelatively minor change in the value of

*t *near 0 would likely result in amajor change in the value of log

*t*. In addition, when you look back atFigure 2 (the Pfizer website graph), it is evident that this initial point isthe one for which it is hardest to estimate an accurate value.

Figure 6. Plot of log

*C *vs. log

*t*.

Let's see just how much of an effect we get by trying a slightly different
estimate for the value of

*t *for this point. Instead of using

*t *= 0

*.*05, supposewe try

*t *= 0

*.*10 and maintain the value

*C *= 10.

The resulting surge
function is

*C*(

*t*) = 619

*.*44

*t*0.8236

*e*−0.2924t. The value for the coefficient haschanged dramatically from

*A *= 208

*.*45; the power in the power functionterm has changed a bit from

*p *= 0

*.*7613 to

*p *= 0

*.*8236; and the multiplein the exponential term has changed fairly minimally from

*b *= −0

*.*2859 to

*b *= −0

*.*2924. However, the corresponding value of the sum of the squaresis now 700,741.1, which is almost four times as large as the previous valueof 192,812.4 and the resulting surge function is a far poorer fit to this data.

More significantly, a relatively small change in the estimate of the point
near the origin clearly results in a huge change in the results.

One way to circumvent this issue is to realize that, by its very nature, ev-
ery surge function must pass through the origin provided that the power

*p *inthe power function term is positive. As a consequence, it might make senseto ignore the point near the origin altogether and see what happens if we useonly the remaining points for the multivariate regression analysis. When wedo this, the corresponding linear function is

*Y *= 2

*.*4276−0

*.*092

*X*1+0

*.*225

*X*2,which is equivalent to log

*C *= 2

*.*4276 − 0

*.*092

*t *+ 0

*.*225 log

*t*.

When we undo the logarithmic transformation by taking powers of 10,
we eventually get the surge function

*C*(

*t*) = 267

*.*67(0

*.*8091)t

*t*0.225 = 267

*.*67

*t*0.225

*e*−0.2118t

*.*
The associated value for the sum of the squares is 153,650.7. This is asubstantial improvement over the two preceding surge functions using mul-tivariate regression with estimates of the point near the origin. However, itis still considerably larger than the value of 53,459.9 we initially obtainedusing the calculus argument, let alone the value of 25,815.1 that resultedfrom the Mathematica routine for minimizing the sum of the squares.

Incidentally, there is another statistical measure used to assess how well
a multivariate linear function fits a set of data; it is the

*coefficient of multipledetermination *and is denoted by

*R*. It is the extension of the correlationcoefficient

*r *to multivariate data. For the three functions we have createdusing multivariate regression, the corresponding values are

*R *= 0

*.*8768,

*R *= 0

*.*8581, and

*R *= 0

*.*8988, respectively.

While all three values are
statistically significant, the fact that they are all quite close to one anothermeans that we should not make a definitive call on which of the three surgefunctions is the best fit based solely on the value of

*R*.

This still leaves one rather perplexing question. How can all three of
these surge functions based on multivariate linear regression be so muchpoorer fits than the one based on calculus, let alone the one obtained us-ing the computer search method? After all, multivariate linear regressionis supposed to produce the

*best *fit! The key is that it does produce thebest

*linear *fit to the set of transformed (

*t*, log

*t*, log

*C*) data. If all we didwas to stop there, we would indeed have the best possible fit. However,we started with (

*t, C*) data and, in the process of transforming it via log-arithms, we stretched the data values in a non-linear way. After we gotthe corresponding multivariable linear regression equation, we undid theoriginal transformation, which entails another non-linear stretch, but thistime the inverse transformation is applied to the function, not to the data.

So, although the three regression planes we obtained were the best linear
Fitting Surge Functions to Data
fits to the three different sets of transformed data, the corresponding surgefunctions are not necessarily the best, or even extremely good, fits to theoriginal data. They may be good fits if the original data fall very closelyinto a surge function pattern; however, if the data is not

*extremely *close tosuch a pattern, the resulting function based on multivariate regression maybe a surprisingly poor fit.

A comparable situation arises with the curve fitting routines in calcula-
tors and in Excel; rather than directly fitting an exponential, logarithmic,or power functions to a set of data, these routines transform the data (eithera semi-log plot or a log-log plot), find the regression line for the transformeddata, and then undo the transformation algebraically. In the process, oneobtains the best possible line for the transformed data, but in the process ofundoing the transformation, a nonlinear stretch takes place and the result-ing function is not necessarily the best fit within that family of functions.

**FITTING A RATIONAL FUNCTION TO THE DATA**
Gordon and Gordon [4] also discuss the possibility of fitting a rational func-tion of the form

*C*(

*t*) =

*t*4 +

*b*2
to a set of data on drug concentration levels over time, where

*a *and

*b *aretwo constants. The quadratic term in the numerator is needed to reflectthe curvature of the drug data at and near the origin; the quartic term inthe denominator reflects the fact that the data eventually die out as timeprogresses.

In [4], it was found that such a rational function was actually a con-
siderably better fit than a surge function is to a set of data on the drugconcentration level for a form of L-Dopa used to treat patients with Parkin-son's disease. Let's see how well such a function fits the data on Viagra. Weagain use Mathematica to perform a direct least squares fit with a rationalfunction of the form in Equation (6). The resulting function is

*C*(

*t*) =

*t*4 + 3

*.*191292
where the corresponding value for the sum of the squares is 28,468.6. Wenote that this value is slightly larger than the value of 25,815.1 we obtainedbefore for the best fitting surge function. So, in this case, the rationalfunction gives slightly poorer accuracy. In Figure 7, we show both the bestsurge function and this best rational function of the form in Equation (5)to compare the relative fits. From this, we see that the rational function
(the darker curve) spikes to a considerably higher level than either the dataor the surge function do; however, it dies out more slowly than the surgefunction and so is a better fit to the data after about

*t *= 12 hours.

Figure 7. The rational function (darker curve) vs. the surge function.

In general, the focus of this kind of investigation should not be on simplyfinding a function that is the best possible fit to a set of data, but ratheron finding a function that is a reasonable fit and whose properties provideinsight into the situation being modeled. A surge function provides thatkind of insight in the sense that the power function term models the initialimpetus and the exponential term models the eventual exponential-typedecay of the drug concentration levels. The rational function certainly fitsthe data well, but there are considerably less compelling interpretations forwhy it follows the desired behavior pattern – the term

*at*2 does model theinitial impetus and the
term does die out relatively quickly, but the
latter is nowhere as convincing a description as exponential decay.

So the key is the realization that all we are producing is a mathematical
model. There are different routes to developing such a model, not only theones we discussed here. For instance, there may be a more sophisticatedpharmacokinetics model based on differential equations, but that is beyondthe scope of what we are considering here and likely beyond the scopeof the students we serve in introductory courses.

In the final analysis,
what is most important is not the keystrokes used to produce a model,but an understanding of the mathematics underlying that model and the
Fitting Surge Functions to Data
development of the judgment necessary to decide how well the functionactually fits the data and how the model gives us an understanding of theprocess.

The work described in this article was supported by the Division of Un-dergraduate Education of the National Science Foundation under grantsDUE-0089400, DUE-0310123, and DUE-0442160. The author also appre-ciates the assistance provided by Brian Winkel, editor of

*PRIMUS *, withMathematica.

1. Gordon, Sheldon P., Florence S. Gordon,

*et al. *2004.

*Functioning*
*in the Real World: A Precalculus Experience, *2nd

*Ed *. Boston: Addison-Wesley.

2. Hughes-Hallett, Deborah, Andrew Gleason,

*et al. *2002.

*Calculus, 3*rd

*Ed *. New York: John Wiley & Sons, New York.

3. Hughes-Hallett, Deborah, Andrew Gleason,

*et al. *1999.

*Applied Cal-*
*culus*, New York: John Wiley & Sons.

4. Gordon, Sheldon P. and Florence S. Gordon. 2003. A Spoonful of
Medicine Makes the Mathematics Go Down.

*The AMATYC Review *. 24:9-24.

5. http://www.pfizer.com/download/uspi viagra.pdf

**APPENDIX A:**
**PERFORMING DIRECT CURVE FITTING**
**Direct least squares fitting of surge function to data.**
data = {{.05, 1}, {.4, 50}, {.6, 320}, {1.2, 440}, {1.8, 410},
{2.1, 350}, {3, 250}, {4, 170}, {6, 80}, {10, 30}, {12, 20},{18, 12}, {24, 6}}
ss[a , p , b ] = Sum [(data[[i, 2]] - m[data[[i, 1]], a, p, b]) 2,
{i,1,Length[data]} ]
sm = FindMinimum[ss[a, p, b], {a, 1}, {b, 1}, {p, 2}]
{23851.6, {a → 1133.57, b → 1.10895.

mf[x ] = m[x,a,p,b]/.FindMinimum[ss[a,p,b],{a,1},{b, 1},{p,1}][[2]]
1133.57 e−1.10895x x1.69499
Figure 8. Plot, in Mathematica, of data with fitted curve.

**APPENDIX B:**
**REGRESSION IN EXCEL**
With Excel's Analysis ToolPak installed, enter the data values for the de-pendent variable

*C *in Column A, say, and those for the dependent variable

*t *in Column B and then create lists of values for log

*t *in Column C andlog

*C *in Column D, as is done in Table 2. Then click on Tools, followedby Data Analysis, and finally Regression and OK. This will bring up theExcel dialog box shown in Figure 9. In this dialog, the first box asks youto Input Y Range; the Y-values are the values for the desired dependentvariable, here log

*C*, which are in Column D. The second box asks you toInput X Range; the X-values for

*t *and log

*t *are in Columns B and C. Next,select the first option, Output Range, under Output options; this will givethe cells in which all the regression analysis output will appear. Select acollection of cells that are empty, say from A20 to I47.

When you click on OK, Excel will perform the complete regression anal-
ysis and print the results in the cells that were indicated. Sample resultsare shown in Figure 10. Of all the output results, the only ones that are ofsignificance to this discussion are the values for the regression coefficientsin Rows 36-38 and possibly the value for the coefficient of multiple deter-mination

*R *in Row 23. In particular, the constant coefficient is 2.428, thecoefficient of the first independent variable

*t *is -0.0920 and the coefficientof the second independent variable log

*t *is 0.2251, leading to the regres-sion equation

*Y *= 2

*.*428 − −0

*.*0920

*X*1 + 0

*.*2251

*X*2, which is equivalent tolog

*C *= 2

*.*428 − 0

*.*0920

*t *+ 0

*.*2251 log

*t*.

Fitting Surge Functions to Data
Figure 9. Excel dialog box for regression analysis.

Figure 10. Excel's regression output display.

Sheldon Gordon is Professor of Mathematics at Farmingdale State Univer-sity of New York. He is a member of a number of national committeesinvolved in undergraduate mathematics education and is leading a nationalinitiative to refocus the courses below calculus. He is the principal authorof

*Functioning in the Real World *and a co-author of the texts developedunder the Harvard Calculus Consortium.

Source: https://www.farmingdale.edu/faculty/sheldon-gordon/RecentArticles/fitting-surge.pdf

Is Intravitreal Clindamycin, Dexamethasone Effective in Treating Ocular Toxoplasmosis? Randomized Trial of Intravitreal Clindamycin and Dexamethasone Versus Pyrimethamine, Sulfadiazine, and Prednisolone in Treatment of Ocular Toxoplasmosis. Soheilian M, Ramezani A, et al: Ophthalmology 2011; 118 (January): 134-141 Intravitreal injection of clindamycin and dexamethasone appears to be as effective as traditional oral therapy for ocular toxoplasmosis.

Final Report: The Recovery Partnership Review of Alcohol Treatment Services Mike Ward, Mark Holmes, Lauren Booker Executive summary 2. Four key findings 3. The current state of the alcohol treatment system 4. Other parts of the care pathway 5. The impact of the recovery agenda, peer support, and mutual aid 6. The commissioning process 7. The role of non-specialist services