# Curve Fitting AKA Model Fitting–the End Goal

One of the early lessons on model building in my current Research Methods course involves taking data we have generated with a manipulative model (radioactive decay) to generate a predictive model. The students plot their data points and then try to find the mathematical expression that will describe the process best. Almost always, my students ask EXCEL to generate a line of best fit based on the data. Sometimes they pick linear plots, sometimes exponential, sometimes log plots and sometime power plots. These are all options in EXCEL to try and fit the data to some mathematical expression. It should be obvious that the process of exponential decay is not best predicted with multiple types of expressions. There should be one type of expression that most closely fits the actual physical phenomenon–a way of capturing what is actually going on. Just picking a “treandline” based on how well it visually fits the current data without considering the actual phenomenon is a very common error or misconception. You see, to pick or develop the best expression requires a deep understanding of the process being described. In my half-life exercise, I have the students go back and consider the fundamental things or core principles that are going on. Much like the process described by Jungck, Gaff and Weisstein:

*CBE-Life Sciences Education*9.3 (2010): 201-211.

By Thomas Shafee (Own work) [CC BY 4.0 (http://creativecommons.org/licenses/by/4.0)], via Wikimedia Commons

The important thing is that students understand where this equation comes from—it doesn’t come out of thin air and it is based on the same core principles they uncovered or experienced if they did the toothpickase manipulation–it is just quantified now. So how do I use this equation to actually see how well my data “fits”? If it were a linear expression that would be easy in Excel or any spreadsheet package but what about non-linear trend lines? I can tell you that this expression is not part of the trend line package you’ll find in spreadsheets.

Km equals the concentration of the substrate where the rate of reaction is 1/2 of Vmax

You can also just go to Desmos and play with it there

I had to use A and B and x1 in my equation as symbols.

It is not that difficult to use DESMOS and with my example your students who are familiar with it will be able to make their own model with their own data within DESMOS. Move the sliders around—they represent the values for Km and Vmax in the equation. Notice how they change the shape of the graph. This really brings home the point of how these constants can be used to quantitatively describe the properties of an enzyme and helps to make sense of the tables one finds about enzyme activity. Also, notice the residuals that are plotted in green along the “x-axis”. These residuals are how we fit the curve. Each green dot is the result of taking the difference between the a point on theoretical line with particular constants and variable values and the actual data point. That difference is squared. A fit that puts the green dots close to zero is a very good fit. (BTW, this is the same thing we do in EXCEL with the Solver tool.) Watch as you try to minimize the total residuals as you move the sliders. The other thing that you get with DESMOS is that if you zoom out you’ll find that this expression is actually a hyperbolic tangent…and not an exponential. How is that important?

Well, think back to the beginning of this post when I talked about how my students often just choose their mathematical model on what line seems to fit the data the best–not on an equation developed from first principles like the Michaelis-Menten.