Curve Fitting AKA Model Fitting–the End Goal

Curve Fitting AKA Model Fitting:
When I started this series of posts my goal was to see if I could generate precise data with a proven classroom lab.  The data precision that is possible with the yeast catalase lab provides a unique opportunity where data analysis skills can be productively explored, practiced and understood.  My contention was that this is the ideal lab to focus not just on content, not just on experimental design, but also to introduce relatively sophisticated data analysis.  To be up front about it, I had only a hint of how rich this lab is for doing just that.  Partly , this is because in my years of teaching high school biology I covered most of the enzyme content in class activities and with 3D visualizations, focusing on the shape of enzymes but neglecting enzyme kinetics.  That would be different if I were teaching today—I’d focus more on the quantitative aspects.  Why?  Well, it isn’t just to introduce the skills but it has more to do with how quantitative methods help to build a deeper understanding of the phenomena you are trying to study.  My claim is that your students will develop a deeper understanding of enzymes and how enzymes work in the grand scheme of things if they follow learning paths that are guided and supported by quantitative data.  This post is an example.
The last post focused on plotting the data points as rates, along with some indication of the variability in each measurement in a plot like this.
As I said before, I would certainly be happy if most of my students got to this point as long as they understood how this graph helps them to describe enzyme reactions and interpret others work.
But a graph like this begs to have a line of best fit–a curve that perhaps plots the relationship implied by our data points.
Something like this.

One of the early lessons on model building in my current Research Methods course involves taking data we have generated with a manipulative model (radioactive decay) to generate a predictive model.  The students plot their data points and then try to find the mathematical expression that will describe the process best.  Almost always, my students ask EXCEL to generate a line of best fit based on the data.  Sometimes they pick linear plots, sometimes exponential, sometimes log plots and sometime power plots.  These are all options in EXCEL to try and fit the data to some mathematical expression.  It should be obvious that the process of exponential decay is not best predicted with multiple types of expressions.  There should be one type of expression that most closely fits the actual physical phenomenon–a way of capturing what is actually going on.  Just picking a “treandline” based on how well it visually fits the current data without considering the actual phenomenon is a very common error or misconception.  You see, to pick or develop the best expression requires a deep understanding of the process being described.  In my half-life exercise, I have the students go back and consider the fundamental things or core principles that are going on.  Much like the process described by Jungck, Gaff and Weisstein:

“By linking mathematical manipulative models in a four-step process—1) use of physical manipulatives, 2) interactive exploration of computer simulations, 3) derivation of mathematical relationships from core principles, and 4) analysis of real data sets…”
Jungck, John R., Holly Gaff, and Anton E. Weisstein. “Mathematical manipulative models: In defense of “Beanbag Biology”.” CBE-Life Sciences Education 9.3 (2010): 201-211.
The point is that we are really fitting curves or finding a curve of best fit–we are really trying to see how well our model will fit the real data.  And that is why fitting this model takes this lab to an entirely new level.   But how are you going to build this mathematical model?
Remember that we started with models that were more conceptual or manipulative.  And we introduced a symbolic model as well that captured the core principles of enzyme action:

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Now how do we derive a mathematical expression from this?  I’m not suggesting that you should necessarily unless you feel comfortable doing so but I’ll bet there are kids in your class that can given a bit of guidance.  You may not feel comfortable providing the guidance.  But in this day of “just ask Google” you can provide that guidance in the form of a video discussion from the Khan Academy designed to help students prepare for the MCAT.  Don’t let that scare you off.  Here are two links that take the symbolic model and derive a mathematical expression–not just any expression—the Michaelis-Menten equation for enzyme kinetics. You or your students will no doubt need to view these more than once but the math is not that deep—not if your students are exploring calculus or advanced algebra.  It is really more about making assumptions and how those assumptions simplify things so that with regular algebra you can generate the Michaelis-Menten equation.
You can also find a worked out derivation here:  in this text excerpt from Biochemistry, 5th ed. Berg JM, Tymoczko JL, Stryer L.
New York: W H Freeman; 2002.
Of course, you don’t even have to go through the derivation you could just provide the equation.

The important thing is that students understand where this equation comes from—it doesn’t come out of thin air and it is based on the same core principles they uncovered or experienced if they did the toothpickase manipulation–it is just quantified now.  So how do I use this equation to actually see how well my data “fits”?  If it were a linear expression that would be easy in Excel or any spreadsheet package but what about non-linear trend lines?  I can tell you that this expression is not part of the trend line package you’ll find in spreadsheets.
I’ve got to admit, I spent too many years thinking that generating best-fit curves from non-linear expressions like the M-M equation was beyond the abilities of me or my students.  But again “Ask Google” comes to the rescue.  If you google “using solver for non-linear curve fitting regression” you’ll end up with lots of videos and even some specific to the Michaelis-Menten equation.  It turns out EXCEL (and I understand Google Sheets) has an add-on called Solver that helps you find the best fit line.  But what does that mean?  Well it means that you need to manipulate the parameters in the M-M equation to generate a line until it mostly fits your data–to see if the model is an accurate description of what you measured.  What parameters are these?
Look at the equation:
V0 equals the rate of the reaction at differing substrate concentrations–the vertical axis in the plots above.
Vmax equals the point at which all of the enzyme is complexed with the substrate–the maximum rate of the reaction with this particular enzyme at this particular enzyme concentration (that is enzyme concentration not substrate)

Km equals the concentration of the substrate where the rate of reaction is 1/2 of Vmax

[S]  equals the substrate concentration, in this case the H2O2
Two of these parameters are variables—one is our experimental or explanatory variable, the concentration of H2O2 and the other is our response variable, the rate of the reaction. Some folks prefer independent and dependent variable. This is what we graph on our axis.
The other two parameters are constants and the help to define the curve. More importantly, these are constants for this particular enzyme at this particular enzyme concentration for this particular reaction. These constants will be for different enzymes, different concentrations or reactions with inhibitors, competitors, etc. In other words it is these constants that help us to define our enzyme properties and provide a quantitative way to compare enzymes and enzyme reactions. You can google up tables of these values on the web. from: Biochemistry, 5th ed. Berg JM, Tymoczko JL, Stryer L.
So calculating these constants is a big deal and one that is not typically a goal in introductory biology but if you’ve come this far then why not?
This is where generating that line that best-fits the data based on the Michaelis-Menten equation comes in.
You can do this manually with some help from Solver in Excel.  (Google Sheets also is supposed to have a solver available but I haven’t tried it.
I have put together a short video on how to do this in Excel based on the data I generated for this lab.

I’ve also taken advantage of a web based math application DESMOS which is kind of a graphing calculator on the web.  While I can create sliders to manipulate the constants in the equation, Km and Vmax  to make a dynamic spreadsheet model it is a lot easier in DESMOS and DESMOS lets me share or embed the interactive equation. Scroll down in the left hand column to get to the sliders that change the constants.

You can also just go to Desmos and play with it there

I had to use A and B and x1 in my equation as symbols.

It is not that difficult to use DESMOS and with my example your students who are familiar with it will be able to make their own model with their own data within DESMOS.  Move the sliders around—they represent the values for   Km and Vmax  in the equation.  Notice how they change the shape of the graph.  This really brings home the point of how these constants can be used to quantitatively describe the properties of an enzyme and helps to make sense of the tables one finds about enzyme activity.  Also, notice the residuals that are plotted in green along the “x-axis”.  These residuals are how we fit the curve.  Each green dot is the result of taking the difference between the a point on theoretical line with particular constants and variable values and the actual data point.  That difference is squared.  A fit that puts the green dots close to zero is a very good fit.  (BTW, this is the same thing we do in EXCEL with the Solver tool.)  Watch as you try to minimize the total residuals as you move the sliders.  The other thing that you get with DESMOS is that if you zoom out you’ll find that this expression is actually a hyperbolic tangent…and not an exponential.  How is that important?

Well, think back to the beginning of this post when I talked about how my students often just choose their mathematical model on what line seems to fit the data the best–not on an equation developed from first principles like the Michaelis-Menten.

Looking at a plot of the data in this experiment before the curve fitting one might have proposed that an exponential equation might have produced the best fit.  In fact, I tried that out just for kicks.
This is what I got.

Here’s a close-up:

Thinking about the actual experiment and the properties of enzymes there are two things really wrong with this fit although you’ll notice that the “line” seems to go through the data points better than the fit to the Michaelis-Menten equation.  1.  Notice that the model line doesn’t go through zero.   Hmmmm.  Wouldn’t a solution with no Hydrogen peroxide not react with the yeast?  That should be tested by the students as a control as part of the experimental design but I can tell you that the disk will not rise in plain water so the plot line needs to go through the origin.  I can force that which I have in this fit:

But the second issue with this fit is still there.  That is the point where the plot has reached it’s maximum rate.  If I had generated data at a 3% substrate concentration I can promise you the rate would have been higher than 0.21 where this plot levels off.  While the exponential model looks like a good fit on first inspection it doesn’t hold up to closer inspection.  Most importantly the fit is mostly coincidental and not base on an equation developed from first principles.  By fitting the data to the mathematical model your students complete the modeling cycle described on page T34 in the AP Biology Investigative Labs Manual, in the Bean Biology paper cited above, and on page 85 in the AP Biology Quantitative Skills Guide.
Give model fitting a try—perhaps a little bit a time and not all at once.  Consider trying it out for yourself with data your students have generated or consider it as a way of differentiating you instruction.  I’ll wrap this up with a model fitted with data from Bob Kuhn’s class that they generated just this month.  He posted the data on the AP Biology forum and I created the fit.

The key thing here is that his enzyme concentration (yeast concentration) was quite a bit diluted compared to the data that I’ve been sharing.  Note how that has changed the Michaelis-Menten curve and note how knowing the Km and Vmax provides a quantitative way to actually compare these results.   (Both constants for this graph are different than for mine)
Hopefully, this sparks some questions for you and your students and opens up new paths for exploring enzymes in the classroom.  I’ll wrap this up next week with how one might assess student learning with one more modeling example.

Teaching Quantitative Skills: Data Analysis

Managing labs has got to be one of the most difficult things we do as biology teachers.  There is so much to keep in mind: safety, time, cost, level appropriateness, course sequence, preparation time, and did I mention time?  It’s no wonder that we are tempted to make sure that the lab “works” and that the students will get good data.  When I first went off the deep end and starting treating my classes like a research lab–meaning almost every lab had an element of individual based inquiry, I’ve got to say I was just pretty content if I could get students to the point that they asked their own question, designed an effective experimental procedure and collected some reasonable data.  It took a lot of effort to get just that far and to honest, I didn’t put enough emphasis on good data analysis and scientific argumentation as much as I should have.  At least that is the 20-20 hind-sight version that I see now.  Of course, that’s what this series is all about—how to incorporate and develop data analysis skills in our classes.

Remember, this lab has a number of features that make it unique:  safe enough to do as homework (which saves time), low cost, and more possible content and quantitative skills to explore than anyone has time for.  For me, its like saddling up to an all you can eat dessert bar.  No doubt, I’m going to “overeat” but since this lab happens early and it is so unique, I think I can get away with asking the students to work out of their comfort zone.  1. because they skills will be used again for other labs and 2. because I need them to get comfortable with learning from mistakes along with the requisite revisions that come from addressing those mistakes.
Depending on how much time we had earlier to go over ideas for handling the data the data the students bring back from their “homework” is all over the map.  Their graphs of their dat are predictably all but useless to effectively convey a message.  But their data and their data presentations provide us a starting point, a beginning, where, as a class we can discuss, dissect, decide, and work out strategies on how to deal with data, how to find meaning in the data, and how to communicate that meaning with others.
In the past, the students would record their results and graph their work in their laboratory notebooks.  Later, I’d let them do their work in Excel or some other spreadsheet.  The data tables and graphs were all over the map.  Usually about the best the students would come up with looked something like this.
The data (although not usually, this precise) and usually not with the actual H2O2 concentrations:

Sometimes they would have a row of “average time” or mean time but I don’t think any student has ever had rows of standard deviation and for sure no one ever calculated standard error but getting them to this point is one of my goals at this point.  Of course, that is going to be one of my goals at this point.  As teachers we work so much with aggregated data (in the form of grades and average grades) that we often don’t consider that for many it doesn’t make any sense.  Turns out to be an important way of thinking that is missing more than we realize.  In fact in the book, Seven Pillars of Statistical Wisdom, Stephen M. Stigler devotes an entire chapter on aggregation and its importance in the history of mathematical statistics.  For most of my career, I was only vaguely familiar with this issue.  Now I’d be very careful to bring this out in discussion with a number of questions.  What does the mean provide for us that the individual data points do not?  Why does the data “move around” so much?
It doesn’t take much to make sure they calculate the mean for their data.
This brings up another point.  Not only do some folks fail to see the advantage of aggregating data some feel that the variation we see can be eliminated with more precise methods and measurement–that there is some true point that we are trying to determine.  The fact is the parameter we are trying to estimate or measure is the mean of the population distribution.  In other words there is a distribution that we are trying to determine and we will always be measuring that distribution of possibilities.  This idea was one of the big outcomes of the development of statistics in the early 1900’s and can be credited to Karl Pearson.  Today, in science, the measurement and such assume these distributions–even when measuring some physical constant like the acceleration of gravity.  That wasn’t the case in the 1800’s and many folks today think that we are measuring some precise point when we collect our data.  Again, I wasn’t smart enough to know this back when I started teaching this lab and honestly it is an idea that I assumed my students automatically assimilated but I was wrong.  Today, I’d take time to discuss this.
Which brings up yet another point about the “raw” data displayed in the table.  Take a look at disk 3, substrate concentration 0.75%.  Note that it is way off compared to the others.  Now this is a point to discuss.  The statement that it is “way off” implies a quantitative relationship.  How do I decide that?  What do I do about that point?  Do I keep it?  Do I ignore it?  Throw it away?  Turns out that I missed the stop button on the stop watch a couple of times when I was recording the data.  (Having a lab partner probably would have led to more precise times).  I think I can justify removing this piece of data but ethically, I would have to report that I did and provide the rationale.  Perhaps in an appendix.  Interestingly, a similar discussion with a particularly high-strung colleague resulted caused him so much aggravation that the discussion almost got physical.  He was passionate that you never, ever, ever discard data and he didn’t appreciate the nuances of reporting improperly collected data.  Might be a conversation for you’ll want to have in your class.
The best student graphs from this data would look like this.  I didn’t often get means but I liked it when I did.  But note that the horizontal axis is log scaled.  Students would often bring this type of graph to me.  Of course, 99% of the them didn’t know they had logged the horizontal axis, they were only plotting the concentrations of H2O2 equally spaced.  I would get them to think about the proper spacing by asking them if the difference between 50% and 25% was the same difference as between 6.25% and 3.125%.  That usually took care of things.  ( of course there were times, later in the semester that we explored log plots but not for this lab. )

Note also, that this hypothetical student added a “best fit” line.  Nice fit but does it fit the trend in the actual data?  Is there actually a curve?  This is where referring back to the models covered earlier can really pay off.  What kind of curve would you expect?  When we drop a disk in the H2O2 and time how long it rises are we measuring how long the reaction takes place or are we measuring a small part of the overall reaction?  At this point it would be good to consider what is going on.  The reaction is continuing long after the disk has risen as evidenced by all the bubbles that have accumulated in this image.   So what is the time of disk rise measuring?  Let’s return to that in a bit but for now let’s look at some more student work.

Often, I’d get something like this with the horizontal axis—the explanatory variable—the independent variable scaled in reverse order.  This happened a lot more when I started letting them used spreadsheets on the first go around.

Spreadsheet use without good guidance is usually a disaster.  After I started letting them use spreadsheets I ended up with stuff that looked like this:

or this

It was simply too easy to graph everything–just in case it was all important.  I’ve got to say this really caught be off guard the first time I saw it.  I actually thought the students were just being lazy, not calculating the means, not plotting means, etc.   But I think I was mostly wrong about that.  I now realize many of them actually thought this was better because everything is recorded.   I have this same problem today with my college students.  To address it I ask questions that try and get to what “message” are we trying to convey with our graph.  What is the simplest graphic that can convey the message?  What can enhance that message?  What is my target audience?
The best spreadsheet plots would usually looked something like this where they at least plotted means and kind of labeled the axis.  But they were almost always bar graphs.  Note the the bar graphs graph “categories” on the horizontal axis so they are equally spaced.  This is the point that I usually bring out to start a question about the appropriateness of different graph methods.  Eventually with questions we move to the idea of the scatter plot and bivariate plots.  BTW, this should be much easier over the next few years since working with bivariate data is a big emphasis in the Common Core math standards.

But my goal in the past was to get the students to consider more than just the means but also to somehow convey the variation in their data–without plotting every point as a bar.  To capture that variability, I would suggest they use a box plot–something we covered earlier in the semester with a drops on a penny lab.  I hoped to get something like this and I usually would, but it would be drawn by hand.

The nice thing about the box plot was that it captured the range and variability in the data and provided them with an opportunity to display that variation.  With a plot like this they could then argue, with authority that each of the dilutions take a different amount of time to rise.  With a plot like this you can plainly see that there is really little or no overlap of data between the treatments and you can also see a trend.  Something very important to the story we hope to tell with the graph.  My students really liked box plots for some reason.  I’m not really sure why but I’d get box plots for data they weren’t appropriate for.
Today, I’m not sure how much I’d promote box plots but instead probably use another technique I used to promote—ironically, based on what I discussed above—plot every point and the mean.  But do so in a way that provides a clear message of the mean and the variation along with the trend.  Here’s what that might look like.

It is a properly scaled scatterplot (bivariate plot) that demonstrates how the response variable (time to rise) varies according to the explanatory variable (H2O2  concentration).  Plotting is not as easy as the bar graph examples above but it might be worth it.  There are a number of ways to do this but one of the most straight forward is to change the data table itself to make it easier to plot your bivariate data.  I’ve done that here.  One column is the explanatory/independent variable, H2O2  concentration.  The other two columns record the response or dependent variable, the time for a disk to rise.  One of the other columns is the mean time to rise and the other is the time for the individual disk to rise.  BTW, this way of organizing your data table is one of the modifications you often need to do in order to enter your data into some statistical software packages.

With the data table like this you can highlight the data table and select scatter plot under your chart options.

At this point, I’d often throw a major curve ball towards my students with a question like, “What’s up with time being the dependent variable?”  Of course, much of their previous instruction on graphing, in an attempt to be too helpful suggested that time always goes on the x-axis.  Obviously, not so in this case but it does lead us to some other considerations in a bit.
For most years this is where we would stop with the data analysis.  We’ve got means, we’ve represented the variability in the data, we have a trend, we have quantitative information to support our scientific arguments.
But now, I want more.  I think we should always be moving the bar in our classes.  To that end, I’d be sure that the students included the descriptive statistic of the standard deviation of the sample along with the standard error of the mean and to use standard error to estimate a 95% confidence interval.   That would also entail a bit of discussion on how to interpret confidence intervals.  If I had already introduced SEM and used it earlier to help establish sample sizes then having the students calculate them here and apply them on their graphs would be a forgone conclusion.
But what my real goal, today would be to get to the point where we could compare our data and understanding about how enzymes work with the work done in the field–enzyme kinetics.  Let’s get back to that problem of what is going on with the rising disk—what is it that we are really measuring if the reaction between the catalase and the substrate continues until the substrate is consumed?  It should be obvious that for the higher levels of concentration we are not measuring how long the reaction takes place but we are measuring how fast the disk accumulates the oxygen product.  Thinking about the model it is not too difficult to generate questions that lead students to the idea of rate:  something per something.  It is really the rate of the reaction we are interested in and it varies over time.   What we are indirectly measuring with the disk rise is the initial rate of the enzyme/substrate reaction.  We can arrive at a rate by taking the inverse or reciprocal of the time to rise.  That would give us a float per second for a unit.  If we knew how much oxygen it takes to float a disk we could convert our data into oxygen produced per second.
So converting the data table would create this new table.

Graphing the means and the data points creates this graph.

Graphing the means with approximately 95% error bars creates this graph.

Woooooooweeeeee, that is so cool.  And it looks just like a Michelis-Menten plot.

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Creating this plot–as long as the students can follow the logic of how we get here opens up an entirely new area for investigation about enzymes and how they work.  Note that we now have some new parameters:  Vmax and Km that help to define this curve.  Hmmmm.  What is this curve and do my points fit it?  How well do the data points fit this curve.  Can this curve, these parameters help us to compare enzymes?  Here we return to the idea of a model–in this case a mathematical model which I’ll cover in the next installment.

TOP 5 LEAST Favorite Student Questions

It happens to everyone… you’re having a perfectly normal day until someone asks you a question that sabotages your day in an instant. With thanks to the KABT Facebook Group, here are our top 5 questions you wish you could never hear again from your students.

#5: “Do we have to know this?” / “When will I ever use this?”


The sarcasm in a teacher’s response to this question has been shown to have a strong statistical correlation to the amount of caffeine consumed that morning.

#4: “How can I earn extra credit?”


Particularly relevant now, with final exams right around this corner, this question comes, invariably, from a student who has not turned in any work since Halloween. However, there is some inherent irony when that student asks to do additional work.

#3: “Can’t we just take notes?”


It is like something from a horror novel… The authentic experience you meticulously planned for hours, and likely spent a significant amount of your own money to produce, has been in front of the students for nearly 6 minutes when you see a hand shoot up in the back…

#2: “I missed [x] days, what did we do?” / I’m going to be gone, will I miss anything important?”


This was suggested independently by four different KABT members, so we imagine the struggle is pretty universal. The frustration caused by this question increases in an exponential relationship with the amount of time you spend updating your class website or learning management system. And like “Can I go to the bathroom?”, no one ever asks this question at the appropriate time… (which is never)

#1: “How much is this worth?” / “Is this for a grade?”


Attitudes and questions like this have given rise to grading schemes which place an undue emphasis on the quantity of work turned in, and have taken away from the noble practice of learning. Thanks for ruining it for everybody, Worst Question Ever.

We hope you have enjoyed “Top 5”, a monthly post on the BioBlog. If you have suggestions for a future “Top 5” email or tweet @ksbioteachers.

TBT: Cell Museum Project

Editor’s Note: This post originally appeared in December 2015 as part of the “In My Classroom” series. The author, Andrew Taylor, is an incredible Biology teacher currently at Olathe Northwest HS. This TBT selection was motivated by a recent Tweet/Periscope storm from Jessica Popescu (mentioned in this post) regarding #CellMuseum2016. Take it away, Andrew!

Students in my classroom recently completed a project based learning unit centered around the driving question ‘How can we, as museum exhibit designers, build a museum exhibit about a somatic cell type that will engage younger audiences?’ The question came about as a collaboration between myself, Jessica Popescu, who teaches one door down from me, and the staff at the Columbus Museum in Georgia, most specifically Rebecca Bush, the curator of history. The project consisted of students working in teams of three to four. The teams first divided themselves up into specific roles and selected a somatic cell type to research and display. The potential roles each had real-world parallels in the museum industry. Role options consisted of a marketing director, a manipulative designer, an application letter writer, and a presentation specialist. The Columbus Museum emphasized how these roles relate to their real world job responsibilities in a video displayed to the students early in the project. The video also included several example exhibits within the museum and information as to what type of items the staff looks for in a museum exhibit.

The reason we decided to have students design their exhibits around a specific cell type, as opposed to just ‘animal’ or ‘plant’ cell was to help students understand the interactions, and the importance of those interactions, between different cells in a multi-cellular organism. Students had baseline knowledge of cell organelles, the cell membrane, and cellular transport when the project began.

Exhibit 1

The project as a whole was very successful. Students created a variety of excellent products, a few of which are pictured in this post. Additionally, students took pride in displaying their exhibits to students from a variety of different classrooms. Also present for presentations and ‘museum walks’ were teachers from throughout the school and various members of the administration. The students will also receive feedback on their final products from the staff at the Columbus Museum. One of the most significant signs of success to me was the way in which students generated questions throughout their research. A moment that sticks out to me occurred as a student attempting to do the bare minimum and simply draw a picture of a red blood cell (their cell type), asked the question, ‘Why don’t red blood cells have many organelles?’ This question led him down a path of discovery that led to another question, ‘If red blood cells don’t have a nucleus, then they probably don’t have DNA which is needed to make protein, so how in the world do they have a large supply of the protein ‘hemoglobin?’ Another example of a questioning attitude is drawn from Mrs. Popescu’s classroom. A group of students researching cone cells asked another group why they decided to color their cone cell model yellow when humans only have cone cells for red, blue, and green.


In completing the project next year, I will strive to offer students more opportunities at receiving feedback before the final displays are done. Also, while the student generated questions were awesome, my goal is to structure the project so that more students begin asking these types of thought provoking questions.

Happy holidays everyone, and now I’ll throw it back over to Brittany Roper for the first post of the new year.

Establishing an Experimental Procedure to Guide the Home Investigation

Moving from the bulk reaction of yeast (catalase) and H2O2 to a procedure that can produce reasonable, reliable and precise data without just telling them,  “This is the technique that we will use”, can be tricky.  But it is a discussion full of quantitative considerations if that procedure is going to generate quantitative data that can support a claim.

At the end of the day, my overall goal is that every student will have an understanding and experience with a  defined protocol in their individual lab notebooks that can serve as their reference when they go home and collect their data.   I could be really helpful and just give them a well-structured set of laboratory instructions which would assure that most of the students who follow directions closely will succeed in getting the expected results.  Ensuring that my lab worked.  Of course, I’d have to hope that they would somehow, subconsciously pick up on the kind of thought that had to go into the organization of the tables, the presentation of the graphs, the preparation of the materials, etc.  My students never seemed to pick up that kind of thing, though by just following instructions.  That insight seems to come with wrestling with the challenges.  And since their thinking skills are more of a priority to me, I quit providing lab instructions very early in my career.  It is a lot more messy and you’ll be amazed at how many ways a student can go down the wrong path but I found that trusting the students to figure things out, works–they get better and better at it which makes the class more fun for me and for them.  For me, this lab fell pretty early in the year and for that reason it was a bit messier than it might have been had we worked on it later in the year.  It is important to note that I don’t just “turn the students loose” to go design whatever they can conjure up.  That is a recipe for disaster in so many ways but most importantly it typically leads to all sorts of negative student experiences.  The goal is to keep the challenges in front of the students finally tuned to their developing skills–to keep the students, as best we can, in the “zone”or perhaps better defined as:  Mihály Csíkszentmihályi’s FLOW.

Some of my learning goal targets that I keep in mind to guide my questions during discussing include:  1.  Introducing the floating disk technique but making sure the students understand how it is working.  2.  How do we explore variables systematically.  (serial dilutions) 3 What is this replicability thing?,  4. Emphasizing the importance of exploratory work to help establish data that can inform design. 5. How big of sample do we need? What factors into determining sample size?  6. Identify and contrast systematic and random error.

With these thoughts guiding my questions we launch into a discussion about the mess I created earlier.

With practice over the years it is easy to have barrage of questions ready to go.   Typically, I reframe/choose my next question based on student responses.  In that way, we are all following along on the same reasoning path–or at least as much as 20+ individual agents can follow the same path.

What did we mix to create the mess?  What did we get out?  How is this related to the models we explored?  How could we quantify what is going on?   What are we going to try and figure out?  What can we control?  What do we need to know?  What should we measure? How should we systematically measure it?   How can we be sure to all generate data/information that can inform our exploration?  How can I capture the products produced?  How do I measure the products over time?  What could/should I use for controls?  What should we quantify if we want to make a claim?  This last question can be particularly productive if out goal is to collaboratively develop an experimental protocol.  I never know exactly where we will go but with the guiding questions in my mind and with practice on my part it doesn’t usually take too long before we get to a starting/exploratory protocol that we can test in class.

At some key point in the discussion (you’ll know when) I demonstrate the floating disk technique itself along with some qualifying statements/comments like:  “Let’s reduce the amount of yeast/catalase but try and keep it constant.  One way might be to collect a sample on a piece of filter paper like this.”  You can guess the next line:  “Now let’s see if this will generate some bubbles that we can count or observe.”  At that point when we drop the disk in the H2O2 it sparks questions in their my minds when the disk floats.  Of course this prompts me to ask more questions.  These questions are now more specific to developing the protocol:  What do you think would happen if we dropped the yeast disk into plain water? (control) What would happen if we dropped a paper disk without yeast into H2O2 ?  (control) If I dropped another disk into the H2O2 will it take the same amount of time to rise?  If not, how could I capture the variation? Why is the disk rising?  How many disks can I drop in the H2O2 before it affects the time to rise?  (why I used the well plate and a single disk).  At this point I may take time to have them time a number of disks dropped into the same substrate dilution to get some preliminary data to work with.
If I keep the yeast concentration constant how can I systematically vary the H2O2 solution?  This was my main objective in the past because I used the lab to introduce serial dilutions and how to make them–skills that came in handy later when we did our microbiology labs.  At this point we could work through a serial dilution without a formal measurement device.  Since, my goal was to do most of the lab work at home, we adapted by doing our dilutions with a “measure”–which was a sharpie mark a little less than half-way up one of the plastic cups.  1 measure of water and 1 measure of 3% H2O2  would equal a 1.5% solution of H2O2  and a 50% dilution.  That solution could then serve to produce the 25% dilution and so on.  If this isn’t clear, let me know and I can put up a small video of the process if that will help.
And a question that I would ask today but didn’t in the past:  Is the time to rise the same as the rate of rise?  How can I convert time to a rate?  Today, I’d consider this one of my primary objectives for this lab.  Like I said earlier my primary goals in the past were to get the students comfortable with serial dilutions, experimental design and data presentation.  But from a standpoint of content and lab integration, I think I’d focus more on the properties of enzymes now.  Explicitly exploring rate of reaction is a key quantitative question to work on because it challenges a common quantitative misconception (confusing rates and quantities) and it also creates a situation where we can address the data in a form that is similar to standard laboratory work with enzyme kinetics.
Other questions come from students as we work on a protocol—questions about how to drop the disk, how do I keep the yeast constant?  do I have to stir?  when to time the float?, how deep should the solution be?
And:  How many disks should I drop to be confident that I have measured the rate of rise?  In the past, I had my students collect data on 10 disks of yeast per substrate concentration because I used this lab to introduce box plots.  The choice was somewhat arbitrary but you need a sample of 10 or more if the box plot is going to provide relevant information.  For example, a sample size of 4, split into 4 quartiles isn’t going to tell me much.  In today’s AP Bio world I might use this lab as an opportunity to explore another way to estimate an appropriate sample size–using standard error.  Here’s how that works.
Pre-Determining Sample Size:
I’m pretty upset with myself that I didn’t teach this in the first half of my career for many reasons but the most important is that I think students need to make that link that helps them to realize that quantitative methods provide strong support for their claims.  One question I never got around to helping my high schoolers figure out was how to justify their sample size.  I kind of let it slide with general statements like:  “Well, three is not enough.”  “Let’s do at least 10.”  and so on.  Here’s how the discussion would go today.
First, during the exploratory work we’d collect some data from an “unknown” substrate solution and an unknown yeast solution.  Here’s the data.

Looks pretty consistent but there is almost 2 seconds difference in the time to rise between the slowest and the fastest disk.  Let’s see what happens if we dilute the substrate by 50% but keep the yeast concentration on the disks the same.

Now, that is interesting.  The time to rise in the diluted substrate definitely seems to take longer.  Just eye-balling it it looks like a difference of about 6 seconds–more than 50% longer.  Still there seems to be about 2 seconds of variability in the diluted substrate results as well.   How can we capture all this in a couple of numbers?
Descriptive stats to the rescue.
The means can help us by using a single number to represent all of the data collected under one condition and the standard deviation (of the sample) can help us describe the amount of variation in the sample.

For many, this would be enough to consider.  The differences between these two samples of 8 is more than a standard deviation–in fact more than 3 standard deviations.  They are really quite different results.  A sample size of 8 seems to an easy sample to collect but what if we wanted to collect smaller samples because our fingers cramp up working the stop watch so many times?  Could we use a smaller sample size and still collect data that will support our claims that these are different?  Let’s see how we might determine that.
First let’s agree on a level of precision that we think we will need.  To do that let’s take a look at the differences in the means.  The difference is almost 6 seconds.  Now, each time I do this experiment under the same conditions I will likely get slightly different means.  How confident am I that my sample mean is close to the actual population mean?  Means are a point estimate but I want to put an interval estimate around that point.  Let’s say that if I can establish an interval of the mean plus or minus 0.5 seconds then I’ll feel pretty confident that my experiment has captured the true population.  How about 95% confident?   To be about 95% confident in our point estimate of the mean in seconds with an interval estimate of plus or minus 0.5 seconds we need to work with the standard error of the mean (SEM).  Bear with me while I do the algebra and violate my principle of being less helpful.  😉
Remember that the formula for SEM is:

I’ve used the approximately equal to because we can only estimate with the standard deviation of the sample.  The actual SEM would require the true population standard deviation.  Our exploratory data has provided us with an estimate of the standard deviation.  With this equation we can solve for n to try and figure a different size of a sample size—a smaller one that could still provide us with confidence.
You may also remember that 2 x SEM is approximately equal to a 95% CI.

Let’s combine these two equations and since, earlier we decided that plus or minus 0.5 seconds was probably enough precision we can just substitute that for the 95% CI.

Substitue 0.66 for the stdev.s that is estimated from our exploratory data:
Divide both sides by 2.

Multiply both sides by the square root of n.

Divide both sides by 0.25 seconds.

We are getting close, now.  Square both sides and you end up with the sample size you’ll need to assure that you have a 95% confidence interval that is plus or minus 0.5 seconds around the mean of your sample.

Ah, finally.  Looks like a sample size of 7 will assure that the 95% CI will fit between plus or minus 0.5 seconds around the mean.  Of course if we wanted a 99% CI we could use 3 x SEM in the work.  Or we could define a more precise CI interval of say 0.25 seconds around the mean.   It is up to you.  But with this type of work, you can make a strong argument as to why you chose the sample size you chose.
Their lab notebooks, at this point will have drawings and instructions in their own words on how to do a serial dilution, sample data, procedures, and background information (and perhaps some model data).   I’ll send them home with my question to work first with the intent of them repeating the homework at home on a different question, later the next week after they have worked to develop their skills. The question I ask them to investigate is:  How is the rate of the enzyme reaction affected by the concentration of the substrate?  They can work in groups, with their family, or by themselves but I want everyone to have a lab notebook entry of the methods, the questions, the design and the data they have collected along with graphs of the data.  I’m not explicit about what that should look like at this point.  I don’t want to be too helpful.  I actually want mistakes so we can address them.  If I’m too helpful at this point and tell them to make a scatterplot of just the means of the time to rise versus the substrate concentration then many will be will not know how to work in a novel situation in the future.
The mistakes that will no doubt appear provide an important starting point for the discussion on analysis.  That will have to wait for the next installment….

Teaching Quantitative Skills in a Lab Context: Getting Started in the Classroom

Some background on my teaching approach (which you may not agree with):

A few years ago a young math teacher, Dan Meyers had several videos that went viral about math instruction.  Be sure to google his work but also check out the critique of his work.  Part of the his message was that we (curricula, teachers, books, etc.) are “too helpful” when we structure our lessons and instruction.  By that he meant that instead of giving students practice with formulating problems and working through unique solutions we have reduced math instruction to a series of “paint by number” steps to be memorized.  Meyers was not the first to make these claims and not the last.  For example another noted math educator, Phil Daro has a series of videos where the main idea is “against answer getting”.  In these videos he compares Japanese math instruction to U.S. instruction and notes that in Japan math instructors ask the question:  “How can I use this problem to teach this math concept?” vs in the U.S:. “How can I get my students to get the right answer to this problem?”  It’s not that the answers aren’t important but if correct answers are the main emphasis of instruction then becomes too easy for the entire system education to devolve into trivial answer getting.  The hard work of critical thinking, working through problems, getting comfortable with false starts, revision, metacognition and learning from mistakes–all qualities that education aspires to gets lost in the extreme focus on the end product.  Moreover, the answer getting approach contributes to students developing a fixed mindset about their own abilities that are very likely false.  Carol Dweck and Jo Boaler’s work in this area provides a number of ideas and approaches to help teachers avoid fixed mindsets and help move students along a learning progression that leads to effective problem solvers.  Part of Boaler’s work at successfully moving students from fixed to growth mindsets in math involves rich problems that have an easy, accessible entry point that opens a door to a very rich, open and challenging environment with many paths to explore.  The floating disk catalase assay fits this description to a “T” in my mind.
BTW,  even though I have participated in a number of curriculum development projects, standards writing and curriculum framework development, I personally seldom pay much explicit attention to standards, science practices frameworks, or objectives when I do my “planning”.  Nor do I ever develop formal learning objectives when I “prepare” lessons.  Like rubrics I tend to look at objectives and frameworks as too confining.  More importantly, I don’t think I have every taught “a lesson” that didn’t take the students beyond the typical standard or learning objective.  Since I kind of live and breath biology education, I don’t want to be boxed in, I want to explore what is possible.  I have a general idea of where we are trying to go in class but I don’t make it explicit.  I don’t want my students to think they have arrived at their destination (learning goal), rather I want them to value the journey and keep on keeping on the path.  I’m not advocating you do the same,  I’m only explaining why you won’t see any specific references here to specific learning goals or science practices.  What follows is a weird blend of what I have done in the classroom and how I would approach this material, today.  I’ve been out of the high school classroom for more than 10 years and I’ve got to say that all these new resources certainly make me wish I was back in the high school classroom.
With that bit of background as justification you’ll see that in the posts that follow I will be promoting being less helpful and trusting my students to be able to come up with reasonable problems and solutions to those problems.  To do this well, requires skill on the part of the teacher to guide student thinking through questions–Socratic questions.  Planning for the instruction requires explicitly thinking about the instructional goals and the types of questions and scenarios that can get us to those goals.  Like the student quantitative skills we are targeting our own skill in questioning will get better and better as we practice it and reflect on it.  By the way since we are talking about skills it is important to remember that skills are improved through practice and therefore our instruction should offer the chance to practice and revisit skills.
Getting Started:
I typically use labs to introduce material so that students have some level of experience with physical phenomena that can serve as a foundation for building conceptual knowledge.  But I’ve got to get their attention and hopefully spark their interest.  I’ve explored many different enzyme systems in the classroom.  For instance, in the “old days” my students did all kinds of things with salivary amylase and starch.  This system had the pedagogical hook of being known as the “spit lab”.  They loved to hate spitting into test tubes to collect their amylase.  High interest.  For catalase I call on their experience with Hydrogen peroxide since most of my students have a bottle back at home and most are familiar with it.
Before going any further, I remind them that they will need to start recording any observations, questions (real important)  and thoughts in their lab notebook.  In the interest of being “less helpful” for more than 25 years I did not provide my students with lab write-ups or worksheets.  They had to organize their own investigations based on demo’s and discussions in class.  I made sure to make their lab notebook indispensable to their individual success in the class by making later assignments that required the information they should have entered into their lab notebooks–usually in the form of laboratory practicals as substitutes for final exams.
I bring out a bottle of Hydrogen peroxide and begin a discussion.
My part of the discussion involves questions and follow-up questions with these targets in mind:  1.  to stimulate interest.  2.  to recall why they use H2O2.  3. to realize that H2O2 breaks down on its own (by asking questions about old, “use-up” bottles in the medicine cabinet and why is the bottle brown?),  4. that bubbles are a sign that the H2O2 is working (killing “germs”).  (the connection to the bubbles needs to be corrected in a bit)

It is at this point I bring out a plastic cup about half full of a yeast solution.  (I almost always use plastic in my labs to minimize when we need goggles)  I mix up a package of bakers yeast in about 250 ml of water before class so that it well suspended.  I pour out about 1/2 cup of H2O2 and say “Let’s see if we can get some bubbles”

At this point I have them.  Because there are lots of bubbles….

Way more than they expect.

When it starts to overflow, that is when I pull out my best Chevy Chase imitation and start bumbling around trying to keep the mess at bay but it is too late.

They are hooked now, at least long enough to provide a quick bit of background information.  At this point we describe the decomposition reaction and quickly balance the equation.  And then, using questions again, start to probe what might be going on.  The target this time is that the idea that the reaction has been greatly speeded up.  Speed implies rate.  This is important.  This is quantitative thinking.  You have been doing similar discussions with your students but you may have not pointed out the quantitative aspect of this observation in the past, assuming that your students would readily see the quantitative aspects of this event.  I know that is exactly what I used to do but if we want to focus more on quantitative skills we have to bring them up to the top and not leave them below the surface, hoping the students will automatically figure it out.   Knowing what I do today, I wish I had made this emphasis more in the past.  Turns out, that one of the big quantitative errors that the public makes is mixing up quantities and rates.
At this point I also introduce the idea of a catalyst as something that increases the rate of a reaction—without being part of the reaction.  The definition is not exactly, spot-on but it is good enough to begin developing a conceptual model–which, again takes us into more quantitative thinking.
Modeling to develop a foundation:
When I was in the classroom this is where I’d start drawing representations of catalase and H2O2 on the whiteboard.and implying motion with lots of hand motion, all the while asking questions about the process.  Of course the purpose of this was to provide the start of a mental model for what was going on at the molecular level to help the students inform their experimental design.  Today I’d do things differently.  I’d use the computer based Molecular Workbench models that are available at  We would have already visited this site, previously so I wouldn’t need a do do much in the way of introducing the site itself.  This type of site, in my mind, is a game changer that makes the abstract world of molecular interactions more accessible and helps to reduce student mis-understandings creating more rigorous mental models.  A very important aspect of these models is the randomness incorporated into the models.  One of the most difficult ideas to get one’s head around is the idea of random motion and interactions leading to the order we see in living things.  Check out this paper to learn more about this teaching/learning challenge:  Garvin-Doxas, Kathy, and Michael W. Klymkowsky. “Understanding randomness and its impact on student learning: lessons learned from building the Biology Concept Inventory (BCI).” CBE-Life Sciences Education 7.2 (2008): 227-233.
These models are 2D agent-based computational models which means each structure in the image is an agent with properties and actions—that interact with the other agents.  The actions and interactions are based on kinetic theory and do not include quantum effects.  Here is the starting reaction representation.

Unlike the catalas/H2O2 decomposition reaction, this model represents a replacement reaction.  In this particular screen shot one of the green molecules has split and could be ready to combine with the purple molecule atoms if the purple molecules were split.  This may not look like a quantitative model but it is.  The reaction without the catalyst does happen but takes a long, long time.  Note that at the bottom there is a time measurement, there is a given, starting number of reactants and there is a measurement of reaction completion.  All quantitative parameters that students can “take data” on using the pause button and simply counting…..
Here below, two catalyst molecules have been added and in a very short time the reaction moving to completion.  Note that while the reaction is near completion the catalysts are unchanged.

Now, at this point I have to make a decision.  Do I have the students collect some data to help form their conceptual understanding or do I simply let their impressions of the model with just few trials guide their understanding.  Either way, it is important that I use a series of questions to guide the students to my targets:  1.  an understanding that the reaction is speeded up and hence rates are something we might want to measure,  2.  that the catalyst provides and “alternate pathway”, 3. that there is a limit to how fast the enzyme works, 4. that even when the reaction is “complete” things are still happening, and 5. that if we re-run the reaction, collecting data each time the results are slightly different but predictable.
You can play with the model right here:
Here’s the link to the model of catalysis where you can explore the model yourself or with your students:
But wait there’s more!
When I use any kind of model, now in the classroom, we have a discussion of the strengths and weakness of the model in play.  Usually, when I show the model above to teachers I get quite a few aha’s and general statements of approvals.  With that in mind what do you think are the strengths of this model?  More difficult for students, at least, is to come up with the weaknesses or limitations of the model.  Often they focus on trivial problems, like the atoms aren’t actually green and purple and miss others like this is a two-dimensional space.  They will no doubt have a difficult time with the idea of scaling time.  For the catalase system this model’s size scales are way out of wack.  What are some other “issues”?
In addition to a computational model a good strategy would be to have the students develop their own physical,  model of a catalyst speeding up a reaction.  Biology teachers have promoted the toothpickase lab as an enzyme lab over the years.
Googling toothpickase will bring up all sorts of prepared documents and images.  This model is a great one to work on and explore.  It will definitely help guide questions and experimental design to explore catalase but consider having the students come up with the model themselves with just a little prompting/demonstration from you.  Use questions to help them figure out the quantitative parameters, the idea of rates and how to structure and graph the results with the idea of supporting and communicating a scientific argument.  Try to avoid the temptation of providing a structured set of lab instructions and tables to fill out for toothpickase–in other words don’t be too helpful.  Every time we do that we are taking away a chance for the student to work on one of their quantitative skills.  One of the attractions of this model is that the students grasp what is modeled but instead of making it a center point of your lesson consider using it to support your exploration into an actual biological system–the catalase/hydrogen peroxide system.  Again,  help the students discover weaknesses and strengths of the model.
Experience with a physical model or the computational model should provide enough background that perhaps you can lead your students to develop a different kind of model—a symbolic model like this:

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Paula Donham has this same model in her write-up.  This particular model is the basis for further analysis later on in this lab.  So consider trying to get to a model like this before moving forward.
Remember,  I said that I would suggest more quantitative things to emphasize than anyone would want to actually use in the classroom so pick and choose.  There’s a couple of other models we could explore which I will explore towards the end of the lab but for now, the students should be ready to start designing how they are going to collect data that will help them understand the catalase/hydrogen peroxide system–which is the next thing I’ll talk about.