Plotting Error Bars in Google Sheets?…..on a scatter plot????

Robbyn Tuinstra, tri-athlete and AP Bio teacher extraordinaire recently had a question about putting error bars on scatter plot data in Google Sheets.  Several of us weighed in—a couple of us suggested it wasn’t possible, a couple of others pointed to a video where custom error bars were placed on a bar graph.  I mentioned that I had tried before to do this but gave up since I use other tools like Excel, Plotly and various stat programs.  Still this issue festered for a while and I finally had to try and attack it again.  I was partially successful.   I’ll describe what I have discovered but this also provides an opportunity to revisit suggested quantitative goals that the community might want to work towards.

First the type of experiment/data appropriate to this question.  Last year I produced a series of posts that featured a lengthy coverage of the types of data analysis and model application one might want to consider when doing a very simple lab–the yeast catalase floating disk lab.  You can find these posts on the Kansas Association of Biology Teachers Bioblog:

I didn’t use google sheets in these posts but I will here.  Here is a data table of results that has already been transposed from disk rise time to rate of disk rise in floats per second.

This data table is typical of how we might record this types of data.  In the original postings I talked about how to plot this data and to do a curve fit.  Here’s one way to plot this data (in excel) using approx. 95% error bars (2 x SEM).

I think this is the type of data and plot that Robbyn was talking about.   The model for enzyme kinetics is known as the Michaelis-Menten equation and it can be used to fit the data.  I’m not sure we want to get into that in the AP Bio classroom but perhaps we do.  Nevertheless, I think we definitely should consider having students at least generate the graph.  The error bars are nice but I think when it comes to developing student argumentation from evidence that simply plotting all the data points along with the means is sufficient.  A plot that looks like this in Excel:

How do we do this in Google Sheets?

One of the first things to do to make this easier to plot is to change the data table into something like this:

Note that there is a column for the data points and a separate column for the means. This allows us to plot two dependent variable series on the graph.  We’ll use this strategy later.  Note that I have also added a 2 % substrate concentration and a 0% substrate concentration but I have left the rise time blank for these.  These x variables extend the range of of the x axis when we plot. 

Select these columns, choose Insert Graph and change to a scatter plot you end up with a plot that looks like this:

Here I’ve changed the size and color of the individual data points.

I won’t go into modifying your lablels, axis titles and titles.

Personally, I think this is more than adequate evidence to make the argument about the shape of this curve but I imagine in my classes we’d go for a non-linear curve fit (to help them justify the upper end math classes they are taking)

But perhaps, like Robbyn you want to include error bars instead of the data points for each substrate concentration.  This really doesn’t seem to be possible with simple menu options in Google Sheets.  (obviously, if you want to get into programming, it would be possible).  I did however find this work around.

First let’s change the data table again. Lets add a new column that has a calculated 2 x standard error of the means.  And another new column that includes values for [mean + (2 x SEM)] and [mean – (2 x SEM).]  Now the table looks like this:

Highlight the entire table, insert a chart BUT here is the thing.  If you highlight the data and let Google sheets determine the graph type it will pick Line Graph.  Let it this time.  That is key to what we need to draw the error bars.  You get something like this:

We have too many variables plotted.  We don’t need the individual data points now so we’ll get rid of those.  We will also turn off the plotting of the SEM (but not the plus or minus SEM).  Finally, select, use column A for labels (assuming you’ve put your substrate concentrations in column A.

Once that is done, we should be down to something that looks like this.  One variable plotted is the means and along with a line that connect plus 2 x SEM to minus 2 x SEM….

There you have it—a work around that works because by default Google sheets treats the blank cells in the plus or minus columns as null data–not zeros.  


You can turn off that feature and the graph will look like this:

Obviously not what we want.  

Summary Post for Teaching Quantitative Skills

Part 1: Teaching Quantitative Skills using the Floating Disk Catalase Lab: Intro
Part 2- Teaching Quantitative Skills in a Lab Context: Getting Started in the Classroom
Part 3- Establishing an Experimental Procedure to Guide the Home Investigation
Part 4- Teaching Quantitative Skills: Data Analysis
Part 5- Curve Fitting AKA Model Fitting–the End Goal
Part 6- The Final Installment: Extending and Evaluating Quantitative Skills.
previous arrow
next arrow

These are links to the posts on Teaching Quantitative Skills with the Floating Disk Enzyme Lab


The Final Installment: Extending and Evaluating Quantitative Skills.

A note:  You might want to scroll down, directly to Applying the NetLogo model to avoid my long winded setup and context)   

Getting Stuck in a Rut:

I grew up about 1 mile from the Santa Fe Trail which cuts diagonally across Kansas on its way from Independence, Mo. to Santa Fe, New Mexico.  And I have lived most of my adult life close to the trail.  Not everyone is familiar with this historical trail so here’s a quote from the Santa Fe Trail Association’s website that might put things into context:   “In 1821, the Santa Fe Trail became America’s first great international commercial highway, and for nearly sixty years thereafter was one of the nation’s great routes of adventure and western expansion. ”  For folks growing up on the plains, the trails are kind of a big deal.  For instance, along U.S. highway 400/50 in western Kansas you can pull over, park and walk in the ruts of the trail that still exist.  Here’s a Google Earth screen shot of the ruts trending to the SW. I have put a white polygon around the ruts.  Amazing, isn’t it?


More than 150 years have not erased these ruts.  How many wagons, people and livestock must have walked in these ruts, all with the same goal.  “Stuck in a rut” takes on additional meaning when you realize where the phrase comes from.  As you can see from this image as each of the ruts became “impassable” for the wagons they would start a new path parallel to it–still heading in the same direction with a focused goal.  Obviously, this highway opened up surrounding areas to Europeans but only if they got out of the ruts.  And just as obviously, this trail helped to set things in motion that eventually led to tragedy for the Native Americans.  That is another discussion.    But why bring up ruts on the Santa Fe trail as I finish out a series of posts about leveraging the yeast catalase floating disk lab to introduce and reinforce a plethora of quantitative skills to biology students?

Well, the short answer is that I think we, the teacher community, are particularly at risk of getting “stuck in a rut.”  Like the folks on the Santa Fe trail we are often looking for direct, point to point solutions for many of the challenges that surface in a classroom of students who all have different skills and backgrounds.  Take for example, “The Scientific Method”.  Here, was a simplification designed originally by Paul Brandwein to make science as a verb more accessible to both teachers and students.  Of course it was a simplification and of course, if Paul were still here, he’d be appalled at how one-dimensional this model has become.  We do that in science education–we make ruts—deep ruts.  Another example, that strikes close to home is the former AP Biology Lab manual–a series of labs that became known as the “Dirty Dozen” that folks felt they had to follow to the letter while almost always neglecting or ignoring the suggestions at the end of each laboratory for further, deeper investigations–another deep rut.

As many of you know, I’ve spent the last 9 years helping to prepare math and science teachers in the UKanTeach program.  In this program we introduce the students to the 5E lesson plan model to help them prepare effective (and efficient) lessons that are steeped in inquiry.  The design works fairly well and really serves as a great scaffold to build an effective lesson or series of lessons around.  Those of you familiar with the model may recognized that one could deconstruct these series of posts down into the 5E’s.  Engage, Explore, Explain, Extend, and Evaluate.  But, to avoid our basic nature of creating a rut to fall into, I’ve purposely left out any explicit designation, notation or label consistent with the 5E’s.  Part of that is because, I think you can see that some folk’s Extension activity might be another’s Evaluation activity.  It depends on the context in my opinion.  Perhaps more importantly, is that I don’t want to suggest that teaching quantitative skills is a linear process.  Following linear paths, creates ruts.  Instead, I hope I presented a multitude of paths or at least suggested that this lab is a very rich resource that opens all sorts of options you might consider.  It is up to you, the trail boss to decide how you plan to guide your class over the quantitative skills landscape and hopefully, you’ll find it rewarding to the point of taking quantitative skill instruction beyond what I’ve suggested here.

With that said, I am going to present material here that might fit more appropriately in the Extend or Evaluate phase of a 5-E lesson.  I see a couple of paths forward.  One takes the quantitative and content level skills learned in this exploration and applies them in an another laboratory investigation and the other takes those same skills but applies them in model-based environment.  Doubtlessly there are many other paths forward for you to discover but let’s focus on these for now.

A model environment that probes deeper into thinking about enzyme reaction kinetics:

But first some more history/ reminiscing.

In the 1980’s when personal computers first arrived on the educational scene one of the first applications were programs that provided simulations of biological phenomena.  I even wrote one that students could use to generate inheritance data with simulated fruit fly crosses.   I was pretty proud of it to the point that I actually marketed it for awhile.  Students had to choose their parent flies with unknown genotypes from primitive graphic images that provided phenotype information.  Once a cross was chosen, then the program would randomly according to the inheritance pattern generate about 48 fly images that represented the phenotypes possible.  The student had to infer genotypes from phenotypes.  However, when I wrote this program I created an option where the student could pick and choose the inheritance pattern to investigate.  So the program  only simulated data to confirm a given inheritance pattern. The data was realistic since it used a random function to generate gametes but it could have promoted more inquiry and scientific thinking.  I found this out when I cam across the Genetics Construction Kit (GCK) a piece of software written by John Calley and John Jungck.  This program lacked the graphics that mine had but it promoted inquiry much, much better.  Students didn’t start by choosing an inheritance patter.  Instead, they received a sample “vial” of flies with a number of different traits expressed.  They had to choose different flies, different traits and such and then design crosses, form hypotheses, look for data to support those hypotheses and go to work.  It was a revelation.  Even better to my way of thinking it “modeled” almost every type of inheritance you could study in those days.  Even more better—the program didn’t tell the student if they were right or wrong.  The student (and the teacher) had to look at the various crossing data to determine if the data supported their hypothesis.  This was an excellent educational tool to teach genetics.  If you and your students could meet the challenge I guarantee that the learning was deep.  (Collins, Angelo, and James H. Stewart. “The knowledge structure of Mendelian genetics.” The American Biology Teacher 51.3 (1989): 143-149.)   If you don’t have access to JSTOR you can find another paper by Angelo Collins on the GCK here:  Collins, Angelo. “Problem-Solving Rules for Genetics.” (1986).  I promoted the GCK heavily throughout the late 80’s and 90’s.  I still think it is an inspired piece of software.  The problem was that little bit about not providing answers.  Few of my teaching colleagues were comfortable with that.  I’d round up computers or a computer lab for a presentation or professional development.  Everything would be going smoothly.  There would be lots of ooh’s and aw’s as we worked through the introductory level and then everything would so south when the teachers would find out that even they couldn’t find out what the “real” answer was.  Over and over, I’d ask; “Who tells a research scientist when they are right?” but it wasn’t enough.  Teachers, then were not as comfortable without having the “real” answer in their back pocket.  I think that has changed now at least to some degree.

The software world has changed as well.  GCK went on to inspire Bioquest.  From their website:  “BioQUEST software modules and five other modules associated with existing computer software, all based on a unified underlying educational philosophy. This philosophy became known, in short, as BioQUEST’s 3P’s of investigative biology – problem posing, problem solving, and persuasion.”  Another early “best practice” educational application of computer technology was the software LOGO from Seymour Papert’s lab.  LOGO was agent based programming specifically targeted to students early in their educational trajectory.  My own children learned to program their turtles.  As the web developed and other software environments developed LOGO was adapted into NETLOGO.

Netlogo (Wilensky, U. (1999). NetLogo. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.) is a modeling environment that is greatly underutilized in the educational community.  Netlogo is an agent based programming language.  There is a desktop version and there is a web browser version.  Agent based models can provide very interesting simulations models of real world. Agent based programming assigns various properties to individual “agents” along with a set of rules for how this agent interacts with other agents or with the “environment”.  No doubt you and your students will gain the most learning if you could explore coding and building your own model but unless we do this multiple times during the year, the time requirement to build these skills is usually prohibitive.  You don’t have to be a coder, though to utilize these tools.  Luckily the Netlogo community has already put together a model on Enzyme Kinetics that can extend your student’s understanding of enzymes.  (Stieff, M. and Wilensky, U. (2001). NetLogo Enzyme Kinetics model. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.)

But it is not always clear how to take advantage of NetLogo simulations as a teaching resource.  Typically students tend load up one of the simulations, click on a few parameters, get some kind of graph that they don’t understand and then shut down—the program and their brain.  Models environments like this require systematic exploration much like a laboratory investigation.  And, just as you try to engage your student in lab based inquiry where they are thinking for themselves and not following a bunch of cookbook like instructions, you need to find ways to encourage your students to engage deeply with the model.  Parameters need to be changed in response to questions—not just to see what happens.  Seeing what happens with random changes can lead to insights but generally it is better to have a plan–to explore the model systematically.  This type if involvement or engagement by the student requires a bit of student motivation.  There are many sources of student motivation and Mihály Csíkszentmihályi‘s FLOW theory applied to education is a good place to start.  In FLOW theory the idea is to seek a good balance between the challenge before you and skills you bring to the table.  One of the important lessons I’ve learned over the years is that using model simulations requires every bit of set-up and preparation as a “normal” wet lab does.  And, more to the point for this example is that working with the actual physical/biological phenomenon ahead of time helps to create enough first-hand knowledge, beginning skills and such that working with models becomes more accessible to students.  Students aren’t used to working with software like this and it takes a bit of preparation to get them to explore the model in a productive way.  In this example of the floating disk catalase lab the students by this time will have explored models to develop a beginning conceptual understanding of enzyme reactions, designed and carried out experiments, collected and analyzed data, and perhaps have fit their data to mathematical expressions.  Hopefully, they are developing deep understandings of enzyme action that now need to be tested, reflected upon, and revised.  While continued exploration in the laboratory can provide that environment of reflection and revision the time and space limitations of a typical classroom likely prohibits a robust exploration.  This is where a simulation like the NetLogo Enzyme Kinetics can play a vital role in student learning.  Here the student or student teams can explore and manipulate all sorts of variables in a relatively short period of time.

Applying the NetLogo Model:

The NetLogo model is built on the same conceptual model for enzyme kinetics that we have explored before:

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Instead of generating an expression with Vmax and Km, though the agent based model assigns properties to the agents based on three “constants”.

Constant 1:  The rate of formation of the ES complex.

Constant 2:  The rate of dissociation of the ES complex back into E and S

Constant 3:  The rate of catalysis of the ES complex into E and P

You explore the model by changing these constants or changing the substrate concentration.  Changing the constants, changes the properties of the enzyme.

I’ve put together a short video of that introduces how one might work with this model to create data similar to the data from the original wet lab.  You can find it here:

Here’s a M-M curve that I generated by changing the values of the constants and then seeing how those properties determined Enzyme rates/velocities at differing substrate concentrations.

In this curve I collected, 8 samples for each substrate concentration.

Here’s the data, generated by the model.  Looks a lot like the wet-lab data, doesn’t it?

And here is a curve fit to the Michaelis-Menten equation.  Note that the data from the NetLogo model has to be fitted to the idealized curve.

Note that the data from the NetLogo model has to be fitted to the idealized curve.

The thing is that I could go back into the Netlogo model and explore questions like, what happens if I lower the constant that describes the rate of Enzyme-Substrate formation relative to the constant that describes the dissociation of that complex?  Several questions come to mind.

Of course you don’t have to explore Netlogo as an extension or evaluation activity. You could have your students explore this spreadsheet from the Bioquest Esteem project:

Michaelis-Menten Enzyme Kinetics

Or if you are really ambitious you could have your students develop their own spreadsheet model like the one described in this paper from Bruist in the Journal of ChemEd.

Bruist, M.F. (1998). Use of a Spreadsheet To Simulate Enzyme Kinetics. Journal of Chemical Education, 75(3), 372.

Or you could have your students explore the AP Biology Community’s own Jon Darkow Stella-based model for lactase enzyme activity:  This is an HTML5 version created in the dynamic modeling system known as Stella.  

Practice, Practice, Practice (Curve fitting in the Photosynthesis Floating Leaf Disk lab)

To master any skill takes lots of practice–something we don’t provide enough of in academic classes.  We do in the performance and fine art classes but not so much in academics.  The excuse as to why not usually gets back to the extreme time limitation we face in the biology classroom.  Still with the right selection of lab topics skill practice is not only possible but highly productive.  For instance in this case, it turns out that the procedure, the data created, the data analysis, the curve fitting (to the same mathematical model) are all skill that can be applied to the Floating Leaf Disk lab, if the students explore how the intensity of light affects the rate of photosynthesis.

In 2015, Camden Burton and I presented some sample data sets from the Floating Leaf Disk lab at NABT.  Later I shared those in a series of posts on the AP Biology Community forum where a lively discussion on data analysis ensued.  If you are a member of the forum you can find the discussion here. 

One of the more problematic data sets we shared was data from a photoresponse curve experiment that explore how light intensity affects the rate of photosynthesis.  Here’s a photo of how light intensity was varied by varying the height of the stack of petri dishes.



Here’s the raw data for this lab using the lap timer on a smart phone:


The first step working with this data is to convert the lap times into cumulative times along with generating the descriptive stats.


Because the how fast a disk rises with this technique is inversely proportional to the actual rate of photosynthesis we need to convert this time into a rate by taking the inverse or the reciprocal.  And since this turns out to be a small decimal number with the units of float/sec, I’ve modified it by multiplying by 1000 seconds to get a rate unit of float per 1000 seconds.  The descriptive stats are converted/transformed in the same way.  This process of data transformation is not emphasized enough at the high school level in my opinion.


Graphing the means in this data table along with plus or minus 2 SEM error bars creates a graph something like this:

Which in my mind is a curve waiting to be fitted.  If you google something like “Photosynthesis Irradiance Curve” you’ll find a number of resources applicable to this experiment and guess what?  You’ll find that folks have been using the Michaelis-Menten equation to model the curve fitting.

I’ll let you explore the resources but here is the fit based on the Michaelis-Menten equation.  There is a modification to the Michaelis-Menten expression that we have to do for this particular lab.  Since this procedure actually is measuring the accumulation of oxygen as a product and some of the oxygen is being consumed at the same time for cellular respiration, we are actually measuring the net rate of photosynthesis.  To account for the oxygen consumed in respiration we need to add an additional term to the Michaelis-Menten equation.

I’ve changed the variables but the form of the equation is the same.  In the curve fitting that I have done, I have manually changed the value of R and let the solver vary Pmax and KI.

The fit for this set of data is not as good as we got for the catalase lab but it is not bad.

Interestingly, you can get a “good” fit to an exponential function as well–maybe even a better fit.  But, that is part of model fitting.  There is many biological reasons to consider that Michelis-Menten provides a model for photosynthesis but I can’t think of one for an exponential fit.  There are many ways to continue to modify the Michaelis Menten application to Photosynthesis Irradiance curves and you can find several with a bit of google searching.

Here’s one fit I managed in excel using the same techniques that we used earlier.

Here is a Desmos version you or your students can play with.

I think it is time to wrap this series of long winded posts up.  I hope, if you’ve read this far, that you have found some ideas to try in your class and I hope that despite the deep dive that an idea of how an increased emphasis on quantitative skills can also lead to an increase understanding of the content–at least it does for me.  Finally, I hope you and your students have a good time exploring data analysis—it really does feel good when the data works out like you think it should. 😉


Curve Fitting AKA Model Fitting–the End Goal

Curve Fitting AKA Model Fitting:
When I started this series of posts my goal was to see if I could generate precise data with a proven classroom lab.  The data precision that is possible with the yeast catalase lab provides a unique opportunity where data analysis skills can be productively explored, practiced and understood.  My contention was that this is the ideal lab to focus not just on content, not just on experimental design, but also to introduce relatively sophisticated data analysis.  To be up front about it, I had only a hint of how rich this lab is for doing just that.  Partly , this is because in my years of teaching high school biology I covered most of the enzyme content in class activities and with 3D visualizations, focusing on the shape of enzymes but neglecting enzyme kinetics.  That would be different if I were teaching today—I’d focus more on the quantitative aspects.  Why?  Well, it isn’t just to introduce the skills but it has more to do with how quantitative methods help to build a deeper understanding of the phenomena you are trying to study.  My claim is that your students will develop a deeper understanding of enzymes and how enzymes work in the grand scheme of things if they follow learning paths that are guided and supported by quantitative data.  This post is an example.
The last post focused on plotting the data points as rates, along with some indication of the variability in each measurement in a plot like this.
As I said before, I would certainly be happy if most of my students got to this point as long as they understood how this graph helps them to describe enzyme reactions and interpret others work.
But a graph like this begs to have a line of best fit–a curve that perhaps plots the relationship implied by our data points.
Something like this.

One of the early lessons on model building in my current Research Methods course involves taking data we have generated with a manipulative model (radioactive decay) to generate a predictive model.  The students plot their data points and then try to find the mathematical expression that will describe the process best.  Almost always, my students ask EXCEL to generate a line of best fit based on the data.  Sometimes they pick linear plots, sometimes exponential, sometimes log plots and sometime power plots.  These are all options in EXCEL to try and fit the data to some mathematical expression.  It should be obvious that the process of exponential decay is not best predicted with multiple types of expressions.  There should be one type of expression that most closely fits the actual physical phenomenon–a way of capturing what is actually going on.  Just picking a “treandline” based on how well it visually fits the current data without considering the actual phenomenon is a very common error or misconception.  You see, to pick or develop the best expression requires a deep understanding of the process being described.  In my half-life exercise, I have the students go back and consider the fundamental things or core principles that are going on.  Much like the process described by Jungck, Gaff and Weisstein:

“By linking mathematical manipulative models in a four-step process—1) use of physical manipulatives, 2) interactive exploration of computer simulations, 3) derivation of mathematical relationships from core principles, and 4) analysis of real data sets…”
Jungck, John R., Holly Gaff, and Anton E. Weisstein. “Mathematical manipulative models: In defense of “Beanbag Biology”.” CBE-Life Sciences Education 9.3 (2010): 201-211.
The point is that we are really fitting curves or finding a curve of best fit–we are really trying to see how well our model will fit the real data.  And that is why fitting this model takes this lab to an entirely new level.   But how are you going to build this mathematical model?
Remember that we started with models that were more conceptual or manipulative.  And we introduced a symbolic model as well that captured the core principles of enzyme action:

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Now how do we derive a mathematical expression from this?  I’m not suggesting that you should necessarily unless you feel comfortable doing so but I’ll bet there are kids in your class that can given a bit of guidance.  You may not feel comfortable providing the guidance.  But in this day of “just ask Google” you can provide that guidance in the form of a video discussion from the Khan Academy designed to help students prepare for the MCAT.  Don’t let that scare you off.  Here are two links that take the symbolic model and derive a mathematical expression–not just any expression—the Michaelis-Menten equation for enzyme kinetics. You or your students will no doubt need to view these more than once but the math is not that deep—not if your students are exploring calculus or advanced algebra.  It is really more about making assumptions and how those assumptions simplify things so that with regular algebra you can generate the Michaelis-Menten equation.
You can also find a worked out derivation here:  in this text excerpt from Biochemistry, 5th ed. Berg JM, Tymoczko JL, Stryer L.
New York: W H Freeman; 2002.
Of course, you don’t even have to go through the derivation you could just provide the equation.

The important thing is that students understand where this equation comes from—it doesn’t come out of thin air and it is based on the same core principles they uncovered or experienced if they did the toothpickase manipulation–it is just quantified now.  So how do I use this equation to actually see how well my data “fits”?  If it were a linear expression that would be easy in Excel or any spreadsheet package but what about non-linear trend lines?  I can tell you that this expression is not part of the trend line package you’ll find in spreadsheets.
I’ve got to admit, I spent too many years thinking that generating best-fit curves from non-linear expressions like the M-M equation was beyond the abilities of me or my students.  But again “Ask Google” comes to the rescue.  If you google “using solver for non-linear curve fitting regression” you’ll end up with lots of videos and even some specific to the Michaelis-Menten equation.  It turns out EXCEL (and I understand Google Sheets) has an add-on called Solver that helps you find the best fit line.  But what does that mean?  Well it means that you need to manipulate the parameters in the M-M equation to generate a line until it mostly fits your data–to see if the model is an accurate description of what you measured.  What parameters are these?
Look at the equation:
V0 equals the rate of the reaction at differing substrate concentrations–the vertical axis in the plots above.
Vmax equals the point at which all of the enzyme is complexed with the substrate–the maximum rate of the reaction with this particular enzyme at this particular enzyme concentration (that is enzyme concentration not substrate)

Km equals the concentration of the substrate where the rate of reaction is 1/2 of Vmax

[S]  equals the substrate concentration, in this case the H2O2
Two of these parameters are variables—one is our experimental or explanatory variable, the concentration of H2O2 and the other is our response variable, the rate of the reaction. Some folks prefer independent and dependent variable. This is what we graph on our axis.
The other two parameters are constants and the help to define the curve. More importantly, these are constants for this particular enzyme at this particular enzyme concentration for this particular reaction. These constants will be for different enzymes, different concentrations or reactions with inhibitors, competitors, etc. In other words it is these constants that help us to define our enzyme properties and provide a quantitative way to compare enzymes and enzyme reactions. You can google up tables of these values on the web. from: Biochemistry, 5th ed. Berg JM, Tymoczko JL, Stryer L.
So calculating these constants is a big deal and one that is not typically a goal in introductory biology but if you’ve come this far then why not?
This is where generating that line that best-fits the data based on the Michaelis-Menten equation comes in.
You can do this manually with some help from Solver in Excel.  (Google Sheets also is supposed to have a solver available but I haven’t tried it.
I have put together a short video on how to do this in Excel based on the data I generated for this lab.

I’ve also taken advantage of a web based math application DESMOS which is kind of a graphing calculator on the web.  While I can create sliders to manipulate the constants in the equation, Km and Vmax  to make a dynamic spreadsheet model it is a lot easier in DESMOS and DESMOS lets me share or embed the interactive equation. Scroll down in the left hand column to get to the sliders that change the constants.

You can also just go to Desmos and play with it there

I had to use A and B and x1 in my equation as symbols.

It is not that difficult to use DESMOS and with my example your students who are familiar with it will be able to make their own model with their own data within DESMOS.  Move the sliders around—they represent the values for   Km and Vmax  in the equation.  Notice how they change the shape of the graph.  This really brings home the point of how these constants can be used to quantitatively describe the properties of an enzyme and helps to make sense of the tables one finds about enzyme activity.  Also, notice the residuals that are plotted in green along the “x-axis”.  These residuals are how we fit the curve.  Each green dot is the result of taking the difference between the a point on theoretical line with particular constants and variable values and the actual data point.  That difference is squared.  A fit that puts the green dots close to zero is a very good fit.  (BTW, this is the same thing we do in EXCEL with the Solver tool.)  Watch as you try to minimize the total residuals as you move the sliders.  The other thing that you get with DESMOS is that if you zoom out you’ll find that this expression is actually a hyperbolic tangent…and not an exponential.  How is that important?

Well, think back to the beginning of this post when I talked about how my students often just choose their mathematical model on what line seems to fit the data the best–not on an equation developed from first principles like the Michaelis-Menten.

Looking at a plot of the data in this experiment before the curve fitting one might have proposed that an exponential equation might have produced the best fit.  In fact, I tried that out just for kicks.
This is what I got.

Here’s a close-up:

Thinking about the actual experiment and the properties of enzymes there are two things really wrong with this fit although you’ll notice that the “line” seems to go through the data points better than the fit to the Michaelis-Menten equation.  1.  Notice that the model line doesn’t go through zero.   Hmmmm.  Wouldn’t a solution with no Hydrogen peroxide not react with the yeast?  That should be tested by the students as a control as part of the experimental design but I can tell you that the disk will not rise in plain water so the plot line needs to go through the origin.  I can force that which I have in this fit:

But the second issue with this fit is still there.  That is the point where the plot has reached it’s maximum rate.  If I had generated data at a 3% substrate concentration I can promise you the rate would have been higher than 0.21 where this plot levels off.  While the exponential model looks like a good fit on first inspection it doesn’t hold up to closer inspection.  Most importantly the fit is mostly coincidental and not base on an equation developed from first principles.  By fitting the data to the mathematical model your students complete the modeling cycle described on page T34 in the AP Biology Investigative Labs Manual, in the Bean Biology paper cited above, and on page 85 in the AP Biology Quantitative Skills Guide.
Give model fitting a try—perhaps a little bit a time and not all at once.  Consider trying it out for yourself with data your students have generated or consider it as a way of differentiating you instruction.  I’ll wrap this up with a model fitted with data from Bob Kuhn’s class that they generated just this month.  He posted the data on the AP Biology forum and I created the fit.

The key thing here is that his enzyme concentration (yeast concentration) was quite a bit diluted compared to the data that I’ve been sharing.  Note how that has changed the Michaelis-Menten curve and note how knowing the Km and Vmax provides a quantitative way to actually compare these results.   (Both constants for this graph are different than for mine)
Hopefully, this sparks some questions for you and your students and opens up new paths for exploring enzymes in the classroom.  I’ll wrap this up next week with how one might assess student learning with one more modeling example.

Teaching Quantitative Skills: Data Analysis

Managing labs has got to be one of the most difficult things we do as biology teachers.  There is so much to keep in mind: safety, time, cost, level appropriateness, course sequence, preparation time, and did I mention time?  It’s no wonder that we are tempted to make sure that the lab “works” and that the students will get good data.  When I first went off the deep end and starting treating my classes like a research lab–meaning almost every lab had an element of individual based inquiry, I’ve got to say I was just pretty content if I could get students to the point that they asked their own question, designed an effective experimental procedure and collected some reasonable data.  It took a lot of effort to get just that far and to honest, I didn’t put enough emphasis on good data analysis and scientific argumentation as much as I should have.  At least that is the 20-20 hind-sight version that I see now.  Of course, that’s what this series is all about—how to incorporate and develop data analysis skills in our classes.

Remember, this lab has a number of features that make it unique:  safe enough to do as homework (which saves time), low cost, and more possible content and quantitative skills to explore than anyone has time for.  For me, its like saddling up to an all you can eat dessert bar.  No doubt, I’m going to “overeat” but since this lab happens early and it is so unique, I think I can get away with asking the students to work out of their comfort zone.  1. because they skills will be used again for other labs and 2. because I need them to get comfortable with learning from mistakes along with the requisite revisions that come from addressing those mistakes.
Depending on how much time we had earlier to go over ideas for handling the data the data the students bring back from their “homework” is all over the map.  Their graphs of their dat are predictably all but useless to effectively convey a message.  But their data and their data presentations provide us a starting point, a beginning, where, as a class we can discuss, dissect, decide, and work out strategies on how to deal with data, how to find meaning in the data, and how to communicate that meaning with others.
In the past, the students would record their results and graph their work in their laboratory notebooks.  Later, I’d let them do their work in Excel or some other spreadsheet.  The data tables and graphs were all over the map.  Usually about the best the students would come up with looked something like this.
The data (although not usually, this precise) and usually not with the actual H2O2 concentrations:

Sometimes they would have a row of “average time” or mean time but I don’t think any student has ever had rows of standard deviation and for sure no one ever calculated standard error but getting them to this point is one of my goals at this point.  Of course, that is going to be one of my goals at this point.  As teachers we work so much with aggregated data (in the form of grades and average grades) that we often don’t consider that for many it doesn’t make any sense.  Turns out to be an important way of thinking that is missing more than we realize.  In fact in the book, Seven Pillars of Statistical Wisdom, Stephen M. Stigler devotes an entire chapter on aggregation and its importance in the history of mathematical statistics.  For most of my career, I was only vaguely familiar with this issue.  Now I’d be very careful to bring this out in discussion with a number of questions.  What does the mean provide for us that the individual data points do not?  Why does the data “move around” so much?
It doesn’t take much to make sure they calculate the mean for their data.
This brings up another point.  Not only do some folks fail to see the advantage of aggregating data some feel that the variation we see can be eliminated with more precise methods and measurement–that there is some true point that we are trying to determine.  The fact is the parameter we are trying to estimate or measure is the mean of the population distribution.  In other words there is a distribution that we are trying to determine and we will always be measuring that distribution of possibilities.  This idea was one of the big outcomes of the development of statistics in the early 1900’s and can be credited to Karl Pearson.  Today, in science, the measurement and such assume these distributions–even when measuring some physical constant like the acceleration of gravity.  That wasn’t the case in the 1800’s and many folks today think that we are measuring some precise point when we collect our data.  Again, I wasn’t smart enough to know this back when I started teaching this lab and honestly it is an idea that I assumed my students automatically assimilated but I was wrong.  Today, I’d take time to discuss this.
Which brings up yet another point about the “raw” data displayed in the table.  Take a look at disk 3, substrate concentration 0.75%.  Note that it is way off compared to the others.  Now this is a point to discuss.  The statement that it is “way off” implies a quantitative relationship.  How do I decide that?  What do I do about that point?  Do I keep it?  Do I ignore it?  Throw it away?  Turns out that I missed the stop button on the stop watch a couple of times when I was recording the data.  (Having a lab partner probably would have led to more precise times).  I think I can justify removing this piece of data but ethically, I would have to report that I did and provide the rationale.  Perhaps in an appendix.  Interestingly, a similar discussion with a particularly high-strung colleague resulted caused him so much aggravation that the discussion almost got physical.  He was passionate that you never, ever, ever discard data and he didn’t appreciate the nuances of reporting improperly collected data.  Might be a conversation for you’ll want to have in your class.
The best student graphs from this data would look like this.  I didn’t often get means but I liked it when I did.  But note that the horizontal axis is log scaled.  Students would often bring this type of graph to me.  Of course, 99% of the them didn’t know they had logged the horizontal axis, they were only plotting the concentrations of H2O2 equally spaced.  I would get them to think about the proper spacing by asking them if the difference between 50% and 25% was the same difference as between 6.25% and 3.125%.  That usually took care of things.  ( of course there were times, later in the semester that we explored log plots but not for this lab. )

Note also, that this hypothetical student added a “best fit” line.  Nice fit but does it fit the trend in the actual data?  Is there actually a curve?  This is where referring back to the models covered earlier can really pay off.  What kind of curve would you expect?  When we drop a disk in the H2O2 and time how long it rises are we measuring how long the reaction takes place or are we measuring a small part of the overall reaction?  At this point it would be good to consider what is going on.  The reaction is continuing long after the disk has risen as evidenced by all the bubbles that have accumulated in this image.   So what is the time of disk rise measuring?  Let’s return to that in a bit but for now let’s look at some more student work.

Often, I’d get something like this with the horizontal axis—the explanatory variable—the independent variable scaled in reverse order.  This happened a lot more when I started letting them used spreadsheets on the first go around.

Spreadsheet use without good guidance is usually a disaster.  After I started letting them use spreadsheets I ended up with stuff that looked like this:

or this

It was simply too easy to graph everything–just in case it was all important.  I’ve got to say this really caught be off guard the first time I saw it.  I actually thought the students were just being lazy, not calculating the means, not plotting means, etc.   But I think I was mostly wrong about that.  I now realize many of them actually thought this was better because everything is recorded.   I have this same problem today with my college students.  To address it I ask questions that try and get to what “message” are we trying to convey with our graph.  What is the simplest graphic that can convey the message?  What can enhance that message?  What is my target audience?
The best spreadsheet plots would usually looked something like this where they at least plotted means and kind of labeled the axis.  But they were almost always bar graphs.  Note the the bar graphs graph “categories” on the horizontal axis so they are equally spaced.  This is the point that I usually bring out to start a question about the appropriateness of different graph methods.  Eventually with questions we move to the idea of the scatter plot and bivariate plots.  BTW, this should be much easier over the next few years since working with bivariate data is a big emphasis in the Common Core math standards.

But my goal in the past was to get the students to consider more than just the means but also to somehow convey the variation in their data–without plotting every point as a bar.  To capture that variability, I would suggest they use a box plot–something we covered earlier in the semester with a drops on a penny lab.  I hoped to get something like this and I usually would, but it would be drawn by hand.

The nice thing about the box plot was that it captured the range and variability in the data and provided them with an opportunity to display that variation.  With a plot like this they could then argue, with authority that each of the dilutions take a different amount of time to rise.  With a plot like this you can plainly see that there is really little or no overlap of data between the treatments and you can also see a trend.  Something very important to the story we hope to tell with the graph.  My students really liked box plots for some reason.  I’m not really sure why but I’d get box plots for data they weren’t appropriate for.
Today, I’m not sure how much I’d promote box plots but instead probably use another technique I used to promote—ironically, based on what I discussed above—plot every point and the mean.  But do so in a way that provides a clear message of the mean and the variation along with the trend.  Here’s what that might look like.

It is a properly scaled scatterplot (bivariate plot) that demonstrates how the response variable (time to rise) varies according to the explanatory variable (H2O2  concentration).  Plotting is not as easy as the bar graph examples above but it might be worth it.  There are a number of ways to do this but one of the most straight forward is to change the data table itself to make it easier to plot your bivariate data.  I’ve done that here.  One column is the explanatory/independent variable, H2O2  concentration.  The other two columns record the response or dependent variable, the time for a disk to rise.  One of the other columns is the mean time to rise and the other is the time for the individual disk to rise.  BTW, this way of organizing your data table is one of the modifications you often need to do in order to enter your data into some statistical software packages.

With the data table like this you can highlight the data table and select scatter plot under your chart options.

At this point, I’d often throw a major curve ball towards my students with a question like, “What’s up with time being the dependent variable?”  Of course, much of their previous instruction on graphing, in an attempt to be too helpful suggested that time always goes on the x-axis.  Obviously, not so in this case but it does lead us to some other considerations in a bit.
For most years this is where we would stop with the data analysis.  We’ve got means, we’ve represented the variability in the data, we have a trend, we have quantitative information to support our scientific arguments.
But now, I want more.  I think we should always be moving the bar in our classes.  To that end, I’d be sure that the students included the descriptive statistic of the standard deviation of the sample along with the standard error of the mean and to use standard error to estimate a 95% confidence interval.   That would also entail a bit of discussion on how to interpret confidence intervals.  If I had already introduced SEM and used it earlier to help establish sample sizes then having the students calculate them here and apply them on their graphs would be a forgone conclusion.
But what my real goal, today would be to get to the point where we could compare our data and understanding about how enzymes work with the work done in the field–enzyme kinetics.  Let’s get back to that problem of what is going on with the rising disk—what is it that we are really measuring if the reaction between the catalase and the substrate continues until the substrate is consumed?  It should be obvious that for the higher levels of concentration we are not measuring how long the reaction takes place but we are measuring how fast the disk accumulates the oxygen product.  Thinking about the model it is not too difficult to generate questions that lead students to the idea of rate:  something per something.  It is really the rate of the reaction we are interested in and it varies over time.   What we are indirectly measuring with the disk rise is the initial rate of the enzyme/substrate reaction.  We can arrive at a rate by taking the inverse or reciprocal of the time to rise.  That would give us a float per second for a unit.  If we knew how much oxygen it takes to float a disk we could convert our data into oxygen produced per second.
So converting the data table would create this new table.

Graphing the means and the data points creates this graph.

Graphing the means with approximately 95% error bars creates this graph.

Woooooooweeeeee, that is so cool.  And it looks just like a Michelis-Menten plot.

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Creating this plot–as long as the students can follow the logic of how we get here opens up an entirely new area for investigation about enzymes and how they work.  Note that we now have some new parameters:  Vmax and Km that help to define this curve.  Hmmmm.  What is this curve and do my points fit it?  How well do the data points fit this curve.  Can this curve, these parameters help us to compare enzymes?  Here we return to the idea of a model–in this case a mathematical model which I’ll cover in the next installment.

Establishing an Experimental Procedure to Guide the Home Investigation

Moving from the bulk reaction of yeast (catalase) and H2O2 to a procedure that can produce reasonable, reliable and precise data without just telling them,  “This is the technique that we will use”, can be tricky.  But it is a discussion full of quantitative considerations if that procedure is going to generate quantitative data that can support a claim.

At the end of the day, my overall goal is that every student will have an understanding and experience with a  defined protocol in their individual lab notebooks that can serve as their reference when they go home and collect their data.   I could be really helpful and just give them a well-structured set of laboratory instructions which would assure that most of the students who follow directions closely will succeed in getting the expected results.  Ensuring that my lab worked.  Of course, I’d have to hope that they would somehow, subconsciously pick up on the kind of thought that had to go into the organization of the tables, the presentation of the graphs, the preparation of the materials, etc.  My students never seemed to pick up that kind of thing, though by just following instructions.  That insight seems to come with wrestling with the challenges.  And since their thinking skills are more of a priority to me, I quit providing lab instructions very early in my career.  It is a lot more messy and you’ll be amazed at how many ways a student can go down the wrong path but I found that trusting the students to figure things out, works–they get better and better at it which makes the class more fun for me and for them.  For me, this lab fell pretty early in the year and for that reason it was a bit messier than it might have been had we worked on it later in the year.  It is important to note that I don’t just “turn the students loose” to go design whatever they can conjure up.  That is a recipe for disaster in so many ways but most importantly it typically leads to all sorts of negative student experiences.  The goal is to keep the challenges in front of the students finally tuned to their developing skills–to keep the students, as best we can, in the “zone”or perhaps better defined as:  Mihály Csíkszentmihályi’s FLOW.

Some of my learning goal targets that I keep in mind to guide my questions during discussing include:  1.  Introducing the floating disk technique but making sure the students understand how it is working.  2.  How do we explore variables systematically.  (serial dilutions) 3 What is this replicability thing?,  4. Emphasizing the importance of exploratory work to help establish data that can inform design. 5. How big of sample do we need? What factors into determining sample size?  6. Identify and contrast systematic and random error.

With these thoughts guiding my questions we launch into a discussion about the mess I created earlier.

With practice over the years it is easy to have barrage of questions ready to go.   Typically, I reframe/choose my next question based on student responses.  In that way, we are all following along on the same reasoning path–or at least as much as 20+ individual agents can follow the same path.

What did we mix to create the mess?  What did we get out?  How is this related to the models we explored?  How could we quantify what is going on?   What are we going to try and figure out?  What can we control?  What do we need to know?  What should we measure? How should we systematically measure it?   How can we be sure to all generate data/information that can inform our exploration?  How can I capture the products produced?  How do I measure the products over time?  What could/should I use for controls?  What should we quantify if we want to make a claim?  This last question can be particularly productive if out goal is to collaboratively develop an experimental protocol.  I never know exactly where we will go but with the guiding questions in my mind and with practice on my part it doesn’t usually take too long before we get to a starting/exploratory protocol that we can test in class.

At some key point in the discussion (you’ll know when) I demonstrate the floating disk technique itself along with some qualifying statements/comments like:  “Let’s reduce the amount of yeast/catalase but try and keep it constant.  One way might be to collect a sample on a piece of filter paper like this.”  You can guess the next line:  “Now let’s see if this will generate some bubbles that we can count or observe.”  At that point when we drop the disk in the H2O2 it sparks questions in their my minds when the disk floats.  Of course this prompts me to ask more questions.  These questions are now more specific to developing the protocol:  What do you think would happen if we dropped the yeast disk into plain water? (control) What would happen if we dropped a paper disk without yeast into H2O2 ?  (control) If I dropped another disk into the H2O2 will it take the same amount of time to rise?  If not, how could I capture the variation? Why is the disk rising?  How many disks can I drop in the H2O2 before it affects the time to rise?  (why I used the well plate and a single disk).  At this point I may take time to have them time a number of disks dropped into the same substrate dilution to get some preliminary data to work with.
If I keep the yeast concentration constant how can I systematically vary the H2O2 solution?  This was my main objective in the past because I used the lab to introduce serial dilutions and how to make them–skills that came in handy later when we did our microbiology labs.  At this point we could work through a serial dilution without a formal measurement device.  Since, my goal was to do most of the lab work at home, we adapted by doing our dilutions with a “measure”–which was a sharpie mark a little less than half-way up one of the plastic cups.  1 measure of water and 1 measure of 3% H2O2  would equal a 1.5% solution of H2O2  and a 50% dilution.  That solution could then serve to produce the 25% dilution and so on.  If this isn’t clear, let me know and I can put up a small video of the process if that will help.
And a question that I would ask today but didn’t in the past:  Is the time to rise the same as the rate of rise?  How can I convert time to a rate?  Today, I’d consider this one of my primary objectives for this lab.  Like I said earlier my primary goals in the past were to get the students comfortable with serial dilutions, experimental design and data presentation.  But from a standpoint of content and lab integration, I think I’d focus more on the properties of enzymes now.  Explicitly exploring rate of reaction is a key quantitative question to work on because it challenges a common quantitative misconception (confusing rates and quantities) and it also creates a situation where we can address the data in a form that is similar to standard laboratory work with enzyme kinetics.
Other questions come from students as we work on a protocol—questions about how to drop the disk, how do I keep the yeast constant?  do I have to stir?  when to time the float?, how deep should the solution be?
And:  How many disks should I drop to be confident that I have measured the rate of rise?  In the past, I had my students collect data on 10 disks of yeast per substrate concentration because I used this lab to introduce box plots.  The choice was somewhat arbitrary but you need a sample of 10 or more if the box plot is going to provide relevant information.  For example, a sample size of 4, split into 4 quartiles isn’t going to tell me much.  In today’s AP Bio world I might use this lab as an opportunity to explore another way to estimate an appropriate sample size–using standard error.  Here’s how that works.
Pre-Determining Sample Size:
I’m pretty upset with myself that I didn’t teach this in the first half of my career for many reasons but the most important is that I think students need to make that link that helps them to realize that quantitative methods provide strong support for their claims.  One question I never got around to helping my high schoolers figure out was how to justify their sample size.  I kind of let it slide with general statements like:  “Well, three is not enough.”  “Let’s do at least 10.”  and so on.  Here’s how the discussion would go today.
First, during the exploratory work we’d collect some data from an “unknown” substrate solution and an unknown yeast solution.  Here’s the data.

Looks pretty consistent but there is almost 2 seconds difference in the time to rise between the slowest and the fastest disk.  Let’s see what happens if we dilute the substrate by 50% but keep the yeast concentration on the disks the same.

Now, that is interesting.  The time to rise in the diluted substrate definitely seems to take longer.  Just eye-balling it it looks like a difference of about 6 seconds–more than 50% longer.  Still there seems to be about 2 seconds of variability in the diluted substrate results as well.   How can we capture all this in a couple of numbers?
Descriptive stats to the rescue.
The means can help us by using a single number to represent all of the data collected under one condition and the standard deviation (of the sample) can help us describe the amount of variation in the sample.

For many, this would be enough to consider.  The differences between these two samples of 8 is more than a standard deviation–in fact more than 3 standard deviations.  They are really quite different results.  A sample size of 8 seems to an easy sample to collect but what if we wanted to collect smaller samples because our fingers cramp up working the stop watch so many times?  Could we use a smaller sample size and still collect data that will support our claims that these are different?  Let’s see how we might determine that.
First let’s agree on a level of precision that we think we will need.  To do that let’s take a look at the differences in the means.  The difference is almost 6 seconds.  Now, each time I do this experiment under the same conditions I will likely get slightly different means.  How confident am I that my sample mean is close to the actual population mean?  Means are a point estimate but I want to put an interval estimate around that point.  Let’s say that if I can establish an interval of the mean plus or minus 0.5 seconds then I’ll feel pretty confident that my experiment has captured the true population.  How about 95% confident?   To be about 95% confident in our point estimate of the mean in seconds with an interval estimate of plus or minus 0.5 seconds we need to work with the standard error of the mean (SEM).  Bear with me while I do the algebra and violate my principle of being less helpful.  😉
Remember that the formula for SEM is:

I’ve used the approximately equal to because we can only estimate with the standard deviation of the sample.  The actual SEM would require the true population standard deviation.  Our exploratory data has provided us with an estimate of the standard deviation.  With this equation we can solve for n to try and figure a different size of a sample size—a smaller one that could still provide us with confidence.
You may also remember that 2 x SEM is approximately equal to a 95% CI.

Let’s combine these two equations and since, earlier we decided that plus or minus 0.5 seconds was probably enough precision we can just substitute that for the 95% CI.

Substitue 0.66 for the stdev.s that is estimated from our exploratory data:
Divide both sides by 2.

Multiply both sides by the square root of n.

Divide both sides by 0.25 seconds.

We are getting close, now.  Square both sides and you end up with the sample size you’ll need to assure that you have a 95% confidence interval that is plus or minus 0.5 seconds around the mean of your sample.

Ah, finally.  Looks like a sample size of 7 will assure that the 95% CI will fit between plus or minus 0.5 seconds around the mean.  Of course if we wanted a 99% CI we could use 3 x SEM in the work.  Or we could define a more precise CI interval of say 0.25 seconds around the mean.   It is up to you.  But with this type of work, you can make a strong argument as to why you chose the sample size you chose.
Their lab notebooks, at this point will have drawings and instructions in their own words on how to do a serial dilution, sample data, procedures, and background information (and perhaps some model data).   I’ll send them home with my question to work first with the intent of them repeating the homework at home on a different question, later the next week after they have worked to develop their skills. The question I ask them to investigate is:  How is the rate of the enzyme reaction affected by the concentration of the substrate?  They can work in groups, with their family, or by themselves but I want everyone to have a lab notebook entry of the methods, the questions, the design and the data they have collected along with graphs of the data.  I’m not explicit about what that should look like at this point.  I don’t want to be too helpful.  I actually want mistakes so we can address them.  If I’m too helpful at this point and tell them to make a scatterplot of just the means of the time to rise versus the substrate concentration then many will be will not know how to work in a novel situation in the future.
The mistakes that will no doubt appear provide an important starting point for the discussion on analysis.  That will have to wait for the next installment….

Teaching Quantitative Skills in a Lab Context: Getting Started in the Classroom

Some background on my teaching approach (which you may not agree with):

A few years ago a young math teacher, Dan Meyers had several videos that went viral about math instruction.  Be sure to google his work but also check out the critique of his work.  Part of the his message was that we (curricula, teachers, books, etc.) are “too helpful” when we structure our lessons and instruction.  By that he meant that instead of giving students practice with formulating problems and working through unique solutions we have reduced math instruction to a series of “paint by number” steps to be memorized.  Meyers was not the first to make these claims and not the last.  For example another noted math educator, Phil Daro has a series of videos where the main idea is “against answer getting”.  In these videos he compares Japanese math instruction to U.S. instruction and notes that in Japan math instructors ask the question:  “How can I use this problem to teach this math concept?” vs in the U.S:. “How can I get my students to get the right answer to this problem?”  It’s not that the answers aren’t important but if correct answers are the main emphasis of instruction then becomes too easy for the entire system education to devolve into trivial answer getting.  The hard work of critical thinking, working through problems, getting comfortable with false starts, revision, metacognition and learning from mistakes–all qualities that education aspires to gets lost in the extreme focus on the end product.  Moreover, the answer getting approach contributes to students developing a fixed mindset about their own abilities that are very likely false.  Carol Dweck and Jo Boaler’s work in this area provides a number of ideas and approaches to help teachers avoid fixed mindsets and help move students along a learning progression that leads to effective problem solvers.  Part of Boaler’s work at successfully moving students from fixed to growth mindsets in math involves rich problems that have an easy, accessible entry point that opens a door to a very rich, open and challenging environment with many paths to explore.  The floating disk catalase assay fits this description to a “T” in my mind.
BTW,  even though I have participated in a number of curriculum development projects, standards writing and curriculum framework development, I personally seldom pay much explicit attention to standards, science practices frameworks, or objectives when I do my “planning”.  Nor do I ever develop formal learning objectives when I “prepare” lessons.  Like rubrics I tend to look at objectives and frameworks as too confining.  More importantly, I don’t think I have every taught “a lesson” that didn’t take the students beyond the typical standard or learning objective.  Since I kind of live and breath biology education, I don’t want to be boxed in, I want to explore what is possible.  I have a general idea of where we are trying to go in class but I don’t make it explicit.  I don’t want my students to think they have arrived at their destination (learning goal), rather I want them to value the journey and keep on keeping on the path.  I’m not advocating you do the same,  I’m only explaining why you won’t see any specific references here to specific learning goals or science practices.  What follows is a weird blend of what I have done in the classroom and how I would approach this material, today.  I’ve been out of the high school classroom for more than 10 years and I’ve got to say that all these new resources certainly make me wish I was back in the high school classroom.
With that bit of background as justification you’ll see that in the posts that follow I will be promoting being less helpful and trusting my students to be able to come up with reasonable problems and solutions to those problems.  To do this well, requires skill on the part of the teacher to guide student thinking through questions–Socratic questions.  Planning for the instruction requires explicitly thinking about the instructional goals and the types of questions and scenarios that can get us to those goals.  Like the student quantitative skills we are targeting our own skill in questioning will get better and better as we practice it and reflect on it.  By the way since we are talking about skills it is important to remember that skills are improved through practice and therefore our instruction should offer the chance to practice and revisit skills.
Getting Started:
I typically use labs to introduce material so that students have some level of experience with physical phenomena that can serve as a foundation for building conceptual knowledge.  But I’ve got to get their attention and hopefully spark their interest.  I’ve explored many different enzyme systems in the classroom.  For instance, in the “old days” my students did all kinds of things with salivary amylase and starch.  This system had the pedagogical hook of being known as the “spit lab”.  They loved to hate spitting into test tubes to collect their amylase.  High interest.  For catalase I call on their experience with Hydrogen peroxide since most of my students have a bottle back at home and most are familiar with it.
Before going any further, I remind them that they will need to start recording any observations, questions (real important)  and thoughts in their lab notebook.  In the interest of being “less helpful” for more than 25 years I did not provide my students with lab write-ups or worksheets.  They had to organize their own investigations based on demo’s and discussions in class.  I made sure to make their lab notebook indispensable to their individual success in the class by making later assignments that required the information they should have entered into their lab notebooks–usually in the form of laboratory practicals as substitutes for final exams.
I bring out a bottle of Hydrogen peroxide and begin a discussion.
My part of the discussion involves questions and follow-up questions with these targets in mind:  1.  to stimulate interest.  2.  to recall why they use H2O2.  3. to realize that H2O2 breaks down on its own (by asking questions about old, “use-up” bottles in the medicine cabinet and why is the bottle brown?),  4. that bubbles are a sign that the H2O2 is working (killing “germs”).  (the connection to the bubbles needs to be corrected in a bit)

It is at this point I bring out a plastic cup about half full of a yeast solution.  (I almost always use plastic in my labs to minimize when we need goggles)  I mix up a package of bakers yeast in about 250 ml of water before class so that it well suspended.  I pour out about 1/2 cup of H2O2 and say “Let’s see if we can get some bubbles”

At this point I have them.  Because there are lots of bubbles….

Way more than they expect.

When it starts to overflow, that is when I pull out my best Chevy Chase imitation and start bumbling around trying to keep the mess at bay but it is too late.

They are hooked now, at least long enough to provide a quick bit of background information.  At this point we describe the decomposition reaction and quickly balance the equation.  And then, using questions again, start to probe what might be going on.  The target this time is that the idea that the reaction has been greatly speeded up.  Speed implies rate.  This is important.  This is quantitative thinking.  You have been doing similar discussions with your students but you may have not pointed out the quantitative aspect of this observation in the past, assuming that your students would readily see the quantitative aspects of this event.  I know that is exactly what I used to do but if we want to focus more on quantitative skills we have to bring them up to the top and not leave them below the surface, hoping the students will automatically figure it out.   Knowing what I do today, I wish I had made this emphasis more in the past.  Turns out, that one of the big quantitative errors that the public makes is mixing up quantities and rates.
At this point I also introduce the idea of a catalyst as something that increases the rate of a reaction—without being part of the reaction.  The definition is not exactly, spot-on but it is good enough to begin developing a conceptual model–which, again takes us into more quantitative thinking.
Modeling to develop a foundation:
When I was in the classroom this is where I’d start drawing representations of catalase and H2O2 on the whiteboard.and implying motion with lots of hand motion, all the while asking questions about the process.  Of course the purpose of this was to provide the start of a mental model for what was going on at the molecular level to help the students inform their experimental design.  Today I’d do things differently.  I’d use the computer based Molecular Workbench models that are available at  We would have already visited this site, previously so I wouldn’t need a do do much in the way of introducing the site itself.  This type of site, in my mind, is a game changer that makes the abstract world of molecular interactions more accessible and helps to reduce student mis-understandings creating more rigorous mental models.  A very important aspect of these models is the randomness incorporated into the models.  One of the most difficult ideas to get one’s head around is the idea of random motion and interactions leading to the order we see in living things.  Check out this paper to learn more about this teaching/learning challenge:  Garvin-Doxas, Kathy, and Michael W. Klymkowsky. “Understanding randomness and its impact on student learning: lessons learned from building the Biology Concept Inventory (BCI).” CBE-Life Sciences Education 7.2 (2008): 227-233.
These models are 2D agent-based computational models which means each structure in the image is an agent with properties and actions—that interact with the other agents.  The actions and interactions are based on kinetic theory and do not include quantum effects.  Here is the starting reaction representation.

Unlike the catalas/H2O2 decomposition reaction, this model represents a replacement reaction.  In this particular screen shot one of the green molecules has split and could be ready to combine with the purple molecule atoms if the purple molecules were split.  This may not look like a quantitative model but it is.  The reaction without the catalyst does happen but takes a long, long time.  Note that at the bottom there is a time measurement, there is a given, starting number of reactants and there is a measurement of reaction completion.  All quantitative parameters that students can “take data” on using the pause button and simply counting…..
Here below, two catalyst molecules have been added and in a very short time the reaction moving to completion.  Note that while the reaction is near completion the catalysts are unchanged.

Now, at this point I have to make a decision.  Do I have the students collect some data to help form their conceptual understanding or do I simply let their impressions of the model with just few trials guide their understanding.  Either way, it is important that I use a series of questions to guide the students to my targets:  1.  an understanding that the reaction is speeded up and hence rates are something we might want to measure,  2.  that the catalyst provides and “alternate pathway”, 3. that there is a limit to how fast the enzyme works, 4. that even when the reaction is “complete” things are still happening, and 5. that if we re-run the reaction, collecting data each time the results are slightly different but predictable.
You can play with the model right here:
Here’s the link to the model of catalysis where you can explore the model yourself or with your students:
But wait there’s more!
When I use any kind of model, now in the classroom, we have a discussion of the strengths and weakness of the model in play.  Usually, when I show the model above to teachers I get quite a few aha’s and general statements of approvals.  With that in mind what do you think are the strengths of this model?  More difficult for students, at least, is to come up with the weaknesses or limitations of the model.  Often they focus on trivial problems, like the atoms aren’t actually green and purple and miss others like this is a two-dimensional space.  They will no doubt have a difficult time with the idea of scaling time.  For the catalase system this model’s size scales are way out of wack.  What are some other “issues”?
In addition to a computational model a good strategy would be to have the students develop their own physical,  model of a catalyst speeding up a reaction.  Biology teachers have promoted the toothpickase lab as an enzyme lab over the years.
Googling toothpickase will bring up all sorts of prepared documents and images.  This model is a great one to work on and explore.  It will definitely help guide questions and experimental design to explore catalase but consider having the students come up with the model themselves with just a little prompting/demonstration from you.  Use questions to help them figure out the quantitative parameters, the idea of rates and how to structure and graph the results with the idea of supporting and communicating a scientific argument.  Try to avoid the temptation of providing a structured set of lab instructions and tables to fill out for toothpickase–in other words don’t be too helpful.  Every time we do that we are taking away a chance for the student to work on one of their quantitative skills.  One of the attractions of this model is that the students grasp what is modeled but instead of making it a center point of your lesson consider using it to support your exploration into an actual biological system–the catalase/hydrogen peroxide system.  Again,  help the students discover weaknesses and strengths of the model.
Experience with a physical model or the computational model should provide enough background that perhaps you can lead your students to develop a different kind of model—a symbolic model like this:

By Thomas Shafee (Own work) [CC BY 4.0 (], via Wikimedia Commons
Paula Donham has this same model in her write-up.  This particular model is the basis for further analysis later on in this lab.  So consider trying to get to a model like this before moving forward.
Remember,  I said that I would suggest more quantitative things to emphasize than anyone would want to actually use in the classroom so pick and choose.  There’s a couple of other models we could explore which I will explore towards the end of the lab but for now, the students should be ready to start designing how they are going to collect data that will help them understand the catalase/hydrogen peroxide system–which is the next thing I’ll talk about.

Teaching Quantitative Skills using the Floating Disk Catalase Lab: Intro

I find it remarkable how deeply the biology education community has embraced the call to increase quantitative skills in biology.   This is certainly not an easy change to incorporate into our curricula and it is one that the community will be working on, tweaking and improving over time as our own instructional quantitative strategies and skills mature.  But even with this willing effort by the community where does one find the time to add “something else” to an already packed curriculum?   The first part of the answer to that question is to first have confidence that it can be done;  the second part of the answer has to do with strategic and efficient curriculum decisions; and the third part of that answer is to realize that, like our students, we are somewhere on learning progressions ourselves and that our skills and understandings will deepen the more we teach quantitative skills.  No one has time to teach all the biology they would like to teach.  Every year most of us make all sorts of decisions about what to include, what to emphasize, and what to leave out.  The challenge of adding structured instruction in quantitative skills is daunting, particularly since most of us have not had time to develop our own math-based pedagogical tools and skills.  With that in mind we often fall back on the type of math instruction that we likely encountered in our own educational background.  If, like me, most of your math instruction was based on algorithms and focused on getting answers instead of learning how to do math, then likely if we model our quantitative skill instruction on the math instruction we experienced, we won’t be doing a very good job helping our students develop quantitative skills.  Instead, perhaps we (the biology teaching community) should consider delivering quantitative skills instruction in a way that models effective and efficient math instruction informed by research.  Here’s the good thing–it turns out that many of the strategies that work well for teaching science also work well for teaching math.  We biology teachers just need a bit more experience trying to explicitly teach appropriate quantitative skills.  We need to develop our own specialized pedagogical content knowledge.   I thought I’d put out an example of how this might work in a classroom–certainly not as an exemplar but more as a starting point.

To this end, at the 2016 NABT meeting Jennifer Pfannerstil, Stacey Kiser and I shared strategies to introduce quantitative skills focused around a classic lab: The Floating Disk Catalase lab.  The earliest version of this lab that I know of was published in ABLE.
A Quantitative Enzyme Study Using Simple Equipment by Beth A. D. Nichols and Linda B. Cholewiak.   Yes, that is the same Beth Nichols that recently retired from ETS but has worked with so many in this community.

In this series of posts I’ll walk through the material we presented at NABT along with some discussion on the rationale of each example coupled with resources so that you should be able to design your own lab that structures quantitative skills.  A caveat:  I want to emphasize how rich this particular lab is for developing quantitative skills–in fact I’ll present more possible ideas than probably anyone will want to use in any particular class.  So pick and choose what works for you and your class but consider the examples presented here as something to shoot for with your students.  Let’s get started.

Lab Overview:  

If you are not familiar with the lab it features a simple and student friendly method to measure/quantify enzyme action or kinetics using disks of filter paper soaked in a yeast solution as the enzyme source and a solution of hydrogen peroxide as the substrate.  Here’s a right up by Paula Donham on the technique:  Note that one of the educational goals that Paula used this lab for was to introduce the use of box plots as a way of presenting your data.  (That’s a quantitative skill, btw.)
The materials:
How does it work?
Dip the paper disk in the yeast solution.  The yeast solution provides a set amount of catalase per disk.  Drop the disk into a solution of hydrogen peroxide.

The catalase breaks down the hydrogen peroxide into water and oxygen.  The oxygen bubbles catch in the paper fibers and eventually cause the disk to rise. You can see the disk staring to rise in the lower right hand corner of the cup.

I use the plastic cups to make a dilution series with the hydrogen peroxide and the 24 well plates for the testing.  The well plates allow you to put one disk per well (which might lead to better precision).
Ready to collect data with eight data points per substrate dilution:
Here’s a short video of the procedure using the well plate:
Dip a disk in the yeast, drop it into the hydrogen peroxide and time how long it takes to rise.
Why use this lab for introducing quantitative skills?
The simplicity and precision of this lab technique allows the teacher and the class to more deeply explore concepts about enzymes but also to explore how different quantitative skills can provide a path to even deeper understanding.  The key here is that the technique is so simple, the students can concentrate on thinking about what is going on with the enzyme, how to capture that quantitatively and how to support their conclusions with data.  And, they can simply do it over if errors are made since it takes a small amount of time.  There are some other aspects of this lab that allows you to introduce different approaches and deeper understanding as you build quantitative skills.  In my classes, I had three main goals for this lab:  1.  To begin an understanding of enzymes and enzyme action; 2.  Introduce and practice a number of quantitative skills (including serial dilutions and graphing);  and 3 Introduce and practice experimental design and scientific argumentation.  In my classes, we would introduce the technique, let everyone in the class practice it,  and then assign the actual data collection as homework.  The students had to acquire their own materials at home, collect the data and report back to class.  They could work collaboratively or with their families.  The lab is safe, inexpensive and doable.  By assigning the data collection as homework, this freed up class time to work on the quantitative skills.   The students generated mini-posters to share their work with their peers.  In the next post I’ll talk more about how you might present this lab to students and the types of quantitative skills you can build and practice.

Get Ready and Sign up for DNA Day

ks_DNADay_logo_goldLast year, the biology folks at the University of Kansas adapted a successful program from the University of North Carolina:  DNA Day.  The teachers I talked to loved the program and so did their students.  Don’t miss out.

Sign-ups for Kansas DNA Day 2016 are now open! Kansas DNA Day features graduate and advanced undergrad students in the biological sciences traveling to local high schools to give interactive lessons on the applications of genetics and genomic sequencing. The event will be held the week of April 21st. We currently have ambassadors ready for the greater Kansas City, Lawrence, Manhattan and Wichita areas so if you are a high school science teacher in one of these areas, sign up to have ambassadors come visit your classroom! More info and a link get involved can be found at, or for more info.