Read an Excerpt
Quantifying Life: A Symbiosis of Computation, Mathematics, and Biology
By Dmitry A. Kondrashov The University of Chicago Press
Copyright © 2016 Dmitry A. Kondrashov
All rights reserved.
ISBN: 978-0-226-37193-1
CHAPTER 1
Arithmetic and variables: The lifeblood of modeling
You can add up the parts, but you won't have the sum; You can strike up the march, there is no drum. Every heart, every heart to love will come But like a refugee.
— Leonard Cohen, Anthem
Mathematical modeling begins with a set of assumptions. In fact, one may say that a mathematical model is a bunch of assumptions translated into mathematics. These assumptions may be more or less reasonable, and they may come from different sources. For instance, many physical models are so well established that we refer to them as laws; we are pretty sure they apply to molecules, cells, and organisms as well as to inanimate objects. Thus at times we may use physical laws as the foundation on which to build models of biological entities; these are often known as first-principles (theory-based) models. At other times we may have experimental evidence that suggests a certain kind of relationship between quantities — perhaps we find that the amount of administered drug and the time until the drug is completely removed from the bloodstream are proportional to each other. This observation can be turned into anempirical (experiment-based) model. Yet another type of model assumption is not based on either theory or experiment, but simply on convenience: for example, we may assume that the mutation rates at two different loci are independent and see what the implications are. These are sometimes called toy or cartoon models (Jungck, Gaff, and Weisstein 2010).
This leads to the question: how do you decide whether a model is good? It is surprisingly difficult to give a straightforward answer to this question. Of course, one major goal of a model is to capture some essential features of reality, so in most biological modeling studies you will see a comparison between experimental results and predictions of the model. But it is not enough for a model to be faithful to experimental data! Think of a simple example: suppose your experiment produced 5 data points as a function of time; it is possible to find a polynomial (of fourth degree) that passes exactly through all 5 points by specifying the coefficients of its 5 terms. This is called data fitting, and it has a large role to play in the mathematical modeling of biology. However, I think you will agree that in this case we have learned very little: we just substituted 5 values in the data set with 5 values of the coefficients of the mathematical model. To heighten the absurdity, imagine a data set of 1001 points that you have modeled using a 1000-degree polynomial. This is an example of overfitting, or making the model agree with the data by making the model overly complex.
Substituting a complicated model for a complicated real situation does not help understand the reality. One necessary ingredient of a useful model is simplicity of assumptions. Simplicity in modeling has at least two virtues: simple models can be grasped by our limited minds, and simple assumptions can be tested against evidence. A simple model that fails to reproduce experimental data can be more informative than a complex model that fits the data perfectly. If a simple model fails, you have learned that you are missing something in your assumptions; but a complex model can be right for the wrong reasons, like erroneous assumptions canceling each other, or it may contain needless assumptions. This is why the ability to build good models is a difficult skill that balances simplicity of assumptions against fidelity to empirical data (Cohen 2004). In this chapter you will learn how to do the following:
1. distinguish variables and parameters in models,
2. describe the state space of a model,
3. perform arithmetic operations in R, and
4. assign variables in R.
1.1 Blood circulation and mathematical modeling
Galen was one of the great physicians of antiquity. He studied how the body works by performing experiments on humans and animals. Among other things, he was famous for a careful study of the heart and how blood traveled through the body. Galen observed that there were different types of blood: arterial blood that flowed out of the heart, which was bright red, and venous blood that flowed in the opposite direction, which was a darker color. This naturally led to questions: what is the difference between venous and arterial blood? Where does each one come from and where does it go?
You, a reader of the twenty-first century, likely already know the answer: blood circulates through the body, bringing oxygen and nutrients to the tissues through the arteries, and returns back through the veins carrying carbon dioxide and waste products, as shown in Figure 1.1. Arterial blood contains a lot of oxygen, while venous blood carries more carbon dioxide, but otherwise they are the same fluid. The heart does the physical work of pushing arterial blood out of the heart, to the tissues and organs, as well as pushing venous blood through the second circulatory loop that goes through the lungs, where it picks up oxygen and releases carbon dioxide, becoming arterial blood again. This may seem like a very natural picture to you, but it is far from easy to deduce by simple observation.
Galen came up with a different explanation based on the notion of "humors," or fluids, that was fundamental to the Greek conception of the body. He proposed that the venous and arterial blood were different humors: venous blood, or "natural spirits," was produced by the liver, while arterial blood, or "vital spirits," was produced by the heart and carried by the arteries, as shown in Figure 1.2. The heart consisted of two halves, and it warmed the blood and pushed both the natural and vital spirits out to the organs; the two spirits could mix through pores in the septum separating its right and left halves. The vital and natural spirits were both consumed by the organs, and they were regenerated by the liver and the heart. The purpose of the lungs was to serve as bellows, cooling the blood after it was heated by the heart.
Is this a good theory of how the heart, lungs, and blood work? Doctors in Europe thought so for more than a thousand years! Galen's textbook on physiology was the standard for medical students through the seventeenth century. The theory seemed to make sense and explain what was observable. Many great scientists and physicians, including Leonardo da Vinci and Avicenna, did not challenge the inaccuracies, such as the porous septum in the heart, even though they could not see the pores themselves. It took both better observations and a quantitative testing of the hypothesis to challenge the orthodoxy.
William Harvey was born in England and studied medicine in Padua under the great physician Hieronymus Fabricius. He became famous and would perform public demonstrations of physiology, using live animals for experiments that would not be legal today. He also studied the heart and the blood vessels, and he measured the volume of the blood that can be contained in the human heart. He was quite accurate in estimating the correct volume, which we now know to be about 70 mL (1.5 oz). What is even more impressive is that he used this quantitative information to test Galen's theory.
Let us assume that all of the blood pumped out by the heart is consumed by the tissues, as Galen proposed; let us further assume that the heart beats at constant rate of 60 beats per minute, with a constant ejection volume of 70 mL. Then over the course of a day, the human body would consume about 70 mL × 60 (beats per minute) × 60 (minutes per hour) × 24 (hours per day), which is more than 6,000 liters of blood! You may quibble over the exact numbers (some hearts beat faster or slower, some hearts may be larger or smaller) but the impact of the calculation remains the same: it is an absurd conclusion. Galen's theory would require a human being to consume and produce a quantity of fluid many times the volume of the human body (about 100 liters) in a day! This is a physical impossibility, so the only possible conclusion in that Galen's model is wrong.
This led Harvey to propose the model that we know today: that blood is not consumed by the tissues but instead returns to the heart and is reused (Schultz 2002). This is why we call the heart and blood vessels part of the circulatory system of the body. This model was controversial at the time — some people proclaimed they would "rather be wrong with Galen, than right with Harvey" — but eventually became accepted as the standard model. What is remarkable is that Harvey's argument, despite being grounded in empirical data, was strictly mathematical. He adopted the assumptions of Galen, made the calculations, and got a result that was inconsistent with reality. This is an excellent example of how mathematical modeling can be useful by providing clear evidence against a wrong hypothesis.
1.2 Parameters and variables in models
Many biologists remain skeptical of mathematical modeling. The criticism can be summarized like this: a theoretical model either agrees with experiment, or it does not. In the former case, it is useless, because the data are already known; in the latter case, it is wrong! As I indicated above, the goal of mathematical modeling is not to reproduce experimental data; otherwise, indeed, it would be of interest only to theoreticians. The correct question to ask is, does a theoretical model help us understand the real thing? There are at least three ways in which a model can be useful:
1. A model can help a scientist make sense of complex data by testing whether a particular mechanism explains the observations. Thus, a model can help clarify our understanding by throwing away the nonessential features and focusing on the most important ones.
2. A mathematical model can make predictions for situations that have not been observed. It is easy to change parameters in a mathematical model and calculate the effects. This can lead to new hypotheses that can be tested by experiments.
3. Model predictions can lead to better experimental design. Instead of trying a whole bunch of conditions in an experiment, the theoretical model can first suggest which ones will produce big effects, and thus can save a lot of work for the lab scientist.
To make a useful model of a complex living system, you have to simplify it. Even if you are only interested in a part of it (for instance, a cell or a single molecule), you have to make simplifying choices. A small protein has thousands of atoms; a cell consists of millions of molecules, which all interact with one another. Keeping track mathematically of every single component is daunting, if not impossible. To build a useful mathematical model, you must choose a few quantities that describe the system sufficiently to answer the questions of interest. For instance, if the positions of a couple of atoms in the protein you are studying determine its activity, those positions would make natural quantities to include in your model. You will find more specific examples of models later in this chapter.
Once you have decided on the essential quantities to be included in the model, these are divided into variables and parameters. As suggested by the name, a variable typically varies over time, and the model tracks the changes in its value; parameters usually stay constant or change more slowly. However, that is not always the case. The most important difference is that variables describe quantities within the system being modeled, while parameters usually refer to quantities which are controlled by something outside the system.
As you can see from this definition, the same quantity can be a variable or a parameter, depending on the scope of the model. Let's go back to our example of modeling a protein: usually the activity (and the structure) of a protein is influenced by external conditions, such as pH and temperature; these would be natural parameters for a model of the molecule. However, if we model an entire organism, the pH (e.g., of the blood plasma) and temperature are controlled by physiological processes in the organism, and thus these quantities would then be considered variables.
Perhaps the clearest way to differentiate between variables and parameters is to think about how you would present the quantities visually. We discuss plotting graphs of functions in Chapter 2, and plotting data sets in Chapter 3, but you have likely seen many such plots before. Consider which of the quantities you would to plot to describe the system you are modeling. If the quantity belongs on either axis, it is a variable, since it requires a range of values to illustrate how it changes. The rest of the quantities can be called parameters. Of course, depending on the question you ask, the same quantity may be plotted on an axis or not, which is why this classification is not absolute.
After specifying the essential variables for the model, we can describe a complex and evolving biological system in terms of its state. This is a general term, but it usually means the values of all the variables chosen for the model, which are often called state variables. For instance, an ion channel can be described with the state variable of conformation, which may be in a open state or in a closed state. The range, or collection of all different states of the system, is called the state space of the model. Below are examples of models of biological systems with diverse state spaces.
1.2.1 discrete state variables: genetics
Some genes are present in a population as two different versions, called alleles — let us use letters A and B to label them. One may describe the genetic state of an individual based on which allele it carries. If this individual is haploid (e.g., a bacterium), then it only carries a single copy of the genome, and its state can be described by a single variable with the state space of A or B.
A diploid organism (e.g., a human) possesses two copies of each gene (unless the gene is on one of the sex chromosomes, X or Y); each copy may be in either state A or B. This may seem to suggest that there are four different values in the genetic state space, but if the order of the copies does not matter (which is usually the case), then AB and BA are effectively the same, so the state space consists of three values: AA, BB, and AB.
1.2.2 continuous state variables: concentration
Suppose that a biological molecule is produced at a certain rate and degraded at a different rate, and we would like to describe the quantity of the molecule, usually expressed as a concentration. The relevant variables here are concentration and time (you will see those variables on the axes of many plots in biochemistry.) Concentration is the ratio of the number of molecules and the volume, so the state space can be any positive real number (although practically speaking, there is a limit on how many molecules can fit inside a given volume, but for simplicity we can ignore this).
Going even further, let us consider an entire cell, which contains a large number of different molecules. We can describe the state of a cell as the collection of all the molecular concentrations, with the parameters being the rates of all the reactions going on among those molecules. The state space for this model with N different molecules is N positive real numbers.
Discussion questions:
For the biological models described below, divide the quantities into variables and parameters, and specify the state space of the model. Note that there may be more than one correct interpretation, so explain your decision in terms of the questions that you would like to ask of the model.
Discussion 1.2.1. The volume of blood pumped by the heart during a certain amount of time, depending on the heart rate and the ejection volume.
Discussion 1.2.2. The number of wolves in a national forest, depending on the number of wolves in the previous year, the birth rate, the death rate, and the migration rate.
Discussion 1.2.3. The fraction of hemes in hemoglobin (a transport protein in red blood cells) that are bound to oxygen, depending on the partial pressure of oxygen and the binding cooperativity of hemoglobin.
Discussion 1.2.4. The number of mutations that occur in a genome, depending on the mutation rate, the amount of time, and the length of the genome.
(Continues...)
Excerpted from Quantifying Life: A Symbiosis of Computation, Mathematics, and Biology by Dmitry A. Kondrashov. Copyright © 2016 Dmitry A. Kondrashov. Excerpted by permission of The University of Chicago Press.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.