Mathematics and Applied Statistics Lesson of the Day – Contrasts

A contrast is a linear combination of a set of variables such that the sum of the coefficients is equal to zero.  Notationally, consider a set of variables

\mu_1, \mu_2, ..., \mu_n.

Then the linear combination

c_1 \mu_1 + c_2 \mu_2 + ... + c_n \mu_n

is a contrast if

c_1 + c_2 + ... + c_n = 0.

There is a reason for why I chose to use \mu as the symbol for the variables in the above notation – in statistics, contrasts provide a very useful framework for comparing multiple population means in hypothesis testing.  In a later Statistics Lesson of the Day, I will illustrate some examples of contrasts, especially in the context of experimental design.

Applied Statistics Lesson of the Day – Notation for Fractional Factorial Designs

Fractional factorial designs use the L^{F-p} notation; unfortunately, this notation is not clearly explained in most textbooks or web sites about experimental design.  I hope that my explanation below is useful.

  • L is the number of levels in each factor; note that the L^{F-p} notation assumes that all factors have the same number of levels.
    • If a factor has 2 levels, then the levels are usually coded as +1 and -1.
    • If a factor has 3 levels, then the levels are usually coded as +1, 0, and -1.
  • F is the number of factors in the experiment
  • p is the number of times that the full factorial design is fractionated by L.  This number is badly explained by most textbooks and web sites that I have seen, because they simply say that p is the fraction – this is confusing, because a fraction has a numerator and a denominator, and p is just 1 number.  To clarify,
    • the fraction is L^{-p}
    • the number of treatments in the fractional factorial design is L^{-p} multiplied by the total possible number of treatments in the full factorial design, which is L^F.

If all L^F possible treatments are used in the experiment, then a full factorial design is used.  If a fractional factorial design is used instead, then L^{-p} denotes the fraction of the L^F treatments that is used.

Most factorial experiments use binary factors (i.e. factors with 2 levels, L = 2).  Thus,

  • if p = 1, then the fraction of treatments that is used is 2^{-1} = 1/2.
  • if p = 2, then the fraction of treatments that is used is 2^{-2} = 1/4.

This is why

  • a 2^{F-1} design is often called a half-fraction design.
  • a 2^{F-2} design is often called a quarter-fraction design.

However, most sources that I have read do not bother to mention that L can be greater than 2; experiments with 3-level factors are less frequent but still common.  Thus, the terms half-fraction design and half-quarter design only apply to binary factors.  If L = 3, then

  • a 3^{F-1} design uses one-third of all possible treatments.
  • a 3^{F-2} design uses one-ninth of all possible treatments.

Applied Statistics Lesson of the Day – Fractional Factorial Design and the Sparsity-of-Effects Principle

Consider again an experiment that seeks to determine the causal relationships between G factors and the response, where G > 1.  Ideally, the sample size is large enough for a full factorial design to be used.  However, if the sample size is small and the number of possible treatments is large, then a fractional factorial design can be used instead.  Such a design assigns the experimental units to a select fraction of the treatments; these treatments are chosen carefully to investigate the most significant causal relationships, while leaving aside the insignificant ones.  

When, then, are the significant causal relationships?  According to the sparsity-of-effects principle, it is unlikely that complex, higher-order effects exist, and that the most important effects are the lower-order effects.  Thus, assign the experimental units so that main (1st-order) effects and the 2nd-order interaction effects can be investigated.  This may neglect the discovery of a few significant higher-order effects, but that is the compromise that a fractional factorial design makes when the sample size available is low and the number of possible treatments is high.  

Applied Statistics Lesson of the Day – The Independent 2-Sample t-Test with Unequal Variances (Welch’s t-Test)

A common problem in statistics is determining whether or not the means of 2 populations are equal.  The independent 2-sample t-test is a popular parametric method to answer this question.  (In an earlier Statistics Lesson of the Day, I discussed how data collected from a completely randomized design with 1 binary factor can be analyzed by an independent 2-sample t-test.  I also discussed its possible use in the discovery of argon.)  I have learned 2 versions of the independent 2-sample t-test, and they differ on the variances of the 2 samples.  The 2 possibilities are

  • equal variances
  • unequal variances

Most statistics textbooks that I have read elaborate at length about the independent 2-sample t-test with equal variances (also called Student’s t-test).  However, the assumption of equal variances needs to be checked using the chi-squared test before proceeding with the Student’s t-test, yet this check does not seem to be universally done in practice.  Furthermore, conducting one test based on the results of another can inflate the probability of committing a Type 1 error (Ruxton, 2006).

Some books give due attention to the independent 2-sample t-test with unequal variances (also called Welch’s t-test), but some barely mention its value, and others do not even mention it at all.  I find this to be puzzling, because the assumption of equal variances is often violated in practice, and Welch’s t-test provides an easy solution to this problem.  There is a seemingly intimidating but straightforward calculation to approximate the number of degrees of freedom for Welch’s t-test, and this calculation is automatically incorporated in most software, including R and SAS.  Finally, Welch’s t-test removes the need to check for equal variances, and it is almost as powerful as Student’s t-test when the variances are equal (Ruxton, 2006).

For all of these reasons, I recommend Welch’s t-test when using the parametric approach to comparing the means of 2 populations.

Reference

Graeme D. Ruxton.  “The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test“.  Behavioral Ecology (July/August 2006) 17 (4): 688-690 first published online May 17, 2006

Applied Statistics Lesson of the Day – Additive Models vs. Interaction Models in 2-Factor Experimental Designs

In a recent “Machine Learning Lesson of the Day“, I discussed the difference between a supervised learning model in machine learning and a regression model in statistics.  In that lesson, I mentioned that a statistical regression model usually consists of a systematic component and a random component.  Today’s lesson strictly concerns the systematic component.

An additive model is a statistical regression model in which the systematic component is the arithmetic sum of the individual effects of the predictors.  Consider the simple case of an experiment with 2 factors.  If Y is the response and X_1 and X_2 are the 2 predictors, then an additive linear model for the relationship between the response and the predictors is

Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon

In other words, the effect of X_1 on Y does not depend on the value of X_2, and the effect of X_2 on Y does not depend on the value of X_1.

In contrast, an interaction model is a statistical regression model in which the systematic component is not the arithmetic sum of the individual effects of the predictors.  In other words, the effect of X_1 on Y depends on the value of X_2, or the effect of X_2 on Y depends on the value of X_1.  Thus, such a regression model would have 3 effects on the response:

  1. X_1
  2. X_2
  3. the interaction effect of X_1 and X_2

full factorial design with 2 factors uses the 2-factor ANOVA model, which is an example of an interaction model.  It assumes a linear relationship between the response and the above 3 effects.

Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \varepsilon

Note that additive models and interaction models are not confined to experimental design; I have merely used experimental design to provide examples for these 2 types of models.

Applied Statistics Lesson of the Day – The Full Factorial Design

An experimenter may seek to determine the causal relationships between G factors and the response, where G > 1.  On first instinct, you may be tempted to conduct G separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with G factors at the same time.  This is better than the first approach because

  • it is faster
  • it uses less resources to answer the same questions
  • the interactions between the G factors can be examined

Such an experiment requires the full factorial design; in this design, the treatments are all possible combinations of all levels of all factors.  After controlling for confounding variables and choosing the appropriate range and number of levels of the factor, the different treatments are applied to the different groups, and data on the resulting responses are collected.  

The simplest full factorial experiment consists of 2 factors, each with 2 levels.  Such an experiment would result in 2 \times 2 = 4 treatments, each being a combination of 1 level from the first factor and 1 level from the second factor.  Since this is a full factorial design, experimental units are independently assigned to all treatments.  The 2-factor ANOVA model is commonly used to analyze data from such designs.

In later lessons, I will discuss interactions and 2-factor ANOVA in more detail.

Applied Statistics Lesson of the Day – The Matched-Pair (or Paired) t-Test

My last lesson introduced the matched pairs experimental design, which is a special type of the randomized blocked design.  Let’s now talk about how to analyze the data from such a design.

Since the experimental units are organized in pairs, the units between pairs (blocks) are not independently assigned.  (The units within each pair are independently assigned – returning to the glove example, one hand is randomly chosen to wear the nitrile glove, while the other is randomly chosen to wear the latex glove.)  Because of this lack of independence between pairs, the independent 2-sample t-test is not applicable.  Instead, use the matched pair t-test (also called the paired or the paired difference t-test).  This is really a 1-sample t-test that tests the difference between the responses of the experimental and the control groups.

Applied Statistics Lesson of the Day – The Matched Pairs Experimental Design

The matched pairs design is a special type of the randomized blocked design in experimental design.  It has only 2 treatment levels (i.e. there is 1 factor, and this factor is binary), and a blocking variable divides the n experimental units into n/2 pairs.  Within each pair (i.e. each block), the experimental units are randomly assigned to the 2 treatment groups (e.g. by a coin flip).  The experimental units are divided into pairs such that homogeneity is maximized within each pair.

For example, a lab safety officer wants to compare the durability of nitrile and latex gloves for chemical experiments.  She wants to conduct an experiment with 30 nitrile gloves and 30 latex gloves to test her hypothesis.  She does her best to draw a random sample of 30 students in her university for her experiment, and they all perform the same organic synthesis using the same procedures to see which type of gloves lasts longer.

She could use a completely randomized design so that a random sample of 30 hands get the 30 nitrile gloves, and the other 30 hands get the 30 latex gloves.  However, since lab habits are unique to each person, this poses a confounding variable – durability can be affected by both the material and a student’s lab habits, and the lab safety officer only wants to study the effect of the material.  Thus, a randomized block design should be used instead so that each student acts as a blocking variable – 1 hand gets a nitrile glove, and 1 hand gets a latex glove.  Once the gloves have been given to the student, the type of glove is randomly assigned to each hand; some may get the nitrile glove on their left hand, and some may get it on their right hand.  Since this design involves one binary factor and blocks that divide the experimental units into pairs, this is a matched pairs design.

Applied Statistics Lesson of the Day – Blocking and the Randomized Complete Blocked Design (RCBD)

A completely randomized design works well for a homogeneous population – one that does not have major differences between any sub-populations.  However, what if a population is heterogeneous?

Consider an example that commonly occurs in medical studies.  An experiment seeks to determine the effectiveness of a drug on curing a disease, and 100 patients are recruited for this double-blinded study – 50 are men, and 50 are women.  An abundance of biological knowledge tells us that men and women have significantly physiologies, and this is a heterogeneous population with respect to gender.  If a completely randomized design is used for this study, gender could be a confounding variable; this is especially true if the experimental group has a much higher proportion of one gender, and the control group has a much higher proportion of the other gender.  (For instance, purely due to the randomness, 45 males may be assigned to the experimental group, and 45 females may be assigned to the control group.)  If a statistically significant difference in the patients’ survival from the disease is observed between such a pair of experimental and control groups, this effect could be attributed to the drug or to gender, and that would ruin the goal of determining the cause-and-effect relationship between the drug and survival from the disease.

To overcome this heterogeneity and control for the effect of gender, a randomized blocked design could be used.  Blocking is the division of the experimental units into homogeneous sub-populations before assigning treatments to them.  A randomized blocked design for our above example would divide the males and females into 2 separate sub-populations, and then each of these 2 groups is split into the experimental and control group.  Thus, the experiment actually has 4 groups:

  1. 25 men take the drug (experimental)
  2. 25 men take a placebo (control)
  3. 25 women take the drug (experimental)
  4. 25 women take a placebo (control)

Essentially, the population is divided into blocks of homogeneous sub-populations, and a completely randomized design is applied to each block.  This minimizes the effect of gender on the response and increases the precision of the estimate of the effect of the drug.

Applied Statistics Lesson of the Day – Positive Control in Experimental Design

In my recent lesson on controlling for confounders in experimental design, the control group was described as one that received a neutral or standard treatment, and the standard treatment may simply be nothing.  This is a negative control group.  Not all experiments require a negative control group; some experiments instead have positive control group.

A positive control group is a group of experimental units that receive a treatment that is known to cause an effect on the response.  Such a causal relationship would have been previously established, and its inclusion in the experiment allows a new treatment to be compared to this existing treatment.  Again, both the positive control group and the experimental group experience the same experimental procedures and conditions except for the treatment.  The existing treatment with the known effect on the response is applied to the positive control group, and the new treatment with the unknown effect on the response is applied to the experimental group.  If the new treatment has a causal relationship with the response, both the positive control group and the experimental group should have the same responses.  (This assumes, of course, that the response can only be changed in 1 direction.  If the response can increase or decrease in value (or, more generally, change in more than 1 way), then it is possible for the positive control group and the experimental group to have the different responses.

In short, in an experiment with a positive control group, an existing treatment is known to “work”, and the new treatment is being tested to see if it can “work” just as well or even better.  Experiments to test for the effectiveness of a new medical therapies or a disease detector often have positive controls; there are existing therapies or detectors that work well, and the new therapy or detector is being evaluated for its effectiveness.

Experiments with positive controls are useful for ensuring that the experimental procedures and conditions proceed as planned.  If the positive control does not show the expected response, then something is wrong with the experimental procedures or conditions, and any “good” result from the new treatment should be considered with skepticism.

 

Applied Statistics Lesson of the Day – Choosing the Range of Levels for Quantitative Factors in Experimental Design

In addition to choosing the number of levels for a quantitative factor in designing an experiment, the experimenter must also choose the range of the levels of the factor.

  • If the levels are too close together, then there may not be a noticeable difference in the corresponding responses.
  • If the levels are too far apart, then an important trend in the causal relationship could be missed.

Consider the following example of making sourdough bread from Gänzle et al. (1998).  The experimenters sought to determine the relationship between temperature and the growth rates of 2 strains of bacteria and 1 strain of yeast, and they used mathematical models and experimental data to study this relationship.  The plots below show the results for Lactobacillus sanfranciscensis LTH2581 (Panel A) and LTH1729 (Panel B), and Candida milleri LTH H198 (Panel C).  The figures contain the predicted curves (solid and dashed lines) and the actual data (circles).  Notice that, for all 3 organisms,

  • the relationship is relatively “flat” in the beginning, so choosing temperatures that are too close together at low temperatures (e.g. 1 and 2 degrees Celsius) would not yield noticeably different growth rates
  • the overall relationship between growth rate and temperature is rather complicated, and choosing temperatures that are too far apart might miss important trends.

yeast temperature

Once again, the experimenter’s prior knowledge and hypothesis can be very useful in making this decision.  In this case, the experimenters had the benefit of their mathematical models in guiding their hypothesis and choosing the range of temperatures for collecting the data on the growth rates.

Reference:

Gänzle, Michael G., Michaela Ehmann, and Walter P. Hammes. “Modeling of growth of Lactobacillus sanfranciscensis and Candida milleri in response to process parameters of sourdough fermentation.” Applied and environmental microbiology 64.7 (1998): 2616-2623.

Applied Statistics Lesson of the Day – Choosing the Number of Levels for Factors in Experimental Design

The experimenter needs to decide the number of levels for each factor in an experiment.

  • For a qualitative (categorical) factor, the number of levels may simply be the number of categories for that factor.  However, because of cost constraints, an experimenter may choose to drop a certain category.  Based on the experimenter’s prior knowledge or hypothesis, the category with the least potential for showing a cause-and-effect relationship between the factor and the response should be dropped.
  • For a quantitative (numeric) factor, the number of levels should reflect the cause-and-effect relationship between the factor and the response.  Again, the experimenter’s prior knowledge or hypothesis is valuable in making this decision.
    • If the relationship in the chosen range of the factor is hypothesized to be roughly linear, then 2 levels (perhaps the minimum and the maximum) should be sufficient.
    • If the relationship in the chosen range of the factor is hypothesized to be roughly quadratic, then 3 levels would be useful.  Often, 3 levels are enough.
    • If the relationship in the chosen range of the factor is hypothesized to be more complicated than a quadratic relationship, consider using 4 or more levels.

Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine

  1. whether or not there is a cause-and-effect relationship between the factor and the response
  2. the strength of the causal relationship, should such a relationship exist.

To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a difference between the 2 responses, then there is evidence to suggest that the causal relationship exists, and the difference can be measured and quantified.

However, in most* experiments, there is random variation in the response.  Random variation exists in the natural sciences, and there is even more of it in the social sciences.  Thus, an observed difference between the control and experimental groups could be mistakenly attributed to a cause-and-effect relationship when the source of the difference is really just random variation.  In short, the difference may simply be due to the noise rather than the signal.  

To detect an actual difference beyond random variation (i.e. to obtain a higher signal-to-noise ratio), it is important to use replication to obtain a sufficiently large sample size in the experiment.  Replication is the repeated application of the treatments to multiple independently assigned experimental units.  (Recall that randomization is an important part of controlling for confounding variables in an experiment.  Randomization ensures that the experimental units are independently assigned to the different treatments.)  The number of independently assigned experimental units that receive the same treatment is the sample size.

*Deterministic computer experiments are unlike most experiments; they do not have random variation in the responses.

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #2: Controlling for Confounders

A well designed experiment must have good control, which is the reduction of effects from confounding variables.  There are several ways to do so:

  • Include a control group.  This group will receive a neutral treatment or a standard treatment.  (This treatment may simply be nothing.)  The experimental group will receive the new treatment or treatment of interest.  The response in the experimental group will be compared to the response in the control group to assess the effect of the new treatment or treatment of interest.  Any effect from confounding variables will affect both the control group and the experimental group equally, so the only difference between the 2 groups should be due to the new treatment or treatment of interest.
  • In medical studies with patients as the experimental units, it is common to include a placebo group.  Patients in the placebo group get a treatment that is known to have no effect.  This accounts for the placebo effect.
    • For example, in a drug study, a patient in the placebo group may get a sugar pill.
  • In experiments with human or animal subjects, participants and/or the experimenters are often blinded.  This means that they do not know which treatment the participant received.  This ensures that knowledge of receiving a particular treatment – for either the participant or the experimenters – is not a confounding variable.  An experiment that blinds both the participants and the experimenters is called a double-blinded experiment.
  • For confounding variables that are difficult or impossible to control for, the experimental units should be assigned to the control group and the experimental group by randomization.  This can be done with random number tables, flipping a coin, or random number generators from computers.  This ensures that confounding effects affect both the control group and the experimental group roughly equally.
    • For example, an experimenter wants to determine if the HPV vaccine will make new students immune to HPV.  There will be 2 groups: the control group will not receive the vaccine, and the experimental group will receive the vaccine.  If the experimenter can choose students from 2 schools for her study, then the students should be randomly assigned into the 2 groups, so that each group will have roughly the same number of students from each school.  This would minimize the confounding effect of the schools.

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #1

The word “experiment” can mean many different things in various contexts.  In science and statistics, it has a very particular and subtle definition, one that is not immediately familiar to many people who work outside of the field of experimental design. This is the first of a series of blog posts to clarify what an experiment is, how it is conducted, and why it is so central to science and statistics.

Experiment: A procedure to determine the causal relationship between 2 variables – an explanatory variable and a response variable.  The value of the explanatory variable is changed, and the value of the response variable is observed for each value of the explantory variable.

  • An experiment can have 2 or more explanatory variables and 2 or more response variables.
  • In my experience, I find that most experiments have 1 response variable, but many experiments have 2 or more explanatory variables.  The interactions between the multiple explanatory variables are often of interest.
  • All other variables are held constant in this process to avoid confounding.

Explanatory Variable or Factor: The variable whose values are set by the experimenter.  This variable is the cause in the hypothesis.  (*Many people call this the independent variable.  I discourage this usage, because “independent” means something very different in statistics.)

Response Variable: The variable whose values are observed by the experimenter as the explanatory variable’s value is changed.  This variable is the effect in the hypothesis.  (*Many people call this the dependent variable.  Further to my previous point about “independent variables”, dependence means something very different in statistics, and I discourage using this usage.)

Factor Level: Each possible value of the factor (explanatory variable).  A factor must have at least 2 levels.

Treatment: Each possible combination of factor levels.

  • If the experiment has only 1 explanatory variable, then each treatment is simply each factor level.
  • If the experiment has 2 explanatory variables, X and Y, then each treatment is a combination of 1 factor level from X and 1 factor level from Y.  Such combining of factor levels generalizes to experiments with more than 2 explanatory variables.

Experimental Unit: The object on which a treatment is applied.  This can be anything – person, group of people, animal, plant, chemical, guitar, baseball, etc.

Don’t Take Good Data for Granted: A Caution for Statisticians

Background

Yesterday, I had the pleasure of attending my first Spring Alumni Reunion at the University of Toronto.  (I graduated from its Master of Science program in statistics in 2012.)  There were various events for the alumni: attend interesting lectures, find out about our school’s newest initiatives, and meet other alumni in smaller gatherings tailored for particular groups or interests.  The event was very well organized and executed, and I am very appreciative of my alma mater for working so hard to include us in our university’s community beyond graduation.  Most of the attendees graduated 20 or more years ago; I met quite a few who graduated in the 1950’s and 1960’s.  It was quite interesting to chat with them over lunch and during breaks to learn about what our school was like back then.  (Incidentally, I did not meet anyone who graduated in the last 2 years.)

A Thought-Provoking Lecture

My highlight at the reunion event was attending Joseph Wong‘s lecture on poverty, governmental welfare programs, developmental economics in poor countries, and social innovation.  (He is a political scientist at UToronto, and you can find videos of him discussing his ideas on Youtube.)  Here are a few of his key ideas that I took away; note that these are my interpretations of what I can remember from the lecture, so they are not transcriptions or even paraphrases of his exact words:

  1. Many workers around the world are not documented by official governmental records.  This is especially true in developing countries, where the nature of the employer-employee relationship (e.g. contractual work, temporary work, unreported labour) or the limitations of the survey/sampling methods make many of these “invisible workers” unrepresented.  Wong argues that this leads to inequitable distribution of welfare programs that aim to re-distribute wealth.
  2. Social innovation is harnessing knowledge to create an impact.  It often does NOT involve inventing a new technology, but actually combining, re-combining, or arranging existing knowledge and technologies to solve a social problem in an innovative way.  Wong addressed this in further detail in a recent U of T News article.
  3. Poor people will not automatically flock to take advantage of a useful product or service just because of a decrease in price.  Sometimes, substantial efforts and intelligence in marketing are needed to increase the quantity demanded.  A good example is the Tata Nano, a small car that was made and sold in India with huge expectations but underwhelming success.
  4. Poor people often need to mitigate a lot of risk, and that can have a significant and surprising effect on their behaviour in response to the availability of social innovations.  For example, a poor person may forgo a free medical treatment or diagnostic screening if he/she risks losing a job or a business opportunity by taking the time away from work to get that treatment/screening.  I asked him about the unrealistic assumptions that he often sees in economic models based on his field work, and he notes that absence of risk (e.g. in cost functions) as one such common unrealistic assumption.

The Importance of Checking the Quality of the Data

These are all very interesting points to me in their own right.  However, Point #1 is especially important to me as a statistician.  During my Master’s degree, I was warned that most data sets in practice are not immediately ready for analysis, and substantial data cleaning is needed before any analysis can be done; data cleaning can often take 80% of the total amount of time in a project.  I have seen examples of this in my job since finishing my graduate studies a little over a year ago, and I’m sure that I will see more of it in the future.

Even before cleaning the data, it is important to check how the data were collected.  If sampling or experimental methods were used, it is essential to check if they were used or designed properly.  It would be unsurprising to learn that many bureaucrats, policy makers, and elected officials have used unreliable labour statistics to guide all kinds of economic policies on business, investment, finance, welfare, and labour – let alone the other non-economic justifications and factors, like politics, that cloud and distort these policies even further.

We statisticians have a saying about data quality: “garbage in – garbage out”.  If the data are of poor quality, then any insights derived from analyzing those data are useless, regardless of how good the analysis or the modelling technique is.  As a statistician, I cannot take good data for granted, and I aim to be more vigilant about the quality and the source of the data before I begin to analyze them.