Statistics | The Chemical Statistician

Video Tutorial: Naive Bayes Classifiers

October 3, 2018 Leave a comment

Naive Bayes classifiers are simple but powerful tools for classification in statistics and machine learning. In this video tutorial, I use a simulated data set and illustrate the mathematical details of how this technique works.

In my recent episode on The Central Equilibrium about word embeddings and text classification, Mandy Gu used naive Bayes classifiers to determine if a sentence is toxic or non-toxic – a very common objective when moderating discussions in online forums. If you are not familiar with naive Bayes classifiers, then I encourage you to watch this video first before watching Mandy’s episode on The Central Equilibrium.

Filed under Applied Mathematics, Applied Statistics, Data Mining, Machine Learning, Mathematics, Predictive Modelling, Statistics, Tutorials, Video Tagged with machine learning, naive bayes classifier, naive bayes classifiers, statistics, tutorial, video

Mandy Gu on Word Embeddings and Text Classification – The Central Equilibrium – Episode 9

September 26, 2018 Leave a comment

I am so grateful to Mandy Gu for being a guest on The Central Equilibrium to talk about word embeddings and text classification. She began by showing how data from text can be encoded in vectors and matrices, and then she used a naive Bayes classifier to classify sentences as toxic or non-toxic – a very common problem for moderating discussions in online forums. I learned a lot from her in this episode, and you can learn more from Mandy on her Medium blog.

If you are not familiar with naive Bayes classifiers, then I encourage you to watch my video tutorial about this topic first.

Filed under Applied Mathematics, Applied Statistics, Machine Learning, Mathematics, Statistics, The Central Equilibrium, Tutorials, Video Tagged with machine learning, mandy gu, math, mathematics, naive bayes classifier, statistics, text classification, text mining, The Central Equilibrium, video, word embedding, word embeddings

Some SAS procedures (like PROC REG, GLM, ANOVA, SQL, and IML) end with “QUIT;”, not “RUN;”

August 1, 2018 Leave a comment

Most SAS procedures require the

RUN;

statement to signal their termination. However, there are some notable exceptions to this.

I have written about PROC SQL many times on my blog, and this procedure requires the

QUIT;

statement instead.

It turns out that there is another set of statistical procedures that require the QUIT statement, and some of them are very common. They are called interactive procedures, and they include PROC REG, PROC GLM, and PROC ANOVA. If you end them with RUN rather than QUIT, then you will run into problems with displaying further output. For example, if you try to output a data set from one such PROC and end it with the RUN statement, then you will get this error message:

ERROR: You cannot open WORK.MYDATA.DATA for input access with record-level 
control because WORK.MYDATA.DATA is in use by you in resource environment 
REG.

WORK.MYDATA cannot be opened.

You will also notice that the Program Editor says “PROC … running” in its banner when you end such a PROC with RUN rather than QUIT.

I don’t like this exception, but, alas, it does exist. You can find out more about these interactive procedures in SAS Usage Note #37105. As this note says, the ANOVA, ARIMA, CATMOD, FACTEX, GLM, MODEL, OPTEX, PLAN, and REG procedures are interactive procedures, and they all require the QUIT statement for termination.

PROC IML is not mentioned in that usage note, but this procedure also requires the QUIT statement. Rick Wicklin has written an article about this on his blog, The DO Loop.

Filed under Data Analysis, SAS Programming, Statistics, Tutorials Tagged with data analysis, interactive procedures, proc anova, proc arima, proc catmod, proc factex, proc glm, PROC IML, proc model, proc optex, proc plan, proc reg, PROC SQL, rick wicklin, SAS, sas programming, statistics

Arnab Chakraborty on The Monty Hall Problem and Bayes’ Theorem – The Central Equilibrium – Episode 6

July 21, 2018 Leave a comment

I am pleased to welcome Arnab Chakraborty back to my talk show, “The Central Equilibrium“, to talk about the Monty Hall Problem and Bayes’ theorem. In this episode, he shows 2 solutions to this classic puzzle in probability, and invokes Bayes’ Theorem for the second solution.

If you have not watched Arnab’s first episode on Bayes’ theorem, then I encourage you to do that first.

Marilyn Vos Savant provided a solution to this problem in PARADE Magazine in 1990-1991. Thousands of readers disagreed with her solution and criticized her vehemently (and incorrectly) for her error. Some of these critics were mathematicians! She included some of those replies and provided alternative perspectives that led to the same conclusion. Although I am dismayed by the disrespect that some people showed in their letters to her, I am glad that a magazine column on probability was able to attract so much readership and interest. Arnab and I referred to one of her solutions in our episode. Thank you, Marilyn!

Enjoy this episode of “The Central Equilibrium“!

Filed under Applied Statistics, Mathematics, Probability, Statistics, The Central Equilibrium, Video Tagged with Bayes' Theorem, math, mathematics, monty hall problem, probability, statistics, The Central Equilibrium

Layne Newhouse on representing neural networks – The Central Equilibrium – Episode 4

June 28, 2018 Leave a comment

I am excited to present the first of a multi-episode series on neural networks on my talk show, “The Central Equilibrium”. My guest in this series in Layne Newhouse, and he talked about how to represent neural networks. We talked about the biological motivations behind neural networks, how to represent them in diagrams and mathematical equations, and a few of the common activation functions for neural networks.

Check it out!

Filed under Applied Statistics, Data Mining, Machine Learning, Mathematics, Statistics, The Central Equilibrium, Video Tagged with activation function, activation functions, hyperbolic tangent function, layne newhouse, logistic function, machine learning, math, mathematics, neural network, neural networks, rectifier linear unit, relu, statistics, The Central Equilibrium, video

A macro to execute PROC TTEST for multiple binary grouping variables in SAS (and sorting t-test statistics by their absolute values)

May 4, 2018 Leave a comment

In SAS, you can perform PROC TTEST for multiple numeric variables in the same procedure. Here is an example using the built-in data set SASHELP.BASEBALL; I will compare the number of at-bats and number of walks between the American League and the National League.

proc ttest
     data = sashelp.baseball;
     class League;
     var nAtBat nBB; 
     ods select ttests;
run;

Here are the resulting tables.

Method	Variances	DF	t Value	Pr > \|t\|
Pooled	Equal	320	2.05	0.0410
Satterthwaite	Unequal	313.66	2.06	0.04

Method	Variances	DF	t Value	Pr > \|t\|
Pooled	Equal	320	0.85	0.3940
Satterthwaite	Unequal	319.53	0.86	0.3884

What if you want to perform PROC TTEST for multiple grouping (a.k.a. classification) variables? You cannot put more than one variable in the CLASS statement, so you would have to run PROC TTEST separately for each binary grouping variable. If you do put LEAGUE and DIVISION in the same CLASS statement, here is the resulting log.

1303 proc ttest
1304 data = sashelp.baseball;
1305 class league division;
 --------
 22
 202
ERROR 22-322: Expecting ;.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
1306 var natbat;
1307 ods select ttests;
1308 run;

There is no syntax in PROC TTEST to use multiple grouping variables at the same time, so this tutorial provides a macro to do so. There are several nice features about my macro:

It allows you to use multiple grouping variables at the same time.
It sorts the t-test statistics by their absolute values within each grouping variable.
It shows the name of each continuous variable in the output table, unlike the above output.

Here is its basic skeleton.

Read more of this post

Filed under Applied Statistics, Data Analysis, Descriptive Statistics, SAS Programming, Statistics, Tutorials Tagged with applied statistics, data analysis, do loop, macro, proc ttest, SAS, sas macro, sashelp.baseball, statistics, Student's t-test, t-test

A macro to automate the creation of indicator variables in SAS

April 25, 2018 Leave a comment

In a recent blog post, I introduced an easy and efficient way to create indicator variables from categorical variables in SAS. This method pretends to run logistic regression, but it really is using PROC LOGISTIC to get the design matrix based on dummy-variable coding. I shared SAS code for how to do so, step-by-step.

I write this follow-up post to provide a macro that you can use to execute all of those steps in one line. If you have not read my previous post on this topic, then I strongly encourage you to do that first. Don’t use this macro blindly.

Here is the macro. The key steps are

Run PROC LOGISTIC to get the design matrix (which has the indicator variables)
Merge the original data with the newly created indicator variables
Delete the “INDICATORS” data set, which was created in an intermediate step

%macro create_indicators(input_data, target, covariates, output_data);

proc logistic
     data = &input_data
          noprint
          outdesign = indicators;
     class &covariates / param = glm;
     model &target = &covariates;
run;


data &output_data;
      merge    &input_data
               indicators (drop = Intercept &target);
run;


proc datasets 
     library = work
          noprint;
     delete indicators;
run;

%mend;

I will use the built-in data set SASHELP.CARS to illustrate the use of my macro. As you can see, my macro can accept multiple categorical variables as inputs for creating indicator variables. I will do that here for the variables TYPE, MAKE, and ORIGIN.

Read more of this post

Filed under Applied Statistics, Biostatistics, Categorical Data Analysis, Data Analysis, SAS Programming, Statistics, Tutorials Tagged with categorical data, Categorical Data Analysis, categorical variable, data analysis, dummy coding, dummy variables, indicator, indicator variable, indicator variables, indicators, SAS, sas programming, statistics

An easy and efficient way to create indicator variables (a.k.a. dummy variables) from a categorical variable in SAS

April 24, 2018 Leave a comment

Introduction

In statistics and biostatistics, the creation of binary indicators is a very useful practice.

They can be useful predictor variables in statistical models.
They can reduce the amount of memory required to store the data set.
They can treat a categorical covariate as a continuous covariate in regression, which has certain mathematical conveniences.

However, the creation of indicator variables can be a long, tedious, and error-prone process. This is especially true if there are many categorical variables, or if a categorical variable has many categories. In this tutorial, I will show an easy and efficient way to create indicator variables in SAS. I learned this technique from SAS usage note #23217: Saving the coded design matrix of a model to a data set.

The Example Data Set

Let’s consider the PRDSAL2 data set that is built into the SASHELP library. Here are the first 5 observations; due to a width constraint, I will show the first 5 columns and the last 6 columns separately. (I encourage you to view this data set using PROC PRINT in SAS by yourself.)

COUNTRY	STATE	ACTUAL	PREDICT
U.S.A.	California	$987.36	$692.24
U.S.A.	California	$1,782.96	$568.48
U.S.A.	California	$32.64	$16.32
U.S.A.	California	$1,825.12	$756.16
U.S.A.	California	$750.72	$723.52

PRODTYPE	PRODUCT	YEAR	QUARTER	MONTH	MONYR
FURNITURE	SOFA	1995	1	Jan	JAN95
FURNITURE	SOFA	1995	1	Feb	FEB95
FURNITURE	SOFA	1995	1	Mar	MAR95
FURNITURE	SOFA	1995	2	Apr	APR95
FURNITURE	SOFA	1995	2	May	MAY95

Read more of this post

Video Tutorial – Obtaining the Expected Value of the Exponential Distribution Using the Moment Generating Function

March 28, 2018 2 Comments

In this video tutorial on YouTube, I use the exponential distribution’s moment generating function (MGF) to obtain the expected value of this distribution. Visit my YouTube channel to watch more video tutorials!

Filed under Mathematical Statistics, Mathematics, Probability, Statistics, Tutorials, Video Tagged with expectation, expected value, exponential distribution, moment generating function, probability, probability density function, probability distribution, statistics

Video Tutorial – The Moment Generating Function of the Exponential Distribution

March 4, 2018 Leave a comment

In this video tutorial on YouTube, I derive the moment generating function (MGF) of the exponential distribution. Visit my YouTube channel to watch more video tutorials!

Filed under Mathematical Statistics, Mathematics, Probability, Statistics, Tutorials, Video Tagged with exponential distribution, math, mathematical statistics, mathematics, moment generating function, probability, statistics

Arnab Chakraborty on Bayes’ Theorem – The Central Equilibrium – Episode 3

December 18, 2017 Leave a comment

Arnab Chakraborty kindly came to my new talk show, “The Central Equilibrium”, to talk about Bayes’ theorem. He introduced the concept of conditional probability, stated Bayes’ theorem in its simple and general forms, and showed an example of how to use it in a calculation.

Check it out!

Filed under Applied Statistics, Mathematical Statistics, Probability, Statistics, The Central Equilibrium, Video Tagged with Bayes' Theorem, math, mathematical statistics, mathematics, probability, statistics, The Central Equilibrium

Christopher Salahub on Markov Chains – The Central Equilibrium – Episode 2

September 11, 2017 1 Comment

It was a great pleasure to talk to Christopher Salahub about Markov chains in the second episode of my new talk show, The Central Equilibrium! Chris graduated from the University of Waterloo with a Bachelor of Mathematics degree in statistics. He just finished an internship in data development at Environics Analytics, and he is starting a Master’s program in statistics at ETH Zurich in Switzerland.

Chris recommends “Introduction to Probability Models” by Sheldon Ross to learn more about probability theory and Markov chains.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams. Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog. You can also watch it directly on YouTube.

Filed under Applied Mathematics, Applied Statistics, Mathematical Statistics, Mathematics, Probability, Statistics, The Central Equilibrium, Video Tagged with chris salahub, christopher salahub, markov, markov chains, math, mathematics, probability, statistics, The Central Equilibrium

Store multiple strings of text as a macro variable in SAS with PROC SQL and the INTO statement

September 8, 2017 Leave a comment

I often need to work with many variables at a time in SAS, but I don’t like to type all of their names manually – not only is it messy to read, it also induces errors in transcription, even when copying and pasting. I recently learned of an elegant and efficient way to store multiple variable names into a macro variable that overcomes those problems. This technique uses the INTO statement in PROC SQL.

To illustrate how this storage method can be applied in a practical context, suppose that we want to determine the factors that contribute to a baseball player’s salary in the built-in SASHELP.BASEBALL data set. I will consider all continuous variables other than “Salary” and “logSalary”, but I don’t want to write them explicitly in any programming statements. To do this, I first obtain the variable names and types of a data set using PROC CONTENTS.

* create a data set of the variable names;
proc contents
     data = sashelp.baseball
          noprint
     out = bvars (keep = name type);
run;

Read more of this post

Filed under Applied Statistics, Data Analysis, Descriptive Statistics, SAS Programming, Statistics Tagged with applied statistics, correlation, correlation coefficient, data analysis, data manipulation, into statement, macro, macro variable, PROC SQL, programming, SAS, sas programming, SQL, statistics

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

August 16, 2017 5 Comments

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected. Here is an example involving the built-in data set SASHELP.CLASS.

Here is the code:

data c1;
     set sashelp.class;
 
     * define a new character variable to classify someone as tall or short;
     if height > 60
     then height_class = 'Tall';
          else height_class = 'Short';
run;


* print the results for the first 5 rows;
proc print
     data = c1 (obs = 5);
run;

Here is the result:

Obs	Name	Sex	Age	Height	Weight	height_class
1	Alfred	M	14	69.0	112.5	Tall
2	Alice	F	13	56.5	84.0	Shor
3	Barbara	F	13	65.3	98.0	Tall
4	Carol	F	14	62.8	102.5	Tall
5	Henry	M	14	63.5	102.5	Tall

What happened? Why does the word “Short” render as “Shor”?

Read more of this post

Filed under Categorical Data Analysis, Data Analysis, R programming, SAS Programming, Statistics, Tutorials Tagged with categorical data, categorical variable, character data, character variable, length(), R, r programing, SAS, sas programming

Sorting correlation coefficients by their magnitudes in a SAS macro

March 21, 2017 Leave a comment

Theoretical Background

Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables. For 2 random variables, $X$ and $Y$ , the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations. Algebraically, this can be expressed as

$\rho_{X, Y} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}$ .

In real life, you can never know what the true correlation coefficient is, but you can estimate it from data. The most common estimator for $\rho$ is the Pearson correlation coefficient, which is defined as the sample covariance between $X$ and $Y$ divided by the product of their sample standard deviations. Since there is a common factor of

$\frac{1}{n - 1}$

in the numerator and the denominator, they cancel out each other, so the formula simplifies to

$r_P = \frac{\sum_{i = 1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i = 1}^{n}(x_i - \bar{x})^2 \sum_{i = 1}^{n}(y_i - \bar{y})^2}}$ .

In predictive modelling, you may want to find the covariates that are most correlated with the response variable before building a regression model. You can do this by

computing the correlation coefficients
obtaining their absolute values
sorting them by their absolute values.

Read more of this post

Filed under Applied Statistics, Data Analysis, Descriptive Statistics, Mathematical Statistics, Predictive Modelling, SAS Programming, Statistics, Tutorials Tagged with correlation, macro, pearson correlation, pearson correlation coefficient, predictive modelling, PROC CORR, regression, regression modelling, SAS, sas macro

Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry

November 17, 2015 Leave a comment

In this second article of a 2-part series on the official JMP blog, I use analysis of variance (ANOVA) to assess a sample-preparation scheme for quantifying sodium in potato chips. I illustrate the use of the “Fit Y by X” platform in JMP to implement ANOVA, and I propose an alternative sample-preparation scheme to obtain a sample with a smaller variance. This article is entitled “Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry“.

If you haven’t read my first blog post in this series on preparing the data in JMP and using the “Stack Columns” function to transpose data from wide format to long format, check it out! I presented this topic at the last Vancouver SAS User Group (VanSUG) meeting on Wednesday, November 4, 2015.

My thanks to Arati Mejdal, Louis Valente, and Mark Bailey at JMP for their guidance in writing this 2-part series! It is a pleasure to be a guest blogger for JMP!

potato-chips-and-analytical-chemistry-part-2

Filed under Analytical Chemistry, Applied Statistics, Basic Chemistry, Chemistry, Data Analysis, Data Visualization, JMP, Practical Applications of Chemistry, Scientific Applications of Chemistry, Statistics, Tutorials Tagged with analysis of variance, analytical chemistry, ANOVA, chemistry, chips, JMP, potato chips, sample preparation, statistics, sum of squares

Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP

November 4, 2015 Leave a comment

I am very excited to write again for the official JMP blog as a guest blogger! Today, the first article of a 2-part series has been published, and it is called “Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP“. This series of blog posts will talk about analysis of variance (ANOVA), sampling, and analytical chemistry, and it uses the quantification of sodium in potato chips as an example to illustrate these concepts.

The first part of this series discusses how to import the data into the JMP and prepare them for ANOVA. Specifically, it illustrates how the “Stack Columns” function is used to transpose the data from wide format to long format.

I will present this at the Vancouver SAS User Group (VanSUG) meeting later today.

Stay tuned for “Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry“!

potato-chips-and-analytical-chemistry-part-1

Filed under Analytical Chemistry, Applied Statistics, Basic Chemistry, Chemistry, Data Analysis, JMP, Scientific Applications of Chemistry, Statistics, Statistics in Industry and Practice, Tutorials Tagged with aliquot, aliquots, analysis of variance, ANOVA, chemistry, chips, erlenmeyer flask, JMP, potato chips, sampling, statistics, transpose, transposing data, uncertainty, volumetric flask

Odds and Probability: Commonly Misused Terms in Statistics – An Illustrative Example in Baseball

August 12, 2015 8 Comments

Yesterday, all 15 home teams in Major League Baseball won on the same day – the first such occurrence in history. CTV News published an article written by Mike Fitzpatrick from The Associated Press that reported on this event. The article states, “Viewing every game as a 50-50 proposition independent of all others, STATS figured the odds of a home sweep on a night with a full major league schedule was 1 in 32,768.” (Emphases added)

odds of all 15 home teams winning on same day

Screenshot captured at 5:35 pm Vancouver time on Wednesday, August 12, 2015.

Out of curiosity, I wanted to reproduce this result. This event is an intersection of 15 independent Bernoulli random variables, all with the probability of the home team winning being 0.5.

$P[(\text{Winner}_1 = \text{Home Team}_1) \cap (\text{Winner}_2 = \text{Home Team}_2) \cap \ldots \cap (\text{Winner}_{15}= \text{Home Team}_{15})]$

Since all 15 games are assumed to be mutually independent, the probability of all 15 home teams winning is just

$P(\text{All 15 Home Teams Win}) = \prod_{n = 1}^{15} P(\text{Winner}_i = \text{Home Team}_i)$

$P(\text{All 15 Home Teams Win}) = 0.5^{15} = 0.00003051757$

Now, let’s connect this probability to odds.

It is important to note that

odds is only applicable to Bernoulli random variables (i.e. binary events)
odds is the ratio of the probability of success to the probability of failure

For our example,

$\text{Odds}(\text{All 15 Home Teams Win}) = P(\text{All 15 Home Teams Win}) \ \div \ P(\text{At least 1 Home Team Loses})$

$\text{Odds}(\text{All 15 Home Teams Win}) = 0.00003051757 \div (1 - 0.00003051757)$

$\text{Odds}(\text{All 15 Home Teams Win}) = 0.0000305185$

The above article states that the odds is 1 in 32,768. The fraction 1/32768 is equal to 0.00003051757, which is NOT the odds as I just calculated. Instead, 0.00003051757 is the probability of all 15 home teams winning. Thus, the article incorrectly states 0.00003051757 as the odds rather than the probability.

This is an example of a common confusion between probability and odds that the media and the general public often make. Probability and odds are two different concepts and are calculated differently, and my calculations above illustrate their differences. Thus, exercise caution when reading statements about probability and odds, and make sure that the communicator of such statements knows exactly how they are calculated and which one is more applicable.

Filed under Applied Statistics, Categorical Data Analysis, Data Analysis, Mathematical Statistics, Mathematics, Probability, Statistics, Statistics in Industry and Practice, Tutorials Tagged with baseball, math, median, mlb, odds, probability, statistics, statistics communication

Mathematical Statistics Lesson of the Day – Basu’s Theorem

July 21, 2015 1 Comment

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics. As before, I will begin with the following set-up.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$ . Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$ .

Let

$t = T(\mathbf{X})$

be a statistics based on $\textbf{X}$ .

Basu’s theorem states that, if $T(\textbf{X})$ is a complete and minimal sufficient statistic, then $T(\textbf{X})$ is independent of every ancillary statistic.

Establishing the independence between 2 random variables can be very difficult if their joint distribution is hard to obtain. This theorem allows the independence between minimally sufficient statistic and every ancillary statistic to be established without their joint distribution – and this is the great utility of Basu’s theorem.

However, establishing that a statistic is complete can be a difficult task. In a later lesson, I will discuss another theorem that will make this task easier for certain cases.

Filed under Mathematical Statistics, Mathematics, Statistics, Statistics Lesson of the Day Tagged with ancillary statistic, ancillary statistics, basu's theorem, complete statistic, complete statistics, independence, math, mathematical statistics, mathematics, minimally sufficient statistic, minimally sufficient statistics, statistical computing, sufficient statistic, sufficient statistics

Mathematical Statistics Lesson of the Day – An Example of An Ancillary Statistic

June 25, 2015 3 Comments

Consider 2 random variables, $X_1$ and $X_2$ , from the normal distribution $\text{Normal}(\mu, \sigma^2)$ , where $\mu$ is unknown. Then the statistic

$D = X_1 - X_2$

has the distribution

$\text{Normal}(0, 2\sigma^2)$ .

The distribution of $D$ does not depend on $\mu$ , so $D$ is an ancillary statistic for $\mu$ .

Note that, if $\sigma^2$ is unknown, then $D$ is not ancillary for $\sigma^2$ .

Filed under Mathematical Statistics, Statistics, Statistics Lesson of the Day Tagged with ancillary statistic, ancillary statistics, estimation, math, mathematical statistics, mathematics, normal distribution, point estimation, random variable, statistics

← Older posts

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Video Tutorial: Naive Bayes Classifiers

Mandy Gu on Word Embeddings and Text Classification – The Central Equilibrium – Episode 9

Some SAS procedures (like PROC REG, GLM, ANOVA, SQL, and IML) end with “QUIT;”, not “RUN;”

Arnab Chakraborty on The Monty Hall Problem and Bayes’ Theorem – The Central Equilibrium – Episode 6

Layne Newhouse on representing neural networks – The Central Equilibrium – Episode 4

A macro to execute PROC TTEST for multiple binary grouping variables in SAS (and sorting t-test statistics by their absolute values)

A macro to automate the creation of indicator variables in SAS

An easy and efficient way to create indicator variables (a.k.a. dummy variables) from a categorical variable in SAS

Introduction

The Example Data Set

Video Tutorial – Obtaining the Expected Value of the Exponential Distribution Using the Moment Generating Function

Video Tutorial – The Moment Generating Function of the Exponential Distribution

Arnab Chakraborty on Bayes’ Theorem – The Central Equilibrium – Episode 3

Christopher Salahub on Markov Chains – The Central Equilibrium – Episode 2

Store multiple strings of text as a macro variable in SAS with PROC SQL and the INTO statement

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

Sorting correlation coefficients by their magnitudes in a SAS macro

Theoretical Background

Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry

Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP

Odds and Probability: Commonly Misused Terms in Statistics – An Illustrative Example in Baseball

Mathematical Statistics Lesson of the Day – Basu’s Theorem

Mathematical Statistics Lesson of the Day – An Example of An Ancillary Statistic

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories