February | 2013 | The Chemical Statistician

The Gold Foil Experiment and The 250-Million-Ton Pea: The Composition of the Atom

February 23, 2013 Leave a comment

This Atom Is Not To Scale

In a recent post about isotopic abundance, I used a prototypical image of a lithium atom to illustrate the basic structure of an atom. However, the image was deliberately not drawn to scale to make the protons, neutrons, and electrons visible. Let’s look at the basic composition of the atom to see why, and we owe this understanding to Ernest Rutherford. First, let’s give some historical background about what motivated Rutherford to conduct this experiment; we first turn to the Plum Pudding Model by J.J. Thomson.

The Plum Pudding Model

Before 1911, the dominant theory of atomic composition was J.J. Thomson‘s “plum pudding” model. Thomson hypothesized that an atom consisted of electrons as negatively charged particles (the “plums”) “floating” in a “pudding” of positive charge.

Plum Pudding Model of the Atom

Source: Wikimedia Commons

Read more of this post

Filed under Basic Chemistry, Chemistry, Nuclear Chemistry, Physical Chemistry, Tutorials Tagged with alpha particle, atom, chemistry, electron, ernest rutherford, geiger, gold foil, j.j. thomson, marsden, nuclear chemistry, nucleus, physical chemistry, plum pudding, proton, rutherford, thomson

Getting Help with R Programming: Useful Survival Skills

February 23, 2013 1 Comment

Useful Resources to Learn about R on the Internet

When I program in R and struggle with something, the first thing that I usually turn to is Google. I search the relevant function or the desired outcome, and I often find the solutions within the first few hits. They likely show up in the documentation, online discussion forums like Nabble and Stack Overflow, email threads within the R Mailing List, and web sites that provide tutorials on R (like R-Bloggers and the UCLA Statistical Computing Group). These are all great resources.

Today, however, I’m going to show you some useful survival skills within the R environment. I read about these help facilities on Page 7 of the 3rd edition of the book “Data Analysis and Graphics Using R” by John Maindonald and W. John Braun.

Read more of this post

Filed under R programming, Statistical Computing, Statistics, Tutorials Tagged with apropos(), example, help(), help.search, R, R programming, survival skills, troubleshooting

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS

February 20, 2013 Leave a comment

My slides on partial least squares regression at the Toronto Area SAS Society (TASS) meeting on September 14, 2012, can be found here.

My Presentation on Partial Least Squares Regression

My first presentation to Toronto Area SAS Society (TASS) was delivered on September 14, 2012. I introduced a supervised learning/predictive modelling technique called partial least squares (PLS) regression; I showed how normal linear least squares regression is often problematic when used with big data because of multicollinearity and overfitting, explained how partial least squares regression overcomes these limitations, and illustrated how to implement it in SAS and JMP. I also highlighted the variable importance for projection (VIP) score that can be used to conduct variable selection with PLS regression; in particular, I documented its effectiveness as a technique for variable selection by comparing some key journal articles on this issue in academic literature.

The green line is an overfitted classifier. Not only does it model the underlying trend, but it also models the noise (the random variation) at the boundary. It separates the blue and the red dots perfectly for this data set, but it will classify very poorly on a new data set from the same population.

Source: Chabacano via Wikimedia
Read more of this post

Filed under Applied Statistics, Data Mining, Machine Learning, Predictive Modelling, Presentations & Appearances, Statistics, Statistics in Industry and Practice Tagged with applied statistics, data analysis, data mining, JMP, least squares regression, least-squares, linear regression, machine learning, multicollinearity, overfitting, predictive modeling, predictive modelling, SAS, SAS User Group, SAS User Groups, statistics, supervised learning, TASS, Toronto Area SAS Society

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

February 19, 2013 Leave a comment

My slides on K-means clustering at the Toronto Area SAS Society (TASS) meeting on December 14, 2012, can be found here.

This image is slightly enhanced from an image created by Weston.pace from Wikimedia Commons.

My Presentation on K-Means Clustering

I was very pleasured to be invited for the second time by the Toronto Area SAS Society (TASS) to deliver a presentation on machine learning. (I previously presented on partial least squares regression.) At its recent meeting on December 14, 2012, I introduced an unsupervised learning technique called K-means clustering.

I first defined clustering as a set of techniques for identifying groups of objects by maximizing a similarity criterion or, equivalently, minimizing a dissimilarity criterion. I then defined K-means clustering specifically as a clustering technique that uses Euclidean proximity to a group mean as its similarity criterion. I illustrated how this technique works with a simple 2-dimensional example; you can follow along this example in the slides by watching the sequence of images of the clusters toward convergence. As with many other machine learning techniques, some arbitrary decisions need to be made to initiate the algorithm for K-means clustering:

How many clusters should there be?
What is the mean of each cluster?

I provided some guidelines on how to make these decisions in these slides.

Read more of this post

Filed under Machine Learning, Presentations & Appearances Tagged with clustering, JMP, k-means clustering, machine learning, SAS, TASS, toronto, unsupervised learning

Displaying Isotopic Abundance Percentages with Bar Charts and Pie Charts

February 17, 2013 Leave a comment

The Structure of an Atom

An atom consists of a nucleus at the centre and electrons moving around it. The nucleus contains a mixture of protons and neutrons. For most purposes in chemistry, the two most important properties about these 3 types of particles are their masses and charges. In terms of charge, protons are positive, electrons are negative, and neutrons are neutral. A proton’s mass is roughly the same as a neutron’s mass, but a proton is almost 2,000 times heavier than an electron.

This image shows a lithium atom, which has 3 electrons, 3 protons, and 4 neutrons.

Source: Wikimedia Commons

Read more of this post

Filed under Basic Chemistry, Chemistry, Data Visualization, Descriptive Statistics, R programming, Statistics, Tutorials Tagged with atomic mass number, atomic number, bar chart, barplot(), categorical variable, chemistry, data, Data Visualization, descriptive statistics, isotope, neutron, pie chart, pie(), plot, plots, plotting, PNG, proton, R, R programming, statistics

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

The Gold Foil Experiment and The 250-Million-Ton Pea: The Composition of the Atom

This Atom Is Not To Scale

The Plum Pudding Model

Getting Help with R Programming: Useful Survival Skills

Useful Resources to Learn about R on the Internet

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS

My Presentation on Partial Least Squares Regression

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

My Presentation on K-Means Clustering

Displaying Isotopic Abundance Percentages with Bar Charts and Pie Charts

The Structure of an Atom

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories