Eric’s Enlightenment for Wednesday, June 3, 2015

  1. Jodi Beggs uses the Rule of 70 to explain why small differences in GDP growth rates have large ramifications.
  2. Rick Wicklin illustrates the importance of choosing bin widths carefully when plotting histograms.
  3. Shana Kelley et al. have developed an electrochemical sensor for detecting selected mutated nucleic acids (i.e. cancer markers in DNA!).  “The sensor comprises gold electrical leads deposited on a silicon wafer, with palladium nano-electrodes.”
  4. Rhett Allain provides a very detailed and analytical critique of Mjölnir (Thor’s hammer) – specifically, its unrealistic centre of mass.  This is an impressive exercise in physics!
  5. Congratulations to the Career Services Centre at Simon Fraser University for winning TalentEgg’s Special Award for Innovation by a Career Centre!  I was fortunate to volunteer there as a career advisor for 5 years, and it was a wonderful place to learn, grow and give back to the community. My career has benefited greatly from that experience, and it is a pleasure to continue my involvement as a guest blogger for its official blog, The Career Services Informer. Way to go, everyone!
Advertisements

Eric’s Enlightenment for Tuesday, April 28, 2015

  1. On a yearly basis, the production of almonds in California uses more water than businesses and residences in San Francisco and Los Angeles combined.  Alex Tabarrok explains why.
  2. How patient well-being and patient satisfaction become conflicting objectives in hospitals – a case study of a well-intended policy with deadly consequences.  (HT: Frances Woolley – with a thought about academia.)
  3. Contrary to a long-held presumption about the stability of DNA in mature cells, Huimei Yu et al. show that neurons use DNA methylation to rewrite their DNA throughout each day.  This is done to adjust the brain to different activity levels as its function changes over time.
  4. Alex Yakubovitch provides a tutorial on regular expressions (patterns that define sets of strings) and how to use them in R.

Eric’s Enlightenment for Thursday, April 23, 2015

  1. Reaching the NBA Finals has been much more difficult in the Western Conference than in the Eastern Conference in the past 15 years.
  2. In terms of points above average shooter per 100 shots, Kyle Korver ranks first in 2014-2015 with +30.4 points.  DeAndre Jordan ranks second with +17.4 points.  (Incredible!)
  3. Evan Soltas evaluates “the rent hypothesis” – the claim that a larger share of income in recent years are unearned gains.  (More rigorous, rent is “a payment for a resource in excess of its opportunity cost, one that instead reflects market power”.)  This is Evan’s most read article.
  4. A research team led by Junjiu Huang from 中山大学 (Sun Yat-Sen University) have successfully “edited the genes of human embryos using a new technique called CRISPR”.  Carl Zimmer provides some background.  (HT: Tyler Cowen.)

Useful Functions in R for Manipulating Text Data

Introduction

In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from the HIV’s RNA.)  In this post, I describe some common functions in R that I often use for text processing.

Obtaining Basic Information about Character Variables

In R, I often work with text data in the form of character variables.  To check if a variable is a character variable, use the is.character() function.

> year = 2014
> is.character(year)
[1] FALSE

If a variable is not a character variable, you can convert it to a character variable using the as.character() function.

> year.char = as.character(year)
> is.character(year.char)
[1] TRUE

A basic piece of information about a character variable is the number of characters that exist in this string.  Use the nchar() function to obtain this information.

> nchar(year.char)
[1] 4

Read more of this post