February 27, 2014 11 Comments
In my current job, I study HIV at the genetic and biochemical levels. Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text. (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from the HIV’s RNA.) In this post, I describe some common functions in R that I often use for text processing.
Obtaining Basic Information about Character Variables
> year = 2014 > is.character(year)  FALSE
If a variable is not a character variable, you can convert it to a character variable using the as.character() function.
> year.char = as.character(year) > is.character(year.char)  TRUE
A basic piece of information about a character variable is the number of characters that exist in this string. Use the nchar() function to obtain this information.
> nchar(year.char)  4