She Loves Data: R Workshop Day 1 – List

Lists are straightforward type of data structure, by using list().

Syntax: list(vector, val2, …)
Lists:
– Store an ordered collections of objects.
– It can contain different data types, works like a container without restriction.

#Define a list with vector, floating number and function, sin.
list <- list(c(2,3,5), 31.7, sin)

The output on my console shows,

Three different elements list on the console. 
Another example of using list was in my previous blog on Matrix, where I define the dimnames using list in the example. 

The rownames and colnames are vectors which contain the names of the rows and columns.

I do not have many example using list() especially on how to access the elements of the list. However, I believe it should be able to retrieve the elements similarly to other programming languages by using the indexes and square brackets. I will give an update on that later.

She Loves Data: R Workshop Day 1 – Data Frame

It is an important data structure in R.

Syntax: data.frame() everything will be declared within the parenthesis.
Data Frames:
– Generated by combining multiple vectors
– It can be created by using external files when importing the data into R.

I am not sure how to share what I learned about data frame in just one blog entry. It works slightly different than matrices, where data frame can contain different modes of data. See example below:

#Create the data frame.
emp.data <- data.frame(
emp_id = c(1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27"))
)
#Print the data frame.
print(emp.data)

A data frame is created called emp.data which contains of number for emp_id, characters for emp_name, floating points for salary and date for start working date. The output of the data frame on the console when I print(emp.data) is as below:

In data frame, the column names are taken from the variable names of the vectors.

Data frame has several built-in R functions which are quite useful. Follow the examples below:

str(emp.data) 
– When I execute the above code, the console shows:
‘data.frame’: 5 obs. of 4 variables:
$ emp_id : int 1 2 3 4 5
$ emp_name : Factor w/ 5 levels “Dan”,”Gary”,”Michelle”,..: 4 1 3 5 2
$ salary : num 623 515 611 729 843
$ start_date: Date, format: “2012-01-01” “2013-09-23” “2014-11-15” “2014-05-11” …

Do you know why it is 5 objects? Yes, 4 vectors and a data frame.

View(emp.data)
– View the data in tabular format. 
– Navigate to the top left box in the RStudio, I see another tab named with empdata displayed.
– Use it often to check or view data.

Cool, right?

Next cool things we can do with data frame is using the summary(emp.data).
– Print out the summary and it shows the min, max, median, mean, 1st quarter and 3rd Quarter. In some statistics analysis, this is very useful piece of information.
– How to do extract just min, median and max values from the summary()?

What if I want to extract specific columns from the data frame? How does it can be done? Below codes explain and the output on the console. I can access to the columns in the data frame by using “$” symbol.

#Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

Accessing the data frame.
– Extract information of a specific rows and columns.

– Extract using head() and tail().
In a larger data frame, it is quite useful function to extract top 6 records and last 6 records. The example from the workshop is not large enough to see the different, so let try head(mtcars) and tail(mtcars).

mtcars is built-in data frame in RStudio.

To add another “column”, it can be done directly with codes below:
emp.data$dept < – c(“IT”,”Operations”,”IT”,”HR”,”Finance”).

Then, map it to a variable to print out on the console using the following codes,

#Add the "dept" column.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

The key is using “$“, the same key I used to extract or access data from emp.data data frame. 

I will share more on data frames when I come across interesting codes.  Stay tuned. Thank you.

She Loves Data: R Workshop Day 1 – Matrix

I continue with the next data structure in R language, let have a quick zoom into matrices with example given by the instructor as below using the matrix syntax:

Syntax : matrix(data, nrow, ncol, byrow, dimnames)

Matrices:
– Elements are arranged sequentially by row.
– It starts with row, then column.
data can be a form of vector.
nrow or ncol means desired number of rows or columns.
byrow is a logical value, TRUE or FALSE. By default, the matrix is filled by columns, otherwise the matrix is filled by rows.

Let me share the examples from the workshop.

#matrices
#Elements are arranged sequentially by row.
M <- matrix(c(3:15), nrow = 4, byrow = TRUE)
print(M)
#Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)

My output on the console,

Notice the above warning message. It is because the length of the vector c ranging from number 3 to 15, inclusive, is only 13 and I wanted to create a 4 rows matrix (m) by setting nrow = 4, byrow = TRUE.

However, if I changed byrow = FALSE, the 13 elements are arranged by column and returned the matrix (n) without warning message.

Rename rows and columns in matrix
It allows to define the row and column names using dimnames.

#Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)

Rename the row and column names gives us a better picture of the matrix.

Before I move on, do you notice the above codes contains vector, matrix and list? Amazing, right?

Does Python or Scala able to do so? Please share with me if you know it.

Access elements in matrix
I can access the element in the matrix using index, which begin with 1. Some example using the matrix (p) to find some values.

Addition and subtraction of matrix
Matrix of the same data type and number of elements within a matrix, can perform addition of both matrix together or subtraction values between the matrices.

Pretty cool right, working around with matrices. I am going to share two more data structures, the list and the data frame in my next blogs.

She Loves Data: R Workshop Day 1 – Vectors

After a short break over the weekend, I began my learning again. Let continue exploring the data structure in R language, the vectors. The basic ones to play around and get familiar with codes.

Syntax: c(val1, val3, val3, …)

Vectors
– Simplest, basic data structure in R.
– Contains same type of data.

#Vector declaration
#vector containing three numeric values 2, 3 and 5.
c(2, 3, 5)

#vector contains 1 to 10. First value before colon (:) is start number, and value after the colon is end number.
c(1:10)

#vector of logical values.
c(TRUE, FALSE, TRUE, FALSE, FALSE)

#vector of character values.
c("aa", "bb", "cc", "dd", "ee")

#Check the length of the vector and it returns 5.
length(c("aa", "bb", "cc", "dd", "ee"))

#check the length of a value, "aa" in the vector. It returns 2.
nchar( c("aa"))

#check the data type,
n = c("aa", "bb", "cc", "dd", "ee")
class(n)
[1] "character"
typeof(n)
[1] "character"

#Full codes:
c("aa", "bb", "cc", "dd", "ee")
[1] "aa" "bb" "cc" "dd" "ee"
length(c("aa", "bb", "cc", "dd", "ee"))
[1] 5
nchar( c("aa"))
[1] 2
n = c("aa", "bb", "cc", "dd", "ee")
class(n)
[1] "character"
typeof(n)
[1] "character"

#Combining vectors. It converts the numeric into string.
n = c(2, 3, 5)
s = c("aa", "bb", "cc", "dd", "ee")
c(n, s)
[1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee"

I learned from the recent R workshop organized by Sparkline, the vectors can be used in many different situations. I will try to point it out whenever I share the codes in my next blogs. Let us move on to the next data structure.

She Loves Data: R Workshop Day 1 – Basic Data Type Conversion

It works similarly to other programming languages, in R, it has.
– use is.xxx() to check the data type is of ‘xxx’ type and returns TRUE or FALSE.
– use as.xxx() to convert it into ‘xxx’ type.

Example,

  • is.numeric(), is.integer(), is.character(), is.vector(), is.matrix(), is.data.frame()
  • as.numeric(), as.integer(), as.character(), as.vector(), as.matrix(), as.data.frame)

I have tried to use as.integer() previously, still remember?

 #When I execute the line, y = as.integer(3), the output is,

y = as.integer(3)
class(y)
[1] "integer"

With as.integer, it converts the “3” into integer data type and when use the class(y) it returns integer. Another one which can be used quite often is as.character(), convert a number into a string.

More data conversion is showed in the chart below, it is taken from, https://www.statmethods.net/management/typeconversion.html

After talking so much for character, numeric, string and Boolean, how about Date? Do we able to convert date?

Answer is yes. It has as.date() to convert character into date with default format, yyyy-mm-dd. Of course, we can change the date format. Example, it could be coded as,

as.Date(x, “format”)

where the date formatting can refer to below table,

Right now, I do not have much examples to show data conversion and formatting. Hopefully in the future posts, there will be more sample codes written with data conversion or formatting.

She Loves Data: R Workshop Day 1 – Basic Data Structure

Comes into the most important part of R, the data structure or data objects in R. It holds data and it is used to handle all computations in R. Let explore one by one. It is a bit theory for this portion. It gives some information of what is vectors, matrices, lists and data frames.

Vector : c(val1, val3, val3, …)
– Simplest, basic data structure in R.
– Contains same type of data.

Matrix : matrix(data, nrow, ncol, byrow, dimnames)
– Can do operations such as addition, multiplication on matrix.
– Elements are arranged sequentially by row.
– It starts with row, then column.
data can be a form of vector.
nrow or ncol means desired number of rows or columns.
byrow is a logical value, TRUE or FALSE. By default, the matrix is filled by columns, otherwise the matrix is filled by rows.

List : list(vector, val2, …)
– Store an ordered collections of objects. It allows me to gather a variety of possibly unrelated objects under one name.
– It can contain different data types, works like a container without restriction.
– Declares using list() function or coerce an object using as.list().

Data Frames : data.frame()
– Generated by combining multiple vectors, such that each vector becomes a separate column.
– A very important data structure in R.
– It can be created by using external files when importing the data into R, looks similar like a tabular data objects.
– It can be converted into a matrix by using as.matrix().

I have prepared separate blogs to go into each of them with sample codes I got from the workshop and online learning website.




She Loves Data: R Workshop Day 1 – Basic Data Type

Continuing from the previous blog, I am going to share some examples for the six data type. These examples can be found from the online learning website, https://www.tutorialspoint.com/r/r_data_types.htm. To help myself to confirm what kind of object is it and what is the object’s data type, the use of class() and type() help to print out the class name of the variable.

Character
– Using single quote or double quote will be fine.
– ‘a’, “good”, “EXAMPLE”, “TRUE”, “3.14”

 #When I execute the line, x = "TRUE", the output is,
x = "TRUE"
class(x)
[1] "character"

Numeric
– It can be whole number or with decimal points.
– 12.3, 10, 999.
– At the global environment tab, I see the value of x is 10.

#When I execute the line, x = 10, the output is,
x = 10
class(x)
[1] "numeric"

Integer
– To declare an integer, I need to use as.integer().
– It converts the value into Integer data type, only keeping the whole number.
– At the global environment tab, I see a different representation of Integer compare to Numeric.

#When I execute the line, y = as.integer(3), the output is,

y = as.integer(3)
class(y)
[1] "integer"
#When I execute the line, z = as.integer(3.14), the output is,

z = as.integer(3.14)
class(z)
[1] "integer"
z
[1] 3

There is a “L” behind the number to represent it is a Integer. 
Hope it is not too confused for Numeric and Integer. If you have some database SQL background, these can easily be understood as well.

Logical
– It has only two returns values either, TRUE, FALSE.
– It can compare with two variables.
– The standard logical operations are “&” (AND), “|” (OR) and “!” (Negation).

#When I execute a code that reads,
x = 1; y = 2 # sample values
z = x > y # is x larger than y?
z # print the logical value
[1] FALSE
class(z)
[1] "logical"

#This is one of the example of comparison of two values and check if one of them larger than the other one.

#If I want to check with 3 variables with declaring z = 5, then assign the logical value to a, then it goes as, a = x > y > z. However, I received an error message,
x = 1; y = 2; z = 5 # sample values
a = x > y > z
Error: unexpected '>' in "a = x > y >"

#Can I know why?

#Another example to demonstrate the logical operations.

u = TRUE; v = FALSE
u & v
[1] FALSE
u | v
[1] TRUE
!u
[1] FALSE

Complex
– It refers to complex numbers with real and imaginary parts.
– I do not have example to execute on this part. Will update it next time when I encounter one good example to share.

Raw
– I did not see it elsewhere for an example or what it is about. Anyone can share?

I was a little confused when most people refer R Data Type as vector, matrix, lists and data frame. Yes, it is data type in R, and in general view, it may known as data structure, a collection of objects whereas object’s data type can be character, numeric, integer and Boolean.

R provides us many functions to check the objects, such as

  • class() = what kind of object is it?
  • typeof() = what is the object’s data type?
  • length() = how long is it?
  • attributes() = does it have any metadata?

So far, I am able to use class() and hopefully will have a chance to try the rest of the above.

Then how about date? Is it also a data type?

Dates are represented as the number of days since 1970-01-01, with negative values for earlier dates.

Two built-in R functions for dates,

  • Sys.Date() returns today’s date.
  • date() returns the current date and time.

She Loves Data: R Workshop Day 1 – Basic

Let us try something very basic, how to declare a variable and assign a value to the variable. This is the basic which every programming languages will need to use various variables to store information in the forms of characters, integers, floating points, Boolean and etc. R’s variables work slightly different where variables are assigned with R-Objects and the data type of the R-Objects becomes the data type of the variable. I will share about it in more detail below. Before that, we can still declare a variable and store a value with the basic data type. The codes look as below:

#declares a variable, x and gives x, a value of 10
x = 10

Remember that, previously I mentioned about R and RStudio installation. Let try to run it using RStudio. Upon launching the RStudio, you can see 4 boxes, so what are these boxes?

The top left box is where I write the R scripts. I can open multiple tabs to write different scripts and save it on my local machine. The bottom left box is the console where I see the output of the execution of the scripts I write above.

Move on to the top right is the global environment’s values and the bottom right is the help function. It has more things on this area which I will slowly explore them.

To execute the script, highlight the line(s) want to be executed. For Window, I use ctrl+enter to execute it and the command will be different for different machines. Alternatively, I can click on the “Run” button on top right of the top left box. The result or the output can be seen at the bottom left box.

The output on the Console tab,

And, the Environment tab,

I guess by now, more or less we are able to see the RStudio’s working environment. I will show how to use the help function in the later part.

In my script above, I used “#“. What is it?

To write comments or notes for our ownself or other people’s references in the script files, use the “#” symbol. The RStudio will not execute lines begin with “#“.

R Functions which I can use with variables are:
print() = print the value.
cat() = concatenate the values of two or more variables.
– paste() = concatenate vectors after converting to character.
ls() = find variables in the current workspace.
pattern = use with ls() to find variables with matched patterns.
all.names = TRUE = use with ls() find variables starting with dot (.) which are hidden.
rm() = remove variable.

Basic Data Types
In R, the basic R-Object data types are as below:

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factors
  • Data Frames

The simplest and most basic used data type is vector object. There are six other data types as below:

  • Logical
  • Numeric
  • Integer
  • Complex
  • Character
  • Raw

I will share the detail of each of the above in the next blog. Stay tuned!

She Loves Data: R Workshop Day 1

The event organized by a group of energetic young ladies who want to share R programming language to the other ladies who do not have experiences in this language. While myself learned it through Coursera in year 2015, the workshop serves as a refresher. For the first time, I invited two of my teammates to join me for the workshop.

The location of the workshop was held at Sparkline’s office which is located at Tanjong Pagar Road. They offered their space at the rooftop to conduct the workshop. Although, it was a little cramp with almost 40 over people without any tables, everyone has to work on their laptops on their laps, the passion to learn is still going strong.

The speaker started with introduction of Sparkline and how to get connected with them. Yes, I pretty liked this kind of workshop which I can build network along the way. It followed by introducing another lady who joined the same workshop 2 months ago to share the experience with us before the instructor began the workshop.

By the way, the workshop is free!

In my next blog, I will detail out the items I learned during the 2.5 hours evening workshop.

Installation of R and RStudio in Window

With my upcoming R workshops this week and the following week, I shall re-install the R and R Studio in my machine. We can find a lot of simple guides on how to do both of the installations.

What is R?
It is a programming language and environment used for statistics, similar to Python. Both are popular programming language for data analytics tasks. Both have its pros and cons, I am not going to discuss further about it. Let me jump into installations.

Installation of R
We need to download the installer (.exe) file from the R website, http://cran.us.r-project.org/.
I get the latest version of R and just follow through the wizard for a default setup. The installation takes a minute to complete. No troubles at all.

Installation of RStudio
Go to RStudio’s download page, https://www.rstudio.com/products/rstudio/download/ and choose the RStudio Desktop free version installer (.exe) based on your machine’s operation system. Go through the wizard setup and the installation completes within a minute too. RStudio is the graphical user interface used by statistians to code R.

This is how RStudio looks like in Window machine. If time is permitted, I would love to share the content and stuff that I learn from the workshops in the near future.