Web Scraping with R

The CodingGirls organised a 3-hour Web Scraping with R session. The speaker of the event was Pang Long. He is a good, dedicated and attentive speaker who can explain well throughout the event. It is important for me because I left R programming for about one year and joined this event to recap some of the R programming. The class is not for beginner who wants to learn R.

The topic, Web Scraping with R is very interesting after attended the event.

When we browse a website, sometimes we wish to download some of the information from the webpage. It can be done by saving the page with .html format. However, if you want to just list out the books list from the website, for example, instead of saving the entire webpage and slowly extract one by one, R can help you to extract it by identifying and exploiting patterns in raw html source codes and manipulate it into usable format. So, a basic HTML knowledge is important here.

Throughout the session, the speaker used this link as an example to explain how to do it, https://rpubs.com/ryanthomas/webscraping-with-rvest. It is written by Ryan Thomas.

Prerequisite:
1. Install rvest
2. Install magrittr

Just in case you do not know how to install a package from RStudio, look for the Package tab on the right side bar as below. There is a Install icon in which we can click and fill up the package name to install packages we want to install. If you are more familiar with syntax, you can always use,

install.packages("rvest")
install.packages("magrittr")

After the packages installed, we began to view the website. In the above link, the author gave us a project link which we can do a right-click and view the raw HTML codes. It works for HTML codes only.

Before that, we were introduced to some of the powerful rvest functions which we used throughout the session.

html_nodes(): identifies HTML wrappers.
html_nodes(".class"): calls node based on css class
html_nodes("#id"): calls node based on <div> id
html_nodes(xpath="xpath"): calls node based on xpath (we’ll cover this later)
html_attrs(): identifies attributes (useful for debugging)
html_table(): turns HTML tables into data frames
html_text(): strips the HTML tags and extracts only the text

It is important to tell R where to look for the information on the webpage based on the HTML tags.

Other functions which are useful,

gsub("^\\s+|\\s+$", "", .) %>% #strip the white space from the beginning and end of a string.
print(paste0("geting data for page: " , page ))
  URL <- paste0("https://scistarter.com/finder?phrase=&lat=&lng=&activity=&topic=&search_filters=&search_audience=&page=", page, "#view-projects")

The output can be written into csv file.

write.csv(co.names, "RE100_2016.csv", row.names=F)

Goodbye My Past

There was many minor versions of myself before and after I resigned from my last position. Now, it is going to be a complete new “product” of myself and no further support for the older version.

What does it mean?
I made a decision and took a drastic change in my life recently before I embarked my new journey. The decision was made upon hearing a message from an ex-colleague. The message contents was shared by a closed source with details of the people involved. Surprisingly, this person shared out what has been done and subsequently, it reached my ears one day. Yes, I do believe this person will do such thing.

Myself as a person who believe in Karma, what goes around, comes around. It does not require me to do anything as what the below image said.

Without further verification on the matter, I removed all the people related to this person, be it in the past or present from all my contact lists. I refrained myself from sharing information or updates on Facebook and Instagram recentlly. I stopped talking with all these people that I knew from the past. Ending the past relationships brought me into a brand new life whereby I am more openly to accept new people and things. Now, the past has become less concerns and let go more easily.

More importantly, I look each day and everything positively. Only with a positive mindset, things will sail smoothly with God’s blessing.

First Thought

I have no intention of sharing the details of my personal life. However, I wish to share some of the great things which happened to me. It happened back in late February 2017 when I received an email asked for my availability for a job interview with an airline company in Singapore. I was very excited back then. I wrote down in a piece of paper of all the possibilities of working for an airline company as well as my daily routine. Besides that, my ex-colleague contacted me and said their CEO was looking for me and wanted to treat me a meal.

I read on Susan’s monthly astrology and the horoscope said there would be a great career opportunity coming to my way. I thought the good things were coming to my way. I was pretty sure this airline company was the one it meant. I shared with my cousin. Since, she believes in Jesus, she quoted some of the Psalms and ask me to meditate throughout my interview process.

On the interview day, I received another email from a ride-hailing company asked me for an interview on the next day. Then, I have a strong belief that my next destination would be in a transportation industry. I applied both jobs and did the tests sometimes before they contacted me for an interview.

With full of confidence, the interviews went on smoothly, however, the interview with the ride-hailing company did not go thoroughly the interviewer was looking for someone to work in Product Services while I told them my preferred field was Data Engineering. In the end, I did not get the job offer.

Then, I put my attention to the airline company with great hope they would contact me for updates meanwhile I went back to Malaysia to wait for the updates. I need to have some buffer for my two marathons in late March and early April, therefore, I could not stay too long Singapore.

Just before I went back to Malaysia, I was being contacted by another ex-colleague asking for a meet up with the new IT manager. Instantly, I guessed both the IT manager and CEO looking for me for a same reason because I knew both of them did not know me. I contacted my source in Thailand and I was told that it was the IT Director who suggested me to both of them. I went to meet up with the new IT manager and the team. Unfortunately, the IT manager came unprepared and gave me chances to turn down them. I informed the Thailand office and apologized. 

I took up a short introduction course of Cassandra but I did not complete it yet. I am kind of demotivated as there is no one pushing me to complete it and I was getting more worried by then.

I did a tarot cards reading. It was really surprised me, it said I would not able to get what I wanted because I did not let go and open up for new opportunity. I felt so shocking especially it said against the horoscope. The tarot reading advised me to meditate and I did as below.

I seek to destroy whatever illusions have been influencing my life and rebuild from a position of strength. I seek to place my life upon form and positive foundations. I seek to align my hopes and ambitions with higher spiritual purpose.

So, was there any news after that?

Yes, there was but not from the airline company. There was a guy from the F&B company contacted me and wanted to meet up for a discussion. On the 24th March, I went back to Singapore for two marathons, the Sundown and 2XU and met up with two friends too.

Since, I was at Bugis for the meetup with the F&B Business Development guy, I stayed back and detoured to the nearby temple to make my prayer. I received a call and asked me to go for the interview the next day. I was excited to receive the good news. It gave me a hope after waiting so long.

On the interview day, my interviewer was not in and replaced by two other interviewers. After some brief talked, one of them started to test me verbally. Lastly, it ended with designing the database structure and did a query based on their questions.

Throughout the interview, I really felt nervous and kept citing the Psalm 23, even after the interview. I tried not to think so much of the outcomes. On my way back home, I received a call and asked me to go to the office again to meet the hiring manager. During the conversation, there was a salary negotiation took place.

After the phone call, I quickly went home and checked Susan’s astrology again. Whatever mentioned in the website, it was 90% accurate, that included, the career growth, job opportunity, salary negotiation and timeline. The last part was about the working permit which was not discussed throughout the interview. Confidently, I got a feeling I would get the job pretty soon.

On my last time of interview, I was there on time but I was waiting for the hiring manager to finish the meeting. Upon the meeting over, I was invited to the room and for the first time, I saw my hiring manager. There was no further testing, thanked God!

We briefly talked and I shared my working experience and background. It seemed like there would be a great learning opportunity and they did not mind that I lack of experience in this field.

At the same time, the hiring manager offered me the job. Truly, I did look surprised although I was confident and I knew the outcome. Not only that, the hiring manager invited the CEO to join the interview as well as getting the CEO’s approval. I believe the CEO is a technical person, from our conversation. They introduced me to their CTO who is another technical guy through the conversation. Positively, I knew it was God’s arrangement. The best thing God gives to me. Thank you God, Thank you Lord, for all the goodwill. I will treasure them as much as I learn and grow along with God’s and Lord’s blessings.

SQL Server – Types of JOIN

Type of JOIN is quite often being asked during the interview where we are asked to list out the type of JOIN and the differences of each of them. I found a simple illustration of the JOIN command to give the first glance idea of what JOIN is.

Type of JOIN includes:-
1. INNER JOIN
2. LEFT JOIN (LEFT OUTER JOIN)
3. RIGHT JOIN (RIGHT OUTER JOIN)
4. FULL JOIN  (FULL OUTER JOIN)
5. CROSS JOIN

The picture above explains the results set to be returned when these type of JOIN are used in our queries. Next time, whenever there is a confusion of which JOIN command to be used, refer to this image to clarify our doubts.

However, let us have some short description of each command too, for our better understanding.

1. INNER JOIN
Returns all rows for which there is at least one match in BOTH tables.

2. LEFT JOIN
Returns all rows from the left table, and the matched rows from the right table.
Eg: The results will contain all records from the left table, even if the JOIN condition does not find any matching records in the right table, but with NULL in each column from the right table.

3. RIGHT JOIN
Returns all rows from the right table, and the matched rows from the left table. It is exactly opposite of the LEFT JOIN.
Eg: The results will contain all records from the right table, even if the JOIN condition does not find any matching records in the left table, but with NULL in each column from the left table.

4. FULL JOIN
Returns all rows for which there is a match in EITHER of the tables.
Eg: Its result set is equivalent to performing a UNION of the results of left and right outer queries.

5. CROSS JOIN
Returns all records where each row from the first table is combined with each row from the second table. Refer to the image below which I captured from the explanation made by Pinal D. in his blog.

SQL – NULL

Last year during one of the SQL Server User Group event brought by Microsoft, they invited Pinal Dave. I am sure you know this guy who wrote many SQL articles in his blog page, blog.sqlauthority.com.

One of the most interesting topics that he shared that morning was the SQL NULL. Sometimes, we have hard times dealing with the null values when we are working on our queries. During one of the interview, I was being asked about the returned result when we execute the below query.

select case when null = null then 'yup' else 'nope' end

I could not remember if I get the answer correct but I did think the query is wrong. According to my search in the Internet, the same question appeared in one of the links where the explanation is as below:-

This query will yield “Nope”, seeming to imply that null is not equal to itself! The reason for this is that the proper way to compare a value to null in SQL is with the is operator, not with =.

It suggests the correct way to compare null value is using the IS operator. You can give a try running it from the SSMS to see the returned result set which I am going to share here as well.

select case when null is null then 'yup' else 'nope' end

Next, I wish to search a bit more about NULL values.

An article from TechNet website, it says below:-

A value of NULL indicates that the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.

Anything compared to NULL evaluates to the third value in three-valued logic: UNKNOWN. Interesting as it is, there are many more things about NULL values can be read from the websites. As for now, we know that to check the NULL values, we shall use IS or IS NOT in WHERE clause.

Sundown Singapore & 2XU Singapore Marathon

It was last year when I was told that the Sundown Singapore Marathon’s 10th Anniversary special rate and 2XU early birds’ special rate with additional polo T-shirt giveaway for the loyal runners of Year 2015 and 2016. The race dates of both runs were just a week apart. My friend and I were discussing whether to join both of the marathons. However, with the race date so close to each other, I was not confident enough to run both half marathons.

Neither my friend did. Then, we both decided to run for 10KM for Sundown Marathon and 21.1KM for 2XU. We both registered ourselves on the same day and same time.

Since, I have run for Sundown’s half marathon in year 2015 (For the Tribe), so it would be good enough for me to run a 10KM this year just to complete the medal collection. I would not be able to get the Sundown’s finisher T-shirt because it is meant for full marathon.

Meanwhile, for 2XU, it is only those who signed up for 21.1KM half marathon would be entitled for the additional polo T-shirt and that was their highest category for 2XU marathon. Another surprised gift was a pair of sock for those super early birds. I think it was just to lure more people to sign up 2XU than Sundown.

And, I was definitely going for half marathon to get my 2XU’s finisher T-shirt for this year which was in rose gold lining and black in colour. I got my first 2XU’s finisher T-shirt in Year 2015 which was in yellow lining and black in colour.

I came down to Singapore just to run these two marathons and reached Singapore in the evening of Friday, 24th March. I went to the Marina Square with hope that I could collect the race pack for 2XU but I was told the queue was very long and wasting time. Yet, indeed it was and I gave up, then walked to the F1 Pit Building to collect the Sundown’s race pack instead. There was queue too and my side was pretty slow.

There was a lot of bad comments about Sundown’s management for this year event especially on the poor management of the race pack collection. As for myself, the queue was fairly okay because I spent 1.5 hours before my turn. However, I was pretty sad with the medal’s quality when I compared to the Year 2015’s finisher medal.

Let me show you the finisher T-shirt and the medal for the 2XU Marathon Year 2017. I will update my post with a picture of both finisher T-shirt in which I personally prefer the cutting and design of the past year.

People have been asking what my reasons for paying so much and joining the marathons after all, there are many parks in Singapore which are free to run and also there are many park connectors which allow us to run for kilometers without a cost. Not only that, sometimes, you will see people join the Spartan race. Why?

To be honest, I do not have an answer which can tell people, the greatest of joining the marathons. It is some sort of suffer whenever I have to run 10KM or more, I would think that I can just easily give up my run after 5KM or 6KM on my normal running routine. Then, why?

I think my answer will be, the satisfaction of running toward the finishing line while people around are cheering up for you to push harder for the next 10 meters or so. I think it is all about the sportsmanship’s spirit that I can feel when I run toward the finishing line. The shouts from those strangers who have the same aims and goals as us motivate me to push to my max limit to cross the line with my best timing.

Oh well, it is worth to do it when it is just for a short 10 meters distance? Hmm, well I do not have a perfect answer too but it is worth especially when you get extra bonus such as being the top 3 winners, winning prize from lucky draw if any or get to know a new friend who has the same hobby as you. Back in Year 2014 during the Standard Chartered KL Marathon, I met a new friend who run for 10KM too at the 9KM signage board. As an amateur runner, I loved to stop by to take pictures and post to Instagram, this is one of those pictures I took during the run.

And, I do love the feeling after the marathons, especially the time when i organize my medals and think about how to display my hall of fame. Yes, some people may say, medals, trophies and certifications can be custom-made outside but what the fun is? You must go through all the obstacles, sweats and the pain after the marathon before you can understand the greatest of running a marathon.

There is always a lesson to be learned from each run and a beautiful reward awaiting us in front, the sunrise (the beginning of new day).

SQL Server – Difference Between UNION And UNION ALL

I am trying to recall some of the interview questions that I was being asked during my job hunting and would like to share some studies or findings which I read from the experts regarding these questions. The first one that I remember is the differences between UNION and UNION ALL.

I believe in many cases we do use both of these commands in our queries. And, what are their differences and in term of performance which one is better?

UNION command is used in the queries to select data from two or more tables similarly to JOIN command, except UNION command selected columns need to be same data types and same number of columns to be selected. It returns the distinct data of both tables. It does the SELECT DISTINCT on the results set.

UNION ALL command is same as the UNION command except it returns all the data from the selected tables. In other words, UNION ALL includes duplicated rows.

However, the UNION ALL gives the faster results when we execute the queries. You can check it from the execution plan to compare both. Reason is UNION command will do the filtering during the execution. If you are sure that your return results are unique, it would be a good idea to use UNION ALL for better performance.