Data Science & Analytics, Experience Sharing, NoSQL Databases, SQL Databases

Highly Scalable Systems

I was being asked to look into this topic called highly scalable systems. Instantly, I diverted my attention to my IT team for more insights. As far as I know, this topic covers many areas, it could be storage, processor cores , memory and etc, from hardware to software. Looking at the data volume nowadays, how to design a highly scalable system is important. So, from where I should start first?

Let me try to understand the term “highly scalable”.  It means flexibility to scale. What is scale? It means can change. In short, it is flexibility to change. In most of the cloud services are now scalable. 

We need to build a solution, uniquely and dedicated to the project.

Next, I moved to focus on database scaling. The first thing it prompts up is the horizontal and vertical scaling. If you have read the MongoDB document, you will find this topic too. I picked up one of the plain text explanations from the stackoverflow website.

Horizontal scaling means that you scale by adding more machines into your pool of resources whereas vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.

https://stackoverflow.com/questions/11707879/difference-between-scaling-horizontally-and-vertically-for-databases

Usually, when we do vertical scaling, it requires some downtime to increase the resources and restart server to complete this process. Meanwhile, horizontal scaling just adding another machine in the system, it is highly used in the tech industry. This will decrease the request per second rate in each server.

To segregate the requests, equally to each of the machine (example, application server), you need to add load balancer which would act as reverse proxy to the web servers.

Advertisements
Data Science & Analytics, Experience Sharing

Let Get Started with Python

After attended the event and workshops organized by the CodingGirls, the interests to learn on Python, R, Tableau and SAS zoomed into my life. One of my colleagues gave some insights of what skills I should learn in order to move into data analytics field. My colleagues and myself intended to pick up the programming language through the Udemy’s online courses. We wanted to take advantage of the Black Friday’s Sales

I did a small, quick start with trying my hand on using the print statements. There are syntax and functionality difference between Python 2 and 3, the fundamentals are all the same. Let’s begin with a baby step.

While double-quotes (“) and single-quotes (‘) are both acceptable ways to define a string or a text. A string needs to be opened and closed by the same type of quote mark. Text in Python is considered a data type of string which it can contain letters, numbers and symbols. We can concatenate (combine) the texts using +

We can use triple quotes (“””)for a string to span multiple lines and assign it to a variable. One of the examples I learned, 

haiku = “””The old pond,
A frog jumps in:
Plop!, we expected: The old pond,
A frog jumps in:
Plop!”””

This looks pretty easy to start off, right? 

Next, the errorhandling while we running the code, the editor shows the SyntaxError to tell us where it goes wrong. Example, this error is due missing the quotation marks.

Then, I moved deeper into using the variables. When my colleagues built web applications, they constantly dealt with changing of data. It  found it irritating when I saw the source code hard-coded with data. It will turn out to be inconvenient if we need to constantly change the texts or data we coded into our script. Python uses variables to define things that are subject to change. Each variable that you derive can be used to store texts, numbers or dates. 
Similarly to writing SQL scripts, wherever possible, I will use variables to define values subjected to change, in a way, we can dynamically use our script. 

Some samples of how to derive variables in Python with different data type. Date, string and number are different data type available in Python.  More data type can be found from the Internet.

Now, we can look into using arithmetic operations with variables. The variable will be used to hold the final result of each operation. Arithmetic operations follow the precedence of the operators. Detail of the precedence can be found from the Internet.

Data Science & Analytics, Experience Sharing

Methodology of Data Mining

What is Methodology?
It means a set of methods used in a particular field, area of study or activity.

What is the Methodology of Data Mining?
It can be described in three different stages; pre-modelling, modelling and post-modelling.

Data mining is an interactive and iterative process. It is necessary to move back and forth among the stages when developing a data mining application. Hence, data mining is not a strictly sequential process.

Pre-modelling

  • Identify the business problem. Example, increase the response rate or reduce the number of customers leaving the brand to another.
  • Translate business problem into data mining application. While data mining can help to solve many business problems, not all business problems can be tackled via data mining application. Example, organization restructuring.
  • Assess the data needed for the data mining application. Data can be from internal or external sources, purchased from data providers or generated during an event. If the data is not available, it is advisable to exclude it from the model.
  • Preparation of data for mining. In many situations, data are not from the same source, format and some can be erroneous. Efforts have to be made to extract, clean and combine data from different sources. (Building data warehouse). 

For example, to improve the response rate of a direct mail marketing campaign, it is important to know the relationship between customer demographic characteristics such as age, gender, past purchases history and the probability of purchasing a particular product being promoted via direct mail marketing. 

Modelling

  • What tool or technique is appropriate depends on the nature of the data mining application and the nature of the data.
  • Construct the model. It involves calibrating the model parameters in order to produce a model with optimal performance.
  • Assess the result. This relates to the objective of the data mining application to access if the predictive model result gives accurate predictions. The assessment results may lead to re-look into the data, re-select the variables for inclusion, re-sample the observations or re-perform the analysis.
  •  Comparison of the different model results to identify the final model. If more than one model gives acceptable result, then there is a need to identify best model. It can be aided by data mining statistics such as accuracy rate.

Post-modelling
Last stage of the methodology. It relates to actions to be taken after the data analysis is completed.

  • Deployment of the data mining model or result. It depends on the objectives of the data mining application.
  • Tracking of the performance. The need to re-look at the data mining application is not limited to predictive modelling.
Data Science & Analytics, Experience Sharing

Definition of Data Mining

I mentioned about my friend shared a study guide on the subject called Data Mining, the first things came into my mind was a question.

What is Data Mining?
The process of finding previously unknown patterns and trends in the database and using that information to build predictive models. Another words, the process of data selection, exploration and building models using vast data stores to uncover previously unknown patterns. 

Question: Does it similar to ETL (extract, transform and load) process into data warehouse? Anyone can share your opinion?

I continued read the study guide and found the following notes useful.

Firstly, data mining focuses on the exploration and discovery of the previously unknown patterns and trends. This suggests that when data mining is applied in the commercial world, it should help to generate information unknown to competitors so that the organizations and managers can make better decisions.

Secondly, data mining can be usually applied to large data sets. This is unlike the traditional statistical methods which are designed for analysis of data sub-samples (smaller data sets).

Lastly, it is a means to an end. Its use should be to solve a business problem, capitalize on a business opportunity or respond to a business trend.

To make a good decision, we need good information. Information is derived as processed data. Raw data by themselves are not useful until they are transformed into information.

Keyword: exploration, discovery, large data sets.

-The content is picked from the study guide given by SUSS.

Data Science & Analytics, Experience Sharing

Study Mode On

After a long holiday, it is time to begin my study and I will try to write and share what I learn daily with hope I can get more people share their opinions and comments. My friend has shared a study guide from her university on the topic called Data Mining which interests me to read it during my free time. 

Before my solo trip, I went for the trial class to learn Python and I felt I wanted to learn it as a preparation for me to move into Data Analytics field. Three years ago, I took up the Coursera’s course, Introduction of Data Science, where I learned R language. Both Python and R are popular languages in this field which I shall re-learn and excel in.

Besides that, I have been looking into going back to school, either taking a master course or a graduate diploma (or specialized diploma) course. I am based in Singapore and shortlisted the NUS and SMU for the master courses and Temasek Polytechnic and Nanyang Polytechnic for the graduate diploma course in Business Analytics. Is there anyone from these schools? 

Data Science & Analytics, Experience Sharing, Web Development

Don’t be afraid to ask for what you want

After the first time meet up with the mentor, I wrote an email to my CTO to seek his advice. He replied saying that email was not an email, it was a conversation. He spent some time to reply me with 3 emails and I summarized them with two things he wanted me to think about.

How long would I want to be a software developer?
After I switched my career into BI development, I involved in some coding works. I did some few years back as a .net software engineer. I would not want to do coding forever, I can code once a while, but I wanted to keep moving, learn new things, gain knowledge in those areas that interest me.

What I want to do?
When I told him what I wanted to do, I gave him a vast area of interests. He tried to explain each of them except things related to data science and data analytics. So, when I replied him, I made a clearer statement, I loved to work with data and slightly narrowed down my scope by telling him what I expected to achieve in near future.

Up to this point, I am glad that he listened and supported me. Making sure I articulate my interest and have my voice and presence is important for other to have a good understanding of who you are. Direct and open is essential for success.

Getting a mentor.
Find someone who appreciates you and your skill and that is in a position to support you. That person has to be well respected, senior and their opinions need to carry a lot of weight.

Be yourself.
Lastly, do not be afraid to ask question when you do not know something. When I first started the data science course in Coursera, I remembered the first words I learned is curiosity.

“There are going to be people who don’t like the way you do something or don’t like you for any particular reason. Don’t let the fear of that hold you back—there’s so much potential out there.” – https://techbeacon.com/women-software-development-8-success-stories-5-tips-advancement.

Data Science & Analytics, Experience Sharing

Mentorship Program

It was back in February 2018 when I saw the mentorship program under the Data Science user group. Like everyone else who wanted to be mentored, it was a golden opportunity to get advice and guidance from the experienced people and move into this field. Two months later, I received an email notification from the user group and informed me I was chosen to be part of the program.

Know your mentor before meet up.

Alright, I should admit, I did not know anyone from the list of the mentors and I did not do any further checks on their profiles to find who is the most suitable person to be my mentor before submitting the application. I did not do an extensive search on my mentor’s profile, I just did not want to appear to be a stalker.

Know what you want.

Our first meet up was a bit awkward as I appeared to be unprepared and I was confused what should a mentor do and what should a mentee do. I wanted to understand what mentor has had to do so I can figure out what path I should take. However, the mentor asked me what I want to do. At that moment, I was not clear.

Mutual mentorship.

Do not underestimate the mutual benefits of the constructive mentor-mentee relationship for both parties. For the mentors, the satisfaction of helping someone else achieve his or her goals is undeniable. My mentor has asked me to read up more articles and blogs to get more sense of the information and resources and attend the regular user group meet up.