Late June Kaohsiung Trip

I just came back from my first visit to a Chinese country and this trip is to Kaohsiung, Taiwan. It was a four days three night trip with a group of colleagues and their family. The flight departed from Singapore in the early morning at 6.00am on Thursday. I went to the airport earlier with thought I can spare some private time at the Jewel Changi. Well, it was really private and quiet as most of the stalls were closed. Huh, it was not a 24-hour mall at all. What a miss!

Well, Pikachu is going to have its dance shows coming soon at Jewel during weekend, maybe I can plan a trip down to Jewel to catch some good pictures and watch the show.

Back to my trip, I hope I can quickly write up all the 4 days, unlike my Busan-Seoul solo trip last year which I did not manage to write all about it. I will try to keep it simple with more pictures because I did not manage to get the names in English and I do not write Chinese. Yep, bear with me if my pictures come without description.

Default picture from the Kaohsiung International Airport. The airport is not big at all and nothing much within the complex. We did not spend too long at the airport after clearing the immigration and waited some of them to get their sim cards for their mobile phones. There is a Family Mart convenience store and foreign currency exchange there at the arrival hall.

Our organizer has arranged the airport transfer, so the coach was waiting for us outside at the parking lots. Shortly, we headed to our hotel, The Lees Boutique at Wufu 1st Road, Xinxing District.

Since we arrived earlier than the check-in time, we did not manage to load our luggage into our hotel rooms. We left the hotel and headed to our first activity of the day. We planned to play for half a day at E7Play which is located at Lingya District, Sanduo 1st Road. We headed there by using Uber. In Taiwan, they are using Uber, there is no Grab.

Before the game started, my group had our first Taiwanese lunch at Dan Dan Hamburger, a fast food chain which is originated from Kaohsiung. We tried some of the setย meals and it shared among 5 people. One of the meals we tried was the mee suah. It is quite delicious, peppery and starchy, even without putting the chilies sauce, it is already well tasted and quite filling for myself with a drink.

Then, we hit the games within this complex. It was kind of shocked when our driver said “huh, anything nice to play there?”ย ๐Ÿ˜…ย Well, I ended up learning well with anyhow throw the darts, anyhow roll the balls down the lawn and anyhow play snooker.

After few hours later, finally we prepared ourselves back to the hotel to wash up and get ready for our night event, the buffet dinner at Hi-Lai Harbour. It is located at the level 43 of the Hi-Lai Grand Hotel, Chenggong 1st Road, Qianjin District. One of my colleagues took a nice picture of the sunset facing the sea from our tables.

The buffet dinner on the weekday is quite affordable and area is quite crowded too. The selections of food are good and always replenished. I loved their roasted beef and ham, Sashimi section and their soups. The seafood is one of the recommended dishes too and you can find different types of seafood available at the counters. Free flow for drinks, any kind of drinks basically and they served a great range of desserts too.

After a great dinner, it is always nice to spend some time walking around before heading back home. So, our first night, we went to Shinkuchan night market. It is a place where we started to crazily shopping around for beauty products and etc. I tried to look for clothes and shoes but in vain. It is not my style and quite limited outlet. I was a little disappointed because I thought it was something like those in Myeongdong, Seoul or the night markets in Bangkok, Thailand.

After the shopping spree, we walked back to hotel. It was a tiring night because I did not really sleep for the day before the departing time. The hotel bed was waiting for me! One thing I wanted to share, probably it was something that I did not know why, the air conditioner in the hotel was not really cold during the night even we lowered to 24 degree.

It is summer, so warm and yet the air-cond did not able to help to cool down ourselves. I am not sure if this is an unique thing about Taiwan that they do not on the air-cond in such low degree even during summer season. Are they used to it already or is it something to do with environmental concern?


Python: Introduction III

The last part of the Python Introduction and I will cover topics on functions, methods and the packages in Python. For sure, there is a difference between function and method. I revisit my original post which I wrote about the differences between functions and methods. You can read up those before continue here.

User-defined Functions

The simplest way I can explain what is function which I wrote in my original post:

A function is a block of code to carry out a task and it calls by its name. All functions may have zero or many arguments. The arguments are passed explicitly (directly). On the exit of the function, it may or may not return value or values.

There are some examples in this post to explain about functions, how to define a function with and without arguments, uses default value for an argument, uses flexible arguments *args and **kwargs and uses of return statement in the function.


It is like a function, except it is attached to an object (dependent). A method is implicitly (indirectly) passed to the object for which it is invoked. It may or may not return a value or values. The method is accessible to data that is contained within the class.

For methods examples, I wrote it in this post.


Think the packages as a directory of Python scripts. Example each .py script is a module. This module specifies functions, methods and types in solving a particular problem. I found a link which explained in detail about packages in Python. Refer here for more reading.

In this part III, I know there are many external links are given, mainly is to reduce re-write of those entries which I wrote them sometimes ago. This blog serves as a place to find the relevant resources for reading and examples which I think it is enough to cover the basic understanding of the functions, methods and packages in Python.

Python: Introduction II

I continue from the Python: Introduction which I wrote it yesterday and it gave the very basic idea of how Python is in term of declaring variables, the data types and how we can store the data in a collection. So. variable, data type and collections. If you missed out or cannot recall them, here is the link to yesterday’s post. There are links to various data types and the collections. I did not want to repeat here.

For the part 2, I have decided to concentrate on writing the introduction to control flows. I think it will be great to have these control flows being taught first before we head to more Data Science orientated topic such as using the package called Numpy and creating our own functions and methods. So, I swapped a little bit in term of syllabus or topic in my writing.

if, elif, else

It is a conditional statement which we can use to match certain conditions. There are few methods to write this statement and it is not always that we need to use elif and else. Let us look at this syntax below:

if condition :

This syntax has 1 single condition to match only, that is why “if” is being used. Example:

z = 4
if z % 2 == 0 :
  print("z is even")

All control flows has a standard syntax and indentation applied to each of them to mark the beginning to the expression or what should it does when the condition is matched. Therefore, you can see the print() statement is slightly indented. In most IDEs, it is automatically indented when we use the colon (:) sign after the condition, in this case, z % 2 == 0.

The moment we have 1 more condition in our code, if else statement is used. See the syntax below:

if condition :
else :

In the else statement, often we do not need to specify the condition within the line because it is understood that when the if statement or condition did not match then it goes to the else statement and execute the code. It looks as though else statement is a default statement. I know in some other programming language, it does have default statement at the end of else statement which say, anything not matches then run default statement.

We can omit this else statement when we do not require to process anything if the first condition does not match. However, sometimes, we might miss out important scenarios of the if statement is skipped. What I meant here is besides being legit that the condition does not match, there is also possibilities of exceptions happened while checking through the conditions. It is recommended to use the else statement as an exception handling, either print out a line in the console or log file. This helps in the debugging procedure.

z = 5
if z % 2 == 0 :
  print("z is even")
  print("z is odd")

The above is an example of using if-else statement when there are two conditions, either-or situation. The variable z is 5, hence it goes to the else statement and print out “z is odd”.

Next, if-elif-else statement is used when there are few conditions in the scenario which one may get matched during the condition checking. When the first condition does not match, it goes to the next elif condition to check until it has no matches, it will end at else statement. You can have many elif statements in your codes. Below is the syntax:

if condition :
elif condition :
else :

Example of using the syntax:

z = 3
if z % 2 == 0 :
  print("z is divisible by 2")
elif z % 3 == 0 :
  print("z is divisible by 3")
else :
  print("z is neither divisible by 2 nor by 3")

The output is z is divisible by 3. After each expression, the statement terminates and returns the result. It will not proceed to check whether next condition is matched. With these 3 examples, I hope it gives some ideas about if statement, if-else statement and if-elif-else statement.


while statement works by repeating an action until condition is met. It is important to assess the code before running the while statement because if any chances the condition is not met, the statement will keep running and this we call it infinite loop. You have to force to end the application manually. The syntax for while statement:

while condition :

Example of using the while syntax:

x = 0
while x < 5:
     print(f'The number is {x}')
     x += 1  

The most crucial part here is the variable x which works as a counter to ensure the condition is met. Without this line of code (x += 1 ) the condition is always true and the loop becomes infinite.

This is the output from executing the while statement. When x = 5, it stops and exits from the while statement and does not print out anything.


Remember in the previous blog, I mentioned about using Python list (collections)? For statement is a good control flow to iterate (repeat) through the Python list to get each element.

for var in seq :

Without using for statement, we might want to repeat few times of the print statements to print out the elements inside the Python list below:

fam = [1.73, 1.68, 1.71, 1.89]


Although this is a correct syntax, it is not a good practice. Below demonstrates how to use a for statement to iterate through the Python list and print the values out.

for height in fam :

Both of the codes returns the same output.


For statement works well for any types of collections or even with a string. Below example uses my_string as the “list” and a variable character as the “item_name” to represent the elements inside the list. One by one, it prints out each character in the my_string.

In for statement, we can use the enumerate(), a Python built-in function. enumerate() method adds a counter to an iterable and returns it in a form of enumerate object. This enumerate object can then be used directly in for loops or be converted into a list of tuples using list() method.

Why is enumerate return a tuple?

When enumerate() returns in a form of enumerate object, it comes in a form of index, value. It is because enumerate() accepts start parameter which is the index value of the counter, by default it is 0. A simple illustration is as below:

enumerate(iterable, start=0)
list = ["eat","sleep","repeat"] 

When we check the output from the console, it shows as below:

[(0, 'eat'), (1, 'sleep'), (2, 'repeat')]

It starts with an index 0, of course, it can be changed with indicating the index value, such as enumerate(list,1), then the index begins with 1 instead of 0. This enumerate() function may look useful when we want to list the elements from the collections with the index and value.

fam = [1.73, 1.68, 1.71, 1.89]
for index, height in enumerate(fam) :
  print("index " + str(index) + ": " + str(height))

Reusing the above example and now, we have added enumerate(fam) in the for statement instead of using “for height in fam”. Then, in the print() statement, we convert the index value and height value to string and concatenate them. This maybe useful when we want to print out our shopping cart’s items list. Its output shows:

index 0: 1.73
index 1: 1.68
index 2: 1.71
index 3: 1.89

Mastering the use of the control flows can help in the later stage when we go into the data structure section. I have written separate blogs about if-else statements, while and for loops, you can refer to the links below:

Python: Introduction I

It has been a while I stopped learning Python from DataCamp due to my part time classes and assignment, and work commitment. It is not easy to keep track each of them everyday. On top of it, I still have my volunteer work with TechLadies and regularly have to meet up to brainstorm and updates each other.

Today’s topic is very much on Python, definitely. I want to concentrate on my writing in Python for next 2 weeks before I head off for an holiday. I am sure, I will be lazy after my break. It would be great if I can write up something to summarize or reorganize what I have been writing for the past few months on my Python’s learning using DataCamp and Udemy.

Remember my very first day I started learning Python using Udemy, it taught about the installation and I went on to install IntelliJ. Till to date, I hardly using it, most of my time, I am using the online version of Jupyter Notebook. I find it pretty easy to be used. I understood that there are many other IDEs in the market and there is no specific software to be used to code Python. For now, I will just keep it simple for my learning.

Checking version

On the very first time, we always want to know if everything we installed for Python works or not. Checking the version, if it is updated, latest and correct version to be used is first time we might want to do with:

python --version

Simple open up your terminal or command line to type the above command on it. On the screen, it may return you the version info such as below:

Python 3.7.0

print(‘Hello World’)

Next, we always start with simple print statement using the built-in function named print() to print out some lines, most often we will print in our first line is “Hello World”. Really, most people who first started learning programming language will have this line printed. I use this function everywhere in my coding and it is very useful. It is just same as the PRINT statement from the SQL server, if you are coming from database background. Using single quote or double quote is not a matter.

print('Hello World!')
print("Hello World!")

Variables and Types

Then, we touch on the variables and types, the important component in most programming languages. Variables and types are interrelated. I discussed about the characteristic of a variable in my first post. Let me have it here too!

  • Specific and case-sensitive name, best practice to use lowercase.
  • Define things that are subject to change.
  • Can be used to store texts, numbers or dates.
  • Cannot start with number.
  • Cannot use space  and symbols in the name, use _ instead.

Then, there are plenty of different data types as well, yes, that is the types I meant here. Remember, different types have different behaviours. I wrote many posts about each of them before. I will link them up whenever we re-visit the topic.

  • Boolean operations: and, or, not (True, False)
  • Numeric types: int, float, complex (number, decimal)
  • Text sequence type: str (string)
  • Sequence type: list, tuple, range
  • Mapping type: dict 
  • Sets type: set

The simplest way to demonstrate how we can create a variable and assign a value to it.

height = 1.67
weight = 180

name = 'Joanne'
gender = 'Female'

isStudent = True

The above shows the height and weight variables in float and int data type, then we have name and gender in string and a variable called isStudent with a Boolean value. In Python, it does not require to declare a variable with any prefix in front of or behind the variable which we can see in Javascript or SQL Server, if you are familiar with those languages. Then, you may ask how does compiler (computer) knows it is of what type of data types.



type() is a built-in function which allows us to check the data types of the variables we created with assigned values. type() helps to answer the above question.

That completed the fundamental and basic to code in Python. Now, you know how to do the following:

  • use the print() statement to print texts.
  • use of variables and data types.
  • use the type() statement to print out the data type of a variable.

Probably, now you want to know what is integer, string, Boolean and etc. I have some links here to help out the basic explanation together with examples:

To talk about numbers and strings, it can be another topics by its own as there are many interesting about them such as the use of (+) sign. It is concatenate sign which means it combines two or more variables of same type together. The way number and string use (+) sign also difference than each other. Also, we have to remember that in Python, string and integer cannot use of (+) sign together. It throws exception (error). Exception is a programming jargon means error. There is a topic of exception handling in Python too. In this case, there is string formatting and integer formatting.

Let us move into fundamental part two, Python List.

Python Collections

It is an interesting topic and important part in Python. Almost everyone of us will use Python List in our daily coding life ๐Ÿ™‚ It is a collection of values and allows to have different types within the elements, one of the most simplest and easiest collections. When it comes to the word “collection”, Python has four type of collections.

You can read more about the basic of these collection here. Each of them has different characteristics, syntax, structure and usage. Along the way, we use different collections to explain the Python codes and concepts. Below is an example of how list looks like:

fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

Declaring a list is same as declare a variable, it just requires to follow the list’s syntax to create one. As mentioned earlier, it can be any data types in a list. So, you can declare a list as below too:

family = ['Anna', 1.73, 'Eddie', 1.68, 'Mother', 1.71, 'Father', 1.89]

We can use the lists above to work with control flows, going through the iteration and/or condition checking, then calculate a value and return a result. I think I will cover it in the next post.

Up to now, this portion is still a basic Python and does not involve any analytics or data science work if you are looking for one.

MongoDB: The Best Way to Work With Data

Relational databases have a long-standing position in most organizations. This made them the default way to think about storing, using and enriching data. However, modern applicants present new challenges that stretch the limits of what is possible with a relational database. Relational database uses tabular data model, stores data across many tables and links by foreign keys as the need to normalize the data.

Document Model

In contrast, MongoDB uses a document data model and presents data in single structure with the related data embedded as sub-documents and arrays. Below JSON document shows how a customer object is modeled in a single document structure with embedded sub-documents and arrays.

Flexibility: Dynamically Adapting to Changes

MongoDB documents’ fields can vary from document to document within a single collection. There is no need to declare the structure of documents to the system – documents are self-describing. If a new field needed to be added into a document, the field can be added without affecting all other documents in the MongoDB, unlike relational databases, we need to run the ‘ALTER TABLE’ operations.

Schema Governance

While MongoDB allows flexible schema, MongoDB also provides schema validation with the database, from MongoDB version 3.6 and above. The JSON schema validator allows us to define a fixed schema and validation rules directly into the database and free the developers to take care of it from the application level. With this, we can apply data governance standard to the schema while maintaining the benefits of a flexible document model.

Below is the sample validation rule,

db.createCollection( "people" , {
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "name", "surname", "email" ],
      properties: {
         name: {
            bsonType: "string",
            description: "required and must be a string" },
         surname: {
            bsonType: "string",
            description: "required and must be a string" },
         email: {
            bsonType: "string",
            pattern: "^.+\@.+$",
            description: "required and must be a valid email address" },
         year_of_birth: {
            bsonType: "int",
            minimum: 1900,
            maximum: 2018,
            description: "the value must be in the range 1900-2018" },
         gender: {
            enum: [ "M", "F" ],
            description: "can be only M or F" }

So, it is possible also to implement the validation rules to the existing collections? The answer is we just need to use the collMod command instead of createCollection command.

db.runCommand( { collMod: "people3",
   validator: {
      $jsonSchema : {
         bsonType: "object",
         required: [ "name", "surname", "gender" ],
         properties: {
            name: {
               bsonType: "string",
               description: "required and must be a string" },
            surname: {
               bsonType: "string",
               description: "required and must be a string" },
            gender: {
               enum: [ "M", "F" ],
               description: "required and must be M or F" }
validationLevel: "moderate",
validationAction: "warn"

Having a Really Fixed Schema

MongoDB allows the additional fields that are not in the validation rules to be inserted into the collection. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the following parameter in the validation rule,

additionalProperties: false

The below MongoDB script shows how to use the above parameter.

db.createCollection( "people2" , {
   validator: {
     $jsonSchema: {
        bsonType: "object",
        additionalProperties: false,
		required: ["name","age"],
        properties: {
           _id : {
              bsonType: "objectId" },
           name: {
              bsonType: "string",
              description: "required and must be a string" },
           age: {
              bsonType: "int",
              minimum: 0,
              maximum: 100,
              description: "required and must be in the range 0-100" }

Speed: Great Performance

For most of the MongoDB’s queries, there is no need to JOIN multiple records. Should your application require it, MongoDB does provide the equivalent of a JOIN, the $lookup which was introduced since version 3.2. For more reading, you can find in this link.

I will stop here for now and shall return with more information in my next write up or I will continue from this post. Stay tuned.

Data Management

Data Management is the process of profiling, cleaning and transforming data sources into useful information. Data Management covers the areas of data profiling, data cleaning, data exploration, data integration and data transformation.

Currently, I am working on the part time course’s project which uses Power BI to do data profiling and data exploration of different datasets. It made me understanding the important of understanding the working datasets and know what story we want to tell as a conclusion before deciding what to be cleaned in the data cleaning process.

The project made me to understand, even there are some incorrect values or data needed to be cleaned and usually it is encouraged to be cleaned, yet it is not required to do so. I believed as long as we are able to justify the reasons of the data to be cleaned and not to be cleaned in our work and how does it affect to the overall data exploration.

Each steps serve different purposes in the ETL process.

  • Data Profiling – to get an overview structure of the data and assess the data quality.
  • Data cleaning – to improve the data quality.
  • Data exploration – to use statistics charts to get a sense of the distribution or correlation.

Data Profiling

  • Get a visual profile of the data to assess the structure and quality of the data. Using Power BI, you will get a table profile of summary statistics such as min and max values, distinct count, error or missing value counts.
  • It allows us to identify missing values and inconsistencies in the data. Based on this information, we can further assess the quality and plan for the data cleaning process.

The above view shows the Power BI Editor once we loaded the data into Power BI tool through the Get Data option. The Power BI supports different types of data files.

One limitation on this summary statistics is Power BI works on the first 1,000 records only. In other words, Power BI will not be able to show us any data error and missing values from 1001th row onward.

Similar data profiling can be found in other software tools such as SAS which allows its users to check the data quality using their tools.

Data Cleaning

Data cleaning or data cleansing is a process of detecting and correcting (or removing) incorrect values or records from the datasets and replacing, modifying and deleting the dirty data. Several things we can do during the data cleaning such as,

  • Remove duplication
  • Change text to lower/upper/proper case
  • Spell check
  • Remove extra spaces
  • Treat blank cells
  • Standardization

Are data cleaning technique essential?

The answer is yes. We spend 80% of our time in data cleaning and it is not only essentially important for data analytics and data science, it is most time consuming part to ensure the data always matches the correct fields, interact effectively and making it easy for data visualization.

Data Exploration

It is the part where the story-telling begins after all the datasets are cleaned and integrated. How well is the data transformed from its raw values and integrated together with all the datasets affect the overall quality of the data. It is important that we have sets of quality datasets before begin the data exploration.

Many times, in my experience of data exploration I found there are more data to be cleaned in order to get the right insights or stories that I want to tell in my dashboard. During the data profiling, these should be identified and any wrong outcomes from the data exploration maybe due to outliers which we overlooked earlier.

Database Stability

This is one of the common question to be asked either during a talk or during the interview. Personally, I look at this topic highly and important for every database administrator to pay attention to it.

Slow performance means tasks take longer time to complete. If it takes longer, there is more likely to overlap when multiple users or connections at the same time. It leads to frequent locks, deadlocks and resource contention and eventually leads to errors and stability issues.

Poor scalability means it has limited options when demand exceed capacity such as queue requests or reject requests. Rejecting requests result error or unexpected behaviour and this is instability. Queuing requests lead to reduced performance, putting demands on resources such as CPU, memory and etc. When it increases demands, it leads to further stability issues.

Poor stability affects performance. The partial success and partial failure must be handled, usually with database rollbacks or manual compensation logic. It is an additional resource requirements on the system whether to do rollback or process the manual compensation logic. And it affects scalability.

I found from the MSDN website, someone shared some important points when come to designing whether a database or an application. It always consider performance, scalability, and stability when architecting, building, and testing your databases and applications.