## Intermediate Python for Data Science: Looping Data Structure

After the matplotlib for visualization, introduction to dictionaries and Pandas DataFrame, follows by logical, Boolean and comparison operators with if-elif-else control flow and now, comes to the last part, the while loop, for loop and loop for a different data structure.

In Python, some of the objects are iterable which means it loops through the object in a list, for example, to get each element. It loops through a string to capture each character in the string. A for loop iterates over a collection of things and while loop can do any kind of iteration within the block of codes, while some condition remains True

## For Loop

The main keywords are for and in. It uses along with colon (:) and indentation (whitespace). Below is the syntax,

```#loop statement
my_iterable = [1,2,3]
for item_name in my_iterable:
print(item_name)
```

I used two iterator variables (index, area) with enumerate(), for example, the sample code below. enumerate() loops over something and has an automatic counter, then returns an enumerate object.

```# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Change for loop to use enumerate() and update print()
for index, area in enumerate(areas) :
print("room " + str(index) + ": " + str(area))

#Output:
"""
room 0: 11.25
room 1: 18.0
room 2: 20.0
room 3: 10.75
room 4: 9.5
"""
```

Another example utilizes a loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.

```# house list of lists
house = [["hallway", 11.25],
["kitchen", 18.0],
["living room", 20.0],
["bedroom", 10.75],
["bathroom", 9.50]]

# Build a for loop from scratch
for x in house:
print("the " + str(x[0]) + " is " + str(x[1]) + " sqm")

# Output:
"""
the hallway is 11.25 sqm
the kitchen is 18.0 sqm
the living room is 20.0 sqm
the bedroom is 10.75 sqm
the bathroom is 9.5 sqm
"""
```

Definition of enumerate() can be found here. My post on for loop is here.

## While Loop

The main keyword is while, colon (:) and indentation (whitespace). Below is the syntax,

```# while loop statement
while some_boolean_condition:
# do something

# Examples
x = 0
while x < 5:
print(f'The number is {x}')
x += 1
```

An example of putting an if-else statement inside a while loop.

```# Initialize offset
offset = -6

# Code the while loop
while offset != 0 :
print("correcting...")
if offset > 0:
offset = offset - 1
else:
offset = offset + 1
print(offset)

# Output:
"""
correcting...
-5
correcting...
-4
correcting...
-3
correcting...
-2
correcting...
-1
correcting...
0
"""
```

My post on while loop is here.

## Loop Data Structure

Dictionary:
If you want to iterate over key-value pairs in a dictionary, use the items() method on the dictionary to define the sequence in the loop.

```for key, value in my_dic.items() :
```

Numpy Array:
If you want to iterate all elements in a Numpy array, use the nditer() function to specify the sequence.

```for val in np.nditer(my_array) :
```

Some examples as below:

```# Definition of dictionary
'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }

# Iterate over europe
for key, value in europe.items():
print("the capital of " + key + " is " + value)

# Output:
the capital of austria is vienna
the capital of norway is oslo
the capital of italy is rome
the capital of spain is madrid
the capital of germany is berlin
the capital of poland is warsaw
the capital of france is paris
"""

# Import numpy as np
import numpy as np

# For loop over np_height
for x in np_height:
print(str(x) + " inches")

# For loop over np_baseball
for x in np.nditer(np_baseball):
print(x)

```

Loop over DataFrame explanation and example can be found in my post here.

## Intermediate Python for Data Science: Logic, Control Flow and Filtering

Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. Also learn to filter data in pandas DataFrames using logic.

In the earlier days when I started to learn Python, there is a topic on Boolean and Comparison Operators, where I studied Boolean (True and False), logical operators (‘and’, ‘or’, ‘not’) and comparison operators (‘==’ ‘!=’, ‘<‘ and ‘>’).

Comparison operators can tell how two Python values relate and result in a Boolean. It allows to compare two numbers, strings or any same type of variables. It throws exception or error message when it is comparing a variable from a different data type. Python cannot tell how the two objects of different type relate.

## Comparison a Numpy array with an integer

Based on the example above taken from a tutorial in DataCamp online learning course that I am taking currently, the variable bmi is a Numpy array, then it compares if the bmi is greater than 23. It works perfectly and returns the Boolean values. Behind the scenes, Numpy builds a Numpy array of the same size, perform an element-wise comparison, filtered with the number 23.

## Boolean operators with Numpy

To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here’s an example on the my_house and your_house arrays from before to give you an idea:

```np.logical_and(your_house > 13,
your_house < 15)
```

Refer to below for the sample code:

```# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, my_house < 10))

# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11, your_house < 11))
```

The first print statement is checking on the ‘or’ condition means, any one of the two condition is true, it returns true. The second print statement is checking on the ‘and’ condition means, both of the comparison has to be True then it returns a True. The output of the execution returns in Boolean array as below:

```[False  True False  True]
[False False False  True]
```

## Combining Boolean operators and Comparison operators with conditional statement, if, else and elif.

It follows the if statement syntax. The most simplest code which can be used to explain the above,

```z = 4
if z % 2 == 0:
print('z is even')
```

Same goes to the if else statement with comparison operator, see code below:

```z = 5
if z % 2 == 0:
print('z is even')
else:
print('z is odd')
```

Or if you are working with if, elif and else statement, it works too. See the code below:

```z = 6
if z % 2 == 0:
print('z is divisible by 2')
elif z % 3 == 0:
print('z is divisible by 3')
else:
print('z is neither divisible by 2 nor 3')
```

In the example above, both first and second condition are matched, however, in this control structure, once Python hits into a condition that returns a True value, it executes the corresponding code and exits the control structure after that. It will not execute the next condition, corresponding to the elif statement.

## Filtering Pandas DataFrame

For an example taken from DataCamp’s tutorial, using the DataFrame below, select countries with area over 8 millions km. There are 3 steps to achieve this.

Step 1: select the area column from the DataFrame. Ideally, it gets a Pandas Series, not a Pandas DataFrame. Assume that the DataFrame is called bric, then it calls the column area using,

```brics["area"]

#alternatively it can use the below too:
# brics.loc[:, "area"]
# brics.iloc[:, 2]
```

Step 2: When the code adds in the comparison operator to see which rows have an area greater than 8, it returns a Series containing Boolean values. The final step is using this Boolean Series to subset the Pandas DataFrame.

Step 3: Store this Boolean Series as ‘is_huge’ as below:

```is_huge = brics["area"] > 8
```

Then, creates a subset of DataFrame using the following code and the result returns as per the screenshot:

```brics[is_huge]
```

It shows those countries with ares greater than 8 million km. The steps can be shorten into 1 line of code:

```brics[brics["area"] > 8]
```

Also, it is able to work with Boolean operators (np.logical_and(), np.logical_or() and np.logical_not()). For example, if it looks for areas between 8 and 10 km, then the single line code can be:

```brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]
```

The result returns from the above code is Brazil and China.

## Python: formatter

Below shows how to do more complicated string formatting. Refer below for the sample code:

```formatter = "{} {} {} {}"

print(formatter.format(1,2,3,4))
print(formatter.format("one","two","three","four"))
print(formatter.format(False, False, True, False))
print(formatter.format(formatter, formatter, formatter, formatter)
print(formatter.format(
"First thing",
"that we can try",
"maybe is having",
"a line of sentence"))
```

It is using something called function to turn the formatter variable into other strings. When the code write, formatter.format, it tells the Python compiler to do the following:

• Take its formatter string declare in the first line.
• Call its format function.
• Pass to format function, the 4 arguments which matches up with the 4 curly brackets {}s in the formatter variable.
• The result of calling format on formatter is the new string that has the {} replaced with four variables.

This is what the print statement prints.
1,2,3,4
one, two, three, four
False, False, True, False
{} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {}
First thing that we can try maybe is having a line of sentence

## Python: Introduction IV

It has been a while back I wrote about Python Introduction I, II and III. Today, I am going to complete the last part of the introduction, the NumPy. Months ago during my Python’s self-learning time, I wrote about NumPy, here is the link.

### NumPy

It is an alternative to Python List, the NumPy array helps us to solve problems dealing with Python List’s operations. Calculations on Python Lists cannot be done in the same way we do for two integers or strings. This package needs to be installed before we can import and use it.

In my blog above, I wrote about the behaviour of the NumPy Array. It does not allow different types of elements in the array. When a NumPy Array is built, element’s data type changed to end up with a homogeneous list. Supposed, the list contains a string, number and Boolean, now it changes to all string format, for an example.

Also, the operator “+”, “-” and etc which we used along with Python List, are different in NumPy Array. Refer below for an example:

```py_list = [1,2,3]
numpy_array = np.array([1,2,3])
py_list + py_list
numpy_array + numpy_array
```

First output shows the two lists are merged or combined together into a single list. Second output shows an array returns an output of addition of those numbers. The screenshot below shows the result which I used Jupyter Notebook to execute.

Whatever that it has covered in the link above is good enough to give us a basic understanding of Numpy. If you wish to learn more, there is another link I found from the Medium which we can refer to.

### NumPy Subsetting

Specifically for NumPy, there is a way of doing list subsetting, using an array of Boolean. Example below shows how we can get all the BMI values above 23. Refer to the example from DataCamp,

First result returns as Boolean, True if BMI value is above 23. Then, you can use this Boolean array inside a square bracket to do the subsetting. When the Boolean’s value is True, it selects its value.

In short, it is using the result of the comparison to make a selection of data.

### 2D NumPy Array

I covered the 2D NumPy Array in this link, where it shows how to declare a 2D NumPy Array and how does it work in subsetting, indexing, slicing and perform math operations.

### NumPy: Basic Statistics

You can generate summary statistics of the data using NumPy. Python NumPy has few useful statistical functions which can be used for analytics. It includes finding min, max, average, standard deviation, variance and etc. from a given elements in the array. Refer to my write up on this basic statistics in this link.

## Python: Introduction III

The last part of the Python Introduction and I will cover topics on functions, methods and the packages in Python. For sure, there is a difference between function and method. I revisit my original post which I wrote about the differences between functions and methods. You can read up those before continue here.

## User-defined Functions

The simplest way I can explain what is function which I wrote in my original post:

A function is a block of code to carry out a task and it calls by its name. All functions may have zero or many arguments. The arguments are passed explicitly (directly). On the exit of the function, it may or may not return value or values.

There are some examples in this post to explain about functions, how to define a function with and without arguments, uses default value for an argument, uses flexible arguments *args and **kwargs and uses of return statement in the function.

## Methods

It is like a function, except it is attached to an object (dependent). A method is implicitly (indirectly) passed to the object for which it is invoked. It may or may not return a value or values. The method is accessible to data that is contained within the class.

For methods examples, I wrote it in this post.

## Packages

Think the packages as a directory of Python scripts. Example each .py script is a module. This module specifies functions, methods and types in solving a particular problem. I found a link which explained in detail about packages in Python. Refer here for more reading.

In this part III, I know there are many external links are given, mainly is to reduce re-write of those entries which I wrote them sometimes ago. This blog serves as a place to find the relevant resources for reading and examples which I think it is enough to cover the basic understanding of the functions, methods and packages in Python.

## Python: Introduction II

I continue from the Python: Introduction which I wrote it yesterday and it gave the very basic idea of how Python is in term of declaring variables, the data types and how we can store the data in a collection. So. variable, data type and collections. If you missed out or cannot recall them, here is the link to yesterday’s post. There are links to various data types and the collections. I did not want to repeat here.

For the part 2, I have decided to concentrate on writing the introduction to control flows. I think it will be great to have these control flows being taught first before we head to more Data Science orientated topic such as using the package called Numpy and creating our own functions and methods. So, I swapped a little bit in term of syllabus or topic in my writing.

## if, elif, else

It is a conditional statement which we can use to match certain conditions. There are few methods to write this statement and it is not always that we need to use elif and else. Let us look at this syntax below:

```if condition :
expression
```

This syntax has 1 single condition to match only, that is why “if” is being used. Example:

```z = 4
if z % 2 == 0 :
print("z is even")
```

All control flows has a standard syntax and indentation applied to each of them to mark the beginning to the expression or what should it does when the condition is matched. Therefore, you can see the print() statement is slightly indented. In most IDEs, it is automatically indented when we use the colon (:) sign after the condition, in this case, z % 2 == 0.

The moment we have 1 more condition in our code, if else statement is used. See the syntax below:

```if condition :
expression
else :
expression
```

In the else statement, often we do not need to specify the condition within the line because it is understood that when the if statement or condition did not match then it goes to the else statement and execute the code. It looks as though else statement is a default statement. I know in some other programming language, it does have default statement at the end of else statement which say, anything not matches then run default statement.

We can omit this else statement when we do not require to process anything if the first condition does not match. However, sometimes, we might miss out important scenarios of the if statement is skipped. What I meant here is besides being legit that the condition does not match, there is also possibilities of exceptions happened while checking through the conditions. It is recommended to use the else statement as an exception handling, either print out a line in the console or log file. This helps in the debugging procedure.

```z = 5
if z % 2 == 0 :
print("z is even")
else
print("z is odd")
```

The above is an example of using if-else statement when there are two conditions, either-or situation. The variable z is 5, hence it goes to the else statement and print out “z is odd”.

Next, if-elif-else statement is used when there are few conditions in the scenario which one may get matched during the condition checking. When the first condition does not match, it goes to the next elif condition to check until it has no matches, it will end at else statement. You can have many elif statements in your codes. Below is the syntax:

```if condition :
expression
elif condition :
expression
else :
expression
```

Example of using the syntax:

```z = 3
if z % 2 == 0 :
print("z is divisible by 2")
elif z % 3 == 0 :
print("z is divisible by 3")
else :
print("z is neither divisible by 2 nor by 3")
```

The output is z is divisible by 3. After each expression, the statement terminates and returns the result. It will not proceed to check whether next condition is matched. With these 3 examples, I hope it gives some ideas about if statement, if-else statement and if-elif-else statement.

## while

while statement works by repeating an action until condition is met. It is important to assess the code before running the while statement because if any chances the condition is not met, the statement will keep running and this we call it infinite loop. You have to force to end the application manually. The syntax for while statement:

```while condition :
expression
```

Example of using the while syntax:

```x = 0
while x < 5:
print(f'The number is {x}')
x += 1
```

The most crucial part here is the variable x which works as a counter to ensure the condition is met. Without this line of code (x += 1 ) the condition is always true and the loop becomes infinite.

This is the output from executing the while statement. When x = 5, it stops and exits from the while statement and does not print out anything.

## for

Remember in the previous blog, I mentioned about using Python list (collections)? For statement is a good control flow to iterate (repeat) through the Python list to get each element.

```for var in seq :
expression
```

Without using for statement, we might want to repeat few times of the print statements to print out the elements inside the Python list below:

```fam = [1.73, 1.68, 1.71, 1.89]

print(fam[0])
print(fam[1])
print(fam[2])
print(fam[3])
```

Although this is a correct syntax, it is not a good practice. Below demonstrates how to use a for statement to iterate through the Python list and print the values out.

```for height in fam :
print(height)
```

Both of the codes returns the same output.

```1.73
1.68
1.71
1.89
```

For statement works well for any types of collections or even with a string. Below example uses my_string as the “list” and a variable character as the “item_name” to represent the elements inside the list. One by one, it prints out each character in the my_string.

In for statement, we can use the enumerate(), a Python built-in function. enumerate() method adds a counter to an iterable and returns it in a form of enumerate object. This enumerate object can then be used directly in for loops or be converted into a list of tuples using list() method.

### Why is enumerate return a tuple?

When enumerate() returns in a form of enumerate object, it comes in a form of index, value. It is because enumerate() accepts start parameter which is the index value of the counter, by default it is 0. A simple illustration is as below:

```enumerate(iterable, start=0)
```
```list = ["eat","sleep","repeat"]
print(enumerate(list)
```

When we check the output from the console, it shows as below:

```[(0, 'eat'), (1, 'sleep'), (2, 'repeat')]
```

It starts with an index 0, of course, it can be changed with indicating the index value, such as enumerate(list,1), then the index begins with 1 instead of 0. This enumerate() function may look useful when we want to list the elements from the collections with the index and value.

```fam = [1.73, 1.68, 1.71, 1.89]
for index, height in enumerate(fam) :
print("index " + str(index) + ": " + str(height))
```

Reusing the above example and now, we have added enumerate(fam) in the for statement instead of using “for height in fam”. Then, in the print() statement, we convert the index value and height value to string and concatenate them. This maybe useful when we want to print out our shopping cart’s items list. Its output shows:

```index 0: 1.73
index 1: 1.68
index 2: 1.71
index 3: 1.89
```

Mastering the use of the control flows can help in the later stage when we go into the data structure section. I have written separate blogs about if-else statements, while and for loops, you can refer to the links below:

## Python: Introduction I

It has been a while I stopped learning Python from DataCamp due to my part time classes and assignment, and work commitment. It is not easy to keep track each of them everyday. On top of it, I still have my volunteer work with TechLadies and regularly have to meet up to brainstorm and updates each other.

Today’s topic is very much on Python, definitely. I want to concentrate on my writing in Python for next 2 weeks before I head off for an holiday. I am sure, I will be lazy after my break. It would be great if I can write up something to summarize or reorganize what I have been writing for the past few months on my Python’s learning using DataCamp and Udemy.

Remember my very first day I started learning Python using Udemy, it taught about the installation and I went on to install IntelliJ. Till to date, I hardly using it, most of my time, I am using the online version of Jupyter Notebook. I find it pretty easy to be used. I understood that there are many other IDEs in the market and there is no specific software to be used to code Python. For now, I will just keep it simple for my learning.

## How to begin?

After the installation of the python 3, I open the terminal (in Linux) or command prompt (Windows) to go into the Python’s shell by typing the following command:

```python
```

From the terminal or command prompt screen, I can see a return message from Python with the version number. There are Python 2 and Python 3. So, be clear on which version is being used on the machine because the syntax are slightly different from each others.

## Checking version

On the very first time, we always want to know if everything we installed for Python works or not. Checking the version, if it is updated, latest and correct version to be used is first time we might want to do with:

```python --version
```

Simple open up your terminal or command line to type the above command on it. On the screen, it may return you the version info such as below:

```Python 3.7.0
```

## print(‘Hello World’)

Next, we always start with simple print statement using the built-in function named print() to print out some lines, most often we will print in our first line is “Hello World”. Really, most people who first started learning programming language will have this line printed. I use this function everywhere in my coding and it is very useful. It is just same as the PRINT statement from the SQL server, if you are coming from database background. Using single quote or double quote is not a matter.

```print('Hello World!')
print("Hello World!")
```

## Variables and Types

Then, we touch on the variables and types, the important component in most programming languages. Variables and types are interrelated. I discussed about the characteristic of a variable in my first post. Let me have it here too!

• Specific and case-sensitive name, best practice to use lowercase.
• Define things that are subject to change.
• Can be used to store texts, numbers or dates.
• Cannot use space  and symbols in the name, use _ instead.

Then, there are plenty of different data types as well, yes, that is the types I meant here. Remember, different types have different behaviours. I wrote many posts about each of them before. I will link them up whenever we re-visit the topic.

• Boolean operations: and, or, not (True, False)
• Numeric types: int, float, complex (number, decimal)
• Text sequence type: str (string)
• Sequence type: list, tuple, range
• Mapping type: dict
• Sets type: set
• None is frequently used to represent the absence of a value, as when default arguments are not passed to a function. It is a null value or no value at all which is different than empty string, 0 or False.

The simplest way to demonstrate how we can create a variable and assign a value to it.

```height = 1.67
weight = 180

name = 'Joanne'
gender = 'Female'

isStudent = True
```

The above shows the height and weight variables in float and int data type, then we have name and gender in string and a variable called isStudent with a Boolean value. In Python, it does not require to declare a variable with any prefix in front of or behind the variable which we can see in Javascript or SQL Server, if you are familiar with those languages. Then, you may ask how does compiler (computer) knows it is of what type of data types.

## What is the difference between (=) and (==)?

The single equal sign (=) assigns the value on the right to a variable on the left whereas the double equal sign (==) tests if the two things have the same value. The two things can be a comparison of two variables or a variable with a math operator.

## type()

```type(height)
type(weight)
type(name)
type(isStudent)
```

type() is a built-in function which allows us to check the data types of the variables we created with assigned values. type() helps to answer the above question.

That completed the fundamental and basic to code in Python. Now, you know how to do the following:

• use the print() statement to print texts.
• use of variables and data types.
• use the type() statement to print out the data type of a variable.

Probably, now you want to know what is integer, string, Boolean and etc. I have some links here to help out the basic explanation together with examples:

To talk about numbers and strings, it can be another topics by its own as there are many interesting about them such as the use of (+) sign. It is concatenate sign which means it combines two or more variables of same type together. The way number and string use (+) sign also difference than each other. Also, we have to remember that in Python, string and integer cannot use of (+) sign together. It throws exception (error). Exception is a programming jargon means error. There is a topic of exception handling in Python too. In this case, there is string formatting and integer formatting.

Let us move into fundamental part two, Python List.

## Python Collections

It is an interesting topic and important part in Python. Almost everyone of us will use Python List in our daily coding life 🙂 It is a collection of values and allows to have different types within the elements, one of the most simplest and easiest collections. When it comes to the word “collection”, Python has four type of collections.

You can read more about the basic of these collection here. Each of them has different characteristics, syntax, structure and usage. Along the way, we use different collections to explain the Python codes and concepts. Below is an example of how list looks like:

```fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
```

Declaring a list is same as declare a variable, it just requires to follow the list’s syntax to create one. As mentioned earlier, it can be any data types in a list. So, you can declare a list as below too:

```family = ['Anna', 1.73, 'Eddie', 1.68, 'Mother', 1.71, 'Father', 1.89]
```

We can use the lists above to work with control flows, going through the iteration and/or condition checking, then calculate a value and return a result. I think I will cover it in the next post.

Up to now, this portion is still a basic Python and does not involve any analytics or data science work if you are looking for one.