Python: Introduction III

The last part of the Python Introduction and I will cover topics on functions, methods and the packages in Python. For sure, there is a difference between function and method. I revisit my original post which I wrote about the differences between functions and methods. You can read up those before continue here.

User-defined Functions

The simplest way I can explain what is function which I wrote in my original post:

A function is a block of code to carry out a task and it calls by its name. All functions may have zero or many arguments. The arguments are passed explicitly (directly). On the exit of the function, it may or may not return value or values.

There are some examples in this post to explain about functions, how to define a function with and without arguments, uses default value for an argument, uses flexible arguments *args and **kwargs and uses of return statement in the function.


It is like a function, except it is attached to an object (dependent). A method is implicitly (indirectly) passed to the object for which it is invoked. It may or may not return a value or values. The method is accessible to data that is contained within the class.

For methods examples, I wrote it in this post.


Think the packages as a directory of Python scripts. Example each .py script is a module. This module specifies functions, methods and types in solving a particular problem. I found a link which explained in detail about packages in Python. Refer here for more reading.

In this part III, I know there are many external links are given, mainly is to reduce re-write of those entries which I wrote them sometimes ago. This blog serves as a place to find the relevant resources for reading and examples which I think it is enough to cover the basic understanding of the functions, methods and packages in Python.


World Bank World Development Indicator Case Study with Python

In the DataCamp’s tutorials, Python Data Science Toolbox (Part 2), it combines user defined functions, iterators, list comprehensions and generators to wrangle and extract meaningful information from the real-world case study.

It is going to use World Bank’s dataset. The tutorials will use all which I have learned recently to work around with this dataset.

Dictionaries for Data Science

The zip() function to combine two lists into a zip object and convert it into a dictionary.

Before I share the sample code, let me share again what the zip() function does.

Using zip()
It allows us to stitch together any arbitrary number of iterables. In other words, it is zipping them together to create a zip object which is an iterator of tuple.

# Zip lists: zipped_lists
zipped_lists = zip(feature_names, row_vals)

# Create a dictionary: rs_dict
rs_dict = dict(zipped_lists)

# Print the dictionary

# Output: {'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Adolescent fertility rate (births per 1,000 women ages 15-19)', 'IndicatorCode': 'SP.ADO.TFRT', 'Year': '1960', 'Value': '133.56090740552298'}

Next, the tutorial wants us to create an user defined function with two parameters. I can re-use the above code and add an user defined function and call it with passing two arguments, feature_names and row_vals.

# Define lists2dict()
def lists2dict(list1, list2):
    """Return a dictionary where list1 provides
    the keys and list2 provides the values."""

    # Zip lists: zipped_lists
    zipped_lists = zip(list1, list2)

    # Create a dictionary: rs_dict
    rs_dict = dict(zipped_lists)

    # Return the dictionary
    return rs_dict

# Call lists2dict: rs_fxn
rs_fxn = lists2dict(feature_names, row_vals)

# Print rs_fxn

It should give the same result when run the codes. Next, tutorial requires me to use list comprehension. It requires to turn a bunch of lists into a list of dictionaries with the help of a list comprehension, where the keys are the header names and the values are the row entries.

The syntax,
[[output expression] for iterator variable in iterable]

The question in the tutorial is,
Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts.

The code on the screen before I coded. As above, sublist is the iterator variable, so it is substituted in between “for” and “in” keyword. The instruction says,
for each sublist in row_lists

indirectly, it means,

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [--- for sublist in row_lists]

The lists2dict() function which I created above returns a dictionary. The question says,
generates a dictionary using lists2dict()

indirectly it means calling the lists2dict() function at the output expression. But if I code,

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, row_lists) for sublist in row_lists]

The output was very wrong and when I clicked the “Submit” button, it prompted me error message,
Check your call of lists2dict(). Did you correctly specify the second argument? Expected sublist, but got row_lists.

It expected sublist and yes, the for loop is reading each list in the row_lists. I have a code to print each list,

It is more meaningful to use sublist as 2nd argument rather than using row_lists. Therefore, the final code is,

# Print the first two lists in row_lists

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Print the first two dictionaries in list_of_dicts
{'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Adolescent fertility rate (births per 1,000 women ages 15-19)', 'IndicatorCode': 'SP.ADO.TFRT', 'Year': '1960', 'Value': '133.56090740552298'}
{'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Age dependency ratio (% of working-age population)', 'IndicatorCode': 'SP.POP.DPND', 'Year': '1960', 'Value': '87.7976011532547'}

The above code is really taking my time to find out what I should code and why it did not get a right code. That did not stop me from continuing my tutorial.

Turning it into a DataFrame
Up to here, this case study I did a zip() function, put it into an user defined function and used the new created function in list comprehensions to generate a list of dictionaries.

Next, the tutorial wants to convert the list of dictionaries into Pandas’ DataFrame. First and foremost, I need to import the Pandas package. Let refer to the code below:

# Import the pandas package
import pandas as pd

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Turn list of dicts into a DataFrame: df
df = pd.DataFrame(list_of_dicts)

# Print the head of the DataFrame

Summary of the day:

  • zip() function combines the lists into a zip object.
  • Use user defined function in list comprehension.
  • Convert list comprehension into a DataFrame.

Day 39: Using Iterator for Big Data

The above illustrate the real scenario of data science where often they need to load a big chunk of data and sometimes it is too huge to be handled by the memory. The usage of Pandas, read_csv() function and setting the chunksize, it helps to load data in a smaller chunk, process the data and store the result somewhere before discard the chunk to load the next set to be processed. This is where iterator becomes useful.

Examples as below:

Either we use a variable “total” to hold the sum’s result or we can create an empty dictionary to perform the same computation and it gives the same result. Below is the exercise I did in DataCamp’s online learning website using the Twitter’s data.

# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
            counts_dict[entry] = 1

# Print the populated dictionary

Create an empty dictionary, iterate over the csv file with chunksize is 10. Read the ‘lang’ column in the chunk and iterate again to get the count of each ‘lang’ in the .csv file. The output on the screen when executed is:

{‘en’: 97, ‘et’: 1, ‘und’: 2}

Let us convert the above code into an user defined function and takes three parameters, the csv filename, the chunk size and column name in the csv file. The updated version of the code with user defined function looks as below:

# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv', 10, 'lang')

# Print result_counts

It gives the same result as the previous code.

Summary of the day:

  • Iterator for Big Data.
  • Using Pandas’ read_csv().
  • Using dictionaries, for loop statement to iterate data.
  • Create an user defined function for the above and call to the function to print out result.

Day 36: Lambda Functions in Python

After mastering how to write our own functions, we can quickly write functions on the fly by using the keyword, lambda.

The syntax is,
lambda value1, value2 : expression

Main keywords for the above syntax:
lambda: to indicate it is a shorthand of a function declaration.
argument: value1, value2, etc are the names of arguments.
colon (:): to indicate beginning of the expression.
expression: specifies what we wish the function to return.

Lambda functions allow us to write functions in quick and simplified way. However, it is not advisable to use it all the time.

Sample code:

raise_to_power = lambda x, y: x ** y

Another example extracted from my exercise in DataCamp:

# Define echo_word as a lambda function: echo_word
echo_word = (lambda word1, echo: word1 * echo)

# Call echo_word: result
result = echo_word('hey', 5)

# Print result

#Output: heyheyheyheyhey

If you run the above code at the Jupyter Notebook’s online, the output is,

To understand the idea of using lambda function, let us try to use the map() function.

map() function applies a function over an object such as a list. Use the lambda function to define the function that map() will use to process the object. See sample code below:

nums = [2,4,6,8,10]
result = map(lambda a: a ** 2, nums)

#Convert into a list and print
list_result = list(result)

#Output: [4, 16, 36, 64, 100]

The nums is a list. The map object that results from the call to map() is stored in the variable result. What you can do next is convert the map object into a list and print out the value. How to do so?

The line of code below converts map object into a list,
list_result = list(result)

and you can see the output of executing the above code, all the numbers in the nums list are power by 2. In another simple example of using map() function with lambda,

# Create a list of strings: spells
spells = ["protego", "accio", "expecto patronum", "legilimens"]

# Use map() to apply a lambda function over spells: shout_spells
shout_spells = map(lambda item: item + '!!!', spells)

# Convert shout_spells to a list: shout_spells_list
shout_spells_list = list(shout_spells)

# Convert shout_spells into a list and print it

By now, you will know what the output is. All items in the lists are concatenated with ‘!!!”. Now we try something new!

Below exercise extracted from DataCamp, I was trying to use a lambda function with filter() function. This filter() function offers a way to filter out elements from a list that do not satisfy certain criteria.

The question of the exercise:
In the filter() call, pass a lambda function and the list of strings, fellowship. The lambda function should check if the number of characters in a string member is greater than 6; use the len() function to do this. Assign the resulting filter object to result. Convert result to a list and print out the list.

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'pippin', 'aragorn', 'boromir', 'legolas', 'gimli', 'gandalf']

# Use filter() to apply a lambda function over fellowship: result
result = filter(lambda member: len(member) > 6, fellowship)

# Convert result to a list: result_list
result_list = list(result)

# Convert result into a list and print it

#Output: ['samwise', 'aragorn', 'boromir', 'legolas', 'gandalf']

The reduce() function is useful for performing some computation on a list and, unlike map() and filter(), returns a single value as a result. To use reduce(), you must import it from the functools module.

# Import reduce from functools
from functools import reduce

# Create a list of strings: stark
stark = ['robb', 'sansa', 'arya', 'brandon', 'rickon']

# Use reduce() to apply a lambda function over stark: result
result = reduce(lambda item1, item2: item1 + item2, stark)

# Print the result

#Output: robbsansaaryabrandonrickon

I am not sure in what situation the reduce() function will be used. If I come across it next time, I will share about it.

Summary of the day:

  • lambda function.
  • map() function.
  • filter() function.
  • reduce() function from functools module.

Day 35: Nested Functions in Python

In layman term, it is an inner function defined inside a function. There is few reasons why sometimes we need to use nested functions.
You use inner functions to protect them from everything happening outside of the function, meaning that they are hidden from the global scope.

def outer(num1):
    def inner_increment(num1):  # Hidden from outer code
        return num1 + 1
    num2 = inner_increment(num1)
    print(num1, num2)

# outer(10)

When we try to execute the above codes, it throws error,
name ‘inner_increment’ is not defined.

Now, try again by commenting the line of code, inner_increment(10) and uncomment the line of code, outer(10), then execute the codes. It returns us a result with two values because the print statement has 2 values.

We cannot access to the inner function (nested function) when we tried to call inner_increment() because it is hidden from the global scope. By calling the outer function, outer() and pass in an argument, it

Another example,

When we try to execute this code by calling the raise_val(), we do not need to repeatedly write the codes twice.

#function call
square = raise_val(2)
cube = raise_val(3)
print(square(2), cube(4))

#4 64

I have a question before proceed, how does the line of code works?
print(square(2), cube(4))

While n value (argument) for function raise_val() is 2 and 3 respectively. the variable square and cube pass an argument too.

Keeping it DRY
Maybe, you have a function that performs the same chunk of code in numerous places. DRY means “don’t repeat yourself”. In an example I found online, you might write a function that processes a file, and you want to accept either an open file object or a file name.

The code looks like,

def process(file_name):
    def do_stuff(file_process):
        for line in file_process:
    if isinstance(file_name, str):
        with open(file_name, 'r') as f:
# Define three_shouts
def three_shouts(word1, word2, word3):
    """Returns a tuple of strings
    concatenated with '!!!'."""

    # Define inner
    def inner(word):
        """Returns a string concatenated with '!!!'."""
        return word + '!!!'

    # Return a tuple of strings
    return (inner(word1), inner(word2), inner(word3))

# Call three_shouts() and print
print(three_shouts('a', 'b', 'c'))

#Output returns a tuple of 3 elements
#('a!!!', 'b!!!', 'c!!!') 

Remember that assigning names will only create or change local names, unless they are declared in global or nonlocal statements using keywords, global or nonlocal.

The syntax,

def outer():
  n = 1
  def inner():
    nonlocal n
    n = 2

The above code alters the value of n in the enclosing scope. When outer() function is called, the n = 1 has changes its value by inner() function using the keyword, nonlocal n. Therefore, both print statements return 2.


Python Scope Rules in Function

Scope Rules in Functions

Functions provide a nested namespace (sometimes called a scope), which localizes the names they use, such that names inside the function will not clash with those outside (in a module or other function). We usually say that functions define a local scope, and modules define a global scope.

The LGB Rule:
– Name references search at most three scopes: local, then global, then built-in.
– Name assignments create or change local names by default.
– “Global” declarations map assigned names to an enclosing module’s scope.

In other words, all names assigned inside a function def statement are locals by default; functions can use globals, but they must declare globals to change them.

When you use an unqualified name inside a function, Python searches three scopes—the local (L), then the global (G), and then the built-in (B)—and stops at the first place the name is found.

When you assign a name in a function (instead of just referring to it in an expression), Python always creates or changes the name in the local scope, unless it’s declared to be global in that function.

An example code is given to further illustrate the concept.

# global scope
X = 99                # X and func assigned in module: global
def func(Y):          # Y and Z assigned in function: locals
    # local scope
    Z = X + Y         # X is not assigned, so it's a global
    return Z

func(1)               # func in module: result=100

X and func are global names because it is assigned at top-level
Y and Z are local names to the function.

Local scope vs global scope

The first example below shows the local scope, new_val within the function, is not related to the global new_val = 10. The local scope uses the value = 3 to calculate the square of the number. As a result, it returns 9 when function square() is called and global new_val did not change.

In my earlier paragraph, “functions can use globals, but they must declare global to change them. ” Below sample illustrates the meaning of this paragraph where,

global new_val

is declared within the function and now, the new_value = 10 and it calculates the square of the number which returns 100. It does not take the parameter passes in square() function. Here, the new_val has been modified.

And, this looks different that the following sample code, where it takes reference of the global variable new_val = 10 and did not change it within the function after the calculation. It then, reassigns with a new value of new_val = 20 and repeats the function calls.

Built-in scope

To use the built-in scope in Python, a module called builtins is needed. After executing the code,
import builtins

we can execute the next code,

which print a list of all the names in the module builtins. You can find some familiar names within the list.


Summary of the day:

  • Python’s scope rules
  • global scopes
  • local scopes
  • built-in scopes

Python: Pig Latin

  • If word starts with a vowel, add ‘ay’ to end.
  • If word does not start with a vowel, put first letter at the end, then add ‘ay’
  • Example:
    • word > ordway
    • apple > appleay

To further understand and practise the functions in Python, the tutor shared one of the case studies in the online web learning. The case study detail is as above and the workaround is taking the first character and check if it matches the condition.

If yes, then appends the ‘ay’ to the end.
If no, then takes the 2nd character onward, appends the 1st character and follows by appending ‘ay’ to the end.

Sample code as below screenshot: