World Bank World Development Indicator Case Study with Python

In the DataCamp’s tutorials, Python Data Science Toolbox (Part 2), it combines user defined functions, iterators, list comprehensions and generators to wrangle and extract meaningful information from the real-world case study.

It is going to use World Bank’s dataset. The tutorials will use all which I have learned recently to work around with this dataset.

Dictionaries for Data Science

The zip() function to combine two lists into a zip object and convert it into a dictionary.

Before I share the sample code, let me share again what the zip() function does.

Using zip()
It allows us to stitch together any arbitrary number of iterables. In other words, it is zipping them together to create a zip object which is an iterator of tuple.

# Zip lists: zipped_lists
zipped_lists = zip(feature_names, row_vals)

# Create a dictionary: rs_dict
rs_dict = dict(zipped_lists)

# Print the dictionary
print(rs_dict)

# Output: {'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Adolescent fertility rate (births per 1,000 women ages 15-19)', 'IndicatorCode': 'SP.ADO.TFRT', 'Year': '1960', 'Value': '133.56090740552298'}

Next, the tutorial wants us to create an user defined function with two parameters. I can re-use the above code and add an user defined function and call it with passing two arguments, feature_names and row_vals.

# Define lists2dict()
def lists2dict(list1, list2):
    """Return a dictionary where list1 provides
    the keys and list2 provides the values."""

    # Zip lists: zipped_lists
    zipped_lists = zip(list1, list2)

    # Create a dictionary: rs_dict
    rs_dict = dict(zipped_lists)

    # Return the dictionary
    return rs_dict

# Call lists2dict: rs_fxn
rs_fxn = lists2dict(feature_names, row_vals)

# Print rs_fxn
print(rs_fxn)

It should give the same result when run the codes. Next, tutorial requires me to use list comprehension. It requires to turn a bunch of lists into a list of dictionaries with the help of a list comprehension, where the keys are the header names and the values are the row entries.

The syntax,
[[output expression] for iterator variable in iterable]

The question in the tutorial is,
Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts.

The code on the screen before I coded. As above, sublist is the iterator variable, so it is substituted in between “for” and “in” keyword. The instruction says,
for each sublist in row_lists

indirectly, it means,

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [--- for sublist in row_lists]

The lists2dict() function which I created above returns a dictionary. The question says,
generates a dictionary using lists2dict()

indirectly it means calling the lists2dict() function at the output expression. But if I code,

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, row_lists) for sublist in row_lists]

The output was very wrong and when I clicked the “Submit” button, it prompted me error message,
Check your call of lists2dict(). Did you correctly specify the second argument? Expected sublist, but got row_lists.

It expected sublist and yes, the for loop is reading each list in the row_lists. I have a code to print each list,
print(row_lists[0])

It is more meaningful to use sublist as 2nd argument rather than using row_lists. Therefore, the final code is,

# Print the first two lists in row_lists
print(row_lists[0])
print(row_lists[1])

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Print the first two dictionaries in list_of_dicts
print(list_of_dicts[0])
print(list_of_dicts[1])
#Output:
{'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Adolescent fertility rate (births per 1,000 women ages 15-19)', 'IndicatorCode': 'SP.ADO.TFRT', 'Year': '1960', 'Value': '133.56090740552298'}
{'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Age dependency ratio (% of working-age population)', 'IndicatorCode': 'SP.POP.DPND', 'Year': '1960', 'Value': '87.7976011532547'}

The above code is really taking my time to find out what I should code and why it did not get a right code. That did not stop me from continuing my tutorial.

Turning it into a DataFrame
Up to here, this case study I did a zip() function, put it into an user defined function and used the new created function in list comprehensions to generate a list of dictionaries.

Next, the tutorial wants to convert the list of dictionaries into Pandas’ DataFrame. First and foremost, I need to import the Pandas package. Let refer to the code below:

# Import the pandas package
import pandas as pd

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Turn list of dicts into a DataFrame: df
df = pd.DataFrame(list_of_dicts)

# Print the head of the DataFrame
print(df.head())

Summary of the day:

  • zip() function combines the lists into a zip object.
  • Use user defined function in list comprehension.
  • Convert list comprehension into a DataFrame.

Day 38: Introduction to Iterators

Today’s topic is about using the iterables and iterating with a for loop. First and foremost, let understand this two keywords, iterators and iterables.

Iterable
It is an object associated with iter() method. Examples are lists, dictionaries, strings and file connections.

Iterator
The process of applying an iterable creates an iterator. Iterator produces next value using next() method.

Let me move into the exercises I did in DataCamp’s online learning to show the examples. In this first example, it creates a for loop to loop over ‘flash’ and print the values in the list. Use ‘person’ as the loop variable. Then, create an iterator for the list flash using iter() method and assign the result to ‘superspeed’. Lastly, print each of the items from ‘superspeed’ using next() method 4 times.

Below is the codes:

# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for person in flash:
    print(person)

# Create an iterator for flash: superspeed
superspeed = iter(flash)

# Print each item from the iterator
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))

The output of the above code is printing the 4 names in the list using the for loop statement and the iter() and next() methods.

Next, it is not all iterables are actual lists. We can use range() function in a loop statement. Recall that range() does not actually create the list. Instead, it creates a range object with an iterator that produces the values until it reaches the limit.

Below is the code. It prints 3 numbers starting from 0.

# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# Loop over range(3) and print the values
for num in range(3):
    print(num)

#Output:
#0
#1
#2

There are also functions that take iterators as arguments. The following exercise, it uses the range(start, stop) together with sum() function to get the sum of the values in the range of 10 to 20.

# Create a range object: values
values = range(10,21)

# Print the range object
print(values) #range(10, 21)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list) #[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

# Get the sum of values: values_sum
values_sum = sum(range(10,21)) #sum(values_list)

# Print values_sum
print(values_sum)

#Output: 165

Using enumerate()
It allows us to add a counter to any iterable, it takes any arguments as iterable such as a list and returns a special enumerate object which consists of a pair containing the elements of the original iterable along with their index within the iterable.

Similar to the above code, we can use the function list to turn (unpack) the enumerate object into a list of tuples and print it out to see what it contains.

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
e = enumerate(avengers)
print(type(e))
e_list = list(e)
print(e_list)

#Output: <class 'enumerate'>
#[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]

Alternatively, we can write the code to put enumerate() and list() in a single line. Let us look into another example below:

# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

#Output:
[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]

Enumerate, by default behavior, it begins with its index 0, however, we can change the index by altering the enumerate with second argument, ‘start’.

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']

for index, value in enumerate(avengers):
    print(index, value)

#Output:
#0 hawkeye
#1 iron man
#2 thor
#3 quicksilver

for index, value in enumerate(avengers, start=10):
    print(index, value)

#Output:
#10 hawkeye
#11 iron man
#12 thor
#13 quicksilver

Using zip()
It allows us to stitch together any arbitrary number of iterables. In other words, it is zipping them together to create a zip object which is an iterator of tuple.

Same as enumerate(), we can turn (unpack) the zip object into a list and print out a list. Also, we can use the for loop statement to iterate over the zip object to print the tuples.

See example:

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odison', 'maximoff']
z = zip(avengers, names)

print(type(z))
z_list = list(z)
print(z_list)

for z1, z2 in zip(avengers, names):
    print(z1, z2)

The output looks as below:
print(type(z))
<class ‘zip’>

print(z_list)
[(‘hawkeye’, ‘barton’), (‘iron man’, ‘stark’), (‘thor’, ‘odison’), (‘quicksilver’, ‘maximoff’)]

print(z1, z2)
hawkeye barton
iron man stark
thor odison
quicksilver maximoff

Same as enumerate(), zip() and list() can write its code in a single line. See the example below:

# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip)

# Unpack the zip object and print the tuple values
for value1,value2,value3 in mutant_zip:
    print(value1, value2, value3)

When we execute the above code, the first print() statement shows the list of the “mutant_data” which we used the zip() function to zip the three lists together and create a list of tuples. Then, the second print() statement prints out the zip object. The output looks as below:

The above show how does the zip object output looks like when we execute the codes and it is something not readable for us to display as the output therefore we need to convert the data into a list.

Next, I used a different method to write the for loop statement. In the first example, it uses zip() to zip up the two lists, while in the second example, I used the variable named “mutual_zip”. Both works the same and returns the result as per expected.

We can use the splat operator. One can also call *, a gather parameter (when used in function arguments definition) or a scatter operator (when used at function invocation). As seen here, Python/Tuples/Variable-length argument tuples. I believe it is most commonly called the “splat operator.”

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odison', 'maximoff']
z = zip(avengers, names)
print(*z)

#output: ('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odison') ('quicksilver', 'maximoff')

The above print() call would have exhausted the elements in z, therefore, when we run again the above codes, it returns nothing to us. Screenshot below shows the code execution I did in Jupyter Notebook.

Another example I did in the DataCamp’s online learning website, where it required to create a zip object from two lists and assigned to a variable. Use the splat operator when unpacking the tuple. In the second part of the code, we unzip the tuple “z1” and assign the results to two variables to do comparison.

# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)

Summary of the day:

  • Iterable, Iterator.
  • Using enumerate().
  • Using zip().
  • Splat operator.