World Bank World Development Indicator Case Study with Python

In the DataCamp’s tutorials, Python Data Science Toolbox (Part 2), it combines user defined functions, iterators, list comprehensions and generators to wrangle and extract meaningful information from the real-world case study.

It is going to use World Bank’s dataset. The tutorials will use all which I have learned recently to work around with this dataset.

Dictionaries for Data Science

The zip() function to combine two lists into a zip object and convert it into a dictionary.

Before I share the sample code, let me share again what the zip() function does.

Using zip()
It allows us to stitch together any arbitrary number of iterables. In other words, it is zipping them together to create a zip object which is an iterator of tuple.

# Zip lists: zipped_lists
zipped_lists = zip(feature_names, row_vals)

# Create a dictionary: rs_dict
rs_dict = dict(zipped_lists)

# Print the dictionary
print(rs_dict)

# Output: {'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Adolescent fertility rate (births per 1,000 women ages 15-19)', 'IndicatorCode': 'SP.ADO.TFRT', 'Year': '1960', 'Value': '133.56090740552298'}

Next, the tutorial wants us to create an user defined function with two parameters. I can re-use the above code and add an user defined function and call it with passing two arguments, feature_names and row_vals.

# Define lists2dict()
def lists2dict(list1, list2):
    """Return a dictionary where list1 provides
    the keys and list2 provides the values."""

    # Zip lists: zipped_lists
    zipped_lists = zip(list1, list2)

    # Create a dictionary: rs_dict
    rs_dict = dict(zipped_lists)

    # Return the dictionary
    return rs_dict

# Call lists2dict: rs_fxn
rs_fxn = lists2dict(feature_names, row_vals)

# Print rs_fxn
print(rs_fxn)

It should give the same result when run the codes. Next, tutorial requires me to use list comprehension. It requires to turn a bunch of lists into a list of dictionaries with the help of a list comprehension, where the keys are the header names and the values are the row entries.

The syntax,
[[output expression] for iterator variable in iterable]

The question in the tutorial is,
Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts.

The code on the screen before I coded. As above, sublist is the iterator variable, so it is substituted in between “for” and “in” keyword. The instruction says,
for each sublist in row_lists

indirectly, it means,

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [--- for sublist in row_lists]

The lists2dict() function which I created above returns a dictionary. The question says,
generates a dictionary using lists2dict()

indirectly it means calling the lists2dict() function at the output expression. But if I code,

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, row_lists) for sublist in row_lists]

The output was very wrong and when I clicked the “Submit” button, it prompted me error message,
Check your call of lists2dict(). Did you correctly specify the second argument? Expected sublist, but got row_lists.

It expected sublist and yes, the for loop is reading each list in the row_lists. I have a code to print each list,
print(row_lists[0])

It is more meaningful to use sublist as 2nd argument rather than using row_lists. Therefore, the final code is,

# Print the first two lists in row_lists
print(row_lists[0])
print(row_lists[1])

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Print the first two dictionaries in list_of_dicts
print(list_of_dicts[0])
print(list_of_dicts[1])
#Output:
{'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Adolescent fertility rate (births per 1,000 women ages 15-19)', 'IndicatorCode': 'SP.ADO.TFRT', 'Year': '1960', 'Value': '133.56090740552298'}
{'CountryName': 'Arab World', 'CountryCode': 'ARB', 'IndicatorName': 'Age dependency ratio (% of working-age population)', 'IndicatorCode': 'SP.POP.DPND', 'Year': '1960', 'Value': '87.7976011532547'}

The above code is really taking my time to find out what I should code and why it did not get a right code. That did not stop me from continuing my tutorial.

Turning it into a DataFrame
Up to here, this case study I did a zip() function, put it into an user defined function and used the new created function in list comprehensions to generate a list of dictionaries.

Next, the tutorial wants to convert the list of dictionaries into Pandas’ DataFrame. First and foremost, I need to import the Pandas package. Let refer to the code below:

# Import the pandas package
import pandas as pd

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Turn list of dicts into a DataFrame: df
df = pd.DataFrame(list_of_dicts)

# Print the head of the DataFrame
print(df.head())

Summary of the day:

  • zip() function combines the lists into a zip object.
  • Use user defined function in list comprehension.
  • Convert list comprehension into a DataFrame.
Advertisements