Explore Power BI Desktop

I updated my current Power BI Desktop version via the Windows Apps Store recently, and now it is a good time to share the new user interface (UI) of the Power BI after the installation. In the year 2019, during my Specialized Diploma study, the Power BI Desktop skinning was in dark mode. I am not sure when the Microsoft team has changed the Power BI Desktop’s skinning to white mode, as well as having the Filter Pane on the right side.

Another new feature that I spotted is the Power BI has the theme options for the dashboard and reports. This theme is not referring to the Power BI Desktop’s skin. You have to enable this feature from the Power BI Settings, and it allows you to change the theme to suit your dashboard and reports presentation.

To do so, navigate to the File menu, select Options and Settings, then Options. Next, in the Preview feature section, select Customize current theme.

Click OK button to proceed. It may prompt you to restart the machine so that it takes effect and enables the theme feature. There is a list of built-in theme available in the Power BI Desktop, and you can refer to this link for more detail. Furthermore, you can optionally export a theme’s JSON file. You can make amendments by manually modifying the settings in that file. You can rename that fine-tuned JSON file and later import it. It gives more control to the users to customize the theme according to their dashboard and reports.

Getting familiar with the interface

From the Microsoft website, it shares the detail of each pane labelled below. I extracted the picture and its explanation.

  1. Ribbon – Displays common tasks that are associated with reports and visualizations.
  2. Report view, or canvas – It is a place where visualizations are created and arranged. You can switch between ReportData, and Model views by selecting the icons in the left column.
  3. Pages tab – This area is where you would select or add a report page.
  4. Visualizations pane – It is the pane where you can change visualizations, customize colours or axes, apply filters, drag fields, and more.
  5. Fields pane – It is the pane where query elements and filters can be dragged onto the Report view or dragged to the Filters area of the Visualizations pane.

You can collapse the Visualizations and Fields panes to provide more space in the Report view by selecting the small arrow.

The screenshot above shows an example of the collapsible pane for Filter pane. It works for Visualizations and Fields panes too.

Connect to data sources

Power BI Desktop connects to many types of data sources, you can choose from local databases, excel sheets or data on the cloud. There are about 70 different types of data sources available. Go to the Get Data from the ribbon on the Home tab to begin accessing the data. Then, select a source to establish a connection. For some data source connection, you may require to input the user credential to authenticate and accessing the data. Here is the list of data connectors available in the Power BI’s Get Data function.

It brings you to the Navigator window that displays the entities (tables) of your data source. It gives you a preview of the selected data. In the same window, you can choose to Load or Transform Data. If you are not making any changes, formatting and data transformation, then you can click on the Load button, else Transform Data allows you to perform data cleaning and conversion before importing the data into the Power BI Desktop. You are allowed to edit the data after importing too.

Transform data to include in a report

Power BI Desktop includes the Power Query Editor tool that helps you shape and transform data so that it is ready for your visualizations. To launch the Power Query Editor tool, there are two ways to bring up this window:

  1. use Transform Data button on the Home ribbon. [For April/2020 version]
  2. use Edit Queries button on the Home ribbon. [For older versions]

If you click on the Enter Data button on the Home ribbon (as shown above), a Create Table window prompts up. From this window, click the Edit button, it brings up the Power Query Editor tool. Remember, earlier I mentioned about the Load and Transform Data buttons when we load data from the Get Data button? The Transform Data button brings up the Power Query Editor too, similar function as to how the Create Table window’s Edit button works. I am not going to cover any data transformation in this blog. It is a big topic to discuss, so I think it is good to share it with some good examples and dataset in the next article.

Connect from multiple sources

Most of the time, we deal with more than one data source when we build a report. You can use the Power Query Editor tool to combine data from multiple sources into a single report. How does it able to combine into a single table? In Power BI Desktop, it has a feature called Append Queries to add the data from a new table to an existing query.

Create a visual

If I remember correctly, in Tableau, when fields are selected, the Tableau suggests the suitable visualizations to the users to use in the dashboard or reports. I am not sure whether Power BI has a similar feature. In the Report View, drag a field onto the Report View canvas, the Power BI Desktop automatically creates a table visual as default visual. This visual as a report listing because it lists the selected fields in a tabular form. You can choose to have different visuals, such as a bar chart or line graph if you wish to do so.

To create a visual, select a field from the Fields pane, you can drag the field into the data field (Values) in the Visualization pane, or you can click on the checkbox. A table visual displays on the screen, and you can choose another type of visual from the Visualization pane. There is no precedence to create a visual, and you can select a visual before selecting the fields. Each visual has a different visualization pane, for example, if you choose a dual chart, the following screenshot shows shared axis, column and line values. When you choose a pie chart, it displays legend and values.

Publish a report

After all the hard work on the dashboard or reports, you want to publish it and share it with other people. You can do so in Power BI Desktop by clicking on the Publish button in the Home menu. You will be prompted to sign in to Power BI, follow the steps and you will see the published reports after that.

At this point of writing, I do not have any published report to show. Therefore, I cannot put up the steps here and show how to pin a visual to the dashboard. This feature allows you to choose whether to pin the visual to an existing dashboard or to create a new dashboard.

Conclusion

This article is a high-level walkthrough of the Power BI Desktop, that explains how to use it to create visuals and publish the dashboard and reports. I do not cover the explanation of the visualization and publication in this article, I will include them in the next article in the future.

I hope this article gives a good impression of the Power BI Desktop’s features and allows you to have some sensing of this tool. Furthermore, the Power BI Desktop’s buttons are self-explanatory, so you should not have issues or troubles to use and navigate around. Besides that, people who have been using Microsoft Excel and Tableau for data analysis may find the Power BI Desktop has some similar functions because the Power BI Desktop is another data visualization tool too.

Reference: microsoft.com

Power BI – Learning new skill

Recently, I get access to the abundance of online learning resources for Microsoft Power BI. I learned the fundamental of using the Power BI in my Specialized Diploma course. Now, it is a good time to recap what I have learned.

So, what is Power BI? It is a Microsoft product. It is a business analytics service that delivers insights to enable fast, informed decisions. This software has both free version and paid versions, Pro and Premium (with different subscription fees and features). Small introduction of what is Power BI and its versions as below.

What is Power BI Desktop?

The Power BI Free/Desktop enables you to connect to 70+ data sources, analyse data, publish to the web, export to excel and much more. The free version gives you the basic features of Power BI.

What is Power BI Pro?

Power BI Pro is the full version of Power BI, which means it comes complete with the ability to use Power BI for both building dashboards, reports and unlimited viewing, sharing and consumption of your created reports (and reports shared by others) which is not possible with Power BI Desktop.

What is the difference?

  • Power BI Pro has the ability to share the data, reports, and dashboards with a large number of other users that also have a Power BI Pro license.
  • Power BI Pro able to create an app-based workspace.
  • Power BI Pro has a 10 GB per Pro user data storage limit.

Maybe, these differences are a little irrelevant if you just want to learn Power BI for leisure instead of using it for commercial usage. For personal learning, I did not need to use up to 10gb data. As long as my email account is valid, I can start using the Power BI.

What is the Power BI App?

All Power BI’s versions can be connected via mobile applications. Furthermore, the Power BI Mobile applications are available for multiple platforms including Android, iOS and Windows devices.

What is Power BI Report Server?

Power BI Report Server is an on-premises (at your own location) server that publish and share both Power BI reports via the website within your organisation’s firewall (infrastructure). Power BI On-Premise or Report Server is an option included with Power BI Premium and is ideal for your business if you want to establish reporting infrastructure on-premises and have it operate under your own policies and rules. The server allows you to seamlessly scale up and move to the cloud if you wish to do so.

The above is a visual that helps to understand all the above. These three elements—Desktop, the service, and Mobile apps. The Power BI Desktop accesses the data and creates the dashboard and reports. Then, publish to the Power BI Service, and share the Power BI reports to users, who can access it via Power BI Mobile too.

By now, you may start getting familiar with some of the terms used in the Power BI. These are some of them:

  • Dashboard or visualization or tile. A tile is a single visualization on a dashboard or report. Visualization is a visual representation of data, like a chart. A dashboard is a collection of visuals from a single page.
  • Reports. A report is a collection of visualizations that appear together on one or more pages.
  • Datasets. A dataset is a collection of data that Power BI uses to create its visualizations.

The example above shows the dashboard contains the bar charts, line graph and cards. These are different visualizations available in Power BI, and the red box refers as a tile.

Limitations: Power BI Free/Desktop

As most of us in the learning stage will use the Power BI Free version, there are some feature limitations with Power BI Desktop.

  • Can’t share created reports with non-Power BI Pro users
  • No App Workspaces
  • No API embedding
  • No email subscriptions
  • No peer-to-peer-sharing
  • No support to analyse in Excel within Power BI Desktop

However, there are useful features available for Power BI Free/Desktop users.

Advantages: Power BI Free/Desktop

  • You can connect and import data from over 70 cloud-based and on-premises sources
  • The same rich visualisations and filters from Power BI Pro
  • Auto-detect that finds and creates data relationships between tables and formats
  • Export your reports to CSV, Microsoft Excel, Microsoft PowerPoint and PDF
  • Python support
  • Save, upload and publish your reports to the Web and the full Power BI service
  • Storage limit of 10 GB per user

I will be sharing more about Power BI Desktop from time to time as part of my learning objectives and improving my technical writing. Hope to hear some feedback from my readers from time to time. Please help me to fill up the survey form so that I can improve in my next blog.

References:
https://dynamics.folio3.com/blog/difference-between-power-bi-pro-vs-free-vs-premium/
https://docs.microsoft.com/en-gb/learn/modules/get-started-with-power-bi/1-introduction

Data Management: Data Wrangling Versus ETL

Data management (DM) consists of the practices, architectural techniques, and tools for achieving consistent access to and delivery of data across the spectrum of data subject areas and data structure types in the enterprise, to meet the data consumption requirements of all applications and business processes.

Data Wrangling Versus ETL: What’s the Difference?

The top three major differences between the two technologies.

1. The Users Are Different

The core idea of data wrangling technologies is that the people who know the data best should be exploring and preparing that data. This means business analysts, line-of-business users, and managers (among others) are the intended users of data wrangling tools. I can personally attest to the painstaking amount of design and engineering effort that has gone into developing a product that enables business people to intuitively do this work themselves.

In comparison, ETL technologies are focused on IT as the end-users. IT employees receive requirements from their business counterparts and implement pipelines or workflows using ETL tools to deliver the desired data to the systems in the required formats.

Business users rarely see or leverage ETL technologies when working with data. Before data wrangling tools were available, these users’ interactions with data would only occur in spreadsheets or business intelligence tools.

2. The Data Is Different

The rise of data wrangling software solutions came out of necessity. A growing variety of data sources can now be analyzed, but analysts didn’t have the right tools to understand, clean, and organize this data in the appropriate format. Much of the data business analysts must deal with today comes in a growing variety of shapes and sizes that are either too big or too complex to work within traditional self-service tools such as Excel. Data wrangling solutions are specifically designed and architected to handle diverse, complex data at any scale.

ETL is designed to handle data that is generally well-structured, often originating from a variety of operational systems or databases the organization wants to report against. Large-scale data or complex raw sources that require substantial extraction and derivation to structure are not one of the ETL tools’ strengths.

Additionally, a growing amount of analysis occurs in environments where the schema of data is not defined or known ahead of time. This means the analyst doing the wrangling is determining how the data can be leveraged for analysis as well as the schema required to perform that analysis.

3. The Use Cases Are Different

The use cases we see among users of data wrangling solutions tend to be more exploratory in nature and are often conducted by small teams or departments before being rolled out across the organization. Users of data wrangling technologies typically are trying to work with a new data source or a new combination of data sources for an analytics initiative. We also see data wrangling solutions making existing analytics processes more efficient and accurate as users can always have their eyes on their data as they prepare it.

ETL technologies initially gained popularity in the 1970s as tools primarily focused on extracting, transforming, and loading data into a centralized enterprise data warehouse for reporting and analysis via business intelligence applications. This continues to be the primary use case for ETL tools and one that they are extremely good at.

With some customers, we see data wrangling and ETL solutions deployed as complementary elements of an organization’s data platform. IT leverages ETL tools to move and manage data, so business users have access to explore and prepare the appropriate data with data wrangling solutions.

Reference: https://tdwi.org/articles/2017/02/10/data-wrangling-and-etl-differences.aspx

January 2020

I hope it is not late to write out my plans for the year 2020. My volunteer work with the TechLadies will come to an end, this March. The TechLadies is recruiting the new core team for the year 2020. The upcoming boot-camp graduation will introduce the new team to the community. Then, the year 2019 core team will pass the baton to the new team.

Will I still continue volunteering with TechLadies?

I have this question in my mind lately, and I am not sure how the TechLadies plans for it. I am quite sure there it would be a great idea to let a new team leads the community. New team, new ideas and directions.

I may consider taking a side role to continue on the study group sessions. But, I also hope that someone is going to plan and run the study group sessions together. If not, then I will be slowly running the events as and when I am available. I am not sure whether a mobile study group will work in Singapore.

Besides TechLadies, what else?

Good question. I have a plan to conduct, learn and teach program after being inspired by my classmate. This program teaches the community (not necessarily must be within TechLadies) of what I learned recently.

I will randomly pick up a topic to learn and share to the community via my blog or private meet-ups. I hope to get more interaction between community members, instead of just giving inputs without receiving feedback from the community.

I hope I will write and share more technical stuff through my blog here as well as my posts in the Medium website.

New focuses

I am looking out for other communities in Singapore that work closely on master data management (MDM), focuses on SQL and NoSQL databases, work on data engineering and use Power BI for data visualization.

I am not going away from my core interest, the databases. Also, I want to go in-depth into master data management and will consider taking some of the courses or certifications in this area. Next, I need to upskill and gain essential experience in the data engineering field while continue exploring the data visualization with Power BI. I am still looking out for Data Engineering meetup or users group in Singapore. Do you know any?

Not to forget, I am doing my data analytics in my final module in Temasek Poly. It is going to be an end-to-end data specialization when I graduate with my Specialized Diploma in Business Analytics this April.

Complete my Python course!

Last but not least, I want to complete my Python course before I graduate too, so that everything is fresh in my mind. Right now, I have completed 10/26 modules. I still need to complete some Pandas, statistics and machine learning topics before the end of February. Maybe, I will take a bit time off from other activities to focus on study and work.

Continuous Intelligence

As part of the assignment, I would like to write something on the chosen topic, Continuous Intelligence. Continuous intelligence plays a major role in most digital business transformation projects. It is a growing part of enterprise analytics and BI strategies.

Definition

Continuous intelligence is a design pattern in which real-time analytics are integrated into a business operation, processing current and historical data to prescribe actions in response to business moments and other events. It provides decision automation or decision support. Continuous intelligence leverages multiple technologies such as augmented analytics, event stream processing, optimization, business relationship management (BRM), and machine learning (ML). The definition extracts from the Gartner Research.

What can you do with Continuous Intelligence?

Continuous intelligence enables companies to deliver better outcomes from a broad range of operational decisions since it involves more relevant, real-time data in decision-making algorithms. Individuals can make sense of extreme volumes of data in milliseconds, evaluating more alternatives in greater detail than humanly possible without access to real-time data and processing.

Gartner estimates that, within 3 years, more than 50% of all business initiatives require continuous intelligence, leveraging streaming data to enhance real-time decision-making.

Combining all these forms of artificial intelligence (AI) with continuous intelligence drawing from geospatial, real-time, and historical analytics can further enhance business ability to know where assets and people are at all times and help predict what might occur next.

Adding rules engines and programmatic logic to AI, location data enables organizations to automate many decisions that previously required human insights. From predictive maintenance based on actual driving conditions to decide the best next action to take with customers to improve loyalty, leading companies are decreasing costs and improving revenues to become more successful.

What are The Challenges?

What makes continuous intelligence difficult is feeding a business’s analytics systems with high volumes of real-time streaming data in a way that is robust, secure, and yet highly consumable. The ability to combine “always-on,” streaming data ingestion and integration with real-time complex event processing, enrichment with rules and optimization logic, and streaming analytics is key to enabling Continuous Intelligence.

Many data analytics organizations lack experience with Continuous Intelligence, or unsure how to start their Continuous Intelligence journey to keep up with growing business demand.

Continuous Intelligence requires the building of new capabilities, skills and technologies. The challenge for data and analytics leaders is to understand how these differ from existing practice.

Why Use Continuous Intelligence in DevOps/DataOps

If you are considering DevOps as a strategy to adopt continuous innovation, your data strategy has to evolve, too. Traditional BI has too many silos and too much human intervention to support your move to an agile system.

Up to this point, I would like to add that in my current project, some of my team members, who are in the agile system, try to implement the ETL (Extract, Transform, Load) processes by following agile methodology. Sometimes ago, I went to the agile workshop and I have forgotten some of the concepts. It is a good time to read them up again.

According to Open Data Science’s article entitled “Why Use Continuous Intelligence in DevOps/DataOps,” it wrote that businesses look out for continuous innovation. Those who do not may put out shoddy products. Your data strategy, therefore, has to be seamless, frictionless, and automated.

Artificial Intelligence

The article adds, “Artificial Intelligence is capable of continually combing data, looking for patterns as data updates. Continuous intelligence allows you to analyze this data accurately and in real-time. The other piece could be letting go of data wrangling. Until you have deployed Continuous Intelligence, data wrangling remains a huge and functional part of your data management plan.”

Gartner identifies six defining features of CI.

  1. Fast: Real-time insight keeps up with the pace of change in the modern age.
  2. Smart: The platform is capable of processing the type of data you get, not the type you wish you had.
  3. Automated: Human intervention is rife with mistakes and wastes your team’s time.
  4. Continuous: Real-time analytics requires a system that works around the clock.
  5. Embedded: It’s integral to your current system.
  6. Results-focused: It should go without saying, but data means nothing without insight. Your program should deliver those insights. Don’t forget the results in the search for more data.

Once you let go of batch processing and silos, moving towards an agile framework is a reality with CI.

Open Data Science

Your team has access to these insights to direct new inquiries and drive brainstorming, pivot during sprints, and reach a frictionless state in which data flows in and insights become the next iteration of a product or a new product altogether. “

With this information, I have a vision; I wish to move into Continuous Intelligence and bring this agile methodology into my project.

References:
https://www.striim.com/blog/2019/05/gartner-identifies-continuous-intelligence-as-top-10-trend-for-2019/
https://www.rtinsights.com/what-can-you-do-with-continuous-intelligence/
https://medium.com/@ODSC/why-use-continuous-intelligence-in-devops-dataops-b6bc0a448b7a

Assignment Topic: Continuous Intelligence

My assignment topic is about Continuous Intelligence, and I was told to refer to the Gartner Research. The Gartner Research is a global research and advisory firm providing information, advice, and tools for businesses in IT, finance, HR, customer service and support, legal and compliance, marketing, sales, and supply chain functions.

My lecturer advised us to refer to this website to complete my research paper. My school has a link to the Gartner Research papers available to all students.

A small introduction of what is Continuous Intelligence – Gartner identified Continuous Intelligence as one of the top 10 technology trends for data and analytics for 2019. 

In the website, the Gartner defines Continuous Intelligence as “a design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. It provides decision automation or decision support. Continuous intelligence leverages multiple technologies such as augmented analytics, event stream processing, optimization, business rule management, and Machine Learning.

Reference: https://www.striim.com/blog/2019/05/gartner-identifies-continuous-intelligence-as-top-10-trend-for-2019/

Intermediate Python for Data Science: Looping Data Structure

After the matplotlib for visualization, introduction to dictionaries and Pandas DataFrame, follows by logical, Boolean and comparison operators with if-elif-else control flow and now, comes to the last part, the while loop, for loop and loop for a different data structure.

In Python, some of the objects are iterable which means it loops through the object in a list, for example, to get each element. It loops through a string to capture each character in the string. A for loop iterates over a collection of things and while loop can do any kind of iteration within the block of codes, while some condition remains True

For Loop

The main keywords are for and in. It uses along with colon (:) and indentation (whitespace). Below is the syntax, 

#loop statement
my_iterable = [1,2,3]
for item_name in my_iterable:
    print(item_name)

I used two iterator variables (index, area) with enumerate(), for example, the sample code below. enumerate() loops over something and has an automatic counter, then returns an enumerate object.

# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Change for loop to use enumerate() and update print()
for index, area in enumerate(areas) :
    print("room " + str(index) + ": " + str(area))

#Output:
"""
room 0: 11.25
room 1: 18.0
room 2: 20.0
room 3: 10.75
room 4: 9.5
"""

Another example utilizes a loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.

# house list of lists
house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]]
         
# Build a for loop from scratch
for x in house:
    print("the " + str(x[0]) + " is " + str(x[1]) + " sqm")

# Output:
"""
the hallway is 11.25 sqm
the kitchen is 18.0 sqm
the living room is 20.0 sqm
the bedroom is 10.75 sqm
the bathroom is 9.5 sqm
"""

Definition of enumerate() can be found here. My post on for loop is here.

While Loop

The main keyword is while, colon (:) and indentation (whitespace). Below is the syntax,

# while loop statement
while some_boolean_condition:
     # do something 

# Examples
x = 0
while x < 5:
     print(f'The number is {x}')
     x += 1  

An example of putting an if-else statement inside a while loop.

# Initialize offset
offset = -6

# Code the while loop
while offset != 0 :
    print("correcting...")
    if offset > 0:
        offset = offset - 1
    else:
        offset = offset + 1
    print(offset)

# Output:
"""
correcting...
-5
correcting...
-4
correcting...
-3
correcting...
-2
correcting...
-1
correcting...
0
"""

My post on while loop is here.

Loop Data Structure

Dictionary:
If you want to iterate over key-value pairs in a dictionary, use the items() method on the dictionary to define the sequence in the loop.

for key, value in my_dic.items() : 

Numpy Array:
If you want to iterate all elements in a Numpy array, use the nditer() function to specify the sequence.

for val in np.nditer(my_array) : 

Some examples as below:

# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
          
# Iterate over europe
for key, value in europe.items():
    print("the capital of " + key + " is " + value)

# Output:
the capital of austria is vienna
the capital of norway is oslo
the capital of italy is rome
the capital of spain is madrid
the capital of germany is berlin
the capital of poland is warsaw
the capital of france is paris
"""

# Import numpy as np
import numpy as np

# For loop over np_height
for x in np_height:
    print(str(x) + " inches")

# For loop over np_baseball
for x in np.nditer(np_baseball):
    print(x)

Loop over DataFrame explanation and example can be found in my post here.

Data Architecture

Recently, I changed my job and my workplace enforces good practice of Data Architecture. I tried to understand better about data architecture on my own and I found a good link that explained it.

Data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated and put to use in data systems and the organization. Data architecture describes how data is processed, stored and utilized in an information system.

Data architecture provides criteria for data processing operations, makes it possible to design data flows and also controls the flow of data in the system. Data architecture should be defined in the planning phase of the design of a new data processing and storage system.

Data Modeling and Design defines as “the process of discovering, analyzing, representing and communicating data requirements in a precise form called the data model.” Data models illustrate and enable an organization to understand its data assets through core building blocks such as entities, relationships, and attributes. These represent the core concepts of the business such as customer, product, employee, and more.

Data architecture and data modeling should align with core business processes and activities of the organization. It needs to be integrated into the entire architecture. Without knowing what the existing data import and export processes are, it is difficult to know whether the new platform will be a good fit. A model entails developing simple business rules about what business has: customer, products, part, etc.

Link: https://www.dataversity.net/data-modeling-vs-data-architecture/

Charting Guideline on Tableau: How to decide what chart to be used

The following is a sharing made by the instructor of the Udemy’s online learning course which I subscribed to. The course is called Tableau for Beginners: Get CA Certified, Grow Your Career.

Okay, now back to the original question which I think most people always ask, how to decide what chart to be used in different situations. The instructor shares some information which I think it may help us to understand and practice more in Tableau so that we can familiarize with the tool and able to pick the right chart next time.

Most of the time when you want to show how a numeric value differs according to different categories, bar charts are the way to go. The eye is very good at making comparisons based on length (as compared with differences in angle, color, etc)

If you are showing change over a date range, you will want to use a line chart.

Histograms and box plots are to show the distribution of data.

Scatter plots show how two continuous variables are related.

There is also more detail in this guide: https://www.tableau.com/learn/whitepapers/tableau-visual-guidebook. It gets into talking about how to use color and other visual elements to add more information to your chart.

Intermediate Python for Data Science: Logic, Control Flow and Filtering

Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. Also learn to filter data in pandas DataFrames using logic.

In the earlier days when I started to learn Python, there is a topic on Boolean and Comparison Operators, where I studied Boolean (True and False), logical operators (‘and’, ‘or’, ‘not’) and comparison operators (‘==’ ‘!=’, ‘<‘ and ‘>’).

Comparison operators can tell how two Python values relate and result in a Boolean. It allows to compare two numbers, strings or any same type of variables. It throws exception or error message when it is comparing a variable from a different data type. Python cannot tell how the two objects of different type relate.

Comparison a Numpy array with an integer

Based on the example above taken from a tutorial in DataCamp online learning course that I am taking currently, the variable bmi is a Numpy array, then it compares if the bmi is greater than 23. It works perfectly and returns the Boolean values. Behind the scenes, Numpy builds a Numpy array of the same size, perform an element-wise comparison, filtered with the number 23.

Boolean operators with Numpy

To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here’s an example on the my_house and your_house arrays from before to give you an idea:

np.logical_and(your_house > 13, 
               your_house < 15)

Refer to below for the sample code:

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, my_house < 10))

# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11, your_house < 11))

The first print statement is checking on the ‘or’ condition means, any one of the two condition is true, it returns true. The second print statement is checking on the ‘and’ condition means, both of the comparison has to be True then it returns a True. The output of the execution returns in Boolean array as below:

[False  True False  True]
[False False False  True]

Combining Boolean operators and Comparison operators with conditional statement, if, else and elif.

It follows the if statement syntax. The most simplest code which can be used to explain the above,

z = 4
if z % 2 == 0:
  print('z is even')

Same goes to the if else statement with comparison operator, see code below:

z = 5
if z % 2 == 0:
  print('z is even')
else:
  print('z is odd')

Or if you are working with if, elif and else statement, it works too. See the code below:

z = 6
if z % 2 == 0:
  print('z is divisible by 2')
elif z % 3 == 0:
  print('z is divisible by 3')
else:
  print('z is neither divisible by 2 nor 3')

In the example above, both first and second condition are matched, however, in this control structure, once Python hits into a condition that returns a True value, it executes the corresponding code and exits the control structure after that. It will not execute the next condition, corresponding to the elif statement.

Filtering Pandas DataFrame

For an example taken from DataCamp’s tutorial, using the DataFrame below, select countries with area over 8 millions km. There are 3 steps to achieve this.

Step 1: select the area column from the DataFrame. Ideally, it gets a Pandas Series, not a Pandas DataFrame. Assume that the DataFrame is called bric, then it calls the column area using,

brics["area"]

#alternatively it can use the below too:
# brics.loc[:, "area"]
# brics.iloc[:, 2]

Step 2: When the code adds in the comparison operator to see which rows have an area greater than 8, it returns a Series containing Boolean values. The final step is using this Boolean Series to subset the Pandas DataFrame.

Step 3: Store this Boolean Series as ‘is_huge’ as below:

is_huge = brics["area"] > 8

Then, creates a subset of DataFrame using the following code and the result returns as per the screenshot:

brics[is_huge]

It shows those countries with ares greater than 8 million km. The steps can be shorten into 1 line of code:

brics[brics["area"] > 8]

Also, it is able to work with Boolean operators (np.logical_and(), np.logical_or() and np.logical_not()). For example, if it looks for areas between 8 and 10 km, then the single line code can be:

brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]

The result returns from the above code is Brazil and China.