Continuous Intelligence

As part of the assignment, I would like to write something on the chosen topic, Continuous Intelligence. Continuous intelligence plays a major role in most digital business transformation projects. It is a growing part of enterprise analytics and BI strategies.

Definition

Continuous intelligence is a design pattern in which real-time analytics are integrated into a business operation, processing current and historical data to prescribe actions in response to business moments and other events. It provides decision automation or decision support. Continuous intelligence leverages multiple technologies such as augmented analytics, event stream processing, optimization, business relationship management (BRM), and machine learning (ML). The definition extracts from the Gartner Research.

What can you do with Continuous Intelligence?

Continuous intelligence enables companies to deliver better outcomes from a broad range of operational decisions since it involves more relevant, real-time data in decision-making algorithms. Individuals can make sense of extreme volumes of data in milliseconds, evaluating more alternatives in greater detail than humanly possible without access to real-time data and processing.

Gartner estimates that, within 3 years, more than 50% of all business initiatives require continuous intelligence, leveraging streaming data to enhance real-time decision-making.

Combining all these forms of artificial intelligence (AI) with continuous intelligence drawing from geospatial, real-time, and historical analytics can further enhance business ability to know where assets and people are at all times and help predict what might occur next.

Adding rules engines and programmatic logic to AI, location data enables organizations to automate many decisions that previously required human insights. From predictive maintenance based on actual driving conditions to decide the best next action to take with customers to improve loyalty, leading companies are decreasing costs and improving revenues to become more successful.

What are The Challenges?

What makes continuous intelligence difficult is feeding a business’s analytics systems with high volumes of real-time streaming data in a way that is robust, secure, and yet highly consumable. The ability to combine “always-on,” streaming data ingestion and integration with real-time complex event processing, enrichment with rules and optimization logic, and streaming analytics is key to enabling Continuous Intelligence.

Many data analytics organizations lack experience with Continuous Intelligence, or unsure how to start their Continuous Intelligence journey to keep up with growing business demand.

Continuous Intelligence requires the building of new capabilities, skills and technologies. The challenge for data and analytics leaders is to understand how these differ from existing practice.

Why Use Continuous Intelligence in DevOps/DataOps

If you are considering DevOps as a strategy to adopt continuous innovation, your data strategy has to evolve, too. Traditional BI has too many silos and too much human intervention to support your move to an agile system.

Up to this point, I would like to add that in my current project, some of my team members, who are in the agile system, try to implement the ETL (Extract, Transform, Load) processes by following agile methodology. Sometimes ago, I went to the agile workshop and I have forgotten some of the concepts. It is a good time to read them up again.

According to Open Data Science’s article entitled “Why Use Continuous Intelligence in DevOps/DataOps,” it wrote that businesses look out for continuous innovation. Those who do not may put out shoddy products. Your data strategy, therefore, has to be seamless, frictionless, and automated.

Artificial Intelligence

The article adds, “Artificial Intelligence is capable of continually combing data, looking for patterns as data updates. Continuous intelligence allows you to analyze this data accurately and in real-time. The other piece could be letting go of data wrangling. Until you have deployed Continuous Intelligence, data wrangling remains a huge and functional part of your data management plan.”

Gartner identifies six defining features of CI.

  1. Fast: Real-time insight keeps up with the pace of change in the modern age.
  2. Smart: The platform is capable of processing the type of data you get, not the type you wish you had.
  3. Automated: Human intervention is rife with mistakes and wastes your team’s time.
  4. Continuous: Real-time analytics requires a system that works around the clock.
  5. Embedded: It’s integral to your current system.
  6. Results-focused: It should go without saying, but data means nothing without insight. Your program should deliver those insights. Don’t forget the results in the search for more data.

Once you let go of batch processing and silos, moving towards an agile framework is a reality with CI.

Open Data Science

Your team has access to these insights to direct new inquiries and drive brainstorming, pivot during sprints, and reach a frictionless state in which data flows in and insights become the next iteration of a product or a new product altogether. “

With this information, I have a vision; I wish to move into Continuous Intelligence and bring this agile methodology into my project.

References:
https://www.striim.com/blog/2019/05/gartner-identifies-continuous-intelligence-as-top-10-trend-for-2019/
https://www.rtinsights.com/what-can-you-do-with-continuous-intelligence/
https://medium.com/@ODSC/why-use-continuous-intelligence-in-devops-dataops-b6bc0a448b7a

Assignment Topic: Continuous Intelligence

My assignment topic is about Continuous Intelligence, and I was told to refer to the Gartner Research. The Gartner Research is a global research and advisory firm providing information, advice, and tools for businesses in IT, finance, HR, customer service and support, legal and compliance, marketing, sales, and supply chain functions.

My lecturer advised us to refer to this website to complete my research paper. My school has a link to the Gartner Research papers available to all students.

A small introduction of what is Continuous Intelligence – Gartner identified Continuous Intelligence as one of the top 10 technology trends for data and analytics for 2019. 

In the website, the Gartner defines Continuous Intelligence as “a design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. It provides decision automation or decision support. Continuous intelligence leverages multiple technologies such as augmented analytics, event stream processing, optimization, business rule management, and Machine Learning.

Reference: https://www.striim.com/blog/2019/05/gartner-identifies-continuous-intelligence-as-top-10-trend-for-2019/

Intermediate Python for Data Science: Looping Data Structure

After the matplotlib for visualization, introduction to dictionaries and Pandas DataFrame, follows by logical, Boolean and comparison operators with if-elif-else control flow and now, comes to the last part, the while loop, for loop and loop for a different data structure.

In Python, some of the objects are iterable which means it loops through the object in a list, for example, to get each element. It loops through a string to capture each character in the string. A for loop iterates over a collection of things and while loop can do any kind of iteration within the block of codes, while some condition remains True

For Loop

The main keywords are for and in. It uses along with colon (:) and indentation (whitespace). Below is the syntax, 

#loop statement
my_iterable = [1,2,3]
for item_name in my_iterable:
    print(item_name)

I used two iterator variables (index, area) with enumerate(), for example, the sample code below. enumerate() loops over something and has an automatic counter, then returns an enumerate object.

# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Change for loop to use enumerate() and update print()
for index, area in enumerate(areas) :
    print("room " + str(index) + ": " + str(area))

#Output:
"""
room 0: 11.25
room 1: 18.0
room 2: 20.0
room 3: 10.75
room 4: 9.5
"""

Another example utilizes a loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.

# house list of lists
house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]]
         
# Build a for loop from scratch
for x in house:
    print("the " + str(x[0]) + " is " + str(x[1]) + " sqm")

# Output:
"""
the hallway is 11.25 sqm
the kitchen is 18.0 sqm
the living room is 20.0 sqm
the bedroom is 10.75 sqm
the bathroom is 9.5 sqm
"""

Definition of enumerate() can be found here. My post on for loop is here.

While Loop

The main keyword is while, colon (:) and indentation (whitespace). Below is the syntax,

# while loop statement
while some_boolean_condition:
     # do something 

# Examples
x = 0
while x < 5:
     print(f'The number is {x}')
     x += 1  

An example of putting an if-else statement inside a while loop.

# Initialize offset
offset = -6

# Code the while loop
while offset != 0 :
    print("correcting...")
    if offset > 0:
        offset = offset - 1
    else:
        offset = offset + 1
    print(offset)

# Output:
"""
correcting...
-5
correcting...
-4
correcting...
-3
correcting...
-2
correcting...
-1
correcting...
0
"""

My post on while loop is here.

Loop Data Structure

Dictionary:
If you want to iterate over key-value pairs in a dictionary, use the items() method on the dictionary to define the sequence in the loop.

for key, value in my_dic.items() : 

Numpy Array:
If you want to iterate all elements in a Numpy array, use the nditer() function to specify the sequence.

for val in np.nditer(my_array) : 

Some examples as below:

# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
          
# Iterate over europe
for key, value in europe.items():
    print("the capital of " + key + " is " + value)

# Output:
the capital of austria is vienna
the capital of norway is oslo
the capital of italy is rome
the capital of spain is madrid
the capital of germany is berlin
the capital of poland is warsaw
the capital of france is paris
"""

# Import numpy as np
import numpy as np

# For loop over np_height
for x in np_height:
    print(str(x) + " inches")

# For loop over np_baseball
for x in np.nditer(np_baseball):
    print(x)

Loop over DataFrame explanation and example can be found in my post here.

Data Architecture

Recently, I changed my job and my workplace enforces good practice of Data Architecture. I tried to understand better about data architecture on my own and I found a good link that explained it.

Data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated and put to use in data systems and the organization. Data architecture describes how data is processed, stored and utilized in an information system.

Data architecture provides criteria for data processing operations, makes it possible to design data flows and also controls the flow of data in the system. Data architecture should be defined in the planning phase of the design of a new data processing and storage system.

Data Modeling and Design defines as “the process of discovering, analyzing, representing and communicating data requirements in a precise form called the data model.” Data models illustrate and enable an organization to understand its data assets through core building blocks such as entities, relationships, and attributes. These represent the core concepts of the business such as customer, product, employee, and more.

Data architecture and data modeling should align with core business processes and activities of the organization. It needs to be integrated into the entire architecture. Without knowing what the existing data import and export processes are, it is difficult to know whether the new platform will be a good fit. A model entails developing simple business rules about what business has: customer, products, part, etc.

Link: https://www.dataversity.net/data-modeling-vs-data-architecture/

Charting Guideline on Tableau: How to decide what chart to be used

The following is a sharing made by the instructor of the Udemy’s online learning course which I subscribed to. The course is called Tableau for Beginners: Get CA Certified, Grow Your Career.

Okay, now back to the original question which I think most people always ask, how to decide what chart to be used in different situations. The instructor shares some information which I think it may help us to understand and practice more in Tableau so that we can familiarize with the tool and able to pick the right chart next time.

Most of the time when you want to show how a numeric value differs according to different categories, bar charts are the way to go. The eye is very good at making comparisons based on length (as compared with differences in angle, color, etc)

If you are showing change over a date range, you will want to use a line chart.

Histograms and box plots are to show the distribution of data.

Scatter plots show how two continuous variables are related.

There is also more detail in this guide: https://www.tableau.com/learn/whitepapers/tableau-visual-guidebook. It gets into talking about how to use color and other visual elements to add more information to your chart.

Intermediate Python for Data Science: Logic, Control Flow and Filtering

Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. Also learn to filter data in pandas DataFrames using logic.

In the earlier days when I started to learn Python, there is a topic on Boolean and Comparison Operators, where I studied Boolean (True and False), logical operators (‘and’, ‘or’, ‘not’) and comparison operators (‘==’ ‘!=’, ‘<‘ and ‘>’).

Comparison operators can tell how two Python values relate and result in a Boolean. It allows to compare two numbers, strings or any same type of variables. It throws exception or error message when it is comparing a variable from a different data type. Python cannot tell how the two objects of different type relate.

Comparison a Numpy array with an integer

Based on the example above taken from a tutorial in DataCamp online learning course that I am taking currently, the variable bmi is a Numpy array, then it compares if the bmi is greater than 23. It works perfectly and returns the Boolean values. Behind the scenes, Numpy builds a Numpy array of the same size, perform an element-wise comparison, filtered with the number 23.

Boolean operators with Numpy

To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here’s an example on the my_house and your_house arrays from before to give you an idea:

np.logical_and(your_house > 13, 
               your_house < 15)

Refer to below for the sample code:

# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, my_house < 10))

# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11, your_house < 11))

The first print statement is checking on the ‘or’ condition means, any one of the two condition is true, it returns true. The second print statement is checking on the ‘and’ condition means, both of the comparison has to be True then it returns a True. The output of the execution returns in Boolean array as below:

[False  True False  True]
[False False False  True]

Combining Boolean operators and Comparison operators with conditional statement, if, else and elif.

It follows the if statement syntax. The most simplest code which can be used to explain the above,

z = 4
if z % 2 == 0:
  print('z is even')

Same goes to the if else statement with comparison operator, see code below:

z = 5
if z % 2 == 0:
  print('z is even')
else:
  print('z is odd')

Or if you are working with if, elif and else statement, it works too. See the code below:

z = 6
if z % 2 == 0:
  print('z is divisible by 2')
elif z % 3 == 0:
  print('z is divisible by 3')
else:
  print('z is neither divisible by 2 nor 3')

In the example above, both first and second condition are matched, however, in this control structure, once Python hits into a condition that returns a True value, it executes the corresponding code and exits the control structure after that. It will not execute the next condition, corresponding to the elif statement.

Filtering Pandas DataFrame

For an example taken from DataCamp’s tutorial, using the DataFrame below, select countries with area over 8 millions km. There are 3 steps to achieve this.

Step 1: select the area column from the DataFrame. Ideally, it gets a Pandas Series, not a Pandas DataFrame. Assume that the DataFrame is called bric, then it calls the column area using,

brics["area"]

#alternatively it can use the below too:
# brics.loc[:, "area"]
# brics.iloc[:, 2]

Step 2: When the code adds in the comparison operator to see which rows have an area greater than 8, it returns a Series containing Boolean values. The final step is using this Boolean Series to subset the Pandas DataFrame.

Step 3: Store this Boolean Series as ‘is_huge’ as below:

is_huge = brics["area"] > 8

Then, creates a subset of DataFrame using the following code and the result returns as per the screenshot:

brics[is_huge]

It shows those countries with ares greater than 8 million km. The steps can be shorten into 1 line of code:

brics[brics["area"] > 8]

Also, it is able to work with Boolean operators (np.logical_and(), np.logical_or() and np.logical_not()). For example, if it looks for areas between 8 and 10 km, then the single line code can be:

brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]

The result returns from the above code is Brazil and China.

Intermediate Python for Data Science

The subjects in this DataCamp’s track, Intermediate Python for Data Science include:

  • Matplotlib
  • Dictionaries and Pandas
  • Logic, Control Flow and Filtering
  • Loops

It looks at data visualization – how to visualize data, data structures – how to store data. Along the way, it shows how control structures customize the flow of your scripts (codes).

Data Visualization

It is one of the key skills for data scientists and Matplotlib makes it easy to create meaningful and informative charts. Matplotlib allows us to build various charts and customize them to make it more visually interpretable. It is not an hard thing to be done and it is pretty interesting to work on it. In my previous write-up, I wrote about how to use Matplotlib to build a line chart, scatter plot and histogram.

Data visualization is a very important part in data analysis. It helps to explore the dataset which it extracts insights. I call this as data profiling, the process of examine the dataset coming from existing data source such as databases, which consists of statistics or summaries of the dataset. The purpose is to find existing data can be used for other purposes, determine the accuracy, completeness and validity of the dataset. I can relate this to “perform a body check on the dataset to ensure it is healthy”.

One of the methods I learned from my school on data profiling is the use of histogram, scatter plot and boxplot to examine the dataset and find out the outliers. I can use either the Python’s Matplotlib, Excel, Power Bi or Tableau to perform this action.

It does not end here…

Python allows us to do customization on the charts to suit our data. There are many types of charts and customization ones can do with Python, changing from colours, labels and axes’ tick size. It depends on the data and the story ones want to tell. Refer the links above to read my write-up on those charts.

Dictionaries

We can use lists to store a collection of data and access the values using the indexes. It can be troublesome and inefficient when it comes to large dataset, therefore, the use of dictionaries in data analysis is important as it represents data in the form of key-value pairs. Creating a dictionary from the lists of data can be found in this link. It has one simple example demonstrating how to convert it. However, I do have a question, how about converting long lists to dictionary? I assumed it is not going to be the same method in this simple example. Does anyone have an example to share?

If you have questions about dictionaries, then you can refer to my blog which I wrote a quite comprehensive introduction of dictionaries in Python.

What is the difference between lists and dictionaries?

If you have a collection of values where order matters, and you want to easily select entire subsets, you will want to go with a list. On the other hand, if you need some sort of lookup table where looking for data should be fast, by specifying unique keys, dictionary is a preferred option.

Lastly, Pandas

Pandas is a high level data manipulation tool built on top of NumPy package. Since NumPy 2D array allows to use one data type in their elements, it may not suitable for some of the data structure which comprise of more than one data type. In Pandas, data is stored like a tabular table called DataFrame, for example:

How to build a DataFrame?

There are few ways to build a Pandas DataFrame and we need to import Pandas package before we begin. In my blog, there are two methods shared, using dictionaries and external file such as .csv file. You can find the examples from the given link. Reading from dictionaries can be done by converting dictionary into DataFrame using DataFrame() and reading from the external file can be done using Pandas’ read_csv().

  • Converting dictionary using DataFrame()
  • reading from external file using read_csv()

How to read from a DataFrame?

The above screenshot shows how the Pandas’ DataFrame looks like, it is in the form of rows and columns. If you wonder why the first column goes without naming. Yes, in the .csv file it has no column name. It appears to be an identifier for each row, just like an index of the table or row label. I have no idea whether the content of the file is done with this purpose or it has other meaning.

Index and Select Data

There are two methods you can select data:

  • Using square bracket []
  • Advanced methods: loc and iloc.

The advanced methods, loc and iloc is Python’s powerful, advanced data access. To access a column using the square bracket, with reference to the above screenshot again, the following codes demonstrate how to select that country column:

brics["country"]

The result shows the row label together with the country column. This is how it read a DataFrame which it returns an object called Pandas Series, which you can assume Series is a one dimension labelled array and when a bunch of Series comes together then, it is called DataFrame.

If you want to do the same selection of country column and keep the data as DataFrame, then using the double square brackets, it can do the magic with following code:

brics[["country"]]

If you check the type of the object, it returns as DataFrame. You can define more than one column to be returned. To access rows using the square bracket and slices, with reference to the same screenshot, the below code is used:

brics[1:4]

The result returns the from row number 2 to 4 or index 1 to 3 which contains, Russia, India and China. If you still remember, the characteristic of slice? The stop value (end value) of a slice is exclusive (not included in the output.

However, this method is has a limitation. For example, if you want to access the data similar to 2D Numpy Array, it can be done using the square bracket with specific column and row.

my_array[column, row]

Hence, Pandas has this powerful and advanced data access, loc and iloc, where loc is a label based and iloc is position based. Let us looking into the usage of loc. The first example reads row loc and follow by another example reads row and column loc. With the same concept as above, single square bracket returns a Series and double square brackets return a DataFrame, just as below:

brics.loc["RU"] --Series single row
brics.loc[["RU"]] --DataFrame single row
brics.loc[["RU, "IN", "CH"]] --DataFrame multiple row

Let extends the above code to read country and capital columns using the row and column with loc. First part it mentions the rows and second part it mentions the column labels. The below code returns a DataFrame.

brics.loc[["RU, "IN", "CH"], ["country", "capital"]]

The above rows values can be replaced with slice, just like the sample code below:

brics.loc[:, ["country", "capital"]]

The above code did not specify the start and end index, it means it returns all the rows with country and capital columns. Below is the screenshot of comparison between square brackets and loc (label-based).

Using iloc is similar to the loc, the only different is how you refer column and row which is using index instead of specifying the rows and column labels.