What is the difference between calculated columns and measures in DAX

Since my school time in Temasek Polytechnic, I have been confused between the calculated columns and measures that available in Power BI. There was no distinguish example to explain them. I try my luck to search around and get a better understanding of these two fields.

The calculated columns and measures are one of the few concepts to learn and master in DAX.

What is DAX?

It stands for Data Analysis Expressions (DAX). It uses to perform basic calculation and solve data analysis problems. According to the Microsoft website, DAX is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values. Stated more simply, DAX helps you create new information from data already in your model.

This may be similar to the Microsoft Excel formulas. With that knowledge, it will be useful for DAX understanding.

What is the measure?

According to the Microsoft website, it explains that the measures are used in some of the most common data analyses. Simple summarizations such as sums, averages, minimum, maximum and counts can be set through the Fields list well. The calculated results of measures are always changing in response to your interaction with your reports, allowing for fast and dynamic ad-hoc data exploration. 

It is useful whenever you do not want to compute values for each row but, rather, you want to aggregate values from many rows in a table.

How?

In Power BI Desktop, a measure is created in Report view or Data view. The new measures appear in the Fields list just like other fields, but it will have a “calculator” icon showing its values. You can name the measures, and add them to a visualization just like other fields.

Measures calculate a result from a DAX formula. An example of how to create a measure can be found in this link. It uses the DAX formula to sum the sales of the year, and calculate sales projections for the coming year. It is expected 6% increase in business.

Projected Sales = SUM('Sales'[Last Years Sales])*1.06

What is a calculated column?

With calculated columns, you can add new data to a table already in your model. But instead of querying and loading values into your new column from a data source, you create a DAX formula that defines the column’s values.

How?

Unlike custom columns that are created as part of a query by using Add Custom Column in Query Editor, calculated columns that are created by using the New Column feature in Report view or Data view are based on data you’ve already loaded into the model.

For example, you might choose to concatenate values, do addition, or extract substrings from two different columns in two different but related tables. The calculated columns appear in the Fields list just like other fields, but it will have a different icon showing its values are the result of a formula. You can name the calculated columns, and add them to a visualization just like other fields.

An example on how to create a calculated field can be found in this link. It uses the DAX formula to create a concatenated string from two columns.

One important concept that you need to remember about calculated columns is that they are computed during the database processing and then stored in the model. In more technical explanation, in data models for DAX, all calculated columns occupy space in memory and are computed during table processing. If you familiar with SQL computed columns, they are computed at query time and do not use memory.

Therefore, when computing a complex calculated column, the time required to compute is always process time and not query time, resulting in a better user experience. However, you have to remember it uses our precious RAM.

The differences

The value of a calculated column is computed during data refresh and uses the current row as a context; it does not depend on user interaction in the report. 

A measure operates on aggregations of data defined by the current context, that depends on the filter applied in the report, such as slicer, rows, and columns selection in a pivot table, or axes and filters applied to a chart.

References:
https://www.sqlbi.com/articles/calculated-columns-and-measures-in-dax/
https://docs.microsoft.com/en-us/power-bi/desktop-quickstart-learn-dax-basics

I would like to get feedback from my reader

As per title, I would like to gather some feedback from the reader regarding my blog. I have been writing on food reviews and some technical knowledge sharing. Appreciate your kind help to give some feedback to improve my writing skill and the content of the blog. Below is the link.

https://drive.google.com/open?id=1khYP9ayDGGlnjEHnospW7hWpdGu90ZKYEOzdrntRWN4

[SQL] Test Database Connection String Using Notepad

It is good to re-post this topic which I wrote it somewhere in Year 2017. It was my interview question. The interviewer asked how to test the database connection without logging into the SQL Server Management Studio (SSMS).

We are able to test a database connection string by creating and configuring a Universal Data Link (.udl) with Notepad. Here is the steps:

  1. Create an empty text file in Notepad and save it with a filename, TestConnection.udl on your desktop.
  2. In your desktop, double click the TestConnection.udl that you just created. A Data Link Properties box will popup.
  3. Select the Provider tab, and find the provider that you want to connect with. Click the “Next” button to proceed.
  4. Select the Connection tab, and enter your source server name and its credential to log onto the server. Also, select the database on the server.
  5. Click on the “Test Connection” and click “OK” to save the file.

References: https://www.gotknowhow.com/articles/test-a-database-connection-string-using-notepad

Agile

Agile is an iterative approach to project management and software development that helps teams deliver value to their customers faster and with fewer headaches.  In an agile team, they deliver work in smaller, and workable, but consumable, increments. Requirements, plans, and results are evaluated continuously so teams have a natural mechanism for responding to change quickly.

During my first internal Agile training, I was introduced to the Agile Manifesto. Many scrum masters and trainers use it for Agile 101 Introduction.

Manifesto for Agile Software Development

While there is value in the items on the right, we value the items on the left more.

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

Agile Project Management (APM) is an iterative approach to planning and guiding project processes. Just as in Agile Software Development, an Agile project is completed in small sections. These sections are called iterations.

Here is the definition of Scrum and Kanban. My current projects are using Scrum methodology. I am writing this blog to give me better understanding about Agile and how I can apply it into my work and personal life goals.

What is Scrum?

Scrum is a framework that helps teams work together. Scrum encourages teams to learn through experiences, self-organize while working on a problem, and reflect on their wins and losses to continuously improve. While the Scrum I’m talking about is most frequently used by software development teams, its principles and lessons can be applied to all kinds of teamwork. 

Often thought of as an agile project management framework, Scrum describes a set of meetings, tools, and roles that work in concert to help teams structure and manage their work.

https://www.atlassian.com/agile/scrum

Keywords in Scrum

  • Sprints
  • Sprint Planning
  • Ceremonies
  • Backlogs
  • Sprint Reviews
  • Standups
  • Scrum Master
  • Roles
  • Retrospective

What is Kanban?

Kanban is a popular framework used to implement agile software development. It requires real-time communication of capacity and full transparency of work. Work items are represented visually on a kanban board, allowing team members to see the state of every piece of work at any time.

Kanban is all about visualizing your work, limiting work in progress, and maximizing efficiency or flow. Kanban teams focus on reducing the time it takes to take a project or user story from start to finish.

Keywords in Kanban

  • Cards
  • Boards
  • Work In Progress (WIP) limits

Scrum or Kanban?

Reference: https://www.atlassian.com/agile/kanban/kanban-vs-scrum

How are you doing so far?

It is two weeks since Singapore began the COVID-19 Circuit Breaker (CCB). Most of us are working from home now. There have been a lot of challenges for me to overcome when I started telecommuting on 31 Mar 2020. The days before the CCB began, I was looking around at some IT shops in my neighbourhood to find cables for my new monitor. Most of the IT shop experienced a high surge of IT equipment and service’s demand.

My first week of working from home was full of Skype calls and messaging. My Skype meetings were scheduled back to back on one of the weekdays from morning until evening. It began with daily stand-ups and then Skype calls, followed by meetings and discussions in the afternoon. It ended in the evening, some 1.5 hours over timed from my usual working hour. There were some unpleasant moments during the Skype meetings when we ran through the sprint reviews and retrospective. It led me to feel so tired to face my work the next day.

I had an e-lunch with my colleague cum scrum master. She listened to my concerns and feedback. She advised me how to move on and looked forward to a new week. That evening, I took a picture of the beautiful skyline in the evening. I have not been stepping out of the house since Monday, 6th April.

Just lovely, Good Friday’s public holiday fell on the first week of CCB. The week cut short, and I could take a rest and collect my thought. Also, I shared my thoughts with my team lead, and I felt better after sharing out my feeling and thoughts. The bad mood has gone during the weekend.

While taking a break from the public holiday, I did some work on and off because I got the ideas from my team lead and I did not want to wait until next Monday to continue on it. Besides work, I took up a new topic to read, Agile project management. I started to explore this area together with my scrum master and her friends. They have a study group, but I have not started any sessions with them.

How about you?

I would like to hear from you too. How do you overcome the telecommuting challenges? How is your life so far and do you pick up anything to read or learn during this period?

Apache Kafka Theory

Apache Kafka is a data processing engine specifically designed for the high-speed, real-time information processing. It makes Artificial Intelligence and Big Data possible. In my previous blog entries, I wrote the Apache Kafka introduction.

  • Introduction to Kafka (Link)
  • Part 2: Introduction to Kafka (Link)

For this blog, I am going to write about the Apache Kafka Theory. Mainly, I will try to share:

  • What is a topic?
  • What is a broker?
  • How does a topic work with a broker?
  • Topic Replication Factor.
  • Concept of Leader for a Partition.
  • What roles do Replicas and the ISR play?
  • Zookeeper Roles.

Topics

It is going to be the base of everything. All Kafka records are organized into topics. In other words, it presents a particular stream of data.

It is similar to a table in the database without any constraints. It has a name, the same way a table does have a name. You can have as many topics as you want in Kafka, just like as many tables as you want in a database. There are no restrictions on how to name a topic, but usually, it is named by the stream of data. A topic is identified by its name.

Topics in Kafka are always multi-subscriber, that means a topic can have zero, one or many consumers (subscribers) that subscribe to the data written to it.

Topics are split into partitions. Each partition is ordered, immutable sequence of records that append to – a structured commit log. Each message within a partition gets an incremental ID, called offset that uniquely identifies each record within the partition.

The consumer controls the offset. A consumer will advance its offset (position) linearly as it reads records. In fact, since the consumer controls the position, the consumer can consume records in any order it likes.

According to the Apache Kafka’s website, it gives an example, a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from “now”.

In my previous blog, I wrote about the core API. The Producer API does the write portion. The Consumer API does the read portion.

According to the Apache Kafka’s website, the Kafka cluster publishes the records whether or not they are being consumed by using a configured retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. 

Example

An example from the Udemy course is tracking a fleet of trucks’ GPS position to Kafka. The dashboard or notification system consumes the data.

You can name the topic as truck_gps that contains all the GPS positions (longitude and latitude) of all the trucks. Each truck sends a message consisting of the truck ID and truck position to Kafka with an interval time, for example, 20 seconds. Here, in the case, it creates 10 partitions.

Do you need to specify the number of partitions when creating a topic?

The answer is yes. You have to specify how many partitions are required and you can change them later on.

Brokers

A Kafka cluster is composed of multiple brokers, which is servers. Each broker is identified with its ID which is an integer. Each broker contains only certain topic partitions. So, each broker has some of its data, but not all the data. It is because Kafka is distributed.

How brokers are connected?

Each broker uses the bootstrap broker to connect with any brokers. The bootstrap broker connects the brokers to the entire cluster. So, in Kafka, when you connect to one Kafka broker, you are connected to a cluster, even you have 100 brokers. A good number to get started is 3 brokers for a cluster.

How do topics and brokers work?

This is an example I extracted from the Udemy’s website. It has 3 brokers in a Kafka cluster. There are two topics named Topic_A and Topic_B and each of them has three and two partitions respectively.

Based on the screenshot above, there is no relationships between the brokers and topics. It can be in whatever order. When you create a topics, Kafka will automatically assign the topic and distribute it across all your brokers.

If you were to create another topic with four partitions, one of the brokers will have two partitions.

Topic Replication Factor

When there is a distributed system in the Big Data world, we need to have replication, just in case one of the brokers (servers) is down, things are still working fine.

There is a replication factor to follow, three is a standard, two is the minimum. The example below, Partition 0 Topic-A replicates from Broker 101 to Broker 102 and Partition 1 Topic-A from Broker 102 to Broker 103.

If the Broker 102 is down, at least data in Broker 101 and Broker 103 are still up and available. Topic replication ensures that the data would not be lost.

Concept of Leader for a Partition

In Kafka, there is a concept of leader for a partition. The golden rule is at any time only one broker can be a leader for a given partition, and none or more servers that act as Followers. Only the leader can receive and serve (read and write) the data for a partition. The other brokers just are passive replicas (in-sync replica – ISR) and it will synchronize the data. In the event of the Leader failing, one of the Followers will take on the role of the Leader. This ensures load balancing of the server.

What roles do Replicas and the ISR play?

Replicas are essentially a list of nodes that replicate the log for a particular partition irrespective of whether they play the role of the Leader. On the other hand, ISR stands for In-Sync Replicas. It is essentially a set of message replicas that are synced to the leaders.

Replication ensures that published messages are not lost and can be consumed in the event of any machine error, program error or frequent software upgrades.

How to manage the leader and replicas?

The Zookeeper manages the leader and replicas. If the leader goes down, for example, the Partition 0 Topic-A in the Broker 101 is the leader, the Partition 0 Topic-A in Broker 102 becomes the leader because it was an in-sync replica. When the Broker 101 comes back, it becomes the leader again after it replicates the data. The promotion of a leader and its management happens in the background and handles by the Kafka. Kafka uses Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group. It is not possible to bypass Zookeeper and connect directly to the Kafka server. If for some reason, ZooKeeper is down, you cannot service any client request.

Question, if a Replica stays out of the ISR for a long time, what does it signify? It means that the Follower is unable to fetch data as fast as data accumulated by the Leader.

References:
https://www.udemy.com/
https://kafka.apache.org/intro

White Radish, Carrot and Chicken Soup

I bought a big white radish and a packet of carrots to cook with the chicken bones. The white radish has plenty of health benefits:

  • Boost immunity – Radish is chock full of dietary fibre, which helps with digestion issues.
  • Help the skin – They also have phosphorous and zinc. These, when combined, can help get rid of dryness, acne and rashes.
  • Aids digestion – Chock full of vitamin A, C, E, B6, potassium, and other minerals, radishes can give your whole body an immunity boost.
  • Regulate blood pressure – Rich in potassium, radishes can help keep blood pressure under control by maintaining the sodium-potassium balance in the body.

During this COVID-19 period, I buy the white radish and carrots quite often to boil soup. For Chinese, this an ordinary soup that every Chinese family cook at home. It is my favourite soup too.

I used the chicken bone as the soup based, and it is nutritious too. The chicken bone helps protect the joints, reduce inflammation and heal the gut. Also, it helps fight osteoarthritis and may aid sleep.

There are plenty of recipes that you can find online. I used the rice cooker to cook the soup by cutting the white radish and carrot into a smaller size. Together with the chicken bone, I put all the ingredients into the rice pot and add the chicken bone broth to cook for 1.5 hours to 2 hours using the slow cook function.

About 30 minutes before I turn off the fire, I add some chicken meat and goji berries into the soup. You can add the red dates if you want too and serve the soup while still hot.

Reference: https://www.huangkitchen.com/chinese-white-radish-soup/