Cardinality in Databases

Recently, I read an article on cardinality in databases. When you do a Google search to define cardinality in general terms, it returns a definition as “the number of elements in a set or other grouping, as a property of that grouping“. It may sound a bit difficult to visualize and understand.

In another search in Stackoverflow website, a contributor named Oded shared, the cardinality can be described in two different contexts, data modelling and data query optimization.

In term of data modelling, cardinality means how one table relates to another, for example, one to one relationship, one to many relationship or many to many relationship. Below diagram is extracted from lucidchart website which shows the different types of relationship in the database. It is used in ER diagram, entity-relationship diagram.

In term of data query optimization, cardinality means the data in a column of a table, specifically how many unique values are in it. If you have done data profiling before using the Microsoft Power BI, for example, you notice there is a summary statistics of the table loaded into the application. This information helps with planning queries and optimizing the execution plans.

Big Data, A Long Introduction

One of my colleagues from the Business Operation texted me on one morning and asked me where she can get insights, understand some of the terminology, difference between the SQL and NoSQL, and how to make decision which type of database to be used. Instantly, I replied, “get it from me!” I was pretty confident that I could give her an answer and I wanted to explain databases in a more interesting way.

What is SQL?

Structured Query Language (SQL) is computer language for database management systems and data manipulation. SQL is used to perform insertion, updation, deletion. It allows us accessing and modifying data. It stored in a relational model, with rows and columns. Rows contain all of the information about one specific entry and columns are the separate data points.

What is NoSQL?

NoSQL encompasses a wide range of database technologies that are designed to cater to the demands of modern apps. It stored a wide range of data types, each with different data storage models. The main ones are document, graph, key-value and columnar. 

This explains the above picture. Apps such as Facebook, Twitter, search engine (web) and IoT applications generate huge amount of data, both structured and unstructured. The best examples to explain what is unstructured data are photos and videos. Therefore, it needs different method to store the data. NoSQL databases do not store data in rows and columns (table) format.

Differences between SQL and NoSQL

There are a lot of websites which we can search online to give us the differences and I referred to this website.

NoSQL is also known as schema-less databases. The above screenshot uses the word, dynamic schema, which means the same, it does not have a fixed schema which locked same number of the columns (fields) for data entry. NoSQL data allow to have different number of columns when data is added.

Image: https://www.guru99.com/nosql-tutorial.html

Another major difference is scalability, SQL is vertical scaling and NoSQL is horizontal scaling. Let’s use a picture to explain scalability.

Relational databases are designed to run on single server in order to maintain integrity of the table mappings and avoid the problems of distributed computing. Often, we will look into more RAM, more CPU and more HDD, ways to upsize our system by upgrading our hardware specification. It is scale up or vertical scaling. This process is expensive.

NoSQL databases is non-relational, making it easy to scale out or horizontal scaling, meaning that it runs on multiple servers that work together, each sharing part of the load. It can be done on inexpensive commodity hardware.

Question: SQL or NoSQL?

Let’s refer to this article, the choice of the database between SQL and NoSQL cannot be concluded on the differences between them but the project requirements. If your application has a fixed structure and does not need frequent modifications, SQL is a preferable database. Conversely, if you have applications where data is changing frequently and growing rapidly, like in Big Data analytics, NoSQL is the best option for you. And remember, SQL is not deceased and can never be superseded by NoSQL or any other database technology.

In short, it depends on what type of applications or project requirements and type of query result as well.

Big Data

Big data is used to refer not just to the total amount of data generated and stored electronically (volume) but also to specific datasets that are large in both size and complexity which algorithms are required in order to extract useful information from them. Example sources such as search engine data, healthcare data and real-time data. In my previous article about What is Big Data?, I shared that Big Data has 3 V’s:

  • Volume of data. Amount of data from myriad sources.
  • Variety of data. Types of data; structured, semi-structured and unstructured.
  • Velocity of data. The speed and time at which the Big Data is generated.

Yes, based on all the above, we have covered 2 of the 3 V’s, the volume and variety. Velocity is how fast data is generated and processed. Although, there are more V’s out there and some are relevant to Big Data’s description. During my visit to the Big Data World 2018 in Singapore, I realized that my understanding of Big Data was limited to the understanding of the volume and variety. In this blog, I am going to write more.

Storing Big Data

Unstructured data storage which cannot be stored in the normal RDBMS for some reasons and often Big Data is related to real-time data and required real-time processing requirements.

Hadoop Distributed File System (HDFS)

It provides efficient and reliable storage for big data across many computers. It is one of the popular distributed file systems (DFS) which stored both unstructured and semi-structured data for data analysis.

Big Data Analytics

There are not many tools for NoSQL analytics in the markets at the moment. One of the popular method dealing with Big Data is MapReduce by dividing it up into small chunks and process each of these individually. In other words, MapReduce spread the required processing or queries over many computers (many processors).

This Big Data does not limited to search engine and healthcare, it can be data e-commerce websites where we want to perform targeted advertising and provide recommendations systems which we can often see in websites such as Amazon, Spotify or Netflix.

Big Data Security

Securing a network and the data it holds are the key issues, a basic measurement such as firewall and encryption should be taken to safeguard networks against unauthorized access.

Big Data and AI

While smart home has became a reality in the recent years, the successful invention of smart vehicles which allows vehicles drive in auto-mode, gives us a big hope that one day smart city can be realized. Countries such as Singapore, Korea, China and European countries such as Ireland and UK are planning smart cities, using the implementation of IoTs and Big Data management techniques to develop the smart cities.

I am looking forward.

Reference:
Dawn E. Holmes (2017) Big Data A Very Short Introduction.

[SQL] Grant User Permissions

The task which I was doing today required me to prepare a SQL script which grants the user to execute the user defined functions and stored procedures. While SQL Server Management Studio allows us to set the permission by a few clicks, it is good to prepare them using the script so we can re-use them in case we have run them in the development and production databases.

A simple script to grant user with execute permission for stored procedures. It applies to all stored procedures in the selected database.

/* GRANT EXECUTE TO THE ROLE */
GRANT EXECUTE TO <user>

Next, I found out that my application needs to set permission for the user defined functions. I felt I could have spent the same amount of time to search the script to automate the process and manually wrote the script one by one. The script below can help,

/* GRANT EXECUTE TO THE USER DEFINED FUNCTIONS */
SELECT 'GRANT EXECUTE ON [' + SCHEMA_NAME(schema_id) + '].[' + [name] + '] TO MyRole;'
FROM sys.objects
WHERE type IN ('TF','FN');

There is another script which could help,

declare @name varchar(100), @qry varchar(2000)
declare cursor cursor_temp
for select name from dbo.sysobjects where OBJECTPROPERTY (id, IsProcedure) = 1
open cursor_temp
fetch next from cursor_temp into @name
while @@fetch_status = 0
begin
set @qry = ' grant execute on [dbo].[' + @name + ' ] to [ <user_name> ]'
exec (@qry)
fetch next from cursor_temp into @name
end
close cursor_temp
deallocate cursor_temp

I hope this could help.

SQL Server Export Data into Excel

How to export data from the SQL Server Management Studio into Excel file?

I used to save the data into .csv file and transfer the file out into my local machine via FTP or use the copy and paste command to transfer the data into the Excel directly from the database. This happened when you have control on the servers and security is not a big concern. Another method is using the SSIS.

Due to various reasons, my client did not allow installation of Excel into the database server and recommended me to use the Export Data function in the SSMS. I seldom used the Export Data function, anyway. Instantly, I gave a try after I was guided.

The steps are simple, guided and straight forward to follow. Here is the step:-
1. Right click the database > Task > Export Data.

2. Choose a data source. It means where the source location is. For this example, we select SQL Server Native Client 11.0. Enter the user credentials if you are using SQL authentication, otherwise, you can just continue with Windows Authentication.

Select the database which you want to export the data from.

3. Choose a destination. It means select the location where you want to save or export the data to. Destination selection, we choose Microsoft Excel, then enter the Excel file path where you want to save the file at. As for the Excel version, I believe it is based on the driver version we installed. If you want to keep the first row with column names, please check the checkbox.

Then, we can proceed with Next button.

4. Specify table copy or query. Here, we can choose either to select which table or tables we want to copy all its data into the Excel or we can write a SQL query to specify the dataset to be copied or exported. Let see how we can export a table out from the SSMS.

Choose the first option, Copy data from one or more tables or views. This allows copying the entire dataset from the selected tables or views into the Excel file. Select the tables by checking the checkbox at the Source.

The destination will name the Excel sheet according to the table’s name. If you want to change the name, you can click on each destination table’s name and edit it. If you do not want to export the entire table, you can select to write a query by choosing the second option from the previous screen.

Write your SQL statement in the next screen, just like below, which I selected top 1000 rows of records from the selected table. You can rename the table name at the Destination similarly to what I describe earlier in the next screen.

The table name will appear in the Excel sheet. One sheet for one table and as you know Excel has maximum number of rows, 65,536 for Excel 2003 and 1,048,576 for Excel 2007. I believe there is no concern about maximum column.

Next, we move to the Edit Mappings… button which is important to look into it especially when we want to format the Excel’s cell format.

In the Edit Mappings, you can select the data type of each field by clicking the dropdown selection list under the Type column. And, click OK to save the mappings. Then, you can view the dataset by clicking on the Preview button. A limited row of data will display in the next screen.

After editing the mappings and preview the dataset, it is time to review the data type mapping

5. Review Data Type Mapping. Click Next button to proceed and run the package.

6. Run Package. Nothing much you can choose, either run immediately or cancel the process. Click Next or Finish button to complete the wizard, it will run the package and save data into the Excel file.

Lastly, navigate to your Excel file path to retrieve the Excel file and check the data inside it.

SQL Server – Types of JOIN

Type of JOIN is quite often being asked during the interview where we are asked to list out the type of JOIN and the differences of each of them. I found a simple illustration of the JOIN command to give the first glance idea of what JOIN is.

Type of JOIN includes:-
1. INNER JOIN
2. LEFT JOIN (LEFT OUTER JOIN)
3. RIGHT JOIN (RIGHT OUTER JOIN)
4. FULL JOIN  (FULL OUTER JOIN)
5. CROSS JOIN

The picture above explains the results set to be returned when these type of JOIN are used in our queries. Next time, whenever there is a confusion of which JOIN command to be used, refer to this image to clarify our doubts.

However, let us have some short description of each command too, for our better understanding.

1. INNER JOIN
Returns all rows for which there is at least one match in BOTH tables.

2. LEFT JOIN
Returns all rows from the left table, and the matched rows from the right table.
Eg: The results will contain all records from the left table, even if the JOIN condition does not find any matching records in the right table, but with NULL in each column from the right table.

3. RIGHT JOIN
Returns all rows from the right table, and the matched rows from the left table. It is exactly opposite of the LEFT JOIN.
Eg: The results will contain all records from the right table, even if the JOIN condition does not find any matching records in the left table, but with NULL in each column from the left table.

4. FULL JOIN
Returns all rows for which there is a match in EITHER of the tables.
Eg: Its result set is equivalent to performing a UNION of the results of left and right outer queries.

5. CROSS JOIN
Returns all records where each row from the first table is combined with each row from the second table. Refer to the image below which I captured from the explanation made by Pinal D. in his blog.

SQL – NULL

Last year during one of the SQL Server User Group event brought by Microsoft, they invited Pinal Dave. I am sure you know this guy who wrote many SQL articles in his blog page, blog.sqlauthority.com.

One of the most interesting topics that he shared that morning was the SQL NULL. Sometimes, we have hard times dealing with the null values when we are working on our queries. During one of the interview, I was being asked about the returned result when we execute the below query.

select case when null = null then 'yup' else 'nope' end

I could not remember if I get the answer correct but I did think the query is wrong. According to my search in the Internet, the same question appeared in one of the links where the explanation is as below:-

This query will yield “Nope”, seeming to imply that null is not equal to itself! The reason for this is that the proper way to compare a value to null in SQL is with the is operator, not with =.

It suggests the correct way to compare null value is using the IS operator. You can give a try running it from the SSMS to see the returned result set which I am going to share here as well.

select case when null is null then 'yup' else 'nope' end

Next, I wish to search a bit more about NULL values.

An article from TechNet website, it says below:-

A value of NULL indicates that the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.

Anything compared to NULL evaluates to the third value in three-valued logic: UNKNOWN. Interesting as it is, there are many more things about NULL values can be read from the websites. As for now, we know that to check the NULL values, we shall use IS or IS NOT in WHERE clause.

SQL Server – Difference Between UNION And UNION ALL

I am trying to recall some of the interview questions that I was being asked during my job hunting and would like to share some studies or findings which I read from the experts regarding these questions. The first one that I remember is the differences between UNION and UNION ALL.

I believe in many cases we do use both of these commands in our queries. And, what are their differences and in term of performance which one is better?

UNION command is used in the queries to select data from two or more tables similarly to JOIN command, except UNION command selected columns need to be same data types and same number of columns to be selected. It returns the distinct data of both tables. It does the SELECT DISTINCT on the results set.

UNION ALL command is same as the UNION command except it returns all the data from the selected tables. In other words, UNION ALL includes duplicated rows.

However, the UNION ALL gives the faster results when we execute the queries. You can check it from the execution plan to compare both. Reason is UNION command will do the filtering during the execution. If you are sure that your return results are unique, it would be a good idea to use UNION ALL for better performance.