Data Warehouse Design

The data warehouse is the core of the Business Intelligence system. Being a database administrator in my previous job, I managed various data warehouses that used to provide meaningful insights into business decision making, data analysis and reporting—for example, the sales reports and sales trends. The data warehouse is not a product, but it is an environment that consists of the current and historical, in both structured and unstructured data. The data warehousing is a process transforming data into information and making it available for the business users. The data is ingested, cleaned and transformed so that the business users can access the processed data by using any Business Intelligence tools such as Tableau, Power BI or SQL clients to analyze data. There is a difference between data warehousing and data mining.

Data warehousing is different than data mining

Types of Data Warehouse

Enterprise Data Warehouse:

  • It is a centralized warehouse.
  • It provides decision support service across the enterprise.
  • It offers a unified approach for organizing and representing data.
  • It provides the ability to classify data according to the subject and give access according to those divisions.

Operational Data Store:

  • Data store required when neither Data warehouse nor OLTP systems support organizations reporting needs.
  • The data warehouse is refreshed in real-time. 

Data Mart:

  • A data mart is a subset of the data warehouse.
  • It specially designed for a particular line of business, such as sales, finance, sales or finance.
  • Data can collect directly from sources.

Star Schema vs. Snowflake Schema

Knowing the differences between star and snowflake schema is one of the most common interview questions. Star schema is a mature modelling approach widely adopted by relational data warehouses. It requires modellers to classify their model tables as either dimension or fact. Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can hold a huge number of rows and continue to grow over time.

Snowflake Schema is a type of multidimensional model. In snowflake schema contains the fact table, dimension tables and one or more than tables for each dimension table. Snowflake schema is a normalized form of star schema which reduce the redundancy and saves the significant storage. It is easy to operate because it has less number of joins between the tables and in this simple and less complex query is used for accessing the data from database.

Metadata

Metadata is also known as a data dictionary or data specification document. It provides the detail of the data such as the name of the column (attribute), the data type of the column whether it is a string, a number, etc, the order of the columns, the definition of nullable columns and etc. Datasets are made up of data objects (entity). After we identify the metadata, we shall design the conceptual, logical and physical data model.

Relationship and Cardinality

Relationship refers to how to link the tables (entities) together. According to the online Visual Paradigm website (a data modelling tool website), cardinality defines the possible number of occurrences in one entity which is associated with the number of occurrences in another. When you do a Google search to define cardinality in general terms, it returns a definition as “the number of elements in a set or other grouping, as a property of that grouping”. For example, ONE team has MANY players. Below diagram is extracted from lucidchart website which shows the different types of relationship in the database. It is used in the ER diagram, entity-relationship diagram.

Generally, the common types of cardinality are:

  • One to one
  • One to many
  • Many to many

In term of data query optimization, cardinality means the data in a column of a table, specifically how many unique values are in it. If you have done data profiling by using the Microsoft Power BI, for example, you notice Power BI provides a summary statistics of the table that show if the attributes contain unique values. This information helps with planning queries and optimizing the execution plans.

Conceptual, Logical and Physical Data Model

A general understanding to the three data models is that business analyst uses a conceptual and logical model to model the business objects exist in the system, while database designer or database engineer elaborates the conceptual and logical ER model to produce the physical model that presents the physical database structure ready for database creation. 

ERD featureConceptualLogicalPhysical
Entity (name)YesYesYes
RelationshipYesYesYes
ColumnYesYes
Column’s TypeOptionalYes
Primary KeyYes
Foreign KeyYes
Comparison of Conceptual, Logical and Physical data model

In summary,

The enterprise data modelling, metadata management and data warehousing are part of the Data Governance (DMBOK) that I mentioned in my previous posts. I learned it in my current job, and I think it is useful for those who are serious in building an enterprise data warehouses. A part of my job scope includes:

  • Design and develop semantic data models.
  • Design metadata and harmonized code references.
  • Identify business keys and relationships among the tables.
  • Ensure to involve all stakeholders including business users in Data warehouse implementation process.

References:
https://www.datacamp.com/community/blog/data-lakes-vs-data-warehouses
https://www.guru99.com/data-warehousing.html
https://www.geeksforgeeks.org/difference-between-snowflake-schema-and-fact-constellation-schema/
https://www.lucidchart.com/pages/ER-diagram-symbols-and-meaning

SQL Server: Reading the SQL Server log files using TSQL

Recently my colleague asked if I can assist on the SQL Server log files extraction using the SSIS ETL package. The log files are stored in the database server, and it will get purged whenever the database server restarts. My colleague wants to keep the server log files into the data warehouse for further analysis and IT auditing.

System Stored Procedure sp_readerrorlog

This stored procedure allows you to read the contents of the SQL Server error log files directly from a query window. It will enable you to search for specific keywords when reading the error file.  It is not new, and it is available since the SQL Server 2005.

Example of using the stored procedure

This procedure takes four parameters:

  1. Value of error log file you want to read: 0 = current, 1 = Archive #1, 2 = Archive #2, etc…
  2. Log file type: 1 or NULL = error log, 2 = SQL Agent log
  3. Search string 1: String one you want to search for
  4. Search string 2: String two you want to search for to further refine the results

If you do not pass any parameters this will return the contents of the current error log.

CREATE PROC [sys].[sp_readerrorlog]( 
   @p1 INT = 0, 
   @p2 INT = NULL, 
   @p3 VARCHAR(255) = NULL, 
   @p4 VARCHAR(255) = NULL) 
AS 
BEGIN 

   IF (NOT IS_SRVROLEMEMBER(N'securityadmin') = 1) 
   BEGIN 
      RAISERROR(15003,-1,-1, N'securityadmin') 
      RETURN (1) 
   END 
    
   IF (@p2 IS NULL) 
       EXEC sys.xp_readerrorlog @p1 
   ELSE 
       EXEC sys.xp_readerrorlog @p1,@p2,@p3,@p4 
END 

-- Call the stored procedure as below
EXEC sp_readerrorlog 6 

Note: the example here shows the values in single quotes, but for later versions of SQL Server you may need to use double quotes or you might get this error.

In Conclusion,

I will try to run it in my local machine to verify and extract the SQL Server logs from the SQL Management Studio and SSIS ETL package. I will keep developing the content of this article to show the result of the above stored procedure.

References: https://www.mssqltips.com/sqlservertip/1476/reading-the-sql-server-log-files-using-tsql/

Different between OLTP and OLAP

Transaction processing vs. Analytics processing.

What is OLTP?

OLTP stands for Online Transaction Processing. OLTP systems and it becomes the source of data for OLAP systems. The primary purpose of a typical OLTP system is to handle the operational database, to record the routine business data (either through insertion, update or deletion requests), and works as a data warehouse to provide storage for data analysis. It usually involves simple queries. Online transaction processing provides transaction-oriented applications in 3-tier architecture. A typical OLTP database stores current (and detailed) data. The Entity Model (table model) is the scheme used for storing such transactional database. The 3rd Normal Form (3NF) normalization may apply to the OLTP databases. The OLTP needs to perform data backup regularly without fail because it stores all the business functional data.

What is OLAP?

OLAP stands for Online Analytical Processing. An OLAP database reads past data that has been inputted using an OLTP system or different OLTP systems and rarely writes into the database. It usually involves large, complex queries used for business decisions. Aggregation across multiple databases is often required. An OLAP system allows users to view a variety of summaries about stored multidimensional data. OLAP system allows extracting information from a bigger database (presumably the data warehouse) and analyzes the same with the intent of decision-making. It is relatively slow if it involves a large dataset. Data is denormalized to improve query performance when aggregation is required. OLAP is built to persist data if data recovery is needed, OLTP backups can be restored.

The above picture shows the distinct difference between the OLTP and OLAP with the relation with data analysis, data mining and Big Data. Both sets of databases have their own set of data structures and algorithm that work for operational, transactional data in OLTP, and data analytics in OLAP.

Moving forward, I will compile some of the useful SQL interview questions in my blog. Hope to get some feedback from the readers who are working on databases as well.

SQL Server: Find MAX value from multiple columns

A community member from the .Net Developer Community Singapore has shared an excellent article about using the MAX(DateTime) in SSIS to get the unique records when there are duplicated records. In SQL Server Management Studio, I used it often as well to write my SQL statements. However, the article mentions that SSIS’ DateTime variable is missing milliseconds and most of the time, you will get duplicated records. How did the writer resolve the issue? You can refer to the link to read the article. In the same article, there is a reference link to the content which you can read as well.

Recently, I explored into finding a maximum value from multiple DateTime columns in a SQL table. As it is a part of the data extraction requirements, I came across a good example with the execution plan, you can refer to this link for the online article. It shows us how to find the maximum or minimum value from different columns in a table of the same data type.

Below is a snippet of the SQL statements to create a temporary with values:

IF (OBJECT_ID('tempdb..##TestTable') IS NOT NULL)
	DROP TABLE ##TestTable

CREATE TABLE ##TestTable
(
	ID INT IDENTITY(1,1) PRIMARY KEY,
	Name NVARCHAR(40),
	UpdateByApp1Date DATETIME,
	UpdateByApp2Date DATETIME,
	UpdateByApp3Date DATETIME

)

INSERT INTO ##TestTable(Name, UpdateByApp1Date, UpdateByApp2Date, UpdateByApp3Date )
VALUES('ABC', '2015-08-05','2015-08-04', '2015-08-06'),
	  ('NewCopmany', '2014-07-05','2012-12-09', '2015-08-14'),
	  ('MyCompany', '2015-03-05','2015-01-14', '2015-07-26')
	  
SELECT * FROM ##TestTable

Based on the above SQL statements, I believe you should be able to imagine that the temporary table called TestTable has 5 columns in which 3 of the columns are the same data type, DateTime. As the title suggested, our expected result from the solution is a column that contains the maximum date value and another column for ID. The writer provides 3 different solutions.

Solution 1: Using MAX and UpdateDate()

SELECT 
   ID, 
   (SELECT MAX(LastUpdateDate)
      FROM (VALUES (UpdateByApp1Date),(UpdateByApp2Date),(UpdateByApp3Date)) AS UpdateDate(LastUpdateDate)) 
   AS LastUpdateDate
FROM ##TestTable

Solution 2: Using UNPIVOT

SELECT ID, MAX(UpdateDate) AS LastUpdateDate 
FROM ##TestTable
UNPIVOT ( UpdateDate FOR DateVal IN ( UpdateByApp1Date, UpdateByApp2Date, UpdateByApp3Date ) ) AS u
GROUP BY ID, Name 

Solution 3: Using UNION

SELECT ID, MAX(UpdateDate) AS LastUpdateDate
FROM
(
	SELECT ID, UpdateByApp1Date AS UpdateDate
	FROM ##TestTable
	UNION 
	SELECT ID, UpdateByApp2Date AS UpdateDate
	FROM ##TestTable
	UNION 
	SELECT ID, UpdateByApp3Date AS UpdateDate
	FROM ##TestTable
) ud
GROUP BY ID

The writer provided the performance comparison for these solutions and mentioned that solution 1 has the most optimized performance. You can refer to the execution plan in the article and the explanation of the finding. Choose the best solution that fits into your query, table structure, etc. For more references, you can look for the other link below.

Reference: https://www.mssqltips.com/sqlservertip/4067/find-max-value-from-multiple-columns-in-a-sql-server-table/ https://stackoverflow.com/questions/71022/sql-max-of-multiple-columns

SQL Server: LEFT JOIN conditions in ON vs WHERE clause

In one of the new tasks in my project, it required me to check through the difference between the conditions in ON and WHERE clause for a SQL Server LEFT JOIN keyword. Before I go into detail and example between these conditions, let me begin with the basic function of the LEFT JOIN keyword in SQL Server.

LEFT JOIN

The simplest way I can put is LEFT JOIN keyword returns all records from the left table and the matched records from the right table. In the resultset after executing the query in SQL Server, it shows NULL if there is no matched record from the right table. The syntax for LEFT JOIN,

SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name

Here is a simple example I extracted from w3school,

SELECT c.CustomerName, o.OrderID
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = c.CustomerID

The Customers in the table1 and the Orders is table2. In the example above, I used an alias, c for the Customers, and o for the Orders. SQL aliases give a table, or a column in a table, a temporary name. The two tables join using the same key, and in this example, it is the CustomerID. For each row from the Customers table, the query compares it with all the rows from the Orders table. If CustomerID is available in both tables, the column values from these two tables will be combined to form a new row which then included in the resultset. If a row from the Customers table does not have any matching row from the Orders table, the query combines column values of the row from the Customers table with "NULL" for each column values from the Orders table. Sometimes, I use the below picture that I extracted from Stackoverflow to help me identify that resultset that I want to get from my query.

The sample resultset as below. As you see from the table, it contains some NULL value in the OrderID column. This means there are no associated orders that the Customers have.

CustomerNameOrderID
Alfreds Futterkistenull
Ana Trujillo Emparedados y helados10308
Antonio Moreno Taquería10365
Around the Horn10355
Around the Horn10383
B’s Beverages10289
Berglunds snabbköp10278
Berglunds snabbköp10280
Berglunds snabbköp10384

You can perform LEFT JOIN with more than one table. In some databases, it uses LEFT OUTER JOIN keyword.

With WHERE condition

The WHERE clause is used to limit the resultset by returning a list of customers that have ordered something. For example, below table removed a customer that does not have OrderID.

SELECT c.CustomerName, o.OrderID
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = c.CustomerID
WHERE o.OrderID NOT NULL
CustomerNameOrderID
Ana Trujillo Emparedados y helados10308
Antonio Moreno Taquería10365
Around the Horn10355
Around the Horn10383
B’s Beverages10289
Berglunds snabbköp10278
Berglunds snabbköp10280
Berglunds snabbköp10384

LEFT JOIN: conditions in ON vs. WHERE clause

Let observe the difference between the above query and the query below,

SELECT c.CustomerName, o.OrderID
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = c.CustomerID AND c.Country = 'Mexico'

When the condition of the WHERE clause moves to the ON clause, as shown above, it returns entire customers records. However, only the orders with country = ‘Mexico’ have associated product’s information. It finds the customers that originated from Mexico. The resultset shows as below:

CustomerNameOrderID
Alfreds Futterkistenull
Ana Trujillo Emparedados y helados10308
Antonio Moreno Taquería10365
Around the Hornnull
B’s Beveragesnull
Berglunds snabbköpnull
Blauer See Delikatessennull
Blondel père et filsnull
Bon app’null

This write-up shows how to use the SQL Server LEFT JOIN keyword to retrieve data from multiple related tables.

Note: The INNER JOIN clause, the condition in the ON clause is functionally equivalent if it is placed in the WHERE clause.

Reference: https://www.sqlservertutorial.net/sql-server-basics/sql-server-left-join/

Concept of Keys in DBMS

In a database management system (DBMS), keys help to uniquely identify rows of records in a table that may contain duplicated records too. Besides that, keys help to manage relationships between tables. There are many different keys in the DBMS, and each of them has its own meaning. Let’s explore them with some picture illustrations that I get from the Internet to help us understand and differentiate the keys.

Super Key

It is an attribute (key) or a set of attributes (keys) that can be used to identify rows of data in a table is called Super Key. A super key can have additional attributes that are not needed for unique identification.

Candidate Key

A minimal subset of a super key which can be used to uniquely identify rows of data in a table. Candidate Key is a super key with no repeated attributes. According to the Educa website, there are few rules that need to follow when it comes to the selection of a candidate key:

  • A Candidate Key should comprise of distinctive values.
  • A Candidate Key can have various attributes.
  • A Candidate Key cannot comprise of null values.
  • A Candidate Key must uniquely identify each row in the table.

Then, the Primary Key is carefully chosen from the given Candidate keys.

Primary Key

A primary key is a column or group of columns in a table that uniquely identify every row in that table. The Primary Key cannot be a duplicate. This means the same value cannot appear more than once in the table. A table cannot have more than one primary key. It is also known as Entity Integrity Rule.

Rules for defining Primary key:

  • Two rows cannot have the same primary key value.
  • It must for every row to have a primary key value.
  • The primary key field cannot be null.
  • The value in a primary key column can never be modified or updated if any foreign key refers to that primary key.

Alternate Key

The alternate key is a column or group of columns in a table that uniquely identify every row in that table. A table can have multiple choices for a primary key but only one key can be set as the primary key. All the keys which are not primary key are called an Alternate Key.

The illustration below is taken from guru.99.com that clearly differentiate the candidate key, primary key and alternate key from a table that has identified that StudID, Roll_No, and Email are the candidate keys, however, the StudID is chose to be the primary key mainly because it is a unique running number that can be used to identify a student in the table. Then, Roll_No and email are the alternate keys for this table.

Foreign Key

The foreign key is a column that creates a relationship between two tables. It is added into a table to establish this relationship. It acts as a cross-reference between two tables as it references the primary key of another table. The purpose of Foreign keys is to maintain data integrity and allow navigation between two different instances of an entity. It is also known as Referential Integrity.

The differences between primary key and foreign key

  • Primary key helps to uniquely identify a record in the table while foreign key is a field in the table that is the primary key of another table.
  • Primary key cannot be null values and a foreign key may accept multiple null values.
  • Primary key is a clustered index and data in the DBMS table are physically organized in the sequence of the clustered index. A foreign key cannot automatically create an index, clustered or non-clustered. However, you can manually create an index on the foreign key.
  • Can have the single Primary key in a table but can have multiple foreign keys in a table.

Composite Key

A composite key is a combination of two or more columns that uniquely identify rows in a table. The combination of columns guarantees uniqueness, though individually uniqueness is not guaranteed. Hence, they are combined to uniquely identify records in a table.

Compound Key

A compound key has two or more attributes that allow you to uniquely recognize a specific record. Another word, it is a primary key that does not consist of a single column but two or more columns that uniquely identify a row. It is possible that each column may not be unique by itself within the database. However, when combined with the other column or columns the combination of composite keys become unique. The purpose of the compound key is to uniquely identify each record in the table.

The differences between composite key and compound key

The difference between compound and the composite key is that any part of the compound key can be a foreign key, but the composite key may or maybe not a part of the foreign key.

Surrogate Key

An artificial key which aims to uniquely identify each record is called a surrogate key. These kind of key are unique because they are created when you don’t have any natural primary key. They do not provide any relation to the table data in the table. Surrogate key is usually an ordered integer.

Integrity Rules

  • Entity Integrity Rules.
  • Referential Integrity Rules.
  • Business Logic Integrity.

Integrity (validation) pertaining to the business logic, for example, the postal code shall be 6-digit within a certain range.

References

Modifying Data in Temporal Table – SQL Server

After an hour of discussion with the Business Analyst and System Data Administrator on the topic of the temporal table, I decided to take a step back to study more on the characteristics of a temporal table in Microsoft SQL Server. I have created three temporal tables in the Microsoft SQL Server 2016 recently for my project. And what is the difference between the temporal table and the temporary table? In one of my posts, I wrote the Introduction to Temporal Table, and I shared that,

Every temporal table has two explicitly defined columns, each with a datetime2 data type. These columns referred to as period columns. SysStartTime and SysEndTime (in the UTC zone) represent the current and historical data for every row of the table.

Period columns cannot be modified

The temporal table has period columns that use to audit SQL data, whereas, the temporary table does not have such columns. The Microsoft website explains that when you run a data modification using the regular DML (Data Manipulation Language) statements on the temporal table, the period columns data cannot be modified. For example, when we perform an insert statement, you can exclude the period columns from the insert statement. If you wanted to include the period columns into the insert statement, then you can set the values as default. For example, from the Microsoft website, the below code shows that the SysStartTime and SysEndTime assigned value with default.

INSERT INTO [dbo].[Department]
  (  [DeptID]
        , [DeptName]
        , [ManagerID]
        , [ParentDeptID]
        , SysStartTime
        , SysEndTime
  )
     VALUES
       (  11
        , 'Sales'
        , 101
        , 1
        , default
        , default) ;

To support further support the above statement, the create table statement that I shared in my introduction post showed that these period columns has declared “GENERATED ALWAYS AS ROW END NOT NULL”. It defines as the period columns are ALWAYS generated automatically by the SQL Server rather than being populated manually.

CREATE TABLE dbo.employee(
  EMPNO   INT,
  ENAME    VARCHAR(10),
  JOB      VARCHAR(9),
  MGR      INT,
  HIREDATE DATE,
  SAL      NUMERIC(7,2),
  COMM     NUMERIC(7,2),
  DEPTNO   INT,
     CONSTRAINT EMP_PK PRIMARY KEY (EMPNO),
   SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL
  ,SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL
  ,PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime)) WITH (SYSTEM_VERSIONING = ON);

An updated row is inserted into the history table

When you perform update statements, data is updated, and it is versioned, with the previous version of each updated row is inserted into the history table. The update statements are the regular update statement, and period columns do not need to be included. If you do, the SQL Server throws an error message as below:

Msg 13537, Level 16, State 1, Line 3 Cannot update GENERATED ALWAYS columns in table 'dbo.tablename'

Important note: However, you cannot update PERIOD columns and you cannot directly update data in the history table while SYSTEM_VERSIONING = ON.

The deleted rows are moved into the history table

When you perform delete statement, data is deleted, but the delete is logical deletion. What it does is the row moved into the history table from the current table – it is not permanently deleted.

Important note: You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

There is an interesting part about modifying data in the temporal table. In the Microsoft website, it shares how we can use the MERGE operation to modify the data. For example,

CREATE TABLE DepartmentStaging (DeptId INT, DeptName varchar(50));
GO
INSERT INTO DepartmentStaging VALUES (1, 'Company Management');
INSERT INTO DepartmentStaging VALUES (10, 'Science & Research');
INSERT INTO DepartmentStaging VALUES (15, 'Process Management');

MERGE dbo.Department AS target
USING (SELECT DeptId, DeptName FROM DepartmentStaging) AS source (DeptId, DeptName)
ON (target.DeptId = source.DeptId)
WHEN MATCHED THEN
    UPDATE
   SET DeptName = source.DeptName
WHEN NOT MATCHED THEN
   INSERT (DeptName)
   VALUES (source.DeptName);

The MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it does not exist, or updating a row if it matches. This link helps to understand how the MERGE operation works. If you are familiar with MongoDB, this operation is similar to the upsert statement.

I am going to explore further when the dataset that I have on my local machine to examine the behaviour of a temporal table and how it helps to solve the delta detection that mentioned by the Business Analyst and System Data Administrator.

SQL Server: Temporal Table

I am working on a project that requires the ETL to generate daily delta changes. The delta change is based on the last extracted date-time that stored in the log table and filtered the records that meet the criteria from the main table. Based on the performance testing done by my teammate, it was not optimized for millions of records that we need to handle daily. My teammate told me that my manager suggested using the temporal table.

SQL Server 2016 introduced support for temporal table, also known as system versioned temporal tables. The introduction of the temporal table as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

The evolution of the data tracking mechanism started with Change Tracking (CT), Change Data Capture (CDC) and now Temporal Tables. I have no experiences using the temporal table. I did some research to find out more information.

Introduction

The temporal table allows SQL Server to maintain and manage the history of the data in the table automatically, and allow historical time analysis. It is a critical feature used to audit SQL Server data. Every temporal table has two explicitly defined columns, each with a datetime2 data type. These columns referred to as period columns. SysStartTime and SysEndTime (in the UTC time zone) represent the current and historical data for every row of the table. The DateTime range has a value that is appropriate during that timeframe.

How does it work?

By querying the table, we can easily see the transitioning of the data to its various states within the specific date-time intervals. In addition to these period columns, a temporal table also contains a reference to another table with a mirrored schema. The system uses this table to automatically store the previous version of the row each time a row in the temporal table gets updated or deleted. This additional table is referred to as the history table, while the main table that stores current (actual) row versions is referred to as the current table or simply as the temporal table. During temporal table creation, users can specify the existing history table (must be schema compliant) or let the system create a default history table.

Why temporal tables

Below are the use cases of the temporal tables besides auditing SQL Server data changes and performing data forensics when necessary:

  • Rebuilding state of the data as of any time in the past.
  • Calculating trends over time (historical trend analysis).
  • Maintaining a slowly changing dimension for decision support applications.
  • Recovering from accidental data changes and application errors.

Examples

From the Microsoft website, a simple example to illustrate a scenario with Employer information in a database.

CREATE TABLE dbo.Employee
(
  [EmployeeID] int NOT NULL PRIMARY KEY CLUSTERED
  , [Name] nvarchar(100) NOT NULL
  , [Position] varchar(100) NOT NULL
  , [Department] varchar(100) NOT NULL
  , [Address] nvarchar(1024) NOT NULL
  , [AnnualSalary] decimal (10,2) NOT NULL
  , [ValidFrom] datetime2 GENERATED ALWAYS AS ROW START
  , [ValidTo] datetime2 GENERATED ALWAYS AS ROW END
  , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
 )
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));

I would like to share another example from sqlshack.com, that creates an employee temporal table.

CREATE TABLE dbo.employee(
  EMPNO   INT,
  ENAME    VARCHAR(10),
  JOB      VARCHAR(9),
  MGR      INT,
  HIREDATE DATE,
  SAL      NUMERIC(7,2),
  COMM     NUMERIC(7,2),
  DEPTNO   INT,
     CONSTRAINT EMP_PK PRIMARY KEY (EMPNO),
   SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL
  ,SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL
  ,PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime)) WITH (SYSTEM_VERSIONING = ON);

SQL Server creates the history table with following format, dbo.MSSQL_TemporalHistoryFor_<Object_ID>.

There is no delete option on temporal table

To get a delete option, we need to turn off the system_versioning setting using the ALTER TABLE statement. For example,

ALTER TABLE [dbo].employee SET ( SYSTEM_VERSIONING = OFF )
GO

Once executed the SQL statement, the temporal table and history table become a regular table. Then, can use the DROP statement to delete the table.

Temporal, Temporary Table and Table Variable

The temporal table is not the same as a temporary table or table variable. I found an article that explains how these two tables can be created in SQL Server, refer to the articles written by Pinal Dave on SQL SERVER – How to Create Table Variable and Temporary Table? In another article, it shares the performance comparison of a temporary table and table variable. I think it is another good article to read, SQL SERVER – Table Variable or Temp Table – Performance Comparison – INSERT.

Summary

In this article, I cover the basic introduction of the temporal table, how do the temporal table work and some examples to create a temporal table. The data retrieval becomes simple by using the SYSTEM_TIME clause along with the associated sub-clause. The history table can grow big, and the number of transactions on the temporal table takes up space. Periodically purging the data from the history table can help to release the space.

I will continuously update this entry whenever I make any successful use cases in my current project. I will use the available dataset to perform the walkthrough and explain the concept.

References:
https://www.sqlshack.com/temporal-tables-in-sql-server/
https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15

[SQL] Test Database Connection String Using Notepad

It is good to re-post this topic which I wrote it somewhere in Year 2017. It was my interview question. The interviewer asked how to test the database connection without logging into the SQL Server Management Studio (SSMS).

We are able to test a database connection string by creating and configuring a Universal Data Link (.udl) with Notepad. Here is the steps:

  1. Create an empty text file in Notepad and save it with a filename, TestConnection.udl on your desktop.
  2. In your desktop, double click the TestConnection.udl that you just created. A Data Link Properties box will popup.
  3. Select the Provider tab, and find the provider that you want to connect with. Click the “Next” button to proceed.
  4. Select the Connection tab, and enter your source server name and its credential to log onto the server. Also, select the database on the server.
  5. Click on the “Test Connection” and click “OK” to save the file.

References: https://www.gotknowhow.com/articles/test-a-database-connection-string-using-notepad