MSSQL: Database Partitioning

mssql

I was being called into a meeting room to discuss with the support team regarding an issue faced by the customer. I did not have the full context of the issue and could only think of some of the queries made by the customer to build the Business Intelligence’s cubes caused some serious performance issues.

And, during the meeting, my boss said the word, database partitioning. For one moment, I did mistakenly mixed up between database partitioning and database normalization. I believed both are used to organize the data in the database but both are targeting at different areas.

Database normalization
According to Microsoft, normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data appears in more than 1 table, we need to make changes exactly the same way in all the other tables.

In general, normalization requires additional tables. There are a few rules for database normalization. Each rule is called a “normal form”. I will write more about it in the next blog.

Database Partitioning
Again, according to Microsoft, partitioning a database improves performance and simplifies maintenance. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan. Maintenance tasks, such as rebuilding indexes or backing up a table, can run more quickly.

a.Hardware Partitioning
Partitioning can be achieved without splitting tables by physically putting tables on individual disk drives. Putting a table on one physical drive and related tables on a separate drive can improve query performance because, when queries that involve joins between the tables are run, multiple disk heads read data at the same time. SQL Server filegroups can be used to specify on which disks to put the tables.

Multiprocessors that enable multiple threads of operations, permitting many queries to run at the same time. Alternatively, a single query may be able to run faster on multiple processors by letting components of the query run at the same time. For example, each table referenced in the query can be scanned at the same time by a different thread.

b.Horizontal Partitioning
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a specific year. Any queries requiring data for a specific month only reference the appropriate table.

Partitioning data horizontally based on age and use is common. For example, a table may contain data for the last five years, but only data from the current year is regularly accessed. In this case, you may consider partitioning the data into five tables, with each table containing data from only one year.

c.Vertical Partitioning
Vertical partitioning divides a table into multiple tables that contain fewer columns. The two types of vertical partitioning are normalization and row splitting.

Normalization is the standard database process of removing redundant columns from a table and putting them in secondary tables that are linked to the primary table by primary key and foreign key relationships.

Row splitting divides the original table vertically into tables with fewer columns. Each logical row in a split table matches the same logical row in the other tables as identified by a UNIQUE KEY column that is identical in all of the partitioned tables. For example, joining the row with ID 712 from each split table re-creates the original row.

Sources:
https://support.microsoft.com/en-sg/help/283878/description-of-the-database-normalization-basics
https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms178148(v=sql.105)

Advertisements

MSSQL: Change DATA Folder Directory

mssql

My colleague came and asked me if the client wanted to change their current SQL Server’s DATA folder’s directory to another directory in their server, can it be done?

The answer is it can be done.

Then, the next question is how? There is a method I found online and tried it and it worked for my local machine. The procedure of changing is not difficult and it is simple to understand. If you are using Window machine, the steps are as below:

Step 1: Identify the mdf and ldf files.
If you did not change the file path during the installation, by default it will set at, C:\Program Files\Microsoft SQL Server\<server_version>\MSSQL\DATA.

So, your filenames are:
– database_name.mdf
– database_name_log.ldf

Just make sure there are .mdf and .ldf files with the matching database name. For example, in my case,
MoveDatabase.mdf
MoveDatabase_log.ldf

Step 2: Set the new file path for the SQL database files.
For my case, I will move the current file path to another drive which I have created a folder to hold the data files. My path is E:\Data. Open the SQL Server Management Studio (SSMS) and execute the below script after you have changed the database name and new file path.

Then, what are the things to be changed?

ALTER DATABASE MoveDatabase   
    MODIFY FILE ( NAME = MoveDatabase,   
                  FILENAME = 'E:\Data\MoveDatabase.mdf');  
 
ALTER DATABASE MoveDatabase
    MODIFY FILE ( NAME = MoveDatabase_log,   
                  FILENAME = 'E:\Data\MoveDatabase_log.ldf');  

There is a couple of things to be done before you can execute the above script in SSMS.

  1. Change the database name “MoveDatabase” to your database name.
  2. In some cases, the .mdf file name is with “_data” at the end of the database name. It could be something like, “MoveDatabase_data”. Just check your database name from step 1. It uses the same name from the step 1.

To execute the above SQL script, the SQL server’s service does not need to be stopped and the database does not need to be offline at this point of time, however, it is advised to ensure your production’s database is not in-used during this point of time to avoid any data losses.

Note: It is advisable to keep database folder in a different directory or best to be in different drive in a machine. Same goes to the database backup files.

Step 3: Set database to offline.
Run the following SQL script in the SSMS to set the database to offline.

ALTER DATABASE MoveDatabase   SET OFFLINE; 

This script is important from this point onward. Change the database name according to your database name. If the database is not set to offline, the following steps cannot be done.

Step 4: Move the database files to new location.
Move .mdf and .ldf files of the specific SQL database to a new location specified in step 2 either by using copy and paste or cut and paste function. If you are familiar with using command, it works the same.

Note: Moving files to another location, there is one important thing to look into, the folder or file permission access.

Step 5: Set database to online.
Run the following SQL script in the SSMS to set the database to online.

ALTER DATABASE MoveDatabase   SET ONLINE;

From this point onward, the specific database starts pointing to new location. However, if you received an error message reads as below:

Msg 5120, Level 16, State 101, Line 13
Unable to open the physical file “E:\Data\MoveDatabase.mdf”. Operating system error 5: “5(Access is denied.)”.


Then, there is a need to set the folder or file permission access with the user account which you can do it at the Permission Setting of the folder. For more details, you can refer to this link, https://www.sqlshack.com/move-sql-database-files-mdf-ldf-another-location/.

MSSQL : Import CSV file into a Table

Recently, I am working on the data stored in the Excel files. Those data are needed to be imported into the database in order to allow the stored procedures to execute and tabulate data back to the users. During the process of importing the Excel files using the “Import Data” (Right click the database > Task), I found the process was tedious because of setting the correct data type for each columns and always cause me to have an error related to truncation text.

Since, the Excel files have a lot of columns and rows, I decided to try another method to import the Excel data into the table, by using the BULK INSERT method. This method requires the Excel files to be in .csv format to begin with. It is easy to convert the .xlsx into .csv file.

Below is the script on how to use the “BULK INSERT” script. Beforehand, a table called dbo.upload_file is created. Based on the delimiter set in the local machine, place the value accordingly at the “FieldTerminator”.

BULK INSERT [dbo].[upload_file] FROM 'C:\Users\<filename>.csv'
    WITH (
       FIELDTERMINATOR = ',',
       ROWTERMINATOR = '\n'
 );

The nightmare began when I realized the data in the Excel contains a lot of comma (,). The data cleaning process began by removing them altogether from the Excel file. Besides that, you want to check if you have inserted the header into the database’ table.

We can set the first row to be inserted into the table.

BULK INSERT [dbo].[upload_file] FROM 'C:\Users\<filename>.csv'
    WITH (
       FIRSTROW = 2,
       FIELDTERMINATOR = ',',
       ROWTERMINATOR = '\n'
 );

By default, Excel saves the .csv file using delimiter comma (,), however it can be changed at the Control Panel setting. Open up Control Panel –> Region and Language, and then click the Additional settings button on the bottom. Now look very closely at the “List separator” item, which normally has a comma in the field, change it to your preferred delimiter such as pipes “|”. Then, hit “Apply” and “Save” button to save the changes made.

Then, next time when saving a .csv file, it will use the new delimiter. Either way can solve the issues mentioned above and the data can be inserted into the database’s table successfully.

MSSQL: Customize theme for SSMS Express 2014

The newer versions of SQL Server Management Studio (SSMS) allows users to change the theme easily from the option menu, however, the version 2014 has different approach and I googled online to find a method to customize it according to my preference of dark theme.

This link gives a good example of how to customize dark theme in SSMS 2014, https://blogs.sentryone.com/aaronbertrand/making-ssms-pretty-my-dark-theme/.

It is simple to follow and you can change the RBG according to your preferences without having to follow exactly every colours mentioned by the writer.

Enjoy!

Highly Scalable Systems

I was being asked to look into this topic called highly scalable systems. Instantly, I diverted my attention to my IT team for more insights. As far as I know, this topic covers many areas, it could be storage, processor cores , memory and etc, from hardware to software. Looking at the data volume nowadays, how to design a highly scalable system is important. So, from where I should start first?

Let me try to understand the term “highly scalable”.  It means flexibility to scale. What is scale? It means can change. In short, it is flexibility to change. In most of the cloud services are now scalable. 

We need to build a solution, uniquely and dedicated to the project.

Next, I moved to focus on database scaling. The first thing it prompts up is the horizontal and vertical scaling. If you have read the MongoDB document, you will find this topic too. I picked up one of the plain text explanations from the stackoverflow website.

Horizontal scaling means that you scale by adding more machines into your pool of resources whereas vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.

https://stackoverflow.com/questions/11707879/difference-between-scaling-horizontally-and-vertically-for-databases

Usually, when we do vertical scaling, it requires some downtime to increase the resources and restart server to complete this process. Meanwhile, horizontal scaling just adding another machine in the system, it is highly used in the tech industry. This will decrease the request per second rate in each server.

To segregate the requests, equally to each of the machine (example, application server), you need to add load balancer which would act as reverse proxy to the web servers.

‘Microsoft.ACE.OLEDB.12.0’ provider is not registered on the local machine

I was working on a complex query and required a dataset to be uploaded into my local machine’s database. The file format of the dataset was excel file and I was using the SQL Server Management Studio (SSMS) to upload the dataset into the database. 

Unfortunately, the SSMS returned an error message after I have selected my source as excel file. The error message was ‘Microsoft.ACE.OLEDB.12.0’ provider is not registered on the local machine.

For my case, the fixes for this error is done by installing the Microsoft Access Database Engine driver. The installation is simple, straight forward and hassle free. It does not require the machine to be restarted in order to continue importing the dataset using excel file.

Reference link: https://social.msdn.microsoft.com/Forums/en-US/1d5c04c7-157f-4955-a14b-41d912d50a64/how-to-fix-error-quotthe-microsoftaceoledb120-provider-is-not-registered-on-the-local?forum=vstsdb

[SQL] Grant User Permissions

The task which I was doing today required me to prepare a SQL script which grants the user to execute the user defined functions and stored procedures. While SQL Server Management Studio allows us to set the permission by a few clicks, it is good to prepare them using the script so we can re-use them in case we have run them in the development and production databases.

A simple script to grant user with execute permission for stored procedures. It applies to all stored procedures in the selected database.

/* GRANT EXECUTE TO THE ROLE */
GRANT EXECUTE TO <user>

Next, I found out that my application needs to set permission for the user defined functions. I felt I could have spent the same amount of time to search the script to automate the process and manually wrote the script one by one. The script below can help,

/* GRANT EXECUTE TO THE USER DEFINED FUNCTIONS */
SELECT 'GRANT EXECUTE ON [' + SCHEMA_NAME(schema_id) + '].[' + [name] + '] TO MyRole;'
FROM sys.objects
WHERE type IN ('TF','FN');

There is another script which could help,

declare @name varchar(100), @qry varchar(2000)
declare cursor cursor_temp
for select name from dbo.sysobjects where OBJECTPROPERTY (id, IsProcedure) = 1
open cursor_temp
fetch next from cursor_temp into @name
while @@fetch_status = 0
begin
set @qry = ' grant execute on [dbo].[' + @name + ' ] to [ <user_name> ]'
exec (@qry)
fetch next from cursor_temp into @name
end
close cursor_temp
deallocate cursor_temp

I hope this could help.