SSIS: How to reset metadata in SSIS packages

It is the first time I write about the SSIS (SQL Server Integration Services) package. Previously, in the older version of SQL Server, it was known as Data Transformation Services (DTS) in BIDS (Business Intelligence Development Studio). The SSIS was introduced in SQL Server 2005 to perform extraction, transformation and loading of data. Back then, this service was available for standard and enterprise version. Now, in the Visual Studio 2019, you may notice the “New Project” does not include the SSIS. However, you can download from the Visual Studio through the Manage Extensions. When the Manage Extensions window loads, type “Integration Services” in the search box. Alternatively, you can go to the marketplace to download the installer.

With Visual Studio 2019, the required functionality to enable Analysis Services, Integration Services, and Reporting Services projects has moved into the respective Visual Studio (VSIX) extensions only.

I started this post because I faced a situation whereby my column has changed its data type in the OLE DB Source node without knowing it. It happened when I was updating the datamart columns with the new extraction criteria, and I was working on some columns with the date and date-time format.

I found out the change from date format to date-time format when the generated CSV file consisted of data with timestamp values. I tried to cast the column to date format or even converted it to a string; still, it returned as a timestamp format. I checked the SQL query that extracted the data from the data mart, and it cleared enough that the SQL query did not return timestamp values.

So, where did it go wrong?

The change was not initiated by myself, and I am clueless how did it get changed in the first place. After investigated, I realized the column changed to the timestamp after the SQL command text executed in OLE DB Source node.

Then, I googled and found that I have to check the metadata in the Data Flow Path Editor (the grey arrow) that connects to the next node. Maybe, if you do not explore so much on each node and arrow in the SSIS package, you probably will not know there is this metadata properties box. In the properties dialogue, it has a tab called metadata where it shows the column name, the data type, and others. Here, I found that the column has a timestamp data type. I wanted to edit the row of the column that caused the issue. However, I cannot make a direct data type change from this properties dialogue.

How did I fix it?

What I did was removing the affected column(s) from the SQL command text and redrew the arrow to the next node.

From here, I noticed that the affected column wholly removed from the metadata. Next, I went into the SQL command text again, and replace the query with the column that I removed earlier. Then, I checked back in the metadata, and the column showed the correct data type, the date format. Maybe, many of us already know how to deal with this error, and I do not find my solution is a good one. Am wondering if there is another workaround or better way to resolve metadata changes in SSIS package?


SSIS: Specify Development Target Version

A continuous topic on SSIS (SQL Server Integration Services). Recently, I have tried to install the Visual Studio 2019 and separately enabled the SSIS that is not part of the default installation. You can check out my previous article with this link.

Many times we are working on different versions Visual Studio during local development and live production servers, and encountered problems such as scripts are not supported in the server’s release of Integration Services. Recently, during the deployment, I faced this issue because I did not change the development target version before uploading the SSIS solution to the TFS.

Today, I am going to share about how we can set the development target version in the Visual Studio.

Right-click on the solution project, and choose Properties. When the property box loads, go to the Configuration Properties and select General. You see the property box as above. Then, select the correct target server version from the dropdown.

Besides that, when you are dealing with older versions, you may want to check whether you need to switch the package from 64bit to 32bit for connections that require Access, Excel, and ODBC. This solution will also allow you to debug a Script Task because that also doesn’t work in 64bit mode. To switch to 32bits, go to the Debugging option in the Configuration Properties and change the Run64BitRuntime to False.

If you have the same problems, you can try to change the deployment target version and the runtime 32bit before uploading to the server.

SSIS Deployment

Recently, the project that I am working started to go into UAT deployment phase, and I get to learn some of the standard procedures for application and database objects deployments in this company. It is a little quite late that I discovered it now when I am almost toward the end of my current employment contract. If time is permitted, I will try to share my experience for the application and database object deployment processes, as well as the bug fixing process that we planned to go through in the next sprint.


Application deployment is the process of putting your application, either a system application or a web application into a server that locates at on-premise or cloud platform. Application deployment uploads the packages of the software components and configuration files to the server. Meanwhile, the database objects deployment creates database schema, tables, views, functions, and any necessary data needed for lookup tables. To implement the application logic, we need to create these database objects.

In my company, the deployment process includes getting reviews and approvals from the Data Architect for all the database objects and application’s packages. Besides that, we manage our data models using the ER Studio. Previously, I used Visual Paradigm in one of my projects. It had become routine for me that I liked to update and get reviewed the data models before I started to create the database objects into my local database environment and begin coding.

After that, I request for approvals to deploy the application packages and database objects into the server, either it is a UAT (User Acceptance Testing) server, Pre-Production server or Production server. The UAT server is where the business users and Quality Engineer conduct their testing using testing data. In some projects, there is Pre-Production server where uses the live data to do testing before the actual deployment to the lived production server that accesses by the users.

For some deployments, I need to get a Budget Quotation (BQ) from the Facilities Management (FM) teams. There is a form to fill up what activities involve in the deployment; such creates folders, uploads scripts, setups configurations, and other steps that include during the implementation. Some companies do not need to get BQ but request to raise a ticket to perform the deployment.

The project team that I worked does not have Continous Integration/Continuous Deployment (CI/CD) implemented; therefore, I need to perform manual deployment by accessing the server and upload the scripts, files and setup the configuration. The CI/CD is a set of the practice of modern automation to deliver the application to users.

The main activities:

  • Design the data models
  • Design and develop an application
  • Review of work
  • Application and database objects deployment

SSIS Deployment

I used the example from an article, Package Deployment in SSIS by Example wrote by MSBI Tutorial to share the same steps that I went through during my project deployment because I cannot share the full implementation or screenshot of my project in my blog. The deployment uses the SSISDB catalogue under the Integration Services Catalogs in the SQL Server Management Studio.

Firstly, create a new folder and name it accordingly with a good description that tells other people what the package does. This folder stores the deployment packages.

Next, launch Visual Studio to access the SSIS packages that we build or code. Go to the Solution Explorer pane on the right side and follow the steps below.

  • Right-click Solution Explorer.
  • Select “Open Folder in File Explorer.
  • Locate the Bin Folder -> Development Folder -> folder_name.ispac (Integration Services Project Development File).
  • Double click and an Integration Service Wizard will open up. Follow through the wizard, and select the .ispac folder as Project Deployment File in the Select Source page.
  • In the Select Destination page, select the server name of the database, and the folder that we created in the SSISDB.
  • Click the Deploy button.

Below shows the solution that we setup based on the above steps.

Package Execution

Right click the package name .dtsx file, and select Execute project. We can view the execution information by viewing the Integration Services Dashboard and Report from the SSISDB node. Right click the solution folder in SSISDB -> Reports -> All Executions.

The report allows you to track the execution status, package name, task, start DateTime, duration, and path. If you have an SSIS package’s audit tables created in the SQL Server database, you can check the SSIS log tables to ensure the package runs successfully. The Integration Services Dashboard may show the result as succeeded in executing the package even though the SSIS log recorded errors in executing some of the nodes in the package because the execution job can set to use “go next” or “quit when failed” options.



Updated pip and Installed Jupyter Notebook

It is just a quick blog entry before the end of Sunday. I have been trying to write some blog entries since last week when I started to clear my annual leaves, but I could not get the right topic to kick start. I did a small update just now on my laptop. I used to code python using the Jupyter Notebook that I installed in another laptop. I have returned the laptop to my friend, and it is a good time to update my laptop to launch the Jupyter Notebook. When I first started my self-learning Python, I learned it through Jupyter Notebook web version. It is convenient to use, quick, and no installation is needed. Now, I wanted to use the Jupyter Notebook without going to the web version, and I can direct host it on my localhost. The pip version in my laptop was too old to install the Jupyter Notebook.

You are using pip version 8.1.1, however version 20.2.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

The above message I copied from the command prompt. It was just a year, and how does the version number jump so much? It is easy to do the pip update because it has provided the command directly on the command prompt screen. Just follow it!

C:\Users\Li Yen\AppData\Local\Programs\Python\Python35-32\Scripts>python -m pip install --upgrade pip
Cache entry deserialization failed, entry ignored
Collecting pip
  Cache entry deserialization failed, entry ignored
  Downloading (1.5MB)
    100% |################################| 1.5MB 603kB/s
Installing collected packages: pip
  Found existing installation: pip 8.1.1
    Uninstalling pip-8.1.1:
      Successfully uninstalled pip-8.1.1
Successfully installed pip-20.2.2

Upon successfully installed the updated version of pip, then I run the next command to install Jupyter Notebook by using the below command:

C:\Users\Li Yen\AppData\Local\Programs\Python\Python35-32\Scripts>pip3 install jupyter

It took a few minutes to complete the whole installation without any problems on my laptop, and the Jupyter Notebook 6.1.3 installed. Lastly, to launch the Jupyter Notebook by using the commands below.

[I 18:56:08.259 NotebookApp] Writing notebook server cookie secret to C:\Users\Li Yen\AppData\Roaming\jupyter\runtime\notebook_cookie_secret
[I 18:56:09.241 NotebookApp] Serving notebooks from local directory: C:\Users\Li Yen\AppData\Local\Programs\Python\Python35-32
[I 18:56:09.241 NotebookApp] Jupyter Notebook 6.1.3 is running at:
[I 18:56:09.244 NotebookApp] http://localhost:8888/?token=4cda80785065046f320111b2788cd0057428c4112abe6915
[I 18:56:09.247 NotebookApp]  or
[I 18:56:09.247 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 18:56:09.272 NotebookApp]

    To access the notebook, open this file in a browser:
    Or copy and paste one of these URLs:

How many of us really read what is on the black screen? It gave some information that I can see on the browser too, such as the localhost is running on port 8888. Another beautiful information, it tells us to stop the server (yes, you are running a Jupyter server, and the web browser acts as the Jupyter client where you can code) by using the Ctrl+C keys to stop the server and shut down all the kernels.

What is a kernel? It receives the code sent by the Jupyter client (our browser), executes it, and returns the results back to the client (browser) for display. If you wish to read a bit about Jupyter Notebook, you can refer to the link below.

If you know a better way to do the above, please share with me. I am learning every single thing from everyone daily 🙂


Data Warehouse Design

The data warehouse is the core of the Business Intelligence system. Being a database administrator in my previous job, I managed various data warehouses that used to provide meaningful insights into business decision making, data analysis and reporting—for example, the sales reports and sales trends. The data warehouse is not a product, but it is an environment that consists of the current and historical, in both structured and unstructured data. The data warehousing is a process transforming data into information and making it available for the business users. The data is ingested, cleaned and transformed so that the business users can access the processed data by using any Business Intelligence tools such as Tableau, Power BI or SQL clients to analyze data. There is a difference between data warehousing and data mining.

Data warehousing is different than data mining

Types of Data Warehouse

Enterprise Data Warehouse:

  • It is a centralized warehouse.
  • It provides decision support service across the enterprise.
  • It offers a unified approach for organizing and representing data.
  • It provides the ability to classify data according to the subject and give access according to those divisions.

Operational Data Store:

  • Data store required when neither Data warehouse nor OLTP systems support organizations reporting needs.
  • The data warehouse is refreshed in real-time. 

Data Mart:

  • A data mart is a subset of the data warehouse.
  • It specially designed for a particular line of business, such as sales, finance, sales or finance.
  • Data can collect directly from sources.

Star Schema vs. Snowflake Schema

Knowing the differences between star and snowflake schema is one of the most common interview questions. Star schema is a mature modelling approach widely adopted by relational data warehouses. It requires modellers to classify their model tables as either dimension or fact. Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can hold a huge number of rows and continue to grow over time.

Snowflake Schema is a type of multidimensional model. In snowflake schema contains the fact table, dimension tables and one or more than tables for each dimension table. Snowflake schema is a normalized form of star schema which reduce the redundancy and saves the significant storage. It is easy to operate because it has less number of joins between the tables and in this simple and less complex query is used for accessing the data from database.


Metadata is also known as a data dictionary or data specification document. It provides the detail of the data such as the name of the column (attribute), the data type of the column whether it is a string, a number, etc, the order of the columns, the definition of nullable columns and etc. Datasets are made up of data objects (entity). After we identify the metadata, we shall design the conceptual, logical and physical data model.

Relationship and Cardinality

Relationship refers to how to link the tables (entities) together. According to the online Visual Paradigm website (a data modelling tool website), cardinality defines the possible number of occurrences in one entity which is associated with the number of occurrences in another. When you do a Google search to define cardinality in general terms, it returns a definition as “the number of elements in a set or other grouping, as a property of that grouping”. For example, ONE team has MANY players. Below diagram is extracted from lucidchart website which shows the different types of relationship in the database. It is used in the ER diagram, entity-relationship diagram.

Generally, the common types of cardinality are:

  • One to one
  • One to many
  • Many to many

In term of data query optimization, cardinality means the data in a column of a table, specifically how many unique values are in it. If you have done data profiling by using the Microsoft Power BI, for example, you notice Power BI provides a summary statistics of the table that show if the attributes contain unique values. This information helps with planning queries and optimizing the execution plans.

Conceptual, Logical and Physical Data Model

A general understanding to the three data models is that business analyst uses a conceptual and logical model to model the business objects exist in the system, while database designer or database engineer elaborates the conceptual and logical ER model to produce the physical model that presents the physical database structure ready for database creation. 

ERD featureConceptualLogicalPhysical
Entity (name)YesYesYes
Column’s TypeOptionalYes
Primary KeyYes
Foreign KeyYes
Comparison of Conceptual, Logical and Physical data model

In summary,

The enterprise data modelling, metadata management and data warehousing are part of the Data Governance (DMBOK) that I mentioned in my previous posts. I learned it in my current job, and I think it is useful for those who are serious in building an enterprise data warehouses. A part of my job scope includes:

  • Design and develop semantic data models.
  • Design metadata and harmonized code references.
  • Identify business keys and relationships among the tables.
  • Ensure to involve all stakeholders including business users in Data warehouse implementation process.


SQL Server: Reading the SQL Server log files using TSQL

Recently my colleague asked if I can assist on the SQL Server log files extraction using the SSIS ETL package. The log files are stored in the database server, and it will get purged whenever the database server restarts. My colleague wants to keep the server log files into the data warehouse for further analysis and IT auditing.

System Stored Procedure sp_readerrorlog

This stored procedure allows you to read the contents of the SQL Server error log files directly from a query window. It will enable you to search for specific keywords when reading the error file.  It is not new, and it is available since the SQL Server 2005.

Example of using the stored procedure

This procedure takes four parameters:

  1. Value of error log file you want to read: 0 = current, 1 = Archive #1, 2 = Archive #2, etc…
  2. Log file type: 1 or NULL = error log, 2 = SQL Agent log
  3. Search string 1: String one you want to search for
  4. Search string 2: String two you want to search for to further refine the results

If you do not pass any parameters this will return the contents of the current error log.

CREATE PROC [sys].[sp_readerrorlog]( 
   @p1 INT = 0, 
   @p2 INT = NULL, 
   @p3 VARCHAR(255) = NULL, 
   @p4 VARCHAR(255) = NULL) 

   IF (NOT IS_SRVROLEMEMBER(N'securityadmin') = 1) 
      RAISERROR(15003,-1,-1, N'securityadmin') 
      RETURN (1) 
   IF (@p2 IS NULL) 
       EXEC sys.xp_readerrorlog @p1 
       EXEC sys.xp_readerrorlog @p1,@p2,@p3,@p4 

-- Call the stored procedure as below
EXEC sp_readerrorlog 6 

Note: the example here shows the values in single quotes, but for later versions of SQL Server you may need to use double quotes or you might get this error.

In Conclusion,

I will try to run it in my local machine to verify and extract the SQL Server logs from the SQL Management Studio and SSIS ETL package. I will keep developing the content of this article to show the result of the above stored procedure.


Business Analytics Framework

I am looking through the slides from my Specialized Diploma in Business Analytics today since it is a public holiday in Singapore. I take this opportunity to focus on reading. The diagram above brought me to re-look into Data Management that my boss told me sometimes ago. Recently, my teammate told me that there supposed to have a CDMP (Certified Data Management Professional) examination in August. However, he is busy with the work and unable to prepare for it. Last year, I did think about taking the same certification, a step to the professional level.

According to DMBOK, data management is vital for every organization. Whether known as Data Management, Data Resource Management, or Enterprise Information Management, organizations increasingly recognize that the data they possess is a valuable asset which must be appropriately managed to ensure success.

DMBOK is Data Management Bodies of Knowledge, similar to other professional organizations’ Bodies of Knowledge, for example, Project Management (PMBOK) and soft engineering (SWEBOK).

Why Do I link It with the DMBOK?

Based on the above diagram, I realized that I used to learn about data integration, central metadata management, and data warehouses during the Diploma period. And, this relates to some of the topics in the DMBOK. Besides that, this Diploma covered data governance in general.

I extracted some of the points from the 11 Data Management Knowledge Area above:

  • Data Governance – plan, oversight, and control over the management of data and the use of data and data-related resources. While we understand that governance covers ‘processes’, not ‘things’, the common term for Data Management Governance is Data Governance.
  • Data Integration & Interoperability – acquisition, extraction, transformation, movement, delivery, replication, federation, virtualization and operational support (a Knowledge Area new in DMBOK2).
  • Metadata – collecting, categorizing, maintaining, integrating, controlling, managing, and delivering metadata.
  • Data Warehousing & Business Intelligence – managing analytical data processing and enabling access to decision support data for reporting and analysis

By reading through the description above, the business analytics framework consists of some of the data management knowledge areas. These knowledge areas give us ideas about the industry standard, terminology, and common best practices, without going into implementation details.

Data Governance in BI

Furthermore, the Data Governance knowledge area is a big topic to talk about. In Business Intelligence, it governs all activities within the environment. The guided principles ensure the information is managed as corporate asset-standardized, integrated, and reused across the organization.

Objectives of BI Governance

  • Clearly defined authority and accountability, roles and responsibilities.
  • Program planning, prioritization, and funding processes.
  • Communicating strategic business opportunities to IT.
  • Transparent decision-making processes for development activities.
  • Tracking value and reporting results.

Benefits of BI and Data Governance

  • Generate greater ROI.
  • Balance business needs and IT imperatives.

Regardless, you are working for public or private sector, data governance in the business intelligence or business analytics context does play an important role to every organization. Hence, as part of my revision on what I have learned during my Specialized Diploma helped me to understand what the industry is needed and make linkage to DMBOK topics. I am not sure how many of us are into this topic area and I believe it is a good topic to discuss with the community to share the best or common practices and learn from each other to improve the standard and guideline.

Different between OLTP and OLAP

Transaction processing vs. Analytics processing.

What is OLTP?

OLTP stands for Online Transaction Processing. OLTP systems and it becomes the source of data for OLAP systems. The primary purpose of a typical OLTP system is to handle the operational database, to record the routine business data (either through insertion, update or deletion requests), and works as a data warehouse to provide storage for data analysis. It usually involves simple queries. Online transaction processing provides transaction-oriented applications in 3-tier architecture. A typical OLTP database stores current (and detailed) data. The Entity Model (table model) is the scheme used for storing such transactional database. The 3rd Normal Form (3NF) normalization may apply to the OLTP databases. The OLTP needs to perform data backup regularly without fail because it stores all the business functional data.

What is OLAP?

OLAP stands for Online Analytical Processing. An OLAP database reads past data that has been inputted using an OLTP system or different OLTP systems and rarely writes into the database. It usually involves large, complex queries used for business decisions. Aggregation across multiple databases is often required. An OLAP system allows users to view a variety of summaries about stored multidimensional data. OLAP system allows extracting information from a bigger database (presumably the data warehouse) and analyzes the same with the intent of decision-making. It is relatively slow if it involves a large dataset. Data is denormalized to improve query performance when aggregation is required. OLAP is built to persist data if data recovery is needed, OLTP backups can be restored.

The above picture shows the distinct difference between the OLTP and OLAP with the relation with data analysis, data mining and Big Data. Both sets of databases have their own set of data structures and algorithm that work for operational, transactional data in OLTP, and data analytics in OLAP.

Moving forward, I will compile some of the useful SQL interview questions in my blog. Hope to get some feedback from the readers who are working on databases as well.

Data Structure & Algorithm

Data Structure is a way of collecting and organising data in such a way that we can perform operations on these data in an effective way. Data Structures is about rendering data elements in terms of some relationship, for better organization and storage. For example, we have some data which has, player’s name “Virat” and age 26. Here “Virat” is of String data type and 26 is of integer data type.

We can organize this data as a record like Player record, which will have both player’s name and age in it. Now we can collect and store player’s records in a file or database as a data structure. For example: “Dhoni” 30, “Gambhir” 31, “Sehwag” 33

In simple language, Data Structures are structures programmed to store ordered data, so that various operations can be performed on it easily. It represents the knowledge of data to be organized in memory. It should be designed and implemented in such a way that it reduces the complexity and increases the efficiency.

Data Structures are the programmatic way of storing data so that data can be used efficiently. Almost every enterprise application uses various types of data structures in one or the other way.

Basic types of Data Structures

As we have discussed above, anything that can store data can be called as a data structure, hence Integer, Float, Boolean, Char etc, all are data structures. They are known as Primitive Data Structures.

Then we also have some complex Data Structures, which are used to store large and connected data. Some example of Abstract Data Structure are :

  • Linked List
  • Tree
  • Graph
  • Stack, Queue etc.

All these data structures allow us to perform different operations on data. We select these data structures based on which type of operation is required.

The data structures are used to organize code and information, also a crucial part of designing efficient software. In SQL, we choose and use the proper data structure for each entity. Same goes to a programming language such as Python or Javascript. Below is the data structure in Python that I extracted from the It has built-in and user-defined data structure.

What is an Algorithm?

An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain predefined task. The algorithm is not the complete code or program, it is just the core logic(solution) of a problem, which can be expressed either as an informal high-level description as pseudocode or using a flowchart.

Every Algorithm must satisfy the following properties:

  1. Input– There should be 0 or more inputs supplied externally to the algorithm.
  2. Output– There should be atleast 1 output obtained.
  3. Definiteness– Every step of the algorithm should be clear and well defined.
  4. Finiteness– The algorithm should have finite number of steps.
  5. Correctness– Every step of the algorithm must generate a correct output.

How to Write an Algorithm?

There are no well-defined standards for writing algorithms. Rather, it is problem and resource dependent. Algorithms are never written to support a particular programming code.

As we know that all programming languages share basic code constructs like loops (do, for, while), flow-control (if-else), etc. These common constructs can be used to write an algorithm.

We write algorithms in a step-by-step manner, but it is not always the case. Algorithm writing is a process and is executed after the problem domain is well-defined. That is, we should know the problem domain, for which we are designing a solution.

Algorithm Complexity

Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm X are the two main factors, which decide the efficiency of X.

  • Time Factor âˆ’ Time is measured by counting the number of key operations such as comparisons in the sorting algorithm.
  • Space Factor âˆ’ Space is measured by counting the maximum memory space required by the algorithm.

The data structure and algorithm is a popular interview question for both technical programming interview or database administrator interview. There are a few online learning websites that teach about data structure and algorithm. You may refer to the below reference link for the learning materials.

Image references:

Learning reference: