SSIS: Binary Code Is Not Found

Background

In my recent SSIS deployment into Polybase servers that run on Windows Server 2019, I encountered the error message, “The binary code is not found” when the data pipeline began to execute the first package. Then, the SSIS execution failed.

More information that came with the above error message,

Error at Script Task: The binary code for the script is not found. Please open the script in the designer by clicking the Edit Script button and make sure it builds successfully.
Error at Script Task: There were errors during task validation.

I am using Visual Studio 2015 to build or compile the SSIS packages on my local machine. There is no error during the execution. The same set of SSIS packages was deployed into another server that runs on an older version: either Windows Server 2016 or Windows Server 2012. It has been running normally daily without encountering similar error messages before.

What is this bug?

I searched on Google to find out more. Sometimes this is caused by a compilation error due to a bug in the script. For example, syntax errors, and usually such errors we can spot quite early. This is resolved by simply building the component a reviewing the errors found.

Other times this error is caused by the bugginess of SSIS and there is no good reason why the script fails to precompile. There are no bugs and the script has actually run perfectly in the past.

Another reason that my colleague shared is copying and pasting the script task component from one package to another package or within the same SSIS package would cause some errors.

What did I do in the beginning?

Based on the above error message, I opened the script task component in the SSIS package, and clicked on the “Edit Script”. I hoped that a simple rebuild would help me to fix the above error. However, it did not.

I made a comparison of the SSIS setting between this SSIS solution and another SSIS solution that I deployed into the Polybase server a month ago to find out more ideas to fix it. I noticed that TargetServerVersion was different between these two solutions; one is using 2016 and another is using 2017. My colleague said it did not affect me much even though I deployed to a Polybase server that uses Windows Server 2019. It should be backwards compatible.

Changing TargetServerVersion to 2017

One thing to take note of when changing the target server version is the script task component will remove the existing variables and/or script inside the Script Task. When I tried to do so, it randomly happened to my SSIS packages. So, take note.

Rebuild Script Task and SSIS Solution

Inside the script task, we can rebuild the script after checking through all the syntax, and at the project level, we can rebuild the SSIS solution. Both rebuilds were successful. It means there was no error with my C# scripts and my packages. However, changing the above two items did not help, the SSIS execution failed to run the first package.

So, I searched for more help or ideas from Google and came across this article that suggests to use, Select the Clean option from the Build menu in Script Task. (Refer to the first link below)

Clean option does build, rebuild, or clean all or some of the project items in a solution. Personally, I do not think it helped much especially since the SSIS package has some stubborn errors that cannot be removed by the rebuild or clean option.

Downgrade the server

In some of the articles that I read, they shared that they downgraded the server or Visual Studio 2019 version. It is not a feasible plan for my case because I cannot do that, and I am in the midst of migrating to a newer server.

Other things that people did and did not work

  • Changes the delay validation to True.
  • Reinstalled Visual Studio.
  • Set the PrecompileiIntoBindaryCode to False, and the default is set to true.

Something that I might need to take note of when I upgrade to use Visual Studio 2019. There is a regression in VS 2019 16.9 that will cause the script task/component not to work properly when editing the script.

What worked for me?

Yes, delete the script task and redo it from scratch without copying the same script to another part within the SSIS package or SSIS solution. I re-added the script task and went through the tedious procedure to add the output columns that I used in the script task previously.

References

If you like my review and wish to donate, you may do so via the buymeacoffee’s QR code below:

[SQL] Why an insert on a certain table is slow?

It is a good topic database question to discuss. There are many reasons why an insert on a certain table takes a little longer than usual. The online article that I found has listed a few reasons. Let me look into it:

  • Existence of INSERT TRIGGERs on the table
  • Lots of enforced constraints that have to be checked (usually foreign keys)
  • Page splits in the clustered index when a row is inserted in the middle of the table
  • Updating all the related non-clustered indexes
  • Blocking from other activities on the table
  • Poor IO write response time

In the same article, some of the replies suggested doing the following:

  • Reduce the batch size from 10000 to something smaller, like 2000 or 1000. In SSIS, reducing the batch size according to the allocated buffer memory size is a great help. It may take a while but at least it lowers the chances of getting batch processing errors that resulting from the failed insertion.
  • Try turning on IO Stats to see just how much IO the FK lookups are taking.
  • Run the query with SET STATISTICS IO ON and SET STATISTICS TIME ON to see the execution time and read/write io for each insert.

References

https://dba.stackexchange.com/questions/482/how-can-i-tell-why-an-insert-on-a-certain-table-is-slow

Describe the difference between batch and streaming data

Data processing is simply converting raw data to meaningful information through a process. Depending on how the data is ingested into your system, you could process each data item as it arrives or buffers the raw data and process it in groups. Processing data as it arrives is called streaming. Buffering and processing the data in groups is called batch processing.

Understand batch processing

In batch processing, newly arriving data elements are collected into a group. The whole group is then processed at a future time as a batch. Exactly when each group is processed can be determined in several ways. For example, I can process data based on a scheduled time interval (for example, every hour), or it could be triggered when a certain amount of data has arrived or as the result of some other event.

Advantages of batch processing include:

  • Large volumes of data can be processed at a convenient time.
  • It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight or during off-peak hours.

Disadvantages of batch processing include:

  • The time delay between ingesting the data and getting the results is because it is scheduled to run at a time.
  • All batch job input data must be ready before a batch can be processed. This means data must be carefully checked. Problems with data, errors, and program crashes during batch jobs halt the process. The input data must be carefully checked before the job can be rerun. Even minor data errors, such as typographical errors in dates, can prevent a batch job from running.

Understand streaming and real-time data

In-stream processing, each new piece of data is processed when it arrives. For example, data ingestion is inherently a streaming process.

Streaming handles data in real-time. Unlike batch processing, there is no waiting until the next batch processing interval, and data is processed as individual pieces rather than a batch at a time. Streaming data processing is beneficial in most scenarios where new, dynamic data is generated continually.

Examples of streaming data include:

  • A financial institution tracks changes in the stock market in real-time, computes value-at-risk, and automatically rebalances portfolios based on stock price movements.
  • An online gaming company collects real-time data about player-game interactions and feeds the data into its gaming platform. It then analyses the data in real-time and offers incentives and dynamic experiences to engage its players.
  • A real-estate website tracks a subset of data from consumers’ mobile devices and makes real-time property recommendations of properties to visit based on their geo-location.
  • Stream processing is ideal for time-critical operations that require an instant real-time response. For example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire.

Understand the differences between batch and streaming data

Apart from how batch processing and streaming processing handle data, there are other differences:

  • Data Scope: Batch processing can process all the data in the dataset. Stream processing typically only has access to the most recent data received or within a rolling time window (the last 30 seconds, for example).
  • Data Size: Batch processing is suitable for handling large datasets efficiently. Stream processing is intended for individual records or micro-batches consisting of few records.
  • Performance: The latency for batch processing is typically a few hours. Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds. Latency is the time taken for the data to be received and processed.
  • Analysis: You typically use batch processing for performing complex analytics. Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.

[SSIS] Implement Timer Delay

Every day I am working and learning new things in SSIS development. Today’s topic is relatively short because it is a simple task to complete.

Background

I was testing my new SSIS solution project for generating CSV data files. There was an update statement I wanted to perform at the end of the SSIS packages execution. However, it kept failing because the SQL Task returned a count that did not match my set condition. The reason was that the package started its PackageStart event, checked the condition using a SQL query, and returned the result while the previous package logged the PackageEnd event.

Therefore, the result turned unmatched with the condition, and the SSIS solution failed. I am glad that I could test the fail scenario because it was a script that I wrote to set the SSIS DTS Task to failure if the condition was unmatched.

How to Resolve?

The instant idea that came into my mind was, is there something we can do to delay the package from calling the SQL query from the SQL Task? I searched online for the syntax to do it. SSIS Toolbox does not come with a timer component.

To achieve that, I used the SQL Task component right before the SQL Task component to run a SQL query for condition checking. The additional SQL Task will run the WAITFOR statement. The DELAY argument specifies the amount of time to pass before proceeding. I set for 20 seconds for testing. For detail about the WAITFOR statement, refer to the reference link below.

It is a pretty cool SQL statement that I learned today.

Reference

  

[SSIS] If A condition is not matched stop execution

I am developing another SSIS solution to perform data extraction only. Another SSIS solution was created to handle the ETL portion from the source and write it into the destination tables. The new SSIS solution will extract data and write it into a CSV file if the condition is matched; else, if the condition is not matched, it will stop execution and return as a failure.

Background

Why do I need to return as a failure when a condition is not matched? It is because I have created a set of batch jobs that depend on one and another. The first batch job runs the SSIS solution to perform ETL. The second batch job will run if the first batch job completes with success. It depends on the SSIS execution result ‘success’ to proceed to the second batch job. Then, the second batch job runs the new SSIS solution that I am developing. Its execution result will determine whether the third batch job can be executed. In other words, the third batch job depends on the second job’s execution result ‘success’. If the new SSIS solution’s end result does not match the expected condition to proceed third batch job, I want the SSIS solution to return as a failure. It will stop the third batch job from being executed.

Setting the Conditions

I used the Execute SQL Task to write a SQL query to check some of the conditions that I want to carry out. I set the Execute SQL Task to return a single row result. Both SQL statement and the result set can be set at the Execute SQL Task Editor under the General section. The result of the SQL query will map to a variable [User::<my_variable_name>] that I added into the SSIS package. The setting can be done in the Result Set section.

Here, I set a condition if the count is greater than 1 then it will proceed to do Task A. If the count is lesser than 1 then it will proceed to do Task B, where Task B is to stop execution and return as a failure.

If I do not set up Task B, when the count is lesser than 1, the SSIS does nothing and returns as ‘success’ execution.

Script Component

I used a script component to set up Task B and set the SSIS solution to return as ‘failure’. And, that means I have to code. The SSIS solution is using C# language. It is not hard to find some samples to start with. All I need is to click on the Edit Script in the Script Component to launch the Visual Studio codes and start coding.

One of the examples online that I found useful for my use case, and I took it and modified it.

public void Main()
{
    if (Dts.Variables[User::<my_variable_name>].Value != null)
    {
        Dts.TaskResult = (int)ScriptResults.Failure;
    }
    else
    {
        Dts.TaskResult = (int)ScriptResults.Success;
    }
}

Whatever conditions stored in the variable [User::<my_variable_name>] will be used to check against my logic and the simple if-else statement will set the task result as success or failure. Save and build the script.

Test SSIS Package

Once it is successfully built, I tested the application to ensure both success and failure conditions are tested properly before I deploy it into QA or Production server.

References

[SSIS] Errorcode: 0x80004005

One of my SSIS packages recently encountered some errors with the error code 0x80004005 found in the SSISDB Catalog execution result. It comes with different error messages:

  • Login timeout expired.
  • Communication link failure.
  • TCP Provider: An existing connection was forcibly closed by the remote host. 

Set Connect Timeout = 0

Open the SSIS Connection Manager using the view code mode. It shows up on the XML configuration page to set the connection string with an unlimited timeout. Setting it to 0 means unlimited timeout.

Data Source=;Initial Catalog=Test;Provider=SQLNCLI11.1; Integrated Security=SSPI;Connect Timeout=0;Auto Translate=False;

I tried to set it from the connection manager property box to find the connect timeout. However, the setting is unable to be saved permanently. The best method so far is using the XML configuration file.

Below is another example of using an XML configuration file.

SQLOLEDB.1 INSTEAD OF SQLNCLI.1

Someone replied in a forum that stated that changing the Provider to SQLOLEDB.1 rather than SQLNCLI.1 helped resolve the issue. The initial configuration

//Master.dts.Config file

<Configuration ConfiguredType="Property" Path="\Package.Connections[Batch].Properties[ConnectionString]" ValueType="String">
  <ConfiguredValue>Data Source=RemoteServerName;Initial Catalog=DataWarehouse;Provider=SQLNCLI.1;Integrated Security=SSPI;</ConfiguredValue>
 </Configuration>

I guess below is the change to be made in the configuration file. I have not tried this configuration before.

<Configuration ConfiguredType="Property" Path="\Package.Connections[Batch].Properties[ConnectionString]" ValueType="String">
  <ConfiguredValue>Data Source=RemoteServerName;Initial Catalog=DataWarehouse;Provider=SQLOLEDB;Integrated Security=SSPI;</ConfiguredValue>
 </Configuration>

RetainSameConnction

Another solution that could help is setting RetainSameConnction = TRUE in the connection manager in the SSIS package. I did it before, and the error still occurred.

vm sNAPSHOT/bACKUP

In another article, one of the replies mentioned that the VM backup solution requests VMware to create a Quiesce=true Snapshot (in our backup software, this is called an “application consistent” backup). The article said that when the VMware snapshot is created and is finally later “committed”, it gets “Communication Link Failures”. So, we have learned to move our SSIS packages to not run when the Backups run.

I checked with my FM team, the VM snapshot (incremental backup) is taken at midnight, lasting for 30 minutes or an hour. Looking at the backup time and my batch job schedule, it seems not related.

Still Looking for Solutions

I am still working on the solutions. I would appreciate it if you have a workable solution; please share it. Thank you.

Reference

[SSIS] AcquireConnection method call failed

This is an SQL Server Integration Services (SSIS) topic. A few months ago, I received an user request to perform a series of data sync from the source into our data warehouse in UAT environment. Once I received the access to the server to perform the task, I opened the SSIS solution from the Visual Studio, and load the packages that I needed to execute.

You may wonder why I have to manually execute the packages instead of using the SQL Agent to trigger the SSISDB that will execute these packages. Yes, I can do so. However, I was told by my colleague that some of the solutions have not been re-deployed to SSISDB after the password updates. These solutions are connected to the Oracle and MSSQL databases.

Therefore, I chose to execute it via Visual Studio.

Problem Faced

Upon execution, the SSIS hit error. The error message contained a line of words, AcquireConnection method call failed. The whole error message was rather confusing. I asked my colleague what happened to the connection after the password updates. Firstly, we checked with the database administrator that the user account is not locked. We did some Google searches and found out that our UAT environment is not 64-bit solutions.

Therefore, the SSIS solution that runs on 64-bit gave problem.

What I Did to Resolve?

I think many people mentioned it in the Internet, set the Run64bitRuntime at the project configuration property to False would help because my UAT environment is running on 32-bit version.

Hope this post can help some of you who are searching for solutions. There are plenty of different solutions out there. If the environment is using 32-bit version, running SSIS packages that connect to Oracle database that runs on 32-bit version too, this might help.

[SSIS] The ‘Microsoft.ACE.OLEDB.15.0’ provider is not registered on the local machine

This is an SQL Server Integration Services (SSIS) topic. I am developing an SSIS package recently, and it requires me to write some of the C# codes for the Script Component that reads the columns from an Excel file and writes them into the System.Object. After that, the data from the System.Object will be used to insert into my local database.

It does sound quite straightforward if you think that I read from a proper tab and proper columns in the Excel file. The data will be inserted correctly into the table in my local database based on the column names.

However, I was unfamiliar with the C# scripting during the development, and I spent quite some time to understand how the BufferWrapper.cs and ComponentWrapper.cs files work. I knew that my main code would be residing at the main.cs file. I cover the story of my discovery in another post; you may want to look if you are a beginner or have no experience using Script Component in SSIS.

What is My Problem?

Besides that, I faced another problem on my local machine when I tried to execute the SSIS package. The error returns as “The ‘Microsoft.ACE.OLEDB.15.0’ provider is not registered on the local machine”. It has an issue with running a 32-bit or 64-bit application. I am going to share about my solution in this post.

At first glance, I thought that I had to check the drive of the ‘Microsoft.ACE.OLEDB.15.0’ provider on my local machine. I did not manage to find the information from the Control Panel’s Programs and Features. Next, I went to the OCDC Data Source Administrator to check and found that the Excel files are using 64-bit.

What I did do Next?

The first thing that comes into my mind is to set the Debug Option – Run64BitRuntime to True on the property page.

It did not immediately resolve my problem when I executed the package again. It gave the same error. I googled again to find other solutions, including changing the Excel file extension from .xlsx to .xls format and executing it using an older Microsoft Excel version. It seemed not working.

In the test server environment, I can execute the package using a .xlsx file extension. Hence, I changed the Excel file connection manager to point it to the .xlsx file instead. Then, I saw the dropdown option to select the Excel version. I gave a try to change its version from Microsoft Excel 2013 to Microsoft Excel 2016 (for my local machine setup).

Execute The Package Again

Tada, the execution was successfully done, and I checked the database table to ensure the logic was working, the control flow was correctly executed, and the expected result was met. My solution is basically to set the 64-bit runtime to True in the Debugging Option and the Excel version.

There are plenty of other solutions provided to resolve different use cases that give the same or similar errors in SSIS. I hope this post may help to give you some ideas on how to fix it.

If you like my post, you may click the love button on my IG above, follow me on IG for more updates, and wish to support my food journey, you may do so via the buymeacoffee’s QR code below:

Follow me on IG channel: liyenz (@liyenzwordpress)
Follow me on FB page: Liyenz (facebook.com/liyenzwordpress)
Follow me on TikTok channel: liyenz (@liyenzwordpress)

Apache Flink

It is my first time I came across Apache Flink when I saw it from a job description. Today, I try to write some Apache Flink information in this entry, from zero to something. My posting may not be accurate because I am sharing what I read and what I understand. Hopefully, I can get some feedback and sharing to learn more.

What is Apache Flink?

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and scale. The definition is taken from the website, https://flink.apache.org/flink-architecture.html

Unbounded and Bounded Data Streams

Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. Let me do a quick summary of bounded and unbounded data.

Bounded Data

  • Bounded data is finite; it has a beginning and an end. 
  • Bounded data to batch processing. It can be processed by ingesting all data before performing any computations.
  • Ordered ingestion is not required to process bounded streams because a bounded data set can always be sorted.

Unbounded Data

  • Unbounded data is an ever-growing, essentially infinite data set.
  • Unbounded data is equated to stream processing. It must be continuously processed. It is not possible to wait for all input data to arrive because the input is unbounded and will not be complete at any point in time.
  • Processing unbounded data often requires that events are ingested in a specific order.

While researching on Apache Flink, I saw an article talking about Apache Apex that it can be used to (and are very capable of) processing both unbounded and bounded data. For more reading on this article, refer to the reference link below.

Run In All Common Cluster Environment

Apache Flink is a distributed system and requires to compute resources to execute applications. Apache Flink integrates with all common cluster resource managers such as Hadoop YARN, Apache Mesos, and Kubernetes. (I did not know what Apache Mesos is. I will Google it and share something about these terms). Apache Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager. In case of a failure, Apache Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls.

Run Applications at Any Scale, Speed, in-memory

Apache Flink runs stateful streaming applications at any scale and optimizes for local state access. Applications are parallelized into tasks that are distributed and concurrently executed in a cluster. Therefore, an application can leverage virtually unlimited CPUs, main memory (often in-memory), disk (in access-efficient on-disk), and network IO. Moreover, Apache Flink easily maintains an extensive application state. Its asynchronous and incremental checkpointing algorithm (the local state to durable storage) ensures minimal impact on processing latencies while guaranteeing exactly-once state consistency.

My apologies I could not explain much at this point because I am in the midst of understanding this “new” technology that I just started Googling today. I will update this entry again once I become knowledgeable.

References

Deploy Is Not The Same As Release – Software Development

In my recent product backlog grooming session with my project team, the business analyst shares a new user story of the upcoming deployment to the production server and generate the datasets to the business users. In one of the acceptance criteria, I spotted the word “release”, and I pointed out that “deploy” and “release” are different meaning in the software development context. I believed in the English context, it is different too. These two terms are used interchangeably.

I am not purposely making the business analyst’s life difficult but using the correct terms in a user story is important. The business analyst asked me, what is the difference? Well, I was not keen to explain because I expected the business analyst knows the differences before using the term.

For my readers, please continue reading as I am trying to explain them in simplest way. I hope I put it in a right way to explain.

In English,

Release (noun) is a tested version published and made available to public users to access and get information. Release (verb) is an act of making a release version available. While deploying (verb) means to push, install and configure a new release (noun) to one or more environment.

In Software Development,

Deployment involves moving software or application from one controlled environment to another. A new version of a software or application is deployed. It means the software or application has successfully installed, configured and running to handle production traffic, but may not actually expose public users to access the new version of the software or application. There are plenty of deployment techniques, one of the common technique used in deployment is the blue-green approach. There are a few mainstream solutions in the market for automation such as Jenkins, Chef, Bamboo and GitLab.

“Deployed” doesn’t necessarily mean “Released”.

While release (verb) is serving the actual production traffic using the new version of the software or application, even release has its techniques. Some release techniques used in software development include the canary release or dark launching.

I would like to share the key difference and I found this website explained something useful that I just mentioned above. I re-post the section where it shares the key difference. See below.

Deployment vs Release: The key difference

The key distinction between these deployment and release is the business rationale. Deployment doesn’t necessarily mean users have access to features, as can clearly be seen by the different environments involved. Some companies will release at the same time as deployment to production is taking place.

Others will choose to wait, thereby having the new features in production but not availed to users until the business decides.

References: