December

December is almost coming to the end of the year. It is a good time to do a checklist of what I have gone through this year. The year 2019 has been a bumpy year for me. A few topics I want to highlight here.

People assume you know everything, and I presume Google answers everything. In one of my recent tasks, two other colleagues and myself tasked to work on some reports. We were briefly running through the task. As usual, it was a simple introduction. There was no documentation about the requirements specification, no head-up about how the users use the reports, and the instruction was blurry. It seemed to be freestyle, but it ended up with re-doing. In the end, I received a familiar word, “When is the report going to deliver?”.

Working with limited information about what I supposed to present to the users and what is the standard of doing the work for the first time, it frustrated me a lot. I did a lot of googling to get my questions answered. I realized I need to join other user groups to get community supports. I have nobody to ask. Things got worse when one person told me to use method A to do; another person told me to use method B to continue. No clear instructions. The same happened to me six years ago.

I have a dilemma when the talks slowly become cheap and unconvinceable. The expected result cannot be delivered. Also, the word “easy” often uttered, and it becomes a toxic word to me. For example, I went through a series of process to identify all the needed database tables. I have to link up all these tables to form their relationships before work can be done. Someone has done nothing, just used the word “easy” to describe the entire episode. It further demotivated me.

I have dealt with red tapes and drawing lines. Processes are useful when the standard of procedures help to streamline the work. Learning processes or procedures was one of the recent I joined my current company. However, I found time and efforts wasted just because of these processes too. Processes are painful. Are we human, so rigid with the words used in the request forms or emails, and we do not work extra (if the extra work still within our mean and budget)?

Lessons learned
No one is permanent.
I just changed a new job about four months ago from a job that I thought I would have spent my next five years working with them as a family. Things did change, and life has to move on. When I started my new job, I experienced colleagues leaving the team as well. It brought me to a point where I began to think in a way that I am not permanent in my team. My dictionary suddenly removes the word ‘permanent’ from the list. However, I feel happy and lighter to have this thought in my mind.

I started to appreciate more on the great people that I met along the way. A senior who taught me how to solve problems and guided me throughout the project that we both worked on. He is a good advisor with multiple domain knowledge. I glad that I know someone similar in my current team. I do hope to work with him closely.

Next year goal

I am not a person who likes to make new year resolutions. However, I have a wish board that listed down things that I wanted. I am inspired to conduct, learn and teach. I have a few topics in my mind that I want to give a try. I have done none so far, even with my volunteer group. I found this https://www.mci.gov.sg/digitalparticipationpledge may be a good starting point for me to begin contributing.

Continuous Intelligence

As part of the assignment, I would like to write something on the chosen topic, Continuous Intelligence. Continuous intelligence plays a major role in most digital business transformation projects. It is a growing part of enterprise analytics and BI strategies.

Definition

Continuous intelligence is a design pattern in which real-time analytics are integrated into a business operation, processing current and historical data to prescribe actions in response to business moments and other events. It provides decision automation or decision support. Continuous intelligence leverages multiple technologies such as augmented analytics, event stream processing, optimization, business relationship management (BRM), and machine learning (ML). The definition extracts from the Gartner Research.

What can you do with Continuous Intelligence?

Continuous intelligence enables companies to deliver better outcomes from a broad range of operational decisions since it involves more relevant, real-time data in decision-making algorithms. Individuals can make sense of extreme volumes of data in milliseconds, evaluating more alternatives in greater detail than humanly possible without access to real-time data and processing.

Gartner estimates that, within 3 years, more than 50% of all business initiatives require continuous intelligence, leveraging streaming data to enhance real-time decision-making.

Combining all these forms of artificial intelligence (AI) with continuous intelligence drawing from geospatial, real-time, and historical analytics can further enhance business ability to know where assets and people are at all times and help predict what might occur next.

Adding rules engines and programmatic logic to AI, location data enables organizations to automate many decisions that previously required human insights. From predictive maintenance based on actual driving conditions to decide the best next action to take with customers to improve loyalty, leading companies are decreasing costs and improving revenues to become more successful.

What are The Challenges?

What makes continuous intelligence difficult is feeding a business’s analytics systems with high volumes of real-time streaming data in a way that is robust, secure, and yet highly consumable. The ability to combine “always-on,” streaming data ingestion and integration with real-time complex event processing, enrichment with rules and optimization logic, and streaming analytics is key to enabling Continuous Intelligence.

Many data analytics organizations lack experience with Continuous Intelligence, or unsure how to start their Continuous Intelligence journey to keep up with growing business demand.

Continuous Intelligence requires the building of new capabilities, skills and technologies. The challenge for data and analytics leaders is to understand how these differ from existing practice.

Why Use Continuous Intelligence in DevOps/DataOps

If you are considering DevOps as a strategy to adopt continuous innovation, your data strategy has to evolve, too. Traditional BI has too many silos and too much human intervention to support your move to an agile system.

Up to this point, I would like to add that in my current project, some of my team members, who are in the agile system, try to implement the ETL (Extract, Transform, Load) processes by following agile methodology. Sometimes ago, I went to the agile workshop and I have forgotten some of the concepts. It is a good time to read them up again.

According to Open Data Science’s article entitled “Why Use Continuous Intelligence in DevOps/DataOps,” it wrote that businesses look out for continuous innovation. Those who do not may put out shoddy products. Your data strategy, therefore, has to be seamless, frictionless, and automated.

Artificial Intelligence

The article adds, “Artificial Intelligence is capable of continually combing data, looking for patterns as data updates. Continuous intelligence allows you to analyze this data accurately and in real-time. The other piece could be letting go of data wrangling. Until you have deployed Continuous Intelligence, data wrangling remains a huge and functional part of your data management plan.”

Gartner identifies six defining features of CI.

  1. Fast: Real-time insight keeps up with the pace of change in the modern age.
  2. Smart: The platform is capable of processing the type of data you get, not the type you wish you had.
  3. Automated: Human intervention is rife with mistakes and wastes your team’s time.
  4. Continuous: Real-time analytics requires a system that works around the clock.
  5. Embedded: It’s integral to your current system.
  6. Results-focused: It should go without saying, but data means nothing without insight. Your program should deliver those insights. Don’t forget the results in the search for more data.

Once you let go of batch processing and silos, moving towards an agile framework is a reality with CI.

Open Data Science

Your team has access to these insights to direct new inquiries and drive brainstorming, pivot during sprints, and reach a frictionless state in which data flows in and insights become the next iteration of a product or a new product altogether. “

With this information, I have a vision; I wish to move into Continuous Intelligence and bring this agile methodology into my project.

References:
https://www.striim.com/blog/2019/05/gartner-identifies-continuous-intelligence-as-top-10-trend-for-2019/
https://www.rtinsights.com/what-can-you-do-with-continuous-intelligence/
https://medium.com/@ODSC/why-use-continuous-intelligence-in-devops-dataops-b6bc0a448b7a

Assignment Topic: Continuous Intelligence

My assignment topic is about Continuous Intelligence, and I was told to refer to the Gartner Research. The Gartner Research is a global research and advisory firm providing information, advice, and tools for businesses in IT, finance, HR, customer service and support, legal and compliance, marketing, sales, and supply chain functions.

My lecturer advised us to refer to this website to complete my research paper. My school has a link to the Gartner Research papers available to all students.

A small introduction of what is Continuous Intelligence – Gartner identified Continuous Intelligence as one of the top 10 technology trends for data and analytics for 2019. 

In the website, the Gartner defines Continuous Intelligence as “a design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. It provides decision automation or decision support. Continuous intelligence leverages multiple technologies such as augmented analytics, event stream processing, optimization, business rule management, and Machine Learning.

Reference: https://www.striim.com/blog/2019/05/gartner-identifies-continuous-intelligence-as-top-10-trend-for-2019/

Database Model

During a check on the Database Engines ranking just now, I found that this link is listing the available database management systems according to their popularity. The website updates monthly and you able to see the current ranking, previous month ranking, and the ranking one year ago.

The picture above shows the list of the top 10 database management systems. Another exciting part of the website is the list of the database model.

Relational DBMS

Relational database management systems (RDBMS) support the relational or table-oriented data model. The schema of a table or the defines by the table name, fixed number of columns (attributes) with fixed data types. A record corresponds to a row in the table (entity) and consists of the values of each column. A relation thus consists of a set of uniform records, according to the website.

The normalization in the process of data modeling generates table schemas. There are some operations used to define a relationship. For example,

  • classical set operations (union, intersection, and difference)
  • Selection (selection of a subset of records according to certain filter criteria for the attribute values)
  • Projection (selecting a subset of attributes/columns of the table)
  • Join: special conjunction of multiple tables as a combination of the Cartesian product with selection and projection.

Document Stores

Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data. According to the website that means,

  • Records do not need to have a uniform structure, i.e. different records may have different columns.
  • The types of values of individual columns can be different for each record.
  • Columns can have more than one value (arrays).
  • Records can have a nested structure.

Document stores often use internal notations, which can be processed directly in applications, mostly JSON. JSON documents, of course, can also be stored as pure text in key-value stores or relational database systems.

Key-value Stores

Key-value stores are probably the simplest form of database management systems. They can only store pairs of keys and values, as well as retrieve values when a key is known.

These simple systems usually are not adequate for complex applications. On the other hand, it is exactly this simplicity that makes such systems attractive in certain circumstances. For example, resource-efficient key-value stores that apply in the embedded systems or as high performance in-process databases.

Advanced Forms

An extended form of key-value stores is able to sort the keys, and thus enables range queries as well as ordered processing of keys. Many systems provide further extensions so that we see a fairly seamless transition to document stores and wide column stores.

Search Engines

Search engines are NoSQL database management systems dedicated to the search for data content. In addition to general optimization for this type of application, the specialization consists of typically offering the following features:

  • Support for complex search expressions
  • Full text search
  • Stemming (reducing inflected words to their stem)
  • Ranking and grouping of search results
  • Geospatial search
  • Distributed search for high scalability

Wide Column Stores

As mentioned above, the wide column stores, also called extensible record stores, store data in records with an ability to hold huge numbers of dynamic columns. Since the column names, as well as the record keys, are not fixed, and since a record can have billions of columns, wide column stores see as two-dimensional key-value stores.

The wide column stores share the characteristic of being schema-free with document stores. However, the implementation is very different. The wide column stores must not be confused with the column-oriented storage in some relational systems. The wide column stores is an internal concept for improving the performance of an RDBMS for OLAP (Online Analytical Processing) workloads and stores the data of a table, not record after record but column by column.

Graph DBMS

Graph DBMS, also called graph-oriented DBMS or graph database, represent data in graph structures as nodes and edges, which are relationships between nodes. Graph DBMS allows easy processing of data in that form, and a simple calculation of specific properties of the graph, such as the number of steps needed to get from one node to another node. Graph DBMS usually does not provide indexes on all nodes, direct access to nodes based on attribute values is not possible in these cases.

Time Series DBMS

A Time Series DBMS is a database management system that optimizes handling time series data; for example, each entry associated with a timestamp.

Time Series DBMS is designed to efficiently collect, store, and query various time series with high transaction volumes. Although the management of the time series data can be the same as other categories of DBMS (from key-value stores to relational systems), the specific challenges often require specialized systems.

I hope the information extracted from the website is able to help us understand the differences between the database models.

References: https://db-engines.com/en/ranking

Life in this November

It has been a while since the last write-up in my blog. My work has been piling up, and I am busy with my school assignments too. This month has been quite impressive, both work and study. I did not realize that my school has signed up a premium package for all their students to use the Grammarly. I wanted to purchase their premium package before this, but I did not proceed (even with 40% discounts) because I stopped writing for the technical documents.

It came to me quite suddenly, and they decided not to renew the contract. It was a great two months of experience writing the technical guides and documents of a product. This experience improved my English and writing skills. I am going to continue writing for myself.

I learned what passive voice writing and active voice writing method are. From not performing well to getting feedback, ‘I am there, but I take a longer route to reach.’ However, I found it quite relieved because I wanted to concentrate on my studies and assignments. This month, there are three assignments, I have completed one of them, two more to be completed by 1st and 2nd week of Dec, just before the school break.

As for work, I will be concentrated on my role to get things standardized. I have been working on data models, standard codes, etc. I met many people in the last two months, some of them I managed to catch up again with them on different occasions such as meetings, while others are during the team bonding sessions or townhall meetups.

I will try to continue writing until the end of the year. Hopefully, you can give me some feedback about what I shall share more for next year. If you have any feedback, please write to me in this Google form.

MongoDB: Importing csv files with mongoimport

Someone asked for my help to upload some .csv files to the MongoDB database and backup the database before sending the file to the next person. I completed the task with the command below. It imports a .csv file to the selected database and collection by specifying the file type, location and whether the file has a headerline. It runs for both Linux and Windows’ machines using the terminal or command prompt.
mongoimport -d mydb -c things --type csv --file locations.csv --headerline

MongoDB: Enabling SSL for MongoDB

Sometimes ago, I was told to prepare and work with a team of people to explore the TLS/SSL for MongoDB and use the Windows Active Directory users to access MongoDB via LDAP authentication.

It is an administration work on MongoDB to enhance security between client and server during data transmission. The default connections to MongoDB servers are not encrypted. It is highly advisable to ensure all connections to the mongod are Transport Layer Security (TLS) also known as SSL enabled.

The definition of TLS/SSL from Google is TLS and its now-deprecated predecessor, Secure Sockets Layer (SSL), which are protocols designed to provide communications security over a computer network

For a development server, you can enable the TLS using a self-signed certificate. OpenSSL allows us to generate the self-signed certificate on our server itself. For the Windows machine, it requires to install the OpenSSL to generate the self-signed certificate. Then, you need to install the self-signed certificate. For the Linux Ubuntu machine, no installation required for the OpenSSL and self-signed certificate.

In the MongoDB configuration file (mongod.conf for Linux machine) or (mongod.cfg for Windows machine) can enable the SSL mode in the MongoDB. You need to specify the path to the .pem file. If you are working on a production server, you are required to include the CAFile path too. CA stands for certificate authority, is an entity that issues digital certificates.

net:
   tls:
      mode: requireTLS
      certificateKeyFile: /etc/ssl/mongodb.pem
      CAFile: /etc/ssl/caToValidateClientCertificates.pem

More reading about MongoDB TLS/SSL can refer to this document: https://docs.mongodb.com/manual/tutorial/configure-ssl/