The data warehouse is the core of the Business Intelligence system. Being a database administrator in my previous job, I managed various data warehouses that used to provide meaningful insights into business decision making, data analysis and reporting—for example, the sales reports and sales trends. The data warehouse is not a product, but it is an environment that consists of the current and historical, in both structured and unstructured data. The data warehousing is a process transforming data into information and making it available for the business users. The data is ingested, cleaned and transformed so that the business users can access the processed data by using any Business Intelligence tools such as Tableau, Power BI or SQL clients to analyze data. There is a difference between data warehousing and data mining.
Data warehousing is different than data mining
Types of Data Warehouse
Enterprise Data Warehouse:
- It is a centralized warehouse.
- It provides decision support service across the enterprise.
- It offers a unified approach for organizing and representing data.
- It provides the ability to classify data according to the subject and give access according to those divisions.
Operational Data Store:
- Data store required when neither Data warehouse nor OLTP systems support organizations reporting needs.
- The data warehouse is refreshed in real-time.
- A data mart is a subset of the data warehouse.
- It specially designed for a particular line of business, such as sales, finance, sales or finance.
- Data can collect directly from sources.
Star Schema vs. Snowflake Schema
Knowing the differences between star and snowflake schema is one of the most common interview questions. Star schema is a mature modelling approach widely adopted by relational data warehouses. It requires modellers to classify their model tables as either dimension or fact. Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can hold a huge number of rows and continue to grow over time.
Snowflake Schema is a type of multidimensional model. In snowflake schema contains the fact table, dimension tables and one or more than tables for each dimension table. Snowflake schema is a normalized form of star schema which reduce the redundancy and saves the significant storage. It is easy to operate because it has less number of joins between the tables and in this simple and less complex query is used for accessing the data from database.
Metadata is also known as a data dictionary or data specification document. It provides the detail of the data such as the name of the column (attribute), the data type of the column whether it is a string, a number, etc, the order of the columns, the definition of nullable columns and etc. Datasets are made up of data objects (entity). After we identify the metadata, we shall design the conceptual, logical and physical data model.
Relationship and Cardinality
Relationship refers to how to link the tables (entities) together. According to the online Visual Paradigm website (a data modelling tool website), cardinality defines the possible number of occurrences in one entity which is associated with the number of occurrences in another. When you do a Google search to define cardinality in general terms, it returns a definition as “the number of elements in a set or other grouping, as a property of that grouping”. For example, ONE team has MANY players. Below diagram is extracted from lucidchart website which shows the different types of relationship in the database. It is used in the ER diagram, entity-relationship diagram.
Generally, the common types of cardinality are:
- One to one
- One to many
- Many to many
In term of data query optimization, cardinality means the data in a column of a table, specifically how many unique values are in it. If you have done data profiling by using the Microsoft Power BI, for example, you notice Power BI provides a summary statistics of the table that show if the attributes contain unique values. This information helps with planning queries and optimizing the execution plans.
Conceptual, Logical and Physical Data Model
A general understanding to the three data models is that business analyst uses a conceptual and logical model to model the business objects exist in the system, while database designer or database engineer elaborates the conceptual and logical ER model to produce the physical model that presents the physical database structure ready for database creation.
The enterprise data modelling, metadata management and data warehousing are part of the Data Governance (DMBOK) that I mentioned in my previous posts. I learned it in my current job, and I think it is useful for those who are serious in building an enterprise data warehouses. A part of my job scope includes:
- Design and develop semantic data models.
- Design metadata and harmonized code references.
- Identify business keys and relationships among the tables.
- Ensure to involve all stakeholders including business users in Data warehouse implementation process.