MongoDB: The Best Way to Work With Data

Relational databases have a long-standing position in most organizations. This made them the default way to think about storing, using and enriching data. However, modern applicants present new challenges that stretch the limits of what is possible with a relational database. Relational database uses tabular data model, stores data across many tables and links by foreign keys as the need to normalize the data.

Document Model

In contrast, MongoDB uses a document data model and presents data in single structure with the related data embedded as sub-documents and arrays. Below JSON document shows how a customer object is modeled in a single document structure with embedded sub-documents and arrays.

Flexibility: Dynamically Adapting to Changes

MongoDB documents’ fields can vary from document to document within a single collection. There is no need to declare the structure of documents to the system – documents are self-describing. If a new field needed to be added into a document, the field can be added without affecting all other documents in the MongoDB, unlike relational databases, we need to run the ‘ALTER TABLE’ operations.

Schema Governance

While MongoDB allows flexible schema, MongoDB also provides schema validation with the database, from MongoDB version 3.6 and above. The JSON schema validator allows us to define a fixed schema and validation rules directly into the database and free the developers to take care of it from the application level. With this, we can apply data governance standard to the schema while maintaining the benefits of a flexible document model.

Below is the sample validation rule,

db.createCollection( "people" , {
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "name", "surname", "email" ],
      properties: {
         name: {
            bsonType: "string",
            description: "required and must be a string" },
         surname: {
            bsonType: "string",
            description: "required and must be a string" },
         email: {
            bsonType: "string",
            pattern: "^.+\@.+$",
            description: "required and must be a valid email address" },
         year_of_birth: {
            bsonType: "int",
            minimum: 1900,
            maximum: 2018,
            description: "the value must be in the range 1900-2018" },
         gender: {
            enum: [ "M", "F" ],
            description: "can be only M or F" }
      }
   }
}})

So, it is possible also to implement the validation rules to the existing collections? The answer is we just need to use the collMod command instead of createCollection command.

db.runCommand( { collMod: "people3",
   validator: {
      $jsonSchema : {
         bsonType: "object",
         required: [ "name", "surname", "gender" ],
         properties: {
            name: {
               bsonType: "string",
               description: "required and must be a string" },
            surname: {
               bsonType: "string",
               description: "required and must be a string" },
            gender: {
               enum: [ "M", "F" ],
               description: "required and must be M or F" }
         }
       }
},
validationLevel: "moderate",
validationAction: "warn"
})

Having a Really Fixed Schema

MongoDB allows the additional fields that are not in the validation rules to be inserted into the collection. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the following parameter in the validation rule,

additionalProperties: false

The below MongoDB script shows how to use the above parameter.

db.createCollection( "people2" , {
   validator: {
     $jsonSchema: {
        bsonType: "object",
        additionalProperties: false,
		required: ["name","age"],
        properties: {
           _id : {
              bsonType: "objectId" },
           name: {
              bsonType: "string",
              description: "required and must be a string" },
           age: {
              bsonType: "int",
              minimum: 0,
              maximum: 100,
              description: "required and must be in the range 0-100" }
        }
     }
}})

Speed: Great Performance

For most of the MongoDB’s queries, there is no need to JOIN multiple records. Should your application require it, MongoDB does provide the equivalent of a JOIN, the $lookup which was introduced since version 3.2. For more reading, you can find in this link.

I will stop here for now and shall return with more information in my next write up or I will continue from this post. Stay tuned.

Advertisements

Database Stability

This is one of the common question to be asked either during a talk or during the interview. Personally, I look at this topic highly and important for every database administrator to pay attention to it.

Slow performance means tasks take longer time to complete. If it takes longer, there is more likely to overlap when multiple users or connections at the same time. It leads to frequent locks, deadlocks and resource contention and eventually leads to errors and stability issues.

Poor scalability means it has limited options when demand exceed capacity such as queue requests or reject requests. Rejecting requests result error or unexpected behaviour and this is instability. Queuing requests lead to reduced performance, putting demands on resources such as CPU, memory and etc. When it increases demands, it leads to further stability issues.

Poor stability affects performance. The partial success and partial failure must be handled, usually with database rollbacks or manual compensation logic. It is an additional resource requirements on the system whether to do rollback or process the manual compensation logic. And it affects scalability.

I found from the MSDN website, someone shared some important points when come to designing whether a database or an application. It always consider performance, scalability, and stability when architecting, building, and testing your databases and applications.

MongoDB Indexes

Indexes

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB performs collection scan, it scans every document in a collection to select documents match the query statement.

Default _id Index

As mentioned, MongoDB creates unique index on the _id field when a collection is created. Indexes help to prevent two documents with same value for the _id field. MongoDB supports the creation of user-defined ascending/descending indexes.

Index Types

  • Single Index – single field.
  • Compound Index – multiple fields. The order of fields in a compounded index has significance.
  • Multikey Index – to index the content stored in arrays.
  • Geospatial Index – to support efficient queries of geospatial coordinate data.
  • Text Indexes – provides a text index type that supports searching for string content in a collection.
  • Hashed Indexes – to support hash based sharding.

The syntax to create MongoDB indexes based on the index types above is shown below:

#Singple Index
db.collection.createIndex( <key and index type specification>, <options> )
db.collection.createIndex( { name: -1 } )

#Compound Index
db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
db.collection.createIndex( { "item": 1, "stock": 1 } )

#Multikey Index is used when any indexed field is an array
db.collection.createIndex( { <field>: < 1 or -1 > } )
db.collection.createIndex( { ratings: 1 } )

#Multiley Index in embedded document
db.collection.createIndex( { "stock.size": 1, "stock.quantity": 1 } )

#Text Index with keyword "text"
db.collection.createIndex( { <field>: "text" } )
db.collection.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

#Hashed Indexes with keyword "hashed"
db.collection.createIndex( { _id: "hashed" } )

Option ‘-1’ is creating a single key descending index while option ‘1’ is creating a single key ascending index.

MongoDB: Schema Planning Tips

MongoDB is advertised with features of its ability to be “schemaless”. It does not mean you do not need to design your database schema or there is no database schema applicable for MongoDB. It is a good idea to enforce some schema validation during the data insertion into the collections for better performance and scalability. Designing the schema can be tedious, yet, it can be fun too.

Avoid Growing Documents

By default, MongoDB allows 16MB size per document. If you intend to allow your documents to grow in size continuously, it is advisable to avoid it because,

  • It can lead to degradation of database and I/O performance.
  • A bad design of schema leads to failure of queries, sometimes.

Avoid Updating Whole Documents

When you do update, try to avoid updating whole document because MongoDB will rewrite the whole document elsewhere in the memory. Hence, it degrades the write performance in your database. Instead, you can use field modifiers to update only specific fields in the documents. It will trigger an in-place update in memory. Hence, it improves performance.

Avoid Application-Level Joins

As MongoDB does not support server level joins, therefore, we have to get all the data from the database and then perform the join at the application level. If we are working on a large amount of data, calling to the database several times to get necessary data is obviously required more time. A suggestion to denormalize schema makes more sense when your application heavily relies on joins. You can use embedded documents to get all the required data in a single query.

Below is an use case for embedded document where you put the addresses in an array inside of Person object.

The advantage of embedded document is you do not have to perform a separate query to get the embedded details. The disadvantage is you have no way to access the embedded details as standalone entities.

Field names Take Up Space

It is less important. When you get up to billions of records, it significantly affects on your index size. Disk space is cheap but RAM is not.

Use Proper Indexing

If the index on sorting field is not available, MongoDB is forced to sort without an index. There is a memory limit of 32MB of total size of all documents which are involved in the sort operation. If MongoDB hits that limit, then it may either produce and error or return an empty dataset. It is also important not to add unnecessary indexes because each index you add, you have to update all indexes while updating documents in database. It will cause,

  • degrade database performance.
  • occupy space and memory.
  • number of indexes can lead to storage-related problems.

One more way to optimize the use of an index is overriding the default _id field. The only purpose of this field is keeping one unique field per document. If your data contains a timestamp or any id field then you can override _id field and save one extra index.

If you create an index which contains all the fields that you would query and all the fields that will be returned by that query, MongoDB will never need to read the data because it is all contained within the index. This significantly reduces the need to fit all data into memory for maximum performance. It is called covered queries.

Read vs Write Ratio

When designing schema for any application, it depends whether the application is read heavy or write heavy. For example, when we build a dashboard to display timeseries data where constantly there is a stream of data loaded into the database, then you should design the schema in such a way that maximize the write throughput. If most of the operations in the application is read, then you should use denormalized schema to reduce the number of calls to be the database for getting data.

BSON Data Types

Make sure you define BSON data types for all fields correctly while designing the schema because changing the data type of any field, MongoDB will rewrite the whole document in a new memory space (can cause a document to be moved).

SysOps By Trials and Errors

During a discussion with my big boss and two other colleagues, my big boss mentioned about the sales account manager of MongoDB did not come back to him for the MongoDB Enterprise’s quotation. There is a project may need to use the enterprise version due to some security reasons. He, then, suggested me to try out the Percona Server for MongoDB.

According to the Percona’s website, “Percona Server for MongoDB is our free and open-source drop-in replacement for MongoDB Community Edition. It offers all the features and benefits of MongoDB Community Edition, plus additional enterprise-grade functionality.”

Below is the features comparison made by Percona and available at their website.

Since I am curious too, I started to try on setting up a virtual machine on my Linux machine on Friday. My IT guy told me to use VBox (VirtualBox) and so I did. I used the installer to complete the installation. It can be done using command lines too.

sudo apt-get update
sudo apt-get install virtualbox-6.0

For VBox installation, you can refer to this link.

Just I was about to think of where I can download the image disk for Ubuntu, he sent me a message to inform me where I can get them. Alternatively, you can get it from this link.

The setup and installation of the Ubuntu 16.04 were all done in the VBox using the .vdi image. I did not recall there was any complications during the Ubuntu installation. It was straight-forward all the way.

Next, I needed to get the Percona Server installed. I registered to the Percona’s website to obtain a copy of the PDF document which contains the installation guide and features’ setups. It is an useful documentation for me to setup the server. On Monday, by trials and errors, I installed latest Percona Server, configured it and ran the service in the virtual machine.

There are plenty features share in the documentation with guide to implement them. I completed the configuration to use Percona Memory Engine for storage engine. I am not sure whether I configured it correctly especially the virtual machine is running on 1GB memory. I set the Percona memory to be running at 3GB. It is something I need to re-visit after this.

Besides that, I did try to enable the authorization mode. Immediately, after it was enabled, I tried to launch my company’s product using default authentication method and the system returned errors because there are some databases not authorized to be used. Authorization and authentication are different, even I have created an user credential in MongoDB to access those databases. It is also a good topic to re-visit.

It gave me an experience being a day or two as a system engineer, or most people called them as SysOps nowadays. Although, it was not a full cycle of SysOps, the experience of installation and setup Ubuntu in the virtual machine, followed by installation of Percona Server and Robo 3T for MongoDB and lastly configurations and using the Percona server with my company’s product was so great that I wanted to share it here, today.

I always have the special privilege being a woman in the industry to have men to do this dirty job, but when a woman tries her hand to work on it, it is a beautiful piece of art!

Great thanks to my colleagues who are willing to help me and guide me through this trials and errors. At least, I did it for myself once!

MongoDB – MacOS Installation

I covered both Windows and Linux installation for MongoDB with my recent updated blog. Unfortunately, I am not able to write much about MacOS installation because I did not have the environment to try on.

Nevertheless, there are plenty of materials on the Internet we can search and follow. One of it and will always get updated is the MongoDB’s website, https://docs.mongodb.com/manual. They have a comprehensive guide on how to complete the installation.

From what I see, firstly, MacOS users need to download the MongoDB .tar.gz tarball. Secondly, extract the .tar.gz downloaded file using command,

tar -zxvf mongodb-osx-ssl-x86_64-4.0.5.tgz

for example.

After that, a couple of setups need to be done. In two examples I saw online, they did move the content of the extracted downloaded file to another folder using command similar to
sudo mv mongodb-osx-ssl-x86_64-4.0.5 /usr/local/mongodb

By default, the mongod process uses the /data/db directory to store data. Using command below to create a directory,

sudo mkdir -p /data/db 

If you wish to use different directory, you must specify that directory in the dbpath option when starting the mongod process. I will share the command later in my blog.

Let’s assume we are keeping the same location in this blog.

Then, change the permission of your username to access the directory.
To check your machine’s username use command, whoami and it returns the username. With this, we can set the permission using below command,

sudo chown <username> /data/db

Lastly, we setup mongodb/bin PATH to ~/.bash_profile. A couple of steps to follow:
1. Type cd, so that it goes back to home directory.
2. Type pwd, to make sure you are in this directory, /Users/<username>.
3. Type ls -al to list down all the files in the directory including the hidden file. The .bash_profile is an hidden file in this case.
4. If the .bash_profile file is not found, then type touch .bash_profile to create.
5. If the .bash_profile file is found, then type open .bash_profile to open the file.
6. Add or append these two lines into the opened file. You can append at the end of the file,
export MONGO_PATH=/usr/local/mongodb
export PATH=$PATH:$MONGO_PATH/bin

7. Save the file.
8. Type source .bash_profile to reload the file.

Start mongo service using command, mongod. Then, you can see whether mongoDB is running from the terminal by looking for output line:
[initandlisten] waiting for connections on port 27017.

#Run without specifying path
mongod

#Run with specifying path
mongod --dbpath <data directory path>

Again, default port for mongodb is 27017.

MongoDB Installation in Window 10

After being told to look into the NoSQL database called MongoDB by my senior last month, December, today, my senior reminded me to start mastering the database. Haha, I stopped using the database after I completed a project which used MongoDB.

Prior to this, I installed MongoDB in my Ubuntu OS, ran in the VM. This time, I want to install it in Window 10 OS. Guide to install MongoDB can be found from MongoDB’s website, https://docs.mongodb.com/manual/installation/#tutorials. The link shows how to install for different environments.

Download MongoDB Community Edition
For Window installation, it is done through the installer (.msi) which can be easily downloaded from the MongoDB Download Centre. The link can be found from the same website of the link given above.

Install MongoDB Community Edition
Run the installer file and follow through the wizard to get the installation done. I choose to install ‘Complete‘ setup type which is recommended to most of the users like you and me. It brings us to the Service Customization screen. From MongoDB website, it says from MongoDB 4.0 onward, it allows us to install MongoDB as a service during the installation. I did not change any pre-setting on this screen. If do not want to install MongoDB as a service, uncheck the checkbox.

What is the difference of install as a service and do not?
– Allows to choose to run as service uses Network Service user (Window user account credential) or uses local or domain user (specify username and password). It is similar like Microsoft SQL Server (MSSQL). However, it does not have mixed mode to allow users to use both modes.

– Allows to change Service name, default name is MongoDB. If we have another instance running with same name or simply want to use different name, it allows us to change the name during the installation.

– Allows to choose the Data Directory and Log Directory, another portion similar to MSSQL. It is quite often we do not want to set the data and log in the C:\. Here is where we can set the directory to another disk directory.

Install MongoDB Compass

New version of MongoDB defaults to install MongoDB Compass, a graphical user interface (GUI) for MongoDB. Users who familiar with using mongo shell or other GUIs can ignore this installation by uncheck the checkbox. Then, proceed to complete the installation. It takes a few minutes to complete.

Upon completion, the screen shows the loading of MongoDB Compass if we choose to install this GUI. Based on the earlier setup, I did not use any local users or domains to setup the credential, therefore, we can remain using Authentication as “None“, the Hostname is “localhost” as I am connecting locally on my machine and default port of MongoDB is 27017.

Click connect, it should bring you to the database lists screen. By default, it shows 3 default databases.

Otherwise, we can check whether the MongoDB service is running by checking through the Window Services Manager. Shortcut command to access Services,

open the Run dialog box, ctrl+r.
type services.msc.
a screen as below shows up (it could a quite long list of services running in a machine).
search for MongoDB Server.
status shows “Running“.

Otherwise, we can go to the directory below and execute the file under Administrator privilege.

“C:\Program Files\MongoDB\Server\4.0\bin\mongo.exe”.

Maybe, can consider of creating a shortcut to this directory to easily accessing it in the future.