MongoDB Indexes

Indexes

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB performs collection scan, it scans every document in a collection to select documents match the query statement.

Default _id Index

As mentioned, MongoDB creates unique index on the _id field when a collection is created. Indexes help to prevent two documents with same value for the _id field. MongoDB supports the creation of user-defined ascending/descending indexes.

Index Types

  • Single Index – single field.
  • Compound Index – multiple fields. The order of fields in a compounded index has significance.
  • Multikey Index – to index the content stored in arrays.
  • Geospatial Index – to support efficient queries of geospatial coordinate data.
  • Text Indexes – provides a text index type that supports searching for string content in a collection.
  • Hashed Indexes – to support hash based sharding.

The syntax to create MongoDB indexes based on the index types above is shown below:

#Singple Index
db.collection.createIndex( <key and index type specification>, <options> )
db.collection.createIndex( { name: -1 } )

#Compound Index
db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
db.collection.createIndex( { "item": 1, "stock": 1 } )

#Multikey Index is used when any indexed field is an array
db.collection.createIndex( { <field>: < 1 or -1 > } )
db.collection.createIndex( { ratings: 1 } )

#Multiley Index in embedded document
db.collection.createIndex( { "stock.size": 1, "stock.quantity": 1 } )

#Text Index with keyword "text"
db.collection.createIndex( { <field>: "text" } )
db.collection.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

#Hashed Indexes with keyword "hashed"
db.collection.createIndex( { _id: "hashed" } )

Option ‘-1’ is creating a single key descending index while option ‘1’ is creating a single key ascending index.

MongoDB: Schema Planning Tips

MongoDB is advertised with features of its ability to be “schemaless”. It does not mean you do not need to design your database schema or there is no database schema applicable for MongoDB. It is a good idea to enforce some schema validation during the data insertion into the collections for better performance and scalability. Designing the schema can be tedious, yet, it can be fun too.

Avoid Growing Documents

By default, MongoDB allows 16MB size per document. If you intend to allow your documents to grow in size continuously, it is advisable to avoid it because,

  • It can lead to degradation of database and I/O performance.
  • A bad design of schema leads to failure of queries, sometimes.

Avoid Updating Whole Documents

When you do update, try to avoid updating whole document because MongoDB will rewrite the whole document elsewhere in the memory. Hence, it degrades the write performance in your database. Instead, you can use field modifiers to update only specific fields in the documents. It will trigger an in-place update in memory. Hence, it improves performance.

Avoid Application-Level Joins

As MongoDB does not support server level joins, therefore, we have to get all the data from the database and then perform the join at the application level. If we are working on a large amount of data, calling to the database several times to get necessary data is obviously required more time. A suggestion to denormalize schema makes more sense when your application heavily relies on joins. You can use embedded documents to get all the required data in a single query.

Below is an use case for embedded document where you put the addresses in an array inside of Person object.

The advantage of embedded document is you do not have to perform a separate query to get the embedded details. The disadvantage is you have no way to access the embedded details as standalone entities.

Field names Take Up Space

It is less important. When you get up to billions of records, it significantly affects on your index size. Disk space is cheap but RAM is not.

Use Proper Indexing

If the index on sorting field is not available, MongoDB is forced to sort without an index. There is a memory limit of 32MB of total size of all documents which are involved in the sort operation. If MongoDB hits that limit, then it may either produce and error or return an empty dataset. It is also important not to add unnecessary indexes because each index you add, you have to update all indexes while updating documents in database. It will cause,

  • degrade database performance.
  • occupy space and memory.
  • number of indexes can lead to storage-related problems.

One more way to optimize the use of an index is overriding the default _id field. The only purpose of this field is keeping one unique field per document. If your data contains a timestamp or any id field then you can override _id field and save one extra index.

If you create an index which contains all the fields that you would query and all the fields that will be returned by that query, MongoDB will never need to read the data because it is all contained within the index. This significantly reduces the need to fit all data into memory for maximum performance. It is called covered queries.

Read vs Write Ratio

When designing schema for any application, it depends whether the application is read heavy or write heavy. For example, when we build a dashboard to display timeseries data where constantly there is a stream of data loaded into the database, then you should design the schema in such a way that maximize the write throughput. If most of the operations in the application is read, then you should use denormalized schema to reduce the number of calls to be the database for getting data.

BSON Data Types

Make sure you define BSON data types for all fields correctly while designing the schema because changing the data type of any field, MongoDB will rewrite the whole document in a new memory space (can cause a document to be moved).

Day 42: Importing Data in Python

Introduction

Importing data from various sources such as,
– flat files eg. txt, csv
– files from software eg. Excel, SAS, Matlab files.
– relational databases eg. SQL server, mySQL, NoSQL.

Reading a text file
It uses the open() function to open the connection to a file with passing two parameters,
– filename.
– mode. Eg: r is read, w is write.

Then, assigns a variable to read() the file. Lastly, close() the file after the process is done to close the connection to the file.

If you wish to print out the texts in the file, use print() statement.

Syntax:
file = open(filename, mode=’r’)
text = file.read()
file.close()

print(text)

The filename can be assigned to a variable instead of writing the filename in the open() function’s parameter. It is always a best practice to close() an opened connection.

To avoid being forgotten to close a file connection (miss out including file.close() in the end of our codes), it is advisable to use the context manager which uses the keyword with statement.

In my previous posts where some of the tutorials were using context manager to open a file connection to read the .csv files. The with statement executes the open() file command and this allows it to create a context in which it can execute commands with the file opens. Once out of this clause or context, the file is no longer opened, and for this reason it is called context manager.

What it is doing here is called ‘binding’ a variable in the context manager construct, while within this construct, the variable file will be bound to open(filename, ‘r’).

Let me share the tutorials I did in DataCamp’s website.

# Open a file: file
file = open('moby_dick.txt', mode='r')

# Print it
print(file.read())

# Check whether file is closed
print(file.closed)

# Close file
file.close()

# Check whether file is closed
print(file.closed)

First, it opens a file using open() function with passing the filename and read mode. It then, reads the file. The first print() statement prints out the context of the moby_dick text file. Then, second print() statement returns a Boolean which checks whether the file is closed. In this case, it returns ‘FALSE’. Lastly, proceeds to close the file using close() function and then check the Boolean. This time, it returns ‘TRUE’

Importing text files line by line
For a larger file, we do not want to print out everything inside the file. We may want to print out several lines of the content. To do this, it uses readline() method.

When the file is opened, use file.readline() to print each line. See the code below, using the with statement and file.readline() to print each line of the content in the file.

# Read & print the first 3 lines
with open('moby_dick.txt') as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

Summary of the day:

  • Importing data from different sources.
  • Reading from a text file using open() and read(),
  • Importing text line by line using readline() method.
  • with statement as context manager.
  • close() to close a file.
  • file.closed() returns Boolean value if the condition is met.

EAT

EAT is a local Singapore food outlet serving local traditional food such as minced pork noodles with mushroom (“Bak Chor Mee”) also famously known as BCM, fishbal noodles, laksa and many others.

You can find their food outlets in some shopping malls around the neighbourhood. I had tried this food outlet for twice before this visit at City Square Mall at Farrer Park recently to collect my race pack.

As mentioned they serve local Singaporeans’ food and one of my local favourites is the fishballs noodle. As usual, I will go for the set meal which I think it is quite worthy to order.

It comes with 3 big fish balls and some slices of fish cakes, fresh fish balls, springy and thick egg noodle and a bowl of clear soup which has all the essence from the fish and meat. The fish ball is made of yellowtail fish according to their website. Their set comes with a cup of coffee or tea of your choice.

A closer view of their thick mee pok (egg noodles) which I feel the texture is thicker than the usual one and it was still springy when I tried it. I took quite a while before start digging into the noodles and mixing with the sauce which include some chili and vinegar.

It is pretty standard sauce in Singapore if you go for dry version of noodle. If you do not want to add chili, I would recommend you try their soup version.

Address: 180 Kitchener Road, B2-K5/K5A/K6 City Square Mall, Singapore 208539.

Su Food

Taking a break from writing Python, Linux and stuff. Let us focus on food for this weekend.

Sometimes back in February this year, just after I came back from my Chinese New Year holiday, I met up with my teammates of the TechLadies’ volunteers’ team for the first time.

We met at Su Food, a meat-free dining concept restaurant. Little that I know, Su Food is hailing from Taiwan and quite popular among the vegetarian and non-vegetarian people. It is located at Raffles City Shopping Center.

I ordered the Kimchi Stone Pot 5-grain Rice for my first try. As you can see the serving above and all the ingredients. It came with some sides.

When it comes to rice, it is going to be filling one. I do not have special comment about this meal. It tasted normal. If you think you want to spend a meal without meat, here is one of the choices, instead of going for salads house.

We spent some time good time introducing ourselves and sharing our daily work life with each others.

Address: 02-19 Raffles City, 252 Bridge Road, Singapore 179103.

Saizeriya

Little that I know, the Saizeriya is a Japanese chain of Italian family-style restaurant which has a few branches in Singapore. It is a wallet-friendly restaurant which serves a fusion food of both Asian and Western food.

It has been a while since the last visit. They have something new for this visit, the chili crab spaghetti. It tasted so-so only. It would be better you order a real chili crab and take the gravy with spaghetti for best satisfaction.

Another type of spaghetti I always order whenever I visited their branch at Aperia Mall is vongole spicy tomato soup, the spicy tomato clam spaghetti in other words. It is tasty, spicy sweet soupy type of spaghetti.

I think other than having spaghetti, the rest of the food is not quite worth as the portion is quite small. If you wish, you can order the set which comes with a bowl of salad and drink which you can self-service at their drink section. Otherwise, plain water is free flow.

Easy Noodle Bar

It is located at Foch Road, Singapore, closes to Jalan Besar, an old district in Singapore. Now, it is accessible via the Bendemeer MRT (Downtown Line). This road features the old and small shop lots and Easy Noodle Bar is Japanese restaurant which is located in between the famous pig organ soup and Vietnamese restaurant, Lang Nuong Vietnam.

I found this restaurant through an introduction made by a fellow Instagram friend and I brought two other colleagues to try this restaurant for one of the dinner together.

Their menu is simple and easy as mentioned by its restaurant name. The selection of food is not many but good enough to give some tries.

They took some pictures of the food and put into a photo album to display on the table, it is not all the best shots, but they are good enough to give you some idea of their food, otherwise, you can ask the waitress to explain them to you before you order.

During my first visit, I tried plenty of good ones.

Nikujaga
My favorite of all the dishes we ordered. A Japanese beef stewed with vegetables and the stock was sweet and tasty. According to Sethlui.com, they use grain-fed Angus beef which is marinated with shoulder, mirin and sake. The beef was tender and easily melted in my mouth.

Recommended dish, the Yang Chun noodle. A simple noodle comes with good soup based, I think, else my colleague may not be trying to finish up the soup. Generally, it gives some homely feel while eating it and it is meatless, except a soft boiled egg.

We tried Tori Nanban, fried chicken cutlet. Crisp on the outside, juicy on the inside and dip with their sauce. Perfect match.

The Tempura prawn was great too when it was served hot and the breadcrumb did not look oily. It was used fresh prawns and the meat texture was crunchy.

Lastly, the Chicken Roulade, minced meat rolled inside a chicken breast I think. Everything was meaty. It comes in two portion, nicely grilled, still tender and delicious.

It is recommended to try their lunch set too because their price is quite reasonable and the portion is just right to allow us to try many different dishes.

The ambiance at night is a little bit dark and under light, however, it is still comfortable to enjoy the dinner together.

Address: 20 Foch Rd, Singapore 209261