How to use SSH to connect to a remote server

To establish a connection with a remote machine depending on the operation system you are running, there are two most commonly used protocols:

  • Secure Shell (SSH) for Linux based machine.
  • Remote desktop protocal (RDP) for Windows based machine.

This protocols use client and server applications to establish a remote connection. It allows you to gain access and remotely manage other machines. Today’s topic focuses on Linux based machine and I am using Ubuntu 16.04.

How to install an OpenSSH Client

For Linux Ubuntu, you can install the openssh-server to enable the SSH on Ubuntu client machine. For Windows machine, you can install PuTTY or any other client of your choice.

sudo apt-get install openssh-server

OpenSSH defaults

  • TCP port: 22
  • OpenSSH server config file is called sshd_config which is located at /etc/ssh/

How to install an OpenSSH Server

In order to accept SSH connections, a machine needs to have a server-side part of the SSH. First, you can check if OpenSSH server is available on the Ubuntu server machine of the remote computer that needs to accept SSH connections, you can try to connect to the localhost.

ssh localhost

Ubuntu machine that without the SSH server installed, the screen may shows:

username@host:~$ ssh localhost
ssh: connect to host localhost port 22: Connection refused username@host:~$

The port 22 is not established, therefore the connection is refused. Then, you can install the SSH server,

sudo apt-get install openssh-server ii

You can check if SSH server is running after the installation is completed on the Ubuntu machine by using command below:

sudo service ssh status

The screenshot above show the status is active and running currently. Then, we can move back to the client machine which can be our local machine to try the command ssh to the remote server machine.

How to connect via SSH

Open the terminal from your machine and run command: ssh username@host_ip_address

Key in the password and start connecting. If you are connected for the first time, an ECDSA key fingerprint is required, so just follow the instruction on the terminal and then, you are connected to the remote server. If the default port has changed, you can specify the port behind the ip address with -p 9876

Firewall rules

Next you may want to look at the firewall setting in the server machine to ensure the port is listening, not blocked by firewall and forwarded incorrectly.

Advertisements

Creating a Linux service

I think creating a Linux service is a fairly easy task and it allows us to write a program and turn it into a service using systemd. The service can start or stop using a terminal or GUI (Graphical User Interface) for Windows. I am going to use terminal for this topic here as we are going to create a Linux service to start running my program.

Another reason I want to create a service to run my program is it allows my program to be restarted if it terminates due to unforeseen reasons.

How to begin?

I started with create a script file with .service file extension. I got a sample copy from my colleague and began to modify the script. I have to be careful enough with the directory path. While I thought it may just a directory change from /opt/ to /home/ for the WorkingDirectory and ExecStart in the script, it did not turn out working when I enabled and started the service. I will share what the stupid mistaken I made was.

Once you have the script file ready, saved and copy it to the following directory, /etc/systemd/system. You cannot do a copy-paste of the document into this directory path.

sudo cp mytest_service.service /etc/systemd/system

Permission and Executable File

Next, add the file permission to the root user and make the file executable. You can check on the command, chmod for details setting of file permission for user, guest and other. Otherwise, to set the file to be executable, the below command does the work.

sudo chmod +x mytest_service.service

Enable and Start Service

Once you have the .service file ready in the said directory and make it executable, I think it should be good enough for the next step which is to enable to the service and start the service.

sudo systemctl enable mytest_service.service
sudo systemctl start mytest_service.service

Created symlink from /etc/systemd/system/multi-user.target.wants/mytest_service.service to /etc/systemd/system/mytest_service.service.

Upon running the above command to enable the service, the symlink is created as shown above.

If you wish to know whether the service is started correctly, you can use the following command to check the running status.

sudo systemctl status mytest_service.service

Checking the status will give us few information which we want know, whether the service is active, running or failed with error. The error message serves an important keyword for us to search online for solutions.

If you search on the Internet, you may find that some command is using service instead of systemctl. service is a fairly simple wrapper which only supports limited actions such as start, stop and check the status of the service. For more complex tasks, actual command to be used is systemctl.

For more reading about systemctl and service command, you can find it in this link.

What is the error I received?

The “Active” status on the screen shows “failed” when I ran the command to start the service. It showed an error code 200/CHDIR. I googled the error and found out this error indicated the path is not found or accessible at the time the service is attempting to run.

Since, I have set the file permission earlier on, the access right to the file should be granted with executable file. Hence, it should be my path wrong.

Having my program saved at home directory, I missed out to include my full home directory path in the “WorkingDirectory” and “ExecStart”. Use “pwd” in the terminal, it is a command line to print the current working directory. Hence, it helps to get a correct path to allow the service to execute the program.

/home/myname/program/run.sh
##/home/program/run.sh

Data Types : Statistics

Two main data types in the Satistics, the qualitative data and the quantitative data. In the previous topic, we discussed about the “Level of Data Measurement” in which we talked about nominal, ordinal, interval and ratio. How does these measurements can be related to qualitative and quantitative data.

The qualitative data is the nominal and ordinal measurements which describe a “feature” of a data object. Meanwhile, the quantitative data refers to data can be counted or measured by numbers. The quantitative data is the interval and ratio measurements.

More examples to distinguish the qualitative data and quantitative data as below:

And, further discussion on the qualitative data where it has a sub level of data types called discrete and continuous. Both have differences in few areas, see the table below:

Discrete data is a whole number (integer) and it cannot be subdivided into smaller and smaller parts.

Continuous data continues on and on and on.

Source:
https://www.mymarketresearchmethods.com/data-types-in-statistics/

Levels of Data Measurement : Statistics

Last week, during my 2nd class in Business Intelligence, a statistics topic on levels of measurement was being discussed. The lecturer tried her very best to explain to us the differences between each of the levels.

Nominal, Ordinal, Interval or Ratio.

In statistics, there are four levels of data measurement, nominal, ordinal, interval and ratio (and sometimes, the interval and ratio are called in other terms such as continuous and scale).

I think this is important for researchers to understand this theory part of statistics to determine which statistic analysis suitable for their problem statements. And, for students like me, I think it is good enough if I can differentiate them as I was told, the exam paper would not ask us to differentiate but theoretically, we have to understand what each of them is.

There are a number of statistics’ articles online which explained it and I found the website called, http://www.mymarketresearchmethods.com gave me a better understanding. You can refer to the link below for the write-up and I will explain a bit here too.

It is quite easily to distinguish the nominal and ordinal measurements.

Nominal & Ordinal

First level of measurement is nominal. The numbers in the variable are used only to classify the data. The words, letters, and alpha-numeric symbols can be used as the values or numbers in the variable (without quantitative value). Best example is gender, male or female.

Nonimal data can be in ordered or no ordered such as gender. Ordered nominal data can be something like, cold, warm, hot and very hot.

Second level of measurement is ordinal. With ordinal scales, the order of the values is what is important and significant, but the differences between each one is not really known. Examples is the ranking No.1, No.2 and No.3 for students’ score, with highest score is No.1 follows by second highest score gets No.2.

However, I get a bit confused because the above mentioned, “the differences between each one is not really known”. But scores and ranks did tell the differences, unless we use exam grade such as grade A+, A and A-. Do you agree with this example?

Maybe, I shall follow what the website says, the satisfaction level, it cannot quantify–how much better it is.

Interval & Ratio

The third level of measurement is interval. Interval scales are numeric scales in which we know both the classification, order and the exact differences between the values. I picked up the explanation from the same website.

Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself means “space in between,” which is the important thing to remember–interval scales not only tell us about order, but also about the value between each item.

For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees.

Here’s the problem with interval scales: they don’t have a “true zero.” For example, there is no such thing as “no temperature,” at least not with celsius. In the case of interval scales, zero doesn’t mean the absence of value, but is actually another number used on the scale, like 0 degrees celsius. Negative numbers also have meaning. Without a true zero, it is impossible to compute ratios. With interval data, we can add and subtract, but cannot multiply or divide.

Consider this: 10 degrees C + 10 degrees C = 20 degrees C. No problem there. 20 degrees C is not twice as hot as 10 degrees C, however, because there is no such thing as “no temperature” when it comes to the Celsius scale. When converted to Fahrenheit, it’s clear: 10C=50F and 20C=68F, which is clearly not twice as hot.

The fourth level of measurement is the ratio. Ratio tells us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied. Good examples of ratio variables include height, weight, and duration. These variables can be meaningfully added, subtracted, multiplied, divided (ratios).

Sources:
https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/
https://www.statisticssolutions.com/data-levels-of-measurement/

MongoDB Indexes

Indexes

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB performs collection scan, it scans every document in a collection to select documents match the query statement.

Default _id Index

As mentioned, MongoDB creates unique index on the _id field when a collection is created. Indexes help to prevent two documents with same value for the _id field. MongoDB supports the creation of user-defined ascending/descending indexes.

Index Types

  • Single Index – single field.
  • Compound Index – multiple fields. The order of fields in a compounded index has significance.
  • Multikey Index – to index the content stored in arrays.
  • Geospatial Index – to support efficient queries of geospatial coordinate data.
  • Text Indexes – provides a text index type that supports searching for string content in a collection.
  • Hashed Indexes – to support hash based sharding.

The syntax to create MongoDB indexes based on the index types above is shown below:

#Singple Index
db.collection.createIndex( <key and index type specification>, <options> )
db.collection.createIndex( { name: -1 } )

#Compound Index
db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
db.collection.createIndex( { "item": 1, "stock": 1 } )

#Multikey Index is used when any indexed field is an array
db.collection.createIndex( { <field>: < 1 or -1 > } )
db.collection.createIndex( { ratings: 1 } )

#Multiley Index in embedded document
db.collection.createIndex( { "stock.size": 1, "stock.quantity": 1 } )

#Text Index with keyword "text"
db.collection.createIndex( { <field>: "text" } )
db.collection.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

#Hashed Indexes with keyword "hashed"
db.collection.createIndex( { _id: "hashed" } )

Option ‘-1’ is creating a single key descending index while option ‘1’ is creating a single key ascending index.

MongoDB: Schema Planning Tips

MongoDB is advertised with features of its ability to be “schemaless”. It does not mean you do not need to design your database schema or there is no database schema applicable for MongoDB. It is a good idea to enforce some schema validation during the data insertion into the collections for better performance and scalability. Designing the schema can be tedious, yet, it can be fun too.

Avoid Growing Documents

By default, MongoDB allows 16MB size per document. If you intend to allow your documents to grow in size continuously, it is advisable to avoid it because,

  • It can lead to degradation of database and I/O performance.
  • A bad design of schema leads to failure of queries, sometimes.

Avoid Updating Whole Documents

When you do update, try to avoid updating whole document because MongoDB will rewrite the whole document elsewhere in the memory. Hence, it degrades the write performance in your database. Instead, you can use field modifiers to update only specific fields in the documents. It will trigger an in-place update in memory. Hence, it improves performance.

Avoid Application-Level Joins

As MongoDB does not support server level joins, therefore, we have to get all the data from the database and then perform the join at the application level. If we are working on a large amount of data, calling to the database several times to get necessary data is obviously required more time. A suggestion to denormalize schema makes more sense when your application heavily relies on joins. You can use embedded documents to get all the required data in a single query.

Below is an use case for embedded document where you put the addresses in an array inside of Person object.

The advantage of embedded document is you do not have to perform a separate query to get the embedded details. The disadvantage is you have no way to access the embedded details as standalone entities.

Field names Take Up Space

It is less important. When you get up to billions of records, it significantly affects on your index size. Disk space is cheap but RAM is not.

Use Proper Indexing

If the index on sorting field is not available, MongoDB is forced to sort without an index. There is a memory limit of 32MB of total size of all documents which are involved in the sort operation. If MongoDB hits that limit, then it may either produce and error or return an empty dataset. It is also important not to add unnecessary indexes because each index you add, you have to update all indexes while updating documents in database. It will cause,

  • degrade database performance.
  • occupy space and memory.
  • number of indexes can lead to storage-related problems.

One more way to optimize the use of an index is overriding the default _id field. The only purpose of this field is keeping one unique field per document. If your data contains a timestamp or any id field then you can override _id field and save one extra index.

If you create an index which contains all the fields that you would query and all the fields that will be returned by that query, MongoDB will never need to read the data because it is all contained within the index. This significantly reduces the need to fit all data into memory for maximum performance. It is called covered queries.

Read vs Write Ratio

When designing schema for any application, it depends whether the application is read heavy or write heavy. For example, when we build a dashboard to display timeseries data where constantly there is a stream of data loaded into the database, then you should design the schema in such a way that maximize the write throughput. If most of the operations in the application is read, then you should use denormalized schema to reduce the number of calls to be the database for getting data.

BSON Data Types

Make sure you define BSON data types for all fields correctly while designing the schema because changing the data type of any field, MongoDB will rewrite the whole document in a new memory space (can cause a document to be moved).

Day 42: Importing Data in Python

Introduction

Importing data from various sources such as,
– flat files eg. txt, csv
– files from software eg. Excel, SAS, Matlab files.
– relational databases eg. SQL server, mySQL, NoSQL.

Reading a text file
It uses the open() function to open the connection to a file with passing two parameters,
– filename.
– mode. Eg: r is read, w is write.

Then, assigns a variable to read() the file. Lastly, close() the file after the process is done to close the connection to the file.

If you wish to print out the texts in the file, use print() statement.

Syntax:
file = open(filename, mode=’r’)
text = file.read()
file.close()

print(text)

The filename can be assigned to a variable instead of writing the filename in the open() function’s parameter. It is always a best practice to close() an opened connection.

To avoid being forgotten to close a file connection (miss out including file.close() in the end of our codes), it is advisable to use the context manager which uses the keyword with statement.

In my previous posts where some of the tutorials were using context manager to open a file connection to read the .csv files. The with statement executes the open() file command and this allows it to create a context in which it can execute commands with the file opens. Once out of this clause or context, the file is no longer opened, and for this reason it is called context manager.

What it is doing here is called ‘binding’ a variable in the context manager construct, while within this construct, the variable file will be bound to open(filename, ‘r’).

Let me share the tutorials I did in DataCamp’s website.

# Open a file: file
file = open('moby_dick.txt', mode='r')

# Print it
print(file.read())

# Check whether file is closed
print(file.closed)

# Close file
file.close()

# Check whether file is closed
print(file.closed)

First, it opens a file using open() function with passing the filename and read mode. It then, reads the file. The first print() statement prints out the context of the moby_dick text file. Then, second print() statement returns a Boolean which checks whether the file is closed. In this case, it returns ‘FALSE’. Lastly, proceeds to close the file using close() function and then check the Boolean. This time, it returns ‘TRUE’

Importing text files line by line
For a larger file, we do not want to print out everything inside the file. We may want to print out several lines of the content. To do this, it uses readline() method.

When the file is opened, use file.readline() to print each line. See the code below, using the with statement and file.readline() to print each line of the content in the file.

# Read & print the first 3 lines
with open('moby_dick.txt') as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

Summary of the day:

  • Importing data from different sources.
  • Reading from a text file using open() and read(),
  • Importing text line by line using readline() method.
  • with statement as context manager.
  • close() to close a file.
  • file.closed() returns Boolean value if the condition is met.