1. How is Debian GNU/Linux different from Ubuntu? Name two aspects.
Ubuntu is based on a snapshot of Debian, therefore there are many similarities between them.
However, there are still significant differences between them. The first one would be the
applicability for beginners. Ubuntu is recommended for beginners because of its ease of use and
on the other hand Debian is recommended for more advanced users. The major difference is the
complexity of the user configuration that Ubuntu doesn’t require during the installation process.
Another difference would be the stability of each distribution. Debian is considered to be more
stable compared to Ubuntu. This is because Debian receives fewer updates that are tested in
detail and the entire operating system is more stable. On the other hand, Ubuntu enables the
user to use the latest releases of software and all the new technologies.
2. What are the most common environments/platforms Linux is used for? Name three different
environments/platforms and name one distribution you can use for each.
A few of the common environments/platforms would be smartphone, desktop and server. On
smartphones, it can be used by distributions such as Android. On desktop and server, it can be
used by any distribution that is mostly suitable with the functionality of that machine, from
Debian, Ubuntu to CentOS and Red Hat Enterprise Linux.
3. You are planning to install a Linux distribution in a new environment. Name four things that
you should consider when choosing a distribution.
When choosing a distribution, a few of the main things that should be considered is cost,
performance, scalability, how stable it is and the hardware demand of the system.
4. Name three devices that the Android OS runs on, other than smartphones.
Other devices that Android runs on are smart TVs, tablet computers, Android Auto and
smartwatches.
5. Explain three major advantages of cloud computing.
The major advantages of cloud computing are flexibility, easy to recover and low use cost. Cloud
based services are easy to implement and scale, depending on the business requirements. It has a
major advantage in backup and recovery solutions, as it enables businesses to recover from
incidents faster and with less repercussions. Furthermore, it reduces operation costs, as it allows
to pay just for the resources that a business uses, on a subscription-based model.
Answers to Explorational Exercises
1. Considering cost and performance, which distributions are mostly suitable for a business that
aims to reduce licensing costs, while keeping performance at its highest? Explain why.
One of the most suitable distributions to be used by business is CentOS. This is because it
incorporates all Red Hat products, which are further used within their commercial operating
system, while being free to use. Similarly, Ubuntu LTS releases guarantee support for a longer
period of time. The stable versions of Debian GNU/Linux are also often used in enterprise
environments.
2. What are the major advantages of the Raspberry Pi and which functions can they take in
business?
Raspberry Pi is small in size while working as a normal computer. Furthermore, it is low cost
and can handle web traffic and many other functionalities. It can be used as a server, a firewall
and can be used as the main board for robots, and many other small devices.
3. What range of distributions does Amazon Cloud Services and Google Cloud offer? Name at least three common ones and two different ones.
The common distributions between Amazon and Google Cloud Services are Ubuntu, CentOS and
Red Hat Enterprise Linux. Each cloud provider also offers specific distributions that the other
one doesn’t. Amazon has Amazon Linux and Kali Linux, while Google offers the use of FreeBSD
and Windows Servers.
Debian, Ubuntu and Linux Mint use the dpkg, apt-get and apt tools to install software
packages, generally referred as DEB packages. Distributions such as Red Hat, Fedora and CentOS use
the rpm, yum and dnf commands instead, which in turn install RPM packages. As the application
packaging is different for each distribution family
If your distribution works with
DEB packages, you can search its repositories using apt-cache search package_name or apt search
package_name. The apt-cache command is used to search for packages and to list information about
available packages. The following command looks for any occurrences of the term “figlet” in the
package’s names and descriptions:
The
installation and removal of a package require special permissions granted only to the system’s
administrator: the user named root. On desktop systems, ordinary users can install or remove
packages by prepending the command sudo to the installation/removal commands. That will require
you to type your password to proceed. For DEB packages, the installation is performed with the
command apt-get install package_name or apt install package_name
In distributions based on RPM packages, searches are performed using yum search package_name or
dnf search package_name. Let’s say you want to display some text in a more irreverent way,
followed by a cartoonish cow, but you are not sure about the package that can perform that task. As
with the DEB packages, the RPM search commands accept descriptive terms:
$ yum search speaking cow
$ cowsay "Brought to you by yum"
The same commands used to install packages are used to remove them. All the commands accept the
remove keyword to uninstall an installed package: apt-get remove package_name or apt remove
package_name for DEB packages and yum remove package_name or dnf remove package_name for
RPM packages. The sudo command is also needed to perform the removal. For example, to remove
the previously installed package figlet from a DEB-based distribution:
$ sudo apt-get remove figlet
A similar procedure is performed on an RPM-based system. For example, to remove the previously
installed package cowsay from an RPM-based distribution:
$ sudo yum remove cowsay
The two projects have the same basic features and are compatible with the document formats from
Microsoft Office. However, the preferred document format is the Open Document Format, a fully open
and ISO standardized file format. The use of ODF files ensures that documents can be transferred
between operating systems and applications from different vendors, such as Microsoft Office. The
main applications offered by OpenOffice/LibreOffice are:
Writer
- Text editor
Calc
- Spreadsheets
Impress
- Presentations
Draw
- Vector drawing
Math
- Math formulas
Base
- Database
Both LibreOffice and Apache OpenOffice are open source software, but LibreOffice is licensed under
LGPLv3 and Apache OpenOffice is licensed under Apache License 2.0. The licensing distinction
implies that LibreOffice can incorporate improvements made by Apache OpenOffice, but Apache
OpenOffice cannot incorporate improvements made by LibreOffice. That, and a more active
community of developers, are the reason most distributions adopt LibreOffice as their default office
suite.
One of the main differences between a GNU General Public License (GNU GPL or simply GPL), the most common type of license for free software, and a BSD license is that GPL are copyleft, meaning any derivative code produced from the original free open source code must remain free. BSD licenses may allow for derivative works to become paid software.
Linux has a modular approach where different parts of the system are developed by different
projects and developers, each one filling a specific need or objective. Because of that, there are
several options of desktop environments to choose from and together with package managers, the
default desktop environment is one of the main differences among the many distributions out there.
Unlike proprietary operating systems like Windows and macOS, where the users are restricted to the
desktop environment that comes with their OS, there is the possibility to install multiple
environments and pick the one that adapts the most to you and your needs.
Basically, there are two major desktop environments in the Linux world: Gnome and KDE. They are
both very complete, with a large community behind them and aim for the same purpose but with
slightly divergent approaches. In a nutshell, Gnome tries to follow the KISS (“keep it simple stupid”)
principle, with very streamlined and clean applications. On the other hand, KDE has another
perspective with a larger selection of applications and giving the user the opportunity to change
every configuration setting in the environment.
While Gnome applications are based on the GTK toolkit (written in the C language), KDE
applications make use of the Qt library (written in C++). One of the most practical aspects of writing
applications with the same graphical toolkit is that applications will tend to share a similar look and
feel, which is responsible for giving the user a sense of unity during their experience. Another
important characteristic is that having the same shared graphical library for many frequently used
applications may save some memory space at the same time that it will speed up loading time once
the library has been loaded for the first time.
Industry Uses of Linux
Linux is heavily used among the software and Internet industries. Sites like W3Techs report that
about 68% of the website servers on the Internet are powered by Unix and the biggest portion of
those are known to be Linux.
This large adoption is given not only for the free nature of Linux (as both in free beer and in freedom
of speech) but also for its stability, flexibility and performance. These characteristics allow vendors
to offer their services with a lower cost and a greater scalability. A significant portion of Linux
systems nowadays run in the cloud, either on a IaaS (Infrastructure as a service), PaaS (Platform as a
Service) or SaaS (Software as a Service) model.
IaaS is a way to share the resources of a large server by offering them access to virtual machines that
are, in fact, multiple operating systems running as guests on a host machine over an important piece
of software that is called a hypervisor. The hypervisor is responsible for making it possible for these
guest OSs to run by segregating and managing the resources available on the host machine to those
guests. That’s what we call virtualization. In the IaaS model, you pay only for the fraction of
resources your infrastructure uses.
Linux has three well know open source hypervisors: Xen, KVM and VirtualBox. Xen is probably the oldest of them. KVM ran out Xen as the most prominent Linux Hypervisor. It has its development
sponsored by RedHat and it is used by them and other players, both in public cloud services and in
private cloud setups. VirtualBox belongs to Oracle since its acquisition of Sun Microsystems and is
usually used by end users because of its easiness of use and administration.
PaaS and SaaS, on the other hand, build up on the IaaS model, both technically and conceptually. In
PaaS instead of a virtual machine, the users have access to a platform where it will be possible to
deploy and run their application. The goal here is to ease the burden of dealing with system
administration tasks and operating systems updates. Heroku is a common PaaS example where
program code can just be run without taking care of the underlying containers and virtual machines.
Lastly, SaaS is the model where you usually pay for a subscription in order to just use a software
without worrying about anything else. Dropbox and Salesforce are two good examples of SaaS. Most
of these services are accessed through a web browser.
A project like OpenStack is a collection of open source software that can make use of different
hypervisors and other tools in order to offer a complete IaaS cloud environment on premise, by leveraging the power of computer cluster on your own datacenter. However, the setup of such
infrastructure is not trivial.
Privacy Issues when using the Internet
The web browser is a fundamental piece of software on any desktop these days, but some people still
lack the knowledge to use it securely. While more and more services are accessed through a web
browser, almost all actions done through a browser are tracked and analyzed by various parties.
Securing access to internet services and preventing tracking is an important aspect of using the
internet in a safe manner.
Cookie Tracking
Let’s assume you have browsed an e-commerce website, selected a product you wanted and placed
that in the shopping cart. But at the last second, you have decided to give it a second thought and
think a little longer if you really needed that. After a while, you start seeing ads of that same product
following you around the web. When clicking on the ads, you are immediately sent to the product
page of that store again. It’s not uncommon that the products you placed in the shopping cart are
still there, just waiting for you to decide to check them out. Have you ever wondered how they do
that? How they show you the right ad at another web page? The answer for these questions is called
cookie tracking.
Cookies are small files a website can save on your computer in order to store and retrieve some kind
of information that can be useful for your navigation. They have been in use for many years and are
one of the oldest ways to store data on the client side. One good example of their use are unique
shopping card IDs. That way, if you ever come back to the same website in a few days, the store can
remember you the products you’ve placed in your cart during your last visit and save you the time
to find them again.
That’s usually okay, since the website is offering you a useful feature and not sharing any data with
third parties. But what about the ads that are shown to you while you surf on other web pages?
That’s where the ad networks come in. Ad networks are companies that offer ads for e-commerce
sites like the one in our example on one side, and monetization for websites, on the other side.
Content creators like bloggers, for example, can make some space available for those ad networks on
their blog, in exchange for a commission related to the sales generated by that ad.
But how do they know what product to show you? They usually do that by saving also a cookie
from the ad network at the moment you visited or searched for a certain product on the e-commerce
website. By doing that, the network is able to retrieve the information on that cookie wherever the
network has ads, making the correlation with the products you were interested. This is usually one
of the most common ways to track someone over the Internet. The example we gave above makes
use of an e-commerce to make things more tangible, but social media platforms do the same with
their “Like” or “Share” buttons and their social login.
One way you can get rid of that is by not allowing third party websites to store cookies on your
browser. This way, only the website you visit can store their cookies. But be aware that some
“legitimate” features may not work well if you do that, because many sites today rely on third party
services to work. So, you can look for a cookie manager at your browser’s add-on repository in order
to have a fine-grained control of which cookies are being stored on your machine.
Do Not Track (DNT)
Another common misconception is related to a certain browser configuration better known as DNT.
That’s an acronym for “Do Not Track” and it can be turned on basically on any current browser.
Similarly to the private mode, it’s not hard to find people that believe they will not be tracked if they
have this configuration on. Unfortunately, that’s not always true. Currently, DNT is just a way for
you to tell the websites you visit that you do not want them to track you. But, in fact, they are the
ones who will decide if they will respect your choice or not. In other words, DNT is a way to opt-out
from website tracking, but there is no guarantee on that choice.
Technically, this is done by simply sending an extra flag on the header of the HTTP request protocol
(DNT: 1) upon requesting data from a web server. If you want to know more about this topic, the
website https://allaboutdnt.com is good starting point.
“Private” Windows
You might have noticed the quotes in the heading above. This is because those windows are not as
private as most people think. The names may vary but they can be called “private mode”, “incognito”
or “anonymous” tab, depending on which browser you are using.
In Firefox, you can easily use it by pressing Ctrl + Shift + P keys. In Chrome, just press Ctrl + Shift +
N . What it actually does is open a brand new session, which usually doesn’t share any configuration
or data from your standard profile. When you close the private window, the browser will
automatically delete all the data generated by that session, leaving no trace on the computer used.
This means that no personal data, like history, passwords or cookies are stored on that computer.
Thus, many people misunderstand this concept by believing that they can browse anonymous on the
Internet, which is not completely true. One thing that the privacy or incognito mode does is avoid
what we call cookie tracking. When you visit a website, it can store a small file on your computer
which may contain an ID that can be used to track you. Unless you configure your browser to not
accept third-party cookies, ad networks or other companies can store and retrieve that ID and
actually track your browsing across websites. But since the cookies stored on a private mode session
are deleted right after you close that session, that information is forever lost.
Besides that, websites and other peers on the Internet can still use plenty other techniques in order
to track you. So, private mode brings you some level of anonymity but it’s completely private only
on the computer you are using. If you are accessing your email account or banking website from a
public computer, like in an airport or a hotel, you should definitely access those using your
browser’s private mode. In other situations, there can be benefits but you should know exactly what
risks you are avoiding and which ones have no effect. Whenever you use a public accessible
computer, be aware that other security threats such as malware or key loggers might exist. Be
careful whenever you enter personal information, including usernames and passwords, on such
computers or when you download or copy confidential data.
Choosing the Right Password
One of the most difficult situations any user faces is choosing a secure password for the services
they make use of. You have certainly heard before that you should not use common combinations
like qwerty, 123456 or 654321, nor easily guessable numbers like your (or a relative’s) birthday or zip
code. The reason for that is because those are all very obvious combinations and the first attempts an
invader will try in order to gain access to your account.
There are known techniques for creating a safe password. One of the most famous is making up a
sentence which reminds you of that service and picking the first letters of each word. Let’s assume I
want to create a good password for Facebook, for example. In this case, I could come up with a
sentence like “I would be happy if I had a 1000 friends like Mike”. Pick the first letter of each word
and the final password would be IwbhiIha1000flM. This would result in a 15 characters password
which is long enough to be hard to guess and easy to remember at the same time (as long as I can
remember the sentence and the “algorithm” for retrieving the password).
Sentences are usually easier to remember than the passwords but even this method has its
limitations. We have to create passwords for so many services nowadays and as we use them with
different frequencies, it will eventually be very difficult to remember all the sentences at the time we
need them. So what can we do? You may answer that the wisest thing to do in this case is creating a
couple good passwords and reuse them on similar services, right?
Unfortunately, that’s also not a good idea. You probably also heard you should not reuse the same
password among different services. The problem of doing such a thing is that a specific service may
leak your password (yes, it happens all the time) and any person who have access to it will try to use
the same email and password combination on other popular services on the Internet in hope you
have done exactly that: recycled passwords. And guess what? In case they are right you will end up
having a problem not only on just one service but on several of them. And believe me, we tend to
think it’s not going to happen to us until it’s too late.
So, what can we do in order to protect ourselves? One of the most secure approaches available today
is using what is called a password manager. Password managers are a piece of software that will
essentially store all your passwords and usernames in an encrypted format which can be decrypted
by a master password. This way you only need to remember one good password since the manager
will keep all the others safe for you.
KeePass is one of the most famous and feature rich open source password managers available. It will
store your passwords in an encrypted file within your file system. The fact it’s open source is an
important issue for this kind of software since it guarantees they will not make any use of your data
because any developer can audit the code and know exactly how it works. This brings a level of
transparency that’s impossible to reach with proprietary code. KeePass has ports for most operating
systems, including Windows, Linux and macOS; as well as mobile ones like iOS and Android. It also
includes a plugin system that is able to extend it’s functionality far beyond the defaults.
Bitwarden is another open source solution that has a similar approach but instead of storing your
data in a file, it will make use of a cloud server. This way, it’s easier to keep all your devices
synchronized and your passwords easily accessible through the web. Bitwarden is one of the few
projects that will make not only the clients, but also the cloud server available as an open source
software. This means you can host your own version of Bitwarden and make it available to anyone,
like your family or your company employees. This will give you flexibility but also total control over
how their passwords are stored and used.
One of the most important things to keep in mind when using a password manager is creating a
random password for each different service since you will not need to remind them anyway. It
would be worthless if you use a password manager to store recycled or easily guessable passwords.
Thus, most of them will offer you a random password generator you can use to create those for you.
Encryption
Whenever data is transferred or stored, precautions need to be taken to ensure that third parties may
not access the data. Data transferred over the internet passes by a series of routers and networks
where third parties might be able to access the network traffic. Likewise, data stored on physical
media might be read by anyone who comes into possession of that media. To avoid this kind of
access, confidential information should be encrypted before it leaves a computing device.
TLS
Transport Layer Security (TLS) is a protocol to offer security over network connections by making
use of cryptography. TLS is the successor of the Secure Sockets Layer (SSL) which has been
deprecated because of serious flaws. TLS has also evolved a couple of times in order to adapt itself
and become more secure, thus it’s current version is 1.3. It can provide both privacy, and
authenticity by making use of what is called symmetric and public-key cryptography. By saying
that, we mean that once in use, you can be sure that nobody will be able to eavesdrop or alter your
communication with that server during that session.
The most important lesson here is recognizing that a website is trustworthy. You should look for the
“lock” symbol on the browser’s address bar. If you desire, you can click on it to inspect the certificate
that plays an important role in the HTTPS protocol.
TLS is what is used on the HTTPS protocol (HTTP over TLS) in order to make it possible to send
sensitive data (like your credit card number) through the web. Explaining how TLS works goes way
beyond the purpose of this article, but you can find more information on the Wikipedia and at the
Mozilla wiki.
File and E-mail Encryption With GnuPG
There are plenty of tools for securing emails but one of the most important of them is certainly
GnuPG. GnuPG stands for GNU Privacy Guard and it is an open source implementation of OpenPGP
which is an international standard codified within RFC 4880.
GnuPG can be used to sign, encrypt, and decrypt texts, e-mails, files, directories, and even whole
disk partitions. It works with public-key cryptography and is widely available. In a nutshell GnuPG
creates a pair of files which contain your public and private keys. As the name implies, the public
key can be available to anyone and the private key needs to be kept in secret. People will use your
public key to encrypt data which only your private key will be able to decrypt.
You can also use your private key to sign any file or e-mail which can be validated against the
corresponding public key. This digital signage works analogous to the real world signature. As long
as you are the only one who posses your private key, the receiver can be sure that it was you who
have authored it. By making use of the cryptographic hash functionality GnuPG will also guarantee
no changes have been made after the signature because any changes to the content would invalidate
the signature.
GnuPG is a very powerful tool and, in a certain extent, also a complex one. You can find more
information on its website and on Archlinux wiki (Archlinux wiki is a very good source of
information, even though you don’t use Archlinux).
Disk Encryption
A good way to secure your data is to encrypt your whole disk or partition. There are many open
source softwares you can use to achieve such a purpose. How they work and what level of
encryption they offer also varies significantly. There are basic two methods available: stacked and
block device encryption.
Stacked filesystem solutions are implemented on top of existing filesystem. When using this method,
the files and directories are encrypted before being stored on the filesystem and decrypted after
reading them. This means the files are stored on the host filesystem in an encrypted form (meaning
that their contents, and usually also their file/folder names, are replaced by random-looking data),
but other than that, they still exist in that filesystem as they would without encryption, as normal
files, symlinks, hardlinks, etc.
On the other hand, block device encryption happens below the filesystem layer, making sure
everything that is written to a block device is encrypted. If you look to the block while it’s offline, it
will look like a large section of random data and you won’t even be able to tell what type of
filesystem is there without decrypting it first. This means you can’t tell what is a file or directory;
how big they are and what kind of data it is, because metadata, directory structure and permissions
are also encrypted.
Both methods have their own pros and cons. Among all the options available, you should take a look
at dm-crypt, which is the de-facto standard for block encryption for Linux systems, since it’s native
in the kernel. It can be used with LUKS (Linux Unified Key Setup) extension, which is a specification
that implements a platform-independent standard for use with various tools.
If you want to try a stackable method, you should take a look at EncFS, which is probably the easiest
way to secure data on Linux because it does not require root privileges to implement and it can work
on an existing filesystem without modifications.
Finally, if you need to access data on various platforms, check out Veracrypt. It is the successor of a
Truecrypt and allows the creation of encrypted media and files, which can be used on Linux as well
as on macOS and Windows.
Use your web browser to navigate to https://haveibeenpwned.com/. Find out the purpose of the
website and check if your email address was included in some data leaks.
The website maintains a database of login information whose passwords were affected by a
password leak. It allows searching for an email address and shows if that email address was
included in a public database of stolen credentials. Chances are that your email address is
affected by one or the other leak, too. If that is the case, make sure you have updated your
passwords recently. If you don’t already use a password manager, take a look at the ones
recommended in this lesson.
Command Line Basics
The shell is a
program that enables text based communication between the operating system and the user. It is
usually a text mode program that reads the user’s input and interprets it as commands to the system.
There are several different shells on Linux, these are just a few:
• Bourne-again shell (Bash)
• C shell (csh or tcsh, the enhanced csh)
• Korn shell (ksh)
• Z shell (zsh)
On Linux the most common one is the Bash shell.
Option(s)/Parameter(s)
ls -l /home
-l is Option(s)/Parameter(s)
Argument(s)
Additional data that is required by the program, like a filename or path, such as /home in the
above example.
The shell supports two types of commands:
Internal
These commands are part of the shell itself and are not separate programs. There are around 30
such commands. Their main purpose is executing tasks inside the shell (e.g. cd, set, export).
cd
command
compgen
complete
compopt
continue
declare
dirs
disown
echo
enable
eval
exec
exit
export
false
fc
fg
getopts
hash
help
history
jobs
kill
let
local
logout
mapfile
popd
printf
pushd
pwd
read
readarray
readonly
return
set
shift
shopt
External
These commands reside in individual files. These files are usually binary programs or scripts.
When a command which is not a shell builtin is run, the shell uses the PATH variable to search for
an executable file with same name as the command. In addition to programs which are installed
with the distribution’s package manager, users can create their own external commands as well.
The command type shows what type a specific command is:
$ type echo
echo is a shell builtin
$ type man
man is /usr/bin/man
Quoting
As a Linux user, you will have to create or manipulate files or variables in various ways. This is easy
when working with short filenames and single values, but it becomes more complicated when, for
example, spaces, special characters and variables are involved. Shells provide a feature called quoting
which encapsulates such data using various kinds of quotes (" ", ' '). In Bash, there are three types of
quotes:
• Double quotes
• Single quotes
• Escape characters
For example, the following commands do not act in the same way due to quoting:
$ TWOWORDS="two words"
$ touch $TWOWORDS
$ ls -l
-rw-r--r-- 1 carol carol 0 Mar 10 14:56 two
-rw-r--r-- 1 carol carol 0 Mar 10 14:56 words
$ touch "$TWOWORDS"
$ ls -l
-rw-r--r-- 1 carol carol 0 Mar 10 14:56 two
-rw-r--r-- 1 carol carol 0 Mar 10 14:58 'two words'
-rw-r--r-- 1 carol carol 0 Mar 10 14:56 words
$ touch '$TWOWORDS'
$ ls -l
-rw-r--r-- 1 carol carol 0 Mar 10 15:00 '$TWOWORDS'
-rw-r--r-- 1 carol carol 0 Mar 10 14:56 two
-rw-r--r-- 1 carol carol 0 Mar 10 14:58 'two words'
-rw-r--r-- 1 carol carol 0 Mar 10 14:56 words
Double Quotes
Double quotes tell the shell to take the text in between the quote marks ("...") as regular characters.
All special characters lose their meaning, except the $ (dollar sign), \ (backslash) and ` (backquote).
This means that variables, command substitution and arithmetic functions can still be used.
For example, the substitution of the $USER variable is not affected by the double quotes:
$ echo I am $USER
I am tom
$ echo "I am $USER"
I am tom
A space character, on the other hand, loses its meaning as an argument separator:
$ touch new file
$ ls -l
-rw-rw-r-- 1 tom students 0 Oct 8 15:18 file
-rw-rw-r-- 1 tom students 0 Oct 8 15:18 new
$ touch "new file"
$ ls -l
-rw-rw-r-- 1 tom students 0 Oct 8 15:19 new file
As you can see, in the first example, the touch command creates two individual files, the command
interprets the two strings as individual arguments. In the second example, the command interprets
both strings as one argument, therefore it only creates one file. It is, however, best practice to avoid
the space character in filenames. Instead, an underscore (_) or a dot (.) could be used.
Single Quotes
Single quotes don’t have the exceptions of the double quotes. They revoke any special meaning from
each character. Let’s take one of the first examples from above:
$ echo I am $USER
I am tom
When applying the single quotes you see a different result:
$ echo 'I am $USER'
I am $USER
The command now displays the exact string without substituting the variable.
Escape Characters
We can use escape characters to remove special meanings of characters from Bash. Going back to the
$USER environment variable:
$ echo $USER
carol
We see that by default, the contents of the variable are displayed in the terminal. However, if we
were to precede the dollar sign with a backslash character (\) then the special meaning of the dollar
sign will be negated. This in turn will not let Bash expand the variable’s value to the username of the
person running the command, but will instead interpret the variable name literally:
$ echo \$USER
$USER
If you recall, we can get similar results to this using the single quote, which prints the literal
contents of whatever is between the single quotes. However the escape character works differently
by instructing Bash to ignore whatever special meaning the character it precedes may possess.
In most Linux shells, there are two types of variables:
Local variables
These variables are available to the current shell process only
Environment variables
These variables are available both in a specific shell session and in sub processes spawned from
that shell session. Theses variables can be used to pass configuration data to commands which
are run. Because these programs can access these variables, they are called environment variables.
Working with Local Variables
You can set up a local variable by using the = (equal) operator. A simple assignment will create a
local variable:
$ greeting=hello
NOTE -> Don’t put any space before or after the = operator
In order to remove a variable, you will need to use the command unset:
Working with Global Variables
To make a variable available to subprocesses, turn it from a local into an environment variable. This
is done by the command export. When it is invoked with the variable name, this variable is added to
the shell’s environment:
$ greeting=hello
$ export greeting
An easier way to create the environment variable is to combine both of the above methods, by
assigning the variable value in the argument part of the command.
$ export greeting=hey
The PATH Variable
The PATH variable is one of the most important environment variables in a Linux system. It stores a
list of directories, separated by a colon, that contain executable programs eligible as commands from
the Linux shell.
$ echo $PATH
/home/user/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/ga
mes
To append a new directory to the variable, you will need to use the colon sign (:).
$ PATH=$PATH:new_directory
Here an example:
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
$ PATH=$PATH:/home/user/bin
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/user/bin
As you see, $PATH is used in the new value assigned to PATH. This variable is resolved during the
command execution and makes sure that the original content of the variable is preserved. Of course,
you can use other variables in the assignment as well:
$ mybin=/opt/bin
$ PATH=$PATH:$mybin
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/user/bin:/opt/bin
The PATH variable needs to be handled with caution, as it is crucial for working on the command line.
Let’s consider the following PATH variable:
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
To find out how the shell invokes a specific command, which can be run with the command’s name
as argument. We can, for example, try to find out where nano is stored:
$ which nano
/usr/bin/nano
$ today1=$(date)
$ echo $today1
Thu 31 Jan 10:07:35 EST 2019
Getting Help on the Command Line
Built-in Help
When started with the --help parameter, most commands display some brief instructions about
their usage. Although not all commands provide this switch, it is still a good first try to learn more
about the parameters of a command. Be aware that the instructions from --help often are rather
brief compared to the other sources of documentation which we will discuss in the rest of this
lesson.
Man Pages
Most commands provide a manual page or “man” page. This documentation is usually installed with
the software and can be accessed with the man command. The command whose man page should be
displayed is added to man as an argument:
$ man mkdir
Info Pages
Another tool that will help you while working with the Linux system are the info pages. The info
pages are usually more detailed than the man pages and are formatted in hypertext, similar to web
pages on the Internet.
$ info mkdir
Locating files
The locate command
A Linux system is built from numerous directories and files. Linux has many tools to locate a
particular file within a system. The quickest one is the command locate.
locate searches within a database and then outputs every name that matches the given string
$ sudo updatedb
The file is newly created, therefore there is no record of it in the database
The find command
find is another very popular tool that is used to search for files. This command has a different
approach, compared to the locate command. find command searches a directory tree recursively,
including its subdirectories. find does such a search at each invocation, it does not maintain a
database like locate. Similar to locate, find also supports wildcards and regular expressions.
File and Directory Names
Filenames can contain a suffix which comes after the
period (.). Unlike Windows, this suffix has no special meaning in Linux; it is there for human
understanding. In our example .txt indicates to us that this is a plaintext file, although it could
technically contain any kind of data.
2.3 Lesson 2
One last thing to note: we can specify the home directories of other users by specifying the username
after the tilde.
root@Quantiphi-2550:~# cd ~vicky
root@Quantiphi-2550:/home/vicky# ll
with the recursive option, we get a far longer list of files
ls -R ~
The find program is usually used to search for files and directories, but without any options, it will
show you a listing of all the files, directories, and sub-directories of your current directory.
Since touch creates empty files, you should get no output. You can use echo with > to create simple
text files. Try it:
$ echo hello > question15
$ cat question15
hello
Be careful when using >! If the named file already exists, it will be overwritten!
Use the -i option to make mv prompt you if you are about to overwrite an existing file.
The rm command can delete files and directories, while the rmdir command can only delete directories
By default rmdir can only delete empty directories, therefore we had to use rm to delete a regular file
Globbing
*
Matches any number of any character, including no characters
?
Matches any one character
[]
Matches a class of characters
The [] brackets are used to match ranges or classes of characters. The [] brackets work like they do
in POSIX regular expressions except with globs the ^ is used instead of !.
The ? expands to any single character. Try the following commands to see for yourself
$ ls
question1 question14 question2012 star10 star2002
question13 question15 question23 star1100 star2013
$ ls question?
question1
$ ls question1?
question13 question14 question15
$ ls question?3
question13 question23
$ ls question13?
ls: cannot access question13?: No such file or directory
Ranges within [] brackets are expressed using a -:
$ ls
file1 file2 file3 file4 file5 file6 file7 filea fileb filec
$ ls file[1-2]
file1 file2
$ ls file[1-3]
file1 file2 file3
$ ls file[1-25-7]
file1 file2 file5 file6 file7
$ ls file[1-35-6a-c]
file1 file2 file3 file5 file6 filea fileb filec
You can also use the ^ character as the first character to match everything except certain characters.
$ ls file[^a]
file1 file2 file3 file4 file5 file6 file7 fileb filec
$ ls file[[:digit:]a]
file1 file2 file3 file4 file5 file6 file7 filea
$ ls file[[:digit:]]a
file1a
Archiving Files on the Command Line
Compression is used to reduce the amount of space a specific set of data consumes. Compression is
commonly used for reducing the amount of space that is needed to store a file. Another common use
is to reduce the amount of data sent over a network connection.
Compression works by replacing repetitive patterns in data. Suppose you have a novel. Some words
are extremely common but have multiple characters, such as the word “the”.
Compression comes in two varieties, lossless and lossy. Things compressed with a lossless algorithm
can be decompressed back into their original form. Data compressed with a lossy algorithm cannot
be recovered. Lossy algorithms are often used for images, video, and audio where the quality loss is
imperceptible to humans, irrelevant to the context, or the loss is worth the saved space or network
throughput.
Archive and compression are commonly used together. Some archiving tools even compress their
contents by default. Others can optionally compress their contents. A few archive tools must be used
in conjunction with stand-alone compression tools if you wish to compress the contents.
The most common tool for archiving files on Linux systems is tar. Most Linux distributions ship
with the GNU version of tar, so it is the one that will be covered in this lesson. tar on its own only
manages the archiving of files but does not compress them.
There are lots of compression tools available on Linux. Some common lossless ones are bzip2, gzip,
and xz. You will find all three on most systems. You may encounter an old or very minimal system
where xz or bzip is not installed. If you become a regular Linux user, you will likely encounter files
compressed with all three of these. All three of them use different algorithms, so a file compressed
with one tool can’t be decompressed by another. Compression tools have a trade off. If you want a
high compression ratio, it will take longer to compress and decompress the file. This is because
higher compression requires more work finding more complex patterns. All of these tools compress
data but can not create archives containing multiple files.
Stand-alone compression tools aren’t typically available on Windows systems. Windows archiving
and compression tools are usually bundled together. Keep this in mind if you have Linux and
Windows systems that need to share files.
Linux systems also have tools for handling .zip files commonly used on Windows system. They are
called zip and unzip. These tools are not installed by default on all systems, so if you need to use
them you may have to install them. Fortunately, they are typically found in distributions' package
repositories
Compression Tools
How much disk space is saved by compressing files depends on a few factors. The nature of the data
you are compressing, the algorithm used to compress the data, and the compression level. Not all
algorithms support different compression levels.
Some compression tools support different compression levels. A higher compression level usually
requires more memory and CPU cycles, but results in a smaller compressed file. The opposite is true
It is not necessary to decompress a file every time you use it. Compression tools typically come with
special versions of common tools used to read text files. For example, gzip has a version of cat, grep,
diff, less, more, and a few others. For gzip, the tools are prefixed with a z, while the prefix bz exists
for bzip2 and xz exists for xz. Below is an example of using zcat to read display a file compressed
with gzip:
$ cp /etc/hosts ./
$ gzip hosts
$ zcat hosts.gz
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Archiving Tools
The tar program is probably the most widely used archiving tool on Linux systems. In case you are
wondering why it is named how it is, it as an abbreviation for “tape archive”. Files created with tar
are often called tar balls. It is very common for applications distributed as source code to be in tar
balls.
The tar program can also manage compression and decompression of archives on the fly. tar does
so by calling one of the compression tools discussed earlier in this section. It is as simple as adding
the option appropriate to the compression algorithm. The most commonly used ones are j, J, and z
for bzip2, xz, and gzip, respectively. Below are examples using the aforementioned algorithms:
$ cd ~/linux_essentials-3.1/compression
$ ls
bigfile bigfile3 bigfile-gz1.gz bigfile-xz1.xz hosts.gz
bigfile2 bigfile4 bigfile-gz9.gz bigfile-xz9.xz
$ tar -czf gzip.tar.gz bigfile bigfile2 bigfile3
$ tar -cjf bzip2.tar.bz2 bigfile bigfile2 bigfile3
$ tar -cJf xz.tar.xz bigfile bigfile2 bigfile3
$ ls -l | grep tar
-rw-r--r-- 1 emma emma 450202 Jun 27 05:56 bzip2.tar.bz2
-rw-r--r-- 1 emma emma 548656 Jun 27 05:55 gzip.tar.gz
-rw-r--r-- 1 emma emma 147068 Jun 27 05:56 xz.tar.xz
Compression is a process of taking some input data, and by using some sophisticated algorithm, compressing it (transform the bits, effectively), in order to have the same entity that weighs less size.
This is useful if you want to keep more data in a less space (space is always limited resource), or if you just want to have a faster file-transfer throughout networks.
Popular compression utility programs, on Linux distributions, are:
gzip (frequently used);
bzip2 (less frequently used, yet produces smaller output file than gzip);
xz (most space-efficient tool, in Linux, so far)
zip (often used for decompressing data, that was compressed on other systems using zip, like Windows OS).
Note, that generally, more efficient compression method is, more time it takes.
Archiving, on the other hand, can be thought of like putting some different files into one box. If you have 5 files, each of a size of 10kb, archiving those will give you 5 x 10 = 50kb, and that is it.
Note, that on Linux, we have a very good program tar, which, when given an input, does both:
archives the input (first step);
and then compresses that archive.
The zip and unzip programs can be used to work with ZIP files on Linux systems.
zip -r zipfile.zip dir
unzip zipfile.zip
Searching and Extracting Data from Files
I/O Redirection
I/O redirection enables the user to redirect information from or to a command by using a text file. As
described earlier, the standard input, output and error output can be redirected, and the information
can be taken from text files.
Redirecting Standard Output
To redirect standard output to a file, instead of the screen, we need to use the > operator followed by
the name of the file. If the file doesn’t exist, a new one will be created, otherwise, the information
will overwrite the existing file.
Redirecting Standard Error
In order to redirect just the error messages, a user will need to employ the 2> operator followed by
the name of the file in which the errors will be written. If the file doesn’t exist, a new one will be
created, otherwise the file will be overwritten.
Redirecting Standard Input
This type of redirection is used to input data to a command, from a specified file instead of a
keyboard. In this case the < operator is used as shown in the example:
$ cat < text
Hello!
Hello to you too!
References:


0 Comments