Pipes, Redirection, and REGEX
A large number of the files in a typical filesystem are text files. Text files contain simply text, no formatting features that you might see in a word processing file.
Because there are so many of these files on a typical Linux system, a great number of commands exist to help users manipulate text files. There are commands to both view and modify these files in various ways.
In addition, there are features available for the shell to control the output of commands, so instead of having the output placed in the terminal window, the output can be redirected into another file or another command. These redirection features provide users with a much more flexible and powerful environment to work within.
Навигация по статье:
- 1.1 Command Line Pipes
- 1.2 I/O Redirection
- 1.2.1 STDIN
- 1.2.2 STDOUT
- 1.2.3 STDERR
- 1.2.4 Redirecting STDOUT
- 1.2.5 Redirecting STDERR
- 1.2.6 Redirecting Multiple Streams
- 1.2.7 Redirecting STDIN
- 1.3 Searching for Files Using the Find Command
- 1.3.1 Search by File Name
- 1.3.2 Displaying File Detail
- 1.3.3 Searching for Files by Size
- 1.3.4 Additional Useful Search Options
- 1.3.5 Using Multiple Options
- 1.4 Viewing Files Using the less Command
- 1.4.1 Help Screen in less
- 1.4.2 less Movement Commands
- 1.4.3 less Searching Commands
- 1.5 Revisiting the head and tail Commands
- 1.5.1 Negative Value with the -n Option
- 1.5.2 Positive Value With the tail Command
- 1.5.3 Following Changes to a File
- 1.6 Sorting Files or Input
- 1.6.1 Fields and Sort Options
- 1.7 Viewing File Statistics With the wc Command
- 1.8 Using the cut Command to Filter File Contents
- 1.9 Using the grep Command to Filter File Contents
- 1.10 Basic Regular Expressions
- 1.10.1 Basic Regular Expressions - the . Character
- 1.10.2 Basic Regular Expressions - the [ ] Characters
- 1.10.3 Basic Regular Expressions - the * Character
- 1.10.4 Basic Regular Expressions - the ^ and $ Characters
- 1.10.5 Basic Regular Expressions - the \ Character
- 1.11 Extended Regular Expressions
- 1.12 xargs Command
1.1 Command Line Pipes
Previous chapters discussed how to use individual commands to perform actions on the operating system, including how to create/move/delete files and move around the system. Typically, when a command has output or generates an error, the output is displayed to the screen; however, this does not have to be the case.
The pipe |
character can be used to send the output of one command to another. Instead of being printed to the screen, the output of one command becomes input for the next command. This can be a powerful tool, especially when looking for specific data; piping is often used to refine the results of an initial command.
The head
and tail
commands will be used in many examples below to illustrate the use of pipes. These commands can be used to display only the first few or last few lines of a file (or, when used with a pipe, the output of a previous command).
By default the head
and tail
commands will display ten lines. For example, the following command will display the first ten lines of the /etc/sysctl.conf
file:
sysadmin@localhost:~$ head /etc/sysctl.conf # # /etc/sysctl.conf - Configuration file for setting system variables # See /etc/sysctl.d/ for additional system variables # See sysctl.conf (5) for information. # #kernel.domainname = example.com # Uncomment the following to stop low-level messages on console #kernel.printk = 3 4 1 3 sysadmin@localhost:~$
In the next example, the last ten lines of the file will be displayed:
sysadmin@localhost:~$ tail /etc/sysctl.conf # Do not send ICMP redirects (we are not a router) #net.ipv4.conf.all.send_redirects = 0 # # Do not accept IP source route packets (we are not a router) #net.ipv4.conf.all.accept_source_route = 0 #net.ipv6.conf.all.accept_source_route = 0 # # Log Martian Packets #net.ipv4.conf.all.log_martians = 1 # sysadmin@localhost:~$
The pipe character will allow users to utilize these commands not only on files, but on the output of other commands. This can be useful when listing a large directory, for example the /etc
directory:
ca-certificates insserv nanorc services ca-certificates.conf insserv.conf network sgml calendar insserv.conf.d networks shadow cron.d iproute2 nologin shadow- cron.daily issue nsswitch.conf shells cron.hourly issue.net opt skel cron.monthly kernel os-release ssh cron.weekly ld.so.cache pam.conf ssl crontab ld.so.conf pam.d sudoers dbus-1 ld.so.conf.d passwd sudoers.d debconf.conf ldap passwd- sysctl.conf debian_version legal perl sysctl.d default locale.alias pinforc systemd deluser.conf localtime ppp terminfo depmod.d logcheck profile timezone dpkg login.defs profile.d ucf.conf environment logrotate.conf protocols udev fstab logrotate.d python2.7 ufw fstab.d lsb-base rc.local update-motd.d gai.conf lsb-base-logging.sh rc0.d updatedb.conf groff lsb-release rc1.d vim group magic rc2.d wgetrc group- magic.mime rc3.d xml sysadmin@localhost:~$
If you look at the output of the previous command, you will note that first filename is ca-certificates
. But there are other files listed "above" that can only be viewed if the user uses the scroll bar. What if you just wanted to list the first few files of the /etc
directory?
Instead of displaying the full output of the above command, piping it to the head
command will display only the first ten lines:
sysadmin@localhost:~$ ls /etc | head adduser.conf adjtime alternatives apparmor.d apt bash.bashrc bash_completion.d bind bindresvport.blacklist blkid.conf sysadmin@localhost:~$
The full output of the ls
command is passed to the head
command by the shell instead of being printed to the screen. The head
command takes this output (from ls
) as "input data" and the output of head
is then printed to the screen.
Multiple pipes can be used consecutively to link multiple commands together. If three commands are piped together, the first command's output is passed to the second command. The output of the second command is then passed to the third command. The output of the third command would then be printed to the screen.
It is important to carefully choose the order in which commands are piped, as the third command will only see input from the output of the second. The examples below illustrate this using the nl
command. In the first example, the nl
command is used to number the lines of the output of a previous command:
sysadmin@localhost:~$ ls -l /etc/ppp | nl 1 total 44 2 -rw------- 1 root root 78 Aug 22 2010 chap-secrets 3 -rwxr-xr-x 1 root root 386 Apr 27 2012 ip-down 4 -rwxr-xr-x 1 root root 3262 Apr 27 2012 ip-down.ipv6to4 5 -rwxr-xr-x 1 root root 430 Apr 27 2012 ip-up 6 -rwxr-xr-x 1 root root 6517 Apr 27 2012 ip-up.ipv6to4 7 -rwxr-xr-x 1 root root 1687 Apr 27 2012 ipv6-down 8 -rwxr-xr-x 1 root root 3196 Apr 27 2012 ipv6-up 9 -rw-r--r-- 1 root root 5 Aug 22 2010 options 10 -rw------- 1 root root 77 Aug 22 2010 pap-secrets 11 drwxr-xr-x 2 root root 4096 Jun 22 2012 peers sysadmin@localhost:~$
In the next example, note that the ls
command is executed first and its output is sent to the nl
command, numbering all of the lines from the output of the ls
command. Then the tail
command is executed, displaying the last five lines from the output of the nl
command:
sysadmin@localhost:~$ ls -l /etc/ppp | nl | tail -5 7 -rwxr-xr-x 1 root root 1687 Apr 27 2012 ipv6-down 8 -rwxr-xr-x 1 root root 3196 Apr 27 2012 ipv6-up 9 -rw-r--r-- 1 root root 5 Aug 22 2010 options 10 -rw------- 1 root root 77 Aug 22 2010 pap-secrets 11 drwxr-xr-x 2 root root 4096 Jun 22 2012 peers sysadmin@localhost:~$
Compare the output above with the next example:
sysadmin@localhost:~$ ls -l /etc/ppp | tail -5 | nl 1 -rwxr-xr-x 1 root root 1687 Apr 27 2012 ipv6-down 2 -rwxr-xr-x 1 root root 3196 Apr 27 2012 ipv6-up 3 -rw-r--r-- 1 root root 5 Aug 22 2010 options 4 -rw------- 1 root root 77 Aug 22 2010 pap-secrets 5 drwxr-xr-x 2 root root 4096 Jun 22 2012 peers sysadmin@localhost:~$
Notice how the line numbers are different. Why is this?
In the second example, the output of the ls
command is first sent to the tail
command which "grabs" only the last five lines of the output. Then the tail
command sends those five lines to the nl
command, which numbers them 1
-5
.
Pipes can be powerful, but it is important to consider how commands are piped to ensure that the desired output is displayed.
1.2 I/O Redirection
Input/Output (I/O) redirection allows for command line information to be passed to different streams. Before discussing redirection, it is important to understand standard streams.
1.2.1 STDIN
Standard input, or STDIN, is information entered normally by the user via the keyboard. When a command prompts the shell for data, the shell provides the user with the ability to type commands that, in turn, are sent to the command as STDIN.
1.2.2 STDOUT
Standard output, or STDOUT, is the normal output of commands. When a command functions correctly (without errors) the output it produces is called STDOUT. By default, STDOUT is displayed in the terminal window (screen) where the command is executing.
1.2.3 STDERR
Standard error, or STDERR, are error messages generated by commands. By default, STDERR is displayed in the terminal window (screen) where the command is executing.
I/O redirection allows the user to redirect STDIN so data comes from a file and STDOUT/STDERR so output goes to a file. Redirection is achieved by using the arrow characters: <
and >
.
1.2.4 Redirecting STDOUT
STDOUT can be directed to files. To begin, observe the output of the following command which will display to the screen:
sysadmin@localhost:~$ echo "Line 1" Line 1 sysadmin@localhost:~$
Using the >
character the output can be redirected to a file:
sysadmin@localhost:~$ echo "Line 1" > example.txt sysadmin@localhost:~$ ls Desktop Downloads Pictures Templates example.txt test Documents Music Public Videos sample.txt sysadmin@localhost:~$ cat example.txt Line 1 sysadmin@localhost:~$
This command displays no output, because STDOUT was sent to the file example.txt
instead of the screen. You can see the new file with the output of the ls
command. The newly-created file contains the output of the echo
command when the file is viewed with the cat
command.
It is important to realize that the single arrow will overwrite any contents of an existing file:
sysadmin@localhost:~$ cat example.txt Line 1 sysadmin@localhost:~$ echo "New line 1" > example.txt sysadmin@localhost:~$ cat example.txt New line 1 sysadmin@localhost:~$
The original contents of the file are gone, replaced with the output of the new echo
command.
It is also possible to preserve the contents of an existing file by appending to it. Use "double arrow" >>
to append to a file instead of overwriting it:
sysadmin@localhost:~$ cat example.txt New line 1 sysadmin@localhost:~$ echo "Another line" >> example.txt sysadmin@localhost:~$ cat example.txt New line 1 Another line sysadmin@localhost:~$
Instead of being overwritten, the output of the most recent echo
command is added to the bottom of the file.
1.2.5 Redirecting STDERR
STDERR can be redirected in a similar fashion to STDOUT. STDOUT is also known as stream (or channel) #1. STDERR is assigned stream #2.
When using arrows to redirect, stream #1 is assumed unless another stream is specified. Thus, stream #2 must be specified when redirecting STDERR.
To demonstrate redirecting STDERR, first observe the following command which will produce an error because the specified directory does not exist:
sysadmin@localhost:~$ ls /fake ls: cannot access /fake: No such file or directory sysadmin@localhost:~$
Note that there is nothing in the example above that implies that the output is STDERR. The output is clearly an error message, but how could you tell that it is being sent to STDERR? One easy way to determine this is to redirect STDOUT:
sysadmin@localhost:~$ ls /fake > output.txt ls: cannot access /fake: No such file or directory sysadmin@localhost:~$
In the example above, STDOUT was redirected to the output.txt
file. So, the output that is displayed can't be STDOUT because it would have been placed in the output.txt
file. Because all command output goes either to STDOUT or STDERR, the output displayed above must be STDERR.
The STDERR output of a command can be sent to a file:
sysadmin@localhost:~$ ls /fake 2> error.txt sysadmin@localhost:~$ more error.txt ls: cannot access /fake: No such file or directory sysadmin@localhost:~$
In the command above, the 2>
indicates that all error messages should be sent to the file error.txt
.
1.2.6 Redirecting Multiple Streams
It is possible to direct both the STDOUT and STDERR of a command at the same time. The following command will produce both STDOUT and STDERR because one of the specified directories exists and the other does not:
sysadmin@localhost:~$ ls /fake /etc/ppp ls: cannot access /fake: No such file or directory /etc/ppp: chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers
If only the STDOUT is sent to a file, STDERR will still be printed to the screen:
sysadmin@localhost:~$ ls /fake /etc/ppp > example.txt ls: cannot access /fake: No such file or directory sysadmin@localhost:~$ cat example.txt /etc/ppp: chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers sysadmin@localhost:~$
If only the STDERR is sent to a file, STDOUT will still be printed to the screen:
sysadmin@localhost:~$ ls /fake /etc/ppp 2> error.txt /etc/ppp: hap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers sysadmin@localhost:~$ cat error.txt ls: cannot access /fake: No such file or directory sysadmin@localhost:~$
Both STDOUT and STDERR can be sent to a file by using &>
, a character set that means "both 1>
and 2>
”:
sysadmin@localhost:~$ ls /fake /etc/ppp &> all.txt sysadmin@localhost:~$ cat all.txt ls: cannot access /fake: No such file or directory /etc/ppp: chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers sysadmin@localhost:~$
Note that when you use &>
, the output appears in the file with all of the STDERR messages at the top and all of the STDOUT messages below all STDERR messages:
sysadmin@localhost:~$ ls /fake /etc/ppp /junk /etc/sound &> all.txt sysadmin@localhost:~$ cat all.txt ls: cannot access /fake: No such file or directory ls: cannot access /junk: No such file or directory /etc/ppp: chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers /etc/sound: events sysadmin@localhost:~$
If you don't want STDERR and STDOUT to both go to the same file, they can be redirected to different files by using both > and 2>
. For example:
sysadmin@localhost:~$ rm error.txt example.txt sysadmin@localhost:~$ ls Desktop Downloads Pictures Templates all.txt Documents Music Public Videos sysadmin@localhost:~$ ls /fake /etc/ppp > example.txt 2> error.txt sysadmin@localhost:~$ ls Desktop Downloads Pictures Templates all.txt example.txt Documents Music Public Videos error.txt sysadmin@localhost:~$ cat error.txt ls: cannot access /fake: No such file or directory sysadmin@localhost:~$ cat example.txt /etc/ppp: chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers sysadmin@localhost:~$
1.2.7 Redirecting STDIN
The concept of redirecting STDIN is a difficult one because it is more difficult to understand why you would want to redirect STDIN. With STDOUT and STDERR, the answer to why is fairly easy: because sometimes you want to store the output into a file for future use.
Most Linux users end up redirecting STDOUT routinely, STDERR on occasion and STDIN...well, very rarely. There are very few commands that require you to redirect STDIN because with most commands if you want to read data from a file into a command, you can just specify the filename as an argument to the command. The command will then look into the file.
For some commands, if you don't specify a filename as an argument, they will revert to using STDIN to get data. For example, consider the following cat
command:
sysadmin@localhost:~$ cat hello hello how are you? how are you? goodbye goodbye sysadmin@localhost:~$
In the example above, the cat
command wasn't provided a filename as an argument. So, it asked for the data to display on the screen from STDIN. The user typed hello
and then the cat
command displayed hello
on the screen. Perhaps this is useful for lonely people, but not really a good use of the cat
command.
However, perhaps if the output of the cat
command were redirected to a file, then this method could be used either to add to an existing file or to place text into a new file:
sysadmin@localhost:~$ cat > new.txt Hello How are you? Goodbye sysadmin@localhost:~$ cat new.txt Hello How are you? Goodbye sysadmin@localhost:~$
While the previous example demonstrates another advantage of redirecting STDOUT, it doesn't address why or how STDIN can be directed. To understand this, first consider a new command called tr
. This command will take a set of characters and translate them into another set of characters.
For example, suppose you wanted to capitalize a line of text. You could use the tr
command as follows:
sysadmin@localhost:~$ tr 'a-z' 'A-Z' watch how this works WATCH HOW THIS WORKS sysadmin@localhost:~$
The tr
command took the STDIN from the keyboard (watch how this works
) and converted all lower case letters before sending STDOUT to the screen (WATCH HOW THIS WORKS
).
It would seem that a better use of the tr
command would be to perform translation on a file, not keyboard input. However, the tr
command does not support filename arguments:
sysadmin@localhost:~$ more example.txt /etc/ppp: chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4 ipv6-down ipv6-up options pap-secrets peers sysadmin@localhost:~$ tr 'a-z' 'A-Z' example.txt tr: extra operand `example.txt' Try `tr --help' for more information sysadmin@localhost:~$
You can, however, tell the shell to get STDIN from a file instead of from the keyboard by using the <
character:
sysadmin@localhost:~$ tr 'a-z' 'A-Z' < example.txt /ETC/PPP: CHAP-SECRETS IP-DOWN IP-DOWN.IPV6TO4 IP-UP IP-UP.IPV6TO4 IPV6-DOWN IPV6-UP OPTIONS PAP-SECRETS sysadmin@localhost:~$
This is fairly rare because most commands do accept filenames as arguments. But, for those that do not, this method could be used to have the shell read from the file instead of relying on the command to have this ability.
One last note: In most cases you probably want to take the resulting output and place it back into another file:
sysadmin@localhost:~$ tr 'a-z' 'A-Z' < example.txt > newexample.txt sysadmin@localhost:~$ more newexample.txt /ETC/PPP: CHAP-SECRETS IP-DOWN IP-DOWN.IPV6TO4 IP-UP IP-UP.IPV6TO4 IPV6-DOWN IPV6-UP OPTIONS PAP-SECRETS sysadmin@localhost:~$
1.3 Searching for Files Using the Find Command
One of the challenges that users face when working with the filesystem, is trying to recall the location where files are stored. There are thousands of files and hundreds of directories on a typical Linux filesystem, so recalling where these files are located can pose challenges.
Keep in mind that most of the files that you will work with are ones that you create. As a result, you often will be looking in your own home directory to find files. However, sometimes you may need to search in other places on the filesystem to find files created by other users.
The find
command is a very powerful tool that you can use to search for files on the filesystem. This command can search for files by name, including using wildcard characters for when you are not certain of the exact filename. Additionally, you can search for files based on file metadata, such as file type, file size and file ownership.
The syntax of the find
command is:
find [starting directory] [search option] [search criteria] [result option]
A description of all of these components:
Component | Description |
---|---|
[starting directory] | This is where the user specifies where to start searching. The find command will search this directory and all of its subdirectories. If no starting directory is provided, then the current directory is used for the starting point. |
[search option] | This is where the user specifies an option to determine what sort of metadata to search for; there are options for file name, file size and many other file attributes. |
[search criteria] | This is an argument that compliments the search option. For example, if the user uses the option to search for a file name, the search criteria would be the filename. |
[result option] | This option is used to specify what action should be taken once the file is found. If no option is provided, the file name will be printed to STDOUT. |
1.3.1 Search by File Name
To search for a file by name, use the -name
option to the find
command:
sysadmin@localhost:~$ find /etc -name hosts find: `/etc/dhcp': Permission denied find: `/etc/cups/ssl': Permission denied find: `/etc/pki/CA/private': Permission denied find: `/etc/pki/rsyslog': Permission denied find: `/etc/audisp': Permission denied find: `/etc/named': Permission denied find: `/etc/lvm/cache': Permission denied find: `/etc/lvm/backup': Permission denied find: `/etc/lvm/archive': Permission denied /etc/hosts find: `/etc/ntp/crypto': Permission denied find: `/etc/polkit-l/localauthority': Permission denied find: `/etc/sudoers.d': Permission denied find: `/etc/sssd': Permission denied /etc/avahi/hosts find: `/etc/selinux/targeted/modules/active': Permission denied find: `/etc/audit': Permission denied sysadmin@localhost:~$
Note that two files were found: /etc/hosts
and /etc/avahi/hosts
. The rest of the output was STDERR messages because the user who ran the command didn't have the permission to access certain subdirectories.
Recall that you can redirect STDERR to a file so you don't need to see these error messages on the screen:
sysadmin@localhost:~$ find /etc -name hosts 2> errors.txt /etc/hosts /etc/avahi.hosts sysadmin@localhost:~$
While the output is easier to read, there really is no purpose to storing the error messages in the error.txt
file. The developers of Linux realized that it would be good to have a "junk file" to send unnecessary data; any file that you send to the /dev/null
file is discarded:
sysadmin@localhost:~$ find /etc -name hosts 2> /dev/null /etc/hosts /etc/avahi/hosts sysadmin@localhost:~$
1.3.2 Displaying File Detail
It can be useful to obtain file details when using the find
command because just the file name itself might not be enough information for you to find the correct file.
For example, there might be seven files named hosts; if you knew that the host file that you needed had been modified recently, then the modification timestamp of the file would be useful to see.
To see these file details, use the -ls
option to the find
command:
sysadmin@localhost:~$ find /etc -name hosts -ls 2> /dev/null 41 4 -rw-r--r-- 1 root root 158 Jan 12 2010 /etc/hosts 6549 4 -rw-r--r-- 1 root root 1130 Jul 19 2011 /etc/avahi/hosts sysadmin@localhost:~$
ls -l
command: file type, permissions, hard link count, user owner, group owner, file size, modification timestamp and file name.1.3.3 Searching for Files by Size
One of the many useful searching options is the option that allows you to search for files by size. The -size
option allows you to search for files that are either larger than or smaller then a specified size as well as search for an exact file size.
When you specify a file size, you can give the size in bytes (c), kilobytes (k), megabytes (M) or gigabytes (G). For example, the following will search for files in the /etc directory structure that are exactly 10 bytes large:
sysadmin@localhost:~$ find /etc -size 10c -ls 2>/dev/null 432 4 -rw-r--r-- 1 root root 10 Jan 28 2015 /etc/adjtime 8814 0 drwxr-xr-x 1 root root 10 Jan 29 2015 /etc/ppp/ip-d own.d 8816 0 drwxr-xr-x 1 root root 10 Jan 29 2015 /etc/ppp/ip-u p.d 8921 0 lrwxrwxrwx 1 root root 10 Jan 29 2015 /etc/ssl/cert s/349f2832.0 -> EC-ACC.pem 9234 0 lrwxrwxrwx 1 root root 10 Jan 29 2015 /etc/ssl/cert s/aeb67534.0 -> EC-ACC.pem 73468 4 -rw-r--r-- 1 root root 10 Nov 16 20:42 /etc/hostname sysadmin@localhost:~$
If you want to search for files that are larger than a specified size, you place a +
character before the size. For example, the following will look for all files in the /usr
directory structure that are over 100 megabytes in size:
sysadmin@localhost:~$ find /usr -size +100M -ls 2> /dev/null 574683 104652 -rw-r--r-- 1 root root 107158256 Aug 7 11:06 /usr/share/icons/oxygen/icon-theme.cache sysadmin@localhost:~$
To search for files that are smaller than a specified size, place a -
character before the file size.
1.3.4 Additional Useful Search Options
There are many search options. The following table illustrates a few of these options:
Option | Meaning |
---|---|
-maxdepth | Allows the user to specify how deep in the directory structure to search. For example, -maxdepth 1 would mean only search the specified directory and its immediate subdirectories. |
-group | Returns files owned by a specified group. For example, -group payroll would return files owned by the payroll group. |
-iname | Returns files that match specified filename, but unlike -name , -iname is case insensitive. For example, -iname hosts would match files named hosts , Hosts , HOSTS , etc. |
-mmin | Returns files that were modified based on modification time in minutes. For example, -mmin 10 would match files that were modified 10 minutes ago. |
-type | Returns files that match file type. For example, -type f would return files that are regular files. |
-user | Returns files owned by a specified user. For example, -user bob would return files owned by the bob user. |
1.3.5 Using Multiple Options
If you use multiple options, they act as an "and", meaning for a match to occur, all of the criteria must match, not just one. For example, the following command will display all files in the /etc
directory structure that are 10 bytes in size and are plain files:
sysadmin@localhost:~$ find /etc -size 10c -type f -ls 2>/dev/null 432 4 -rw-r--r-- 1 root root 10 Jan 28 2015 /etc/adjtime 73468 4 -rw-r--r-- 1 root root 10 Nov 16 20:42 /etc/hostname sysadmin@localhost:~$
1.4 Viewing Files Using the less Command
While viewing small files with the cat
command poses no problems, it is not an ideal choice for large files. The cat
command doesn't provide any way to easily pause and restart the display, so the entire file contents are dumped to the screen.
For larger files, you will want to use a pager command to view the contents. Pager commands will display one page of data at a time, allowing you to move forward and backwards in the file by using movement keys.
There are two commonly used pager commands:
- The
less
command: This command provides a very advanced paging capability. It is normally the default pager used by commands like theman
command. - The
more
command: This command has been around since the early days of UNIX. While it has fewer features than theless
command, it does have one important advantage: Theless
command isn't always included with all Linux distributions (and on some distributions, it isn't installed by default). Themore
command is always available.
When you use the more
or less
commands, they will allow you to "move around" a document by using keystroke commands. Because the developers of the less
command based the command from the functionality of the more
command, all of the keystroke commands available in the more
command also work in the less
command.
For the purpose of this manual, the focus will be on the more advanced command (less
). The more
command is still useful to remember for times when the less
command isn't available. Remember that most of the keystroke commands provided work for both commands.
1.4.1 Help Screen in less
When you view a file with the less
command, you can use the h key to display a help screen. The help screen allows you to see which other commands are available. In the following example, the less /usr/share/dict/words
command is executed. Once the document is displayed, the h key was pressed, displaying the help screen:
SUMMARY OF LESS COMMANDS Commands marked with * may be preceded by a number, N. Notes in parentheses indicate the behavior if N is given. h H Display this help. q :q Q :Q ZZ Exit. ------------------------------------------------------------------------ MOVING e ^E j ^N CR * Forward one line (or N lines). y ^Y k ^K ^P * Backward one line (or N lines). f ^F ^V SPACE * Forward one window (or N lines). b ^B ESC-v * Backward one window (or N lines). z * Forward one window (and set window to N). w * Backward one window (and set window to N). ESC-SPACE * Forward one window, but don't stop at end-of-file. d ^D * Forward one half-window (and set half-window to N). u ^U * Backward one half-window (and set half-window to N). ESC-) RightArrow * Left one half screen width (or N positions). ESC-( LeftArrow * Right one half screen width (or N positions). HELP -- Press RETURN for more, or q when done
1.4.2 less Movement Commands
There are many movement commands for the less
command, each with multiple possible keys or key combinations. While this may seem intimidating, remember you don't need to memorize all of these movement commands; you can always use the h key whenever you need to get help.
The first group of movement commands that you may want to focus upon are the ones that are most commonly used. To make this even easier to learn, the keys that are identical in more
and less
will be summarized. In this way, you will be learning how to move in more
and less
at the same time:
Movement | Key |
---|---|
Window forward | Spacebar |
Window backward | b |
Line forward | Enter |
Exit | q |
Help | h |
When simply using less
as a pager, the easiest way to advance forward a page is to press the spacebar.
1.4.3 less Searching Commands
There are two ways to search in the less
command: you can either search forward or backwards from your current position using patterns called regular expressions. More details regarding regular expressions are provided later in this chapter.
To start a search to look forward from your current position, use the / key. Then, type the text or pattern to match and press the Enter key.
If a match can be found, then your cursor will move in the document to the match. For example, in the following graphic the expression "frog" was searched for in the /usr/share/dict/words
file:
bullfrog
bullfrog
's bullfrog
s bullheaded bullhorn bullhorn's bullhorns bullied bullies bulling bullion bullion's bullish bullock bullock's bullocks bullpen bullpen's bullpens bullring bullring's bullrings bulls :
Notice that "frog" didn't have to be a word by itself. Also notice that while the less
command took you to the first match from the current position, all matches were highlighted.
If no matches forward from your current position can be found, then the last line of the screen will report “Pattern not found
“:
Pattern not found (press RETURN)
To start a search to look backwards from your current position, press the ? key, then type the text or pattern to match and press the Enter key. Your cursor will move backward to the first match it can find or report that the pattern cannot be found.
If more than one match can be found by a search, then using the n key will allow you to move to the next match and using the N key will allow you to go to a previous match.
1.5 Revisiting the head and tail Commands
Recall that the head
and tail
commands are used to filter files to show a limited number of lines. If you want to view a select number of lines from the top of the file, you use the head
command and if you want to view a select number of lines at the bottom of a file, then you use the tail
command.
By default, both commands display ten lines from the file. The following table provides some examples:
Command Example | Explanation of Displayed Text |
---|---|
head /etc/passwd | First ten lines of /etc/passwd |
head -3 /etc/group | First three lines of /etc/group |
head -n 3 /etc/group | First three lines of /etc/group |
help | head | First ten lines of output piped from the help command |
tail /etc/group | Last ten lines of /etc/group |
tail -5 /etc/passwd | Last five lines of /etc/passwd |
tail -n 5 /etc/passwd | Last five lines of /etc/passwd |
help | tail | Last ten lines of output piped from the help command |
As seen from the above examples, both commands will output text from either a regular file or from the output of any command sent through a pipe. They both use the -n
option to indicate how many lines to output.
1.5.1 Negative Value with the -n Option
Traditionally in UNIX, the number of lines to output would be specified as an option with either command, so -3
meant show three lines. For the tail
command, either -3
or -n -3
still means show three lines. However, the GNU version of the head
command recognizes -n -3
as show all but the last three lines, and yet the head
command still recognizes the option -3
as show the first three lines.
1.5.2 Positive Value With the tail Command
The GNU version of the tail
command allows for a variation of how to specify the number of lines to be printed. If you use the -n
option with a number prefixed by the plus sign, then the tail
command recognizes this to mean to display the contents starting at the specified line and continuing all the way to the end.
For example, the following will display line #22
to the end of the output of the nl
command:
sysadmin@localhost:~$ nl /etc/passwd | tail -n +22 22 sshd:x:103:65534::/var/run/sshd:/usr/sbin/nologin 23 operator:x:1000:37::/root:/bin/sh 24 sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bash sysadmin@localhost:~$
1.5.3 Following Changes to a File
You can view live file changes by using the -f
option to the tail
command. This is useful when you want to see changes to a file as they are happening.
A good example of this would be when viewing log files as a system administrator. Log files can be used to troubleshoot problems and administrators will often view them "interactively" with the tail
command as they are performing the commands they are trying to troubleshoot in a separate window.
For example, if you were to log in as the root
user, you could troubleshoot issues with the email server by viewing live changes to its log file with the following command: tail -f /var/log/mail.log
1.6 Sorting Files or Input
The sort
command can be used to rearrange the lines of files or input in either dictionary or numeric order based upon the contents of one or more fields. Fields are determined by a field separator contained on each line, which defaults to whitespace (spaces and tabs).
The following example creates a small file, using the head
command to grab the first 5 lines of the /etc/passwd
file and send the output to a file called mypasswd
.
sysadmin@localhost:~$ head -5 /etc/passwd > mypasswd sysadmin@localhost:~$
sysadmin@localhost:~$ cat mypasswd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync sysadmin@localhost:~$
Now we will sort
the mypasswd
file:
sysadmin@localhost:~$ sort mypasswd bin:x:2:2:bin:/bin:/bin/sh daemon:x:1:1:daemon:/usr/sbin:/bin/sh root:x:0:0:root:/root:/bin/bash sync:x:4:65534:sync:/bin:/bin/sync sys:x:3:3:sys:/dev:/bin/sh sysadmin@localhost:~$
1.6.1 Fields and Sort Options
In the event that the file or input might be separated by another delimiter like a comma or colon, the -t
option will allow for another field separator to be specified. To specify fields to sort
by, use the -k
option with an argument to indicate the field number (starting with 1 for the first field).
The other commonly used options for the sort
command are the -n
to perform a numeric sort
and -r
to perform a reverse sort
.
In the next example, the -t
option is used to separate fields by a colon character and performs a numeric sort
using the third field of each line:
sysadmin@localhost:~$ sort -t: -n -k3 mypasswd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync sysadmin@localhost:~$
Note that the -r
option could have been used to reverse the sort
, making the higher numbers in the third field appear at the top of the output:
sysadmin@localhost:~$ sort -t: -n -r -k3 mypasswd sync:x:4:65534:sync:/bin:/bin/sync sys:x:3:3:sys:/dev:/bin/sh bin:x:2:2:bin:/bin:/bin/sh daemon:x:1:1:daemon:/usr/sbin:/bin/sh root:x:0:0:root:/root:/bin/bash sysadmin@localhost:~$
Lastly, you may want to perform more complex sorts, such as sort
by a primary field and then by a secondary field. For example, consider the following data:
bob:smith:23 nick:jones:56 sue:smith:67
You might want to sort
first by the last name (field #2) and then first name (field #1) and then by age (field #3). This can be done with the following command:
sysadmin@localhost:~$ sort -t: -k2 -k1 -k3n filename
1.7 Viewing File Statistics With the wc Command
The wc
command allows for up to three statistics to be printed for each file provided, as well as the total of these statistics if more than one filename is provided. By default, the wc
command provides the number of lines, words and bytes (1 byte = 1 character in a text file):
sysadmin@localhost:~$ wc /etc/passwd /etc/passwd- 35 56 1710 /etc/passwd 34 55 1665 /etc/passwd- 69 111 3375 total sysadmin@localhost:~$
The above example shows the output from executing: wc /etc/passwd /etc/passwd-
. The output has four columns: number of lines in the file, number of words in the file, number of bytes in the file and the file name or total
.
If you are interested in viewing just specific statistics, then you can use -l
to show just the number of lines, -w
to show just the number of words and -c
to show just the number of bytes.
The wc
command can be useful for counting the number of lines output by some other command through a pipe. For example, if you wanted to know the total number of files in the /etc
directory, you could execute ls /etc | wc -l
:
sysadmin@localhost:~$ ls /etc/ | wc -l 136 sysadmin@localhost:~$
1.8 Using the cut Command to Filter File Contents
The cut
command can extract columns of text from a file or standard input. A primary use of the cut
command is for working with delimited database files. These files are very common on Linux systems.
By default, it considers its input to be separated by the Tab character, but the -d
option can specify alternative delimiters such as the colon or comma.
Using the -f
option, you can specify which fields to display, either as a hyphenated range or a comma separated list.
In the following example, the first, fifth, sixth and seventh fields from mypasswd
database file are displayed:
sysadmin@localhost:~$ cut -d: -f1,5-7 mypasswd root:root:/root:/bin/bash daemon:daemon:/usr/sbin:/bin/sh bin:bin:/bin:/bin/sh sys:sys:/dev:/bin/sh sync:sync:/bin:/bin/sync sysadmin@localhost:~$
Using the cut
command, you can also extract columns of text based upon character position with the -c
option. This can be useful for extracting fields from fixed-width database files. For example, the following will display just the file type (character #1), permissions (characters #2-10) and filename (characters #50+) of the output of the ls -l
command:
sysadmin@localhost:~$ ls -l | cut -c1-11,50- total 12 drwxr-xr-x Desktop drwxr-xr-x Documents drwxr-xr-x Downloads drwxr-xr-x Music drwxr-xr-x Pictures drwxr-xr-x Public drwxr-xr-x Templates drwxr-xr-x Videos -rw-rw-r-- errors.txt -rw-rw-r-- mypasswd -rw-rw-r-- new.txt sysadmin@localhost:~$
1.9 Using the grep Command to Filter File Contents
The grep
command can be used to filter lines in a file or the output of another command based on matching a pattern. That pattern can be as simple as the exact text that you want to match or it can be much more advanced through the use of regular expressions (discussed later in this chapter).
For example, you may want to find all the users who can login to the system with the BASH shell, so you could use the grep
command to filter the lines from the /etc/passwd
file for the lines containing the characters bash
:
sysadmin@localhost:~$ grep bash /etc/passwd root:x:0:0:root:/root:/bin/bash sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bash sysadmin@localhost:~$
To make it easier to see what exactly is matched, use the --color
option. This option will highlight the matched items in red:
sysadmin@localhost:~$ grep --color bash /etc/passwd root:x:0:0:root:/root:/bin/bash sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bash sysadmin@localhost:~$
In some cases you don't care about the specific lines that match the pattern, but rather how many lines match the pattern. With the -c
option, you can get a count of how many lines that match:
sysadmin@localhost:~$ grep -c bash /etc/passwd 2 sysadmin@localhost:~$
When you are viewing the output from the grep
command, it can be hard to determine the original line numbers. This information can be useful when you go back into the file (perhaps to edit the file) as you can use this information to quickly find one of the matched lines.
The -n
option to the grep
command will display original line numbers:
sysadmin@localhost:~$ grep -n bash /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 24:sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bas sysadmin@localhost:~$
Some additional useful grep
options:
Examples | Output |
---|---|
grep -v nologin /etc/passwd | All lines not containing nologin in the /etc/passwd file |
grep -l linux /etc/* | List of files in the /etc directory containing linux |
grep -i linux /etc/* | Listing of lines from files in the /etc directory containing any case (capital or lower) of the character pattern linux |
grep -w linux /etc/* | Listing of lines from files in the /etc directory containing the word pattern linux |
1.10 Basic Regular Expressions
A Regular Expression is a collection of "normal" and "special" characters that are used to match simple or complex patterns. Normal characters are alphanumeric characters which match themselves. For example, an a
would match an a
.
Some characters have special meanings when used within patterns by commands like the grep
command. There are both Basic Regular Expressions (available to a wide variety of Linux commands) and Extended Regular Expressions (available to more advanced Linux commands). Basic Regular Expressions include the following:
Regular Expression | Matches |
---|---|
. | Any single character |
[ ] | A list or range of characters to match one character, unless the first character is the caret ^ , and then it means any character not in the list |
* | Previous character repeated zero or more times |
^ | Following text must appear at beginning of line |
$ | Preceding text must appear at the end of the line |
The grep
command is just one of many commands that support regular expressions. Some other commands include the more
and less
commands. While some of the regular expressions are unnecessarily quoted with single quotes, it is a good practice to use single quotes around your regular expressions to prevent the shell from trying to interpret special meaning from them.
1.10.1 Basic Regular Expressions - the . Character
In the example below, a simple file is first created using redirection. Then the grep
command is used to demonstrate a simple pattern match:
sysadmin@localhost:~$ echo 'abcddd' > example.txt sysadmin@localhost:~$ cat example.txt abcddd sysadmin@localhost:~$ grep --color 'a..' example.txt abcddd sysadmin@localhost:~$
In the previous example, you can see that the pattern a..
matched abc
. The first .
character matched the b
and the second matched the c
.
In the next example, the pattern a..c
won't match anything, so the grep
command will not product any output. For the match to be successful, there would need to be two characters between the a
and the c
in example.txt
:
sysadmin@localhost:~$ grep --color 'a..c' example.txt sysadmin@localhost:~$
1.10.2 Basic Regular Expressions - the [ ] Characters
If you use the .
character, then any possible character could match. In some cases you want to specify exactly which characters you want to match. For example, maybe you just want to match a lower-case alpha character or a number character. For this, you can use the [ ]
Regular Expression characters and specify the valid characters inside the [ ]
characters.
For example, the following command matches two characters, the first is either an a
or a b
while the second is either an a
, b
, c
or d
:
sysadmin@localhost:~$ grep --color '[ab][a-d]' example.txt abcddd sysadmin@localhost:~$
Note that you can either list out each possible character [abcd]
or provide a range [a-d]
as long as the range is in the correct order. For example, [d-a]
wouldn't work because it isn't a valid range:
sysadmin@localhost:~$ grep --color '[d-a]' example.txt grep: Invalid range end sysadmin@localhost:~$
The range is specified by a standard called the ASCII table. This table is a collection of all printable characters in a specific order. You can see the ASCII table with the man ascii
command. A small example:
041 33 21 ! 141 97 61 a 042 34 22 “ 142 98 62 b 043 35 23 # 143 99 63 c 044 36 24 $ 144 100 64 d 045 37 25 % 145 101 65 e 046 38 26 & 146 102 66 f
Since a
has a smaller numeric value (141
) then d
(144
), the range a-d
includes all characters from a
to d
.
What if you want to match a character that can be anything but an x
, y
or z
? You wouldn't want to have to provide a [ ]
set with all of the characters except x
, y
or z
.
To indicate that you want to match a character that is not one of the listed characters, start your [ ]
set with a ^
symbol. For example, the following will demonstrate matching a pattern that includes a character that isn't an a
, b
or c
followed by a d
:
sysadmin@localhost:~$ grep --color '[^abc]d' example.txt abcddd sysadmin@localhost:~$
1.10.3 Basic Regular Expressions - the * Character
The *
character can be used to match "zero or more of the previous character". For example, the following will match zero or more d
characters:
sysadmin@localhost:~$ grep --color 'd*' example.txt abcddd sysadmin@localhost:~$
1.10.4 Basic Regular Expressions - the ^ and $ Characters
When you perform a pattern match, the match could occur anywhere on the line. You may want to specify that the match occurs at the beginning of the line or the end of the line. To match at the beginning of the line, begin the pattern with a ^
symbol.
In the following example, another line is added to the example.txt
file to demonstrate the use of the ^
symbol:
sysadmin@localhost:~$ echo "xyzabc" >> example.txt sysadmin@localhost:~$ cat example.txt abcddd xyzabc sysadmin@localhost:~$ grep --color "a" example.txt abcddd xyzabc sysadmin@localhost:~$ grep --color "^a" example.txt abcddd sysadmin@localhost:~$
Note that in the first grep output, both lines match because they both contain the letter a
. In the second grep output, only the line that began with the letter a
matched.
In order to specify the match occurs at the end of line, end the pattern with the $
character. For example, in order to only find lines which end with the letter c
:
sysadmin@localhost:~$ grep "c$" example.txt xyzabc sysadmin@localhost:~$
1.10.5 Basic Regular Expressions - the \ Character
In some cases you may want to match a character that happens to be a special Regular Expression character. For example, consider the following:
sysadmin@localhost:~$ echo "abcd*" >> example.txt sysadmin@localhost:~$ cat example.txt abcddd xyzabc abcd* sysadmin@localhost:~$ grep --color "cd*" example.txt abcddd xyzabc abcd* sysadmin@localhost:~$
In the output of the grep
command above, you will see that every line matches because you are looking for a c
character followed by zero or more d
characters. If you want to look for an actual *
character, place a \
character before the *
character:
sysadmin@localhost:~$ grep --color "cd\*" example.txt abcd* sysadmin@localhost:~$
1.11 Extended Regular Expressions
The use of Extended Regular Expressions often requires a special option be provided to the command to recognize them. Historically, there is a command called egrep
, which is similar to grep
, but is able to understand their usage. Now, the egrep
command is deprecated in favor of using grep with the -E
option.
The following regular expressions are considered "extended":
RE | Meaning |
---|---|
? | Matches previous character zero or one time, so it is an optional character |
+ | Matches previous character repeated one or more times |
| | Alternation or like a logical or operator |
Some extended regular expressions examples:
Command | Meaning | Matches |
---|---|---|
grep -E 'colou?r' 2.txt | Match colo following by zero or one u character | color colour |
grep -E 'd+' 2.txt | Match one or more d characters | d dd ddd dddd |
grep -E 'gray|grey' 2.txt | Match either gray or grey | gray grey |
1.12 xargs Command
The xargs
command is used to build and execute command lines from standard input. This command is very helpful when you need to execute a command with a very long list of arguments, which in some cases can result in an error if the list of arguments is too long.
The xargs
command has an option -0
which disables the end-of-file string, allowing the use of arguments containing spaces, quotes, or backslashes.
The xargs
command is useful for allowing commands to be executed more efficiently. Its goal is to build the command line for a command to execute as few times as possible with as many arguments as possible, rather than to execute the command many times with one argument each time.
The xargs
command functions by breaking up the list of arguments into sublists and executing the command with each sublist. The number of arguments in each sublist will not exceed the maximum number of argments for the command being executed and therefore avoids an “Argument list too long
” error.
The following example shows a scenario where the xargs
command allowed for many files to be removed, where using a normal wildcard (glob) character failed:
sysadmin@localhost:~/many$ rm * bash: /bin/rm: Argument list too long sysadmin@localhost:~/many$ ls | xargs rm sysadmin@localhost:~/many$
Все материалы взяты с официального курса NDG Linux Essential