Text Processing
grep
grep, short for Global Regular Expression Print, is a command-line utility used to search for patterns within text. It scans input line by line and prints lines that match a given regular expression, making it essential for text searching and filtering.
-i
case-insensitve search (case-sensitive by default)
-v
exclude a certain pattern
-o
print only the matched pattern (line by default)
-q
quiet mode (no output, interested in the return status)
-E
extended regex (regex instead of basic patterns)
Multiple patterns:
grep "<pattern1>\|<pattern2>" <file>
grep -e "<pattern1>" -e "<pattern2>" <file>echo "<string>" | grep -q -E "<substring>"Extract whatever comes after Host: .
oprints only the matched parts of the line and not the entire line.Pactivates Perl-compatible regular expressions (PCRE), in this case,\K:. This resets the starting point of the match, that is, only the characters after\Kwill be included in the output.
grep -oP 'Host:\s*\K[^\s;]*' "${NMAP_FILE}"awk
awk is named after its creators: Aho, Weinberger, and Kernighan. It's a powerful Unix tool for pattern scanning and processing — often used to extract and manipulate text from files or input streams.
By default, it treats spaces/tabs as delimiters. This can be changed using -F'<del>'.
# Extract the first field
awk '{print $1}' <file>
# Extract multiple fields
awk '{print $1,$4}' <file>$NF (number of fields) represents the last field, while $NR (number of records) represents the total number of records:
# Start from the 3rd line and print the last field
awk 'NR > 3 {print $NF}' <file>Whatever is within {} is the awk script that will be executed for each input line. printf is a built-in awk function, which formats and prints data. "%s " defines the string format, in this case, the output should be a string (%s) followed by a space. $0 represents the entire current line of input in awk:
# For each input line, print the entire line and add a space after it
awk '{printf "%s ", $0}'We can add a newline character at the end using an END block. END {...} actions within an END block are executed after all input lines have been processed:
# Add a new line character after each line
awk '{printf "%s ", $0} END {print ""}'sed
sed, which stands for Stream Editor, is a powerful tool for parsing and transforming text in a data stream. It reads input line by line, applies editing commands, and outputs the result, often used for non-interactive text processing and automation.
s
substitution (replace)
g
global (across the whole file)
d
delete line
p
print line
-i
inplace changes
tr
tr is a Unix command used to translate, delete, or squeeze characters from input text.
printf
printf is a command in Unix/Linux used to format and print text to the terminal.
%c
Treat the arguments as a single character.
%d
Treat the input as a decimal (integer) number (base 10).
%e
Treats the input as an exponential floating-point number.
%f
Treat the input as a floating-point number.
%i
Treat the input as an integer number (base 10).
%o
Treats the input as an octal number (base 8).
%s
Treat the input as a string of characters.
%u
Treat the input as an unsigned decimal (integer) number.
%x
Treats the input as a hexadecimal number (base 16).
%%
Print a percent sign.
%Wd
Print the W integer X digits wide.
%(format)T
Outputs a date-time string resulting from using format as a format string for strftime. The corresponding argument can be the number of seconds since Epoch (January 1, 1970, 00:00), -1 (the current time), or -2 (shell startup time). Not specifying an argument uses the current time as the default value.
\%
Print a percent sign.
\n
Prints a newline character.
\t
Print a tab character.
When multiple specifiers are used, they are replaced by a corresponding argument:
Assign output to a variable:
tee
tee, which stands for the T-splitter in plumbing, reads input from stdin, writes it to a file, and also outputs it to stdout — effectively "splitting" the stream.
Print the string to stdout and save it into a file:
Last updated
Was this helpful?