Text Processing
grep
grep, short for Global Regular Expression Print, is a command-line utility used to search for patterns within text. It scans input line by line and prints lines that match a given regular expression, making it essential for text searching and filtering.
-i
case-insensitve search (case-sensitive by default)
-v
exclude a certain pattern
-o
print only the matched pattern (line by default)
-q
quiet mode (no output, interested in the return status)
-E
extended regex (regex instead of basic patterns)
Multiple patterns:
grep "<pattern1>\|<pattern2>" <file>
grep -e "<pattern1>" -e "<pattern2>" <file>echo "<string>" | grep -q -E "<substring>"Extract whatever comes after Host: .
oprints only the matched parts of the line and not the entire line.Pactivates Perl-compatible regular expressions (PCRE), in this case,\K:. This resets the starting point of the match, that is, only the characters after\Kwill be included in the output.
grep -oP 'Host:\s*\K[^\s;]*' "${NMAP_FILE}"awk
awk is named after its creators: Aho, Weinberger, and Kernighan. It's a powerful Unix tool for pattern scanning and processing — often used to extract and manipulate text from files or input streams.
By default, it treats spaces/tabs as delimiters. This can be changed using -F'<del>'.
# Extract the first field
awk '{print $1}' <file>
# Extract multiple fields
awk '{print $1,$4}' <file>$NF (number of fields) represents the last field, while $NR (number of records) represents the total number of records:
# Start from the 3rd line and print the last field
awk 'NR > 3 {print $NF}' <file>Whatever is within {} is the awk script that will be executed for each input line. printf is a built-in awk function, which formats and prints data. "%s " defines the string format, in this case, the output should be a string (%s) followed by a space. $0 represents the entire current line of input in awk:
# For each input line, print the entire line and add a space after it
awk '{printf "%s ", $0}'We can add a newline character at the end using an END block. END {...} actions within an END block are executed after all input lines have been processed:
# Add a new line character after each line
awk '{printf "%s ", $0} END {print ""}'sed
sed, which stands for Stream Editor, is a powerful tool for parsing and transforming text in a data stream. It reads input line by line, applies editing commands, and outputs the result, often used for non-interactive text processing and automation.
s
substitution (replace)
g
global (across the whole file)
d
delete line
p
print line
-i
inplace changes
# Substitution
sed 's/<to-be-replaced>/<to-replace-with>/g' <file>
# Remove white space with substitution
sed 's/ //g' <file>
# Remove everything after
sed 's/<pattern>.*//g'
# Replace all white space with a .
sed 's/\s\+/\./g' # Delete the first line
sed '1d' <file>
# Delete the last line
sed '$d' <file>
# Delete multiple lines
sed '1,7d' <file>
# Delete every word that starts with '1' inplace
sed -i '/^1/d' <file># Print specific line ranges
sed -n '1,3p' <file>tr
tr is a Unix command used to translate, delete, or squeeze characters from input text.
cat <file> | tr '\n' ','cat <file> | tr -d '[]'printf
printf is a command in Unix/Linux used to format and print text to the terminal.
%c
Treat the arguments as a single character.
%d
Treat the input as a decimal (integer) number (base 10).
%e
Treats the input as an exponential floating-point number.
%f
Treat the input as a floating-point number.
%i
Treat the input as an integer number (base 10).
%o
Treats the input as an octal number (base 8).
%s
Treat the input as a string of characters.
%u
Treat the input as an unsigned decimal (integer) number.
%x
Treats the input as a hexadecimal number (base 16).
%%
Print a percent sign.
%Wd
Print the W integer X digits wide.
%(format)T
Outputs a date-time string resulting from using format as a format string for strftime. The corresponding argument can be the number of seconds since Epoch (January 1, 1970, 00:00), -1 (the current time), or -2 (shell startup time). Not specifying an argument uses the current time as the default value.
\%
Print a percent sign.
\n
Prints a newline character.
\t
Print a tab character.
# Syntax
printf "<specifier(s)>" "string"
# New line
printf "%s\n" "str1" "str2" "str3"
str1
str2
str3When multiple specifiers are used, they are replaced by a corresponding argument:
printf "%s\n%s\t%s\n" "str1" "str2" "str3" "str4"
str1
str2 str3
str4Assign output to a variable:
printf -v var "%s\n" hello
echo $var
hellotee
tee, which stands for the T-splitter in plumbing, reads input from stdin, writes it to a file, and also outputs it to stdout — effectively "splitting" the stream.
Print the string to stdout and save it into a file:
echo "hello" | tee file.txt# Print file contents
$ cat test.txt
test
# Display & append string to the file
$ echo "test1" | tee -a test.txt
test1
$ cat test.txt
test
test1Last updated
Was this helpful?