Text Processing

grep

grep, short for Global Regular Expression Print, is a command-line utility used to search for patterns within text. It scans input line by line and prints lines that match a given regular expression, making it essential for text searching and filtering.

Option

Function

-i

case-insensitve search (case-sensitive by default)

-v

exclude a certain pattern

-o

print only the matched pattern (line by default)

-q

quiet mode (no output, interested in the return status)

-E

extended regex (regex instead of basic patterns)

Multiple patterns:

grep "<pattern1>\|<pattern2>" <file>
grep -e "<pattern1>" -e "<pattern2>" <file>

echo "<string>" | grep -q -E "<substring>"

Extract whatever comes after Host: .

o prints only the matched parts of the line and not the entire line.
P activates Perl-compatible regular expressions (PCRE), in this case,\K:. This resets the starting point of the match, that is, only the characters after \K will be included in the output.

grep -oP 'Host:\s*\K[^\s;]*' "${NMAP_FILE}"

awk

awk is named after its creators: Aho, Weinberger, and Kernighan. It's a powerful Unix tool for pattern scanning and processing — often used to extract and manipulate text from files or input streams.

By default, it treats spaces/tabs as delimiters. This can be changed using -F'<del>'.

# Extract the first field
awk '{print $1}' <file>

# Extract multiple fields
awk '{print $1,$4}' <file>

$NF (number of fields) represents the last field, while $NR (number of records) represents the total number of records:

# Start from the 3rd line and print the last field
awk 'NR > 3 {print $NF}' <file>

Whatever is within {} is the awk script that will be executed for each input line. printf is a built-in awk function, which formats and prints data. "%s " defines the string format, in this case, the output should be a string (%s) followed by a space. $0 represents the entire current line of input in awk:

# For each input line, print the entire line and add a space after it
awk '{printf "%s ", $0}'

We can add a newline character at the end using an END block. END {...} actions within an END block are executed after all input lines have been processed:

# Add a new line character after each line
awk '{printf "%s ", $0} END {print ""}'

sed

sed, which stands for Stream Editor, is a powerful tool for parsing and transforming text in a data stream. It reads input line by line, applies editing commands, and outputs the result, often used for non-interactive text processing and automation.

Option

Description

s

substitution (replace)

g

global (across the whole file)

d

delete line

p

print line

-i

inplace changes

# Substitution
sed 's/<to-be-replaced>/<to-replace-with>/g' <file>
 
# Remove white space with substitution
sed 's/ //g' <file>

# Remove everything after
sed 's/<pattern>.*//g'

# Delete the first line
sed '1d' <file>

# Delete the last line
sed '$d' <file>

# Delete multiple lines
sed '1,7d' <file>

# Delete every word that starts with '1' inplace
sed -i '/^1/d' <file>

# Print specific line ranges
sed -n '1,3p' <file>

tr

tr is a Unix command used to translate, delete, or squeeze characters from input text.

cat <file> | tr '\n' ','

cat <file> | tr -d '[]'

printf

printf is a command in Unix/Linux used to format and print text to the terminal.

Format Specifier

Description

%c

Treat the arguments as a single character.

%d

Treat the input as a decimal (integer) number (base 10).

%e

Treats the input as an exponential floating-point number.

%f

Treat the input as a floating-point number.

%i

Treat the input as an integer number (base 10).

%o

Treats the input as an octal number (base 8).

%s

Treat the input as a string of characters.

%u

Treat the input as an unsigned decimal (integer) number.

%x

Treats the input as a hexadecimal number (base 16).

%%

Print a percent sign.

%Wd

Print the W integer X digits wide.

%(format)T

Outputs a date-time string resulting from using format as a format string for strftime. The corresponding argument can be the number of seconds since Epoch (January 1, 1970, 00:00), -1 (the current time), or -2 (shell startup time). Not specifying an argument uses the current time as the default value.

\%

Print a percent sign.

\n

Prints a newline character.

\t

Print a tab character.

# Syntax
printf "<specifier(s)>" "string"
 
# New line
printf "%s\n" "str1" "str2" "str3"
str1
str2
str3

When multiple specifiers are used, they are replaced by a corresponding argument:

printf "%s\n%s\t%s\n" "str1" "str2" "str3" "str4"
str1
str2    str3
str4

Assign output to a variable:

printf -v var "%s\n" hello
echo $var
hello

tee

tee, which stands for the T-splitter in plumbing, reads input from stdin, writes it to a file, and also outputs it to stdout — effectively "splitting" the stream.

Print the string to stdout and save it into a file:

echo "hello" | tee file.txt

# Print file contents
$ cat test.txt
test

# Display & append string to the file 
$ echo "test1" | tee -a test.txt
test1

$ cat test.txt
test
test1

PreviousVulnerability Scanners NextShells

Last updated 1 month ago

Was this helpful?