Text Processing

grep

grep, short for Global Regular Expression Print, is a command-line utility used to search for patterns within text. It scans input line by line and prints lines that match a given regular expression, making it essential for text searching and filtering.

Option
Function

-i

case-insensitve search (case-sensitive by default)

-v

exclude a certain pattern

-o

print only the matched pattern (line by default)

-q

quiet mode (no output, interested in the return status)

-E

extended regex (regex instead of basic patterns)

Multiple patterns:

grep "<pattern1>\|<pattern2>" <file>
grep -e "<pattern1>" -e "<pattern2>" <file>

awk

awk is named after its creators: Aho, Weinberger, and Kernighan. It's a powerful Unix tool for pattern scanning and processing — often used to extract and manipulate text from files or input streams.

# Extract the first field
awk '{print $1}' <file>

# Extract multiple fields
awk '{print $1,$4}' <file>

sed

sed, which stands for Stream Editor, is a powerful tool for parsing and transforming text in a data stream. It reads input line by line, applies editing commands, and outputs the result, often used for non-interactive text processing and automation.

Option
Description

s

substitution (replace)

g

global (across the whole file)

d

delete line

p

print line

-i

inplace changes

# Substitution
sed 's/<to-be-replaced>/<to-replace-with>/g' <file>
 
# Remove white space with substitution
sed 's/ //g' <file>

# Remove everything after
sed 's/<pattern>.*//g'

tr

tr is a Unix command used to translate, delete, or squeeze characters from input text.

cat <file> | tr '\n' ','

printf

printf is a command in Unix/Linux used to format and print text to the terminal.

Format Specifier
Description

%c

Treat the arguments as a single character.

%d

Treat the input as a decimal (integer) number (base 10).

%e

Treats the input as an exponential floating-point number.

%f

Treat the input as a floating-point number.

%i

Treat the input as an integer number (base 10).

%o

Treats the input as an octal number (base 8).

%s

Treat the input as a string of characters.

%u

Treat the input as an unsigned decimal (integer) number.

%x

Treats the input as a hexadecimal number (base 16).

%%

Print a percent sign.

%Wd

Print the W integer X digits wide.

%(format)T

Outputs a date-time string resulting from using format as a format string for strftime. The corresponding argument can be the number of seconds since Epoch (January 1, 1970, 00:00), -1 (the current time), or -2 (shell startup time). Not specifying an argument uses the current time as the default value.

\%

Print a percent sign.

\n

Prints a newline character.

\t

Print a tab character.

# Syntax
printf "<specifier(s)>" "string"
 
# New line
printf "%s\n" "str1" "str2" "str3"
str1
str2
str3

tee

tee, which stands for the T-splitter in plumbing, reads input from stdin, writes it to a file, and also outputs it to stdout — effectively "splitting" the stream.

Print the string to stdout and save it into a file:

echo "hello" | tee file.txt

Last updated

Was this helpful?