Text Processing
grep
grep
, short for Global Regular Expression Print, is a command-line utility used to search for patterns within text. It scans input line by line and prints lines that match a given regular expression, making it essential for text searching and filtering.
-i
case-insensitve search (case-sensitive by default)
-v
exclude a certain pattern
-o
print only the matched pattern (line by default)
-q
quiet mode (no output, interested in the return status)
-E
extended regex (regex instead of basic patterns)
Multiple patterns:
grep "<pattern1>\|<pattern2>" <file>
grep -e "<pattern1>" -e "<pattern2>" <file>
awk
awk
is named after its creators: Aho, Weinberger, and Kernighan. It's a powerful Unix tool for pattern scanning and processing — often used to extract and manipulate text from files or input streams.
By default, it treats spaces/tabs as delimiters. This can be changed using -F'<del>'
.
# Extract the first field
awk '{print $1}' <file>
# Extract multiple fields
awk '{print $1,$4}' <file>
sed
sed
, which stands for Stream Editor, is a powerful tool for parsing and transforming text in a data stream. It reads input line by line, applies editing commands, and outputs the result, often used for non-interactive text processing and automation.
s
substitution (replace)
g
global (across the whole file)
d
delete line
p
print line
-i
inplace changes
# Substitution
sed 's/<to-be-replaced>/<to-replace-with>/g' <file>
# Remove white space with substitution
sed 's/ //g' <file>
# Remove everything after
sed 's/<pattern>.*//g'
tr
tr
is a Unix command used to translate, delete, or squeeze characters from input text.
cat <file> | tr '\n' ','
printf
printf
is a command in Unix/Linux used to format and print text to the terminal.
%c
Treat the arguments as a single character.
%d
Treat the input as a decimal (integer) number (base 10).
%e
Treats the input as an exponential floating-point number.
%f
Treat the input as a floating-point number.
%i
Treat the input as an integer number (base 10).
%o
Treats the input as an octal number (base 8).
%s
Treat the input as a string of characters.
%u
Treat the input as an unsigned decimal (integer) number.
%x
Treats the input as a hexadecimal number (base 16).
%%
Print a percent sign.
%Wd
Print the W
integer X
digits wide.
%(format)T
Outputs a date-time string resulting from using format as a format string for strftime
. The corresponding argument can be the number of seconds since Epoch (January 1, 1970, 00:00), -1
(the current time), or -2
(shell startup time). Not specifying an argument uses the current time as the default value.
\%
Print a percent sign.
\n
Prints a newline
character.
\t
Print a tab
character.
# Syntax
printf "<specifier(s)>" "string"
# New line
printf "%s\n" "str1" "str2" "str3"
str1
str2
str3
tee
tee
, which stands for the T-splitter in plumbing, reads input from stdin
, writes it to a file, and also outputs it to stdout
— effectively "splitting" the stream.
Print the string to stdout and save it into a file:
echo "hello" | tee file.txt
Last updated
Was this helpful?