shell相关¶
教程参考
shell变量¶
- 定义变量时,变量名不加美元符号
- 变量名和等号之间不能有空格
- 变量名外面的花括号是可选的,加不加都行,加花括号是为了帮助解释器识别变量的边界,比如下面
for
循环中举的${skill}
例子。
iterators in for
loop¶
- separated by space
for skill in Ada Coffe Action Java; do
echo "I am good at ${skill}Script"
done
start..stop..length
for i in {5..50..5}; do
echo $i
done
- construct an array
actually, it can be used to construct an array,
arr=({1..10..2})
echo ${arr[@]}
for i in ${arr[@]}; do
echo $i
done
seq
alternatively, we can use seq
,
for i in $(seq 5 5 50); do
echo $i
done
shell字符串¶
- 单引号里的任何字符都会原样输出,单引号字符串中的变量是无效的;
- 单引号字串中不能出现单引号(对单引号使用转义符后也不行)。
- 双引号里可以有变量
- 双引号里可以出现转义字符
shell数组¶
- 在Shell中,用括号来表示数组,数组元素用“空格”符号分割开。
sed用法¶
参考
- 打印特定行,比如第 10 行:
sed '10!d' file.txt
, 参考 Get specific line from text file using just shell script - 打印行范围,
sed -n '10,20p' file.txt
,则单独打印第 10 行也可以由sed -n '10p' file.txt
给出,如果采用分号;
则不是连续选择,而只是特定的行,参考 sed之打印特定行与连续行- 第一行到最后一行:
sed -n '1,$p'
- 第一行和最后一行:
sed -n '1p;$p'
, notsed -n '1;$p'
- 第一行到最后一行:
- 删除最后一行:
sed -i '$ d' file.txt
|
的作用¶
竖线(|)元字符是元字符扩展集的一部分,用于指定正则表达式的联合。如果某行匹配其中的一个正则表达式,那么它就匹配该模式。
-r
的作用¶
也就是使用扩展的正则表达式
参考Extended regexps - sed, a stream editor
摘录如下
The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.
就是说 basic 模式下,要使用特殊字符(如正则表达式中)需要转义,但 extended 模式相反,转义后表达的是原字符。
举个例子
abc?
becomesabc\?
when using extended regular expressions. It matches the literal string ‘abc?’.c\+
becomesc+
when using extended regular expressions. It matches one or more ‘c’s.a\{3,\}
becomesa{3,}
when using extended regular expressions. It matches three or more ‘a’s.\(abc\)\{2,3\}
becomes(abc){2,3}
when using extended regular expressions. It matches eitherabcabc
orabcabcabc
.\(abc*\)\1
becomes(abc*)\1
when using extended regular expressions. Backreferences must still be escaped when using extended regular expressions.
实战一¶
将

替换成
[IMG_0802]: https://user-images.githubusercontent.com/13688320/72489850-733ae480-3850-11ea-8e51-15021588a7e6.jpg
解决方案
sed -i "s/\!\[IMG_\([0-9]\{4\}\)\](\(.*\))/\[IMG_\1\]\: \2/g" FILENAME
\(\)
用于匹配子串,并可以通过\1
,\2
引用\!
需要 escape\2
前面的空格不需要写成[ ]
,不然会直接出现[ ]
,而之前某次为了匹配多个空格需要写成[ ]*
人总是善变的,过了一段时间,我又想把这些 img 下载到本地文件夹,但是之前处理过的文件都删掉了,只剩下传到 github 上的了,所以我首先要把文件下载到合适的位置并重命名。比如对于文件 _posts/2019-12-21-quant-genetics.md
,只保留了 https://user-images.githubusercontent.com/
的链接,采用下面的脚本下载到合适的位置并重命名,
grep -E "https://user-images." _posts/2019-12-21-quant-genetics.md | while read -a ADDR; do if [ ${#ADDR[@]} -eq 2 ]; then proxychains wget ${ADDR[1]} -O images/2019-12-21-quant-genetics/${ADDR[0]:1:8}.jpg; fi; done
其中
ADDR[0]:1:8
是所谓的 “Parameter Expansion” ${parameter:offset:length},用于提取特定范围的子串wget -O
是重命名,这里顺带移动到合适的位置proxychains
则是用于科学上网read -a ADDR
表示将分割后的字符串(比如默认按照空格进行分割,或者指定IFS=
)放进数组 ADDR 中,详见help read
,而man read
并没有给出参数列表。另外需要注意到数组$ADDR
返回结果为${ADDR[0]}
.
批量重命名¶
有时候下载文件时网站并没有区分同名文件,下载到本地后会出现 A.zip
与 A (1).zip
的情况,但这两个并不是相同的文件,所以避免以后误删,决定重命名。不过此类文件有好几个,批量处理代码为
$ ls -1 | grep "(1)" | while read -a ADDR; do mv "${ADDR[0]} (1).zip" "${ADDR[0]}_SOMETHING.zip"; done
awk¶
统计访问日志里每个 ip 访问次数¶
#!/bin/bash
cat access.log |sed -rn '/28\/Jan\/2015/p' > a.txt
cat a.txt |awk '{print $1}'|sort |uniq > ipnum.txt
for i in `cat ipnum.txt`; do
iptj=`cat access.log |grep $i | grep -v 400 |wc -l`
echo "ip地址"$i"在2015-01-28日全天(24小时)累计成功请求"$iptj"次,平均每分钟请求次数为:"$(($iptj/1440)) >> result.txt
done
Refer to 用shell统计访问日志里每个ip访问次数
列的个数¶
如果间隔是空格,则直接用下面命令便可以得到列数,
awk '{print NF; exit}' file.txt
如果是其他的间隔符,比如 |
,可以指定 -F'|'
参考 unix - count of columns in file
split string while reading files¶
specify IFS=
.
- How to split a tab-delimited string in bash script WITHOUT collapsing blanks?
- Split String in shell script while reading from file
- Read a file line by line assigning the value to a variable
distribute jobs into queues¶
since different queues has different quota, try to assign the job into available nodes.
queue=(bigmem large batch)
queues=()
for ((i=0;i<12;i++)) do queues+=(${queue[0]}); done;
for ((i=0;i<20;i++)) do queues+=(${queue[1]}); done;
for ((i=0;i<15;i++)) do queues+=(${queue[2]}); done;
refer to
- Add a new element to an array without specifying the index in Bash
- Repeat an element n number of times in an array
- The Double-Parentheses Construct
- Increment variable value by 1 ( shell programming)
- Shell 数组
Command line arguments¶
refer to Taking Command Line Arguments in Bash
join elements of an array in Bash¶
arr=(a b c)
printf '%s\n' "$(IFS=,; printf '%s' "${arr[*]}")"
# a,b,c
where *
or @
return all elements of such array.
refer to How can I join elements of an array in Bash?
A more complex way¶
list=
for nc in {2..10}; do
for nf in 5 10 15; do
if [ -z "$list" ]
then
list=acc-$nc-$nf
else
list=$list,acc-$nc-$nf
fi
done
done
echo $list
timestamp¶
timestamp=$(date +"%Y-%m-%dT%H:%M:%S")
echo $timestamp
# 2020-02-11T10:51:42
compare two timestamps¶
d1=$(date -d "2019-09-22 20:07:25" +'%s')
d2=$(date -d "2019-09-22 20:08:25" +'%s')
if [ $d1 -gt $d2 ]
then
echo "d1 > d2"
else
echo "d1 < d2"
fi
where
-d
: display time described by STRING, not ‘now’ (fromman date
)+%[format-option]
: format specifiers (details formats refer toman date
, but I am curious why+
, no hints frommany date
, but here is one from date command in Linux with examples)-gt
: larger than,-lt
: less than; with equality,-ge
and-le
, (from Shell 基本运算符)- 条件表达式要放在方括号之间,并且要有空格, from Shell 基本运算符
refer to How to compare two time stamps?
globbing for ls
vs regular expression for find
¶
Support we want to get abc2.txt
as stated in Listing with ls
and regular expression
,
ls
does not support regular expressions, but it can work with globbing, or filename expressions.
ls *[!0-9][0-9].txt
where !
is complement.
Alternatively, we can use find -regex
,
find . -maxdepth 1 -regex '\./.*[^0-9][0-9]\.txt'
where
-maxdepth 1
disables recursive, and only to find files in the current directory
We also can add -exec ls
to get the output of ls
, and change the regex type by -regextype egrep
.
strip first 2 character from a string¶
simplest way:
${string:2}
some alternatives refer to How can I strip first X characters from string using sed?, or Remove first character of a string in Bash
select the first field¶
given filename file.txt
, want to get a string file_test
.
a=$(cut -d'.' -f1 <<< $1)_test
echo $a
where -d'.'
is to define the delimiter, and then -f1
get the first field.
If we need to get the last field, we can use rev
, i.e.,
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
refer to How to find the last field using ‘cut’
Multiple IFS
¶
while IFS= read -a ADDR; do
IFS=':' read -a Line <<< $ADDR
echo ${Line[0]};
done < <(grep -nE "finished" slurm-37985.out)
will also output the numbers of the finished line.
<()
is called process substitution<<<
is known ashere string
, and different from<<
,<
refer to How can I store the “find” command results as an array in Bash
my working case:
files=()
start_time=$(date -d "2019-09-21T14:11:16" +'%s')
end_time=$(date -d "2019-09-22T20:07:00" +'%s')
while IFS= read -r -d $'\0'; do
IFS='_' read -ra ADDR <<< "$REPLY"
timestamp=$(date -d ${ADDR[2]} +'%s')
if [ $timestamp -ge $start_time -a $timestamp -lt $end_time ]; then
curr_folder="${ADDR[0]}_${ADDR[1]}_${ADDR[2]}"
files+=("${ADDR[0]}_${ADDR[1]}_${ADDR[2]}")
qsub -v folder=${curr_folder} revisit_sil_parallel.job
fi
done < <(find . -maxdepth 1 -regex "\./oracle_setting_2019-09-.*recall\.pdf" -print0)
链接自动推送¶
find -regex "\./.*\.html" | sed -n "s#\./#https://esl.hohoweiya.xyz/#p" >> ../urls.txt
Only show directory¶
ls -d */
refer to Listing only directories using ls in Bash?
My application: TeXtemplates: create a tex template
Check whether a certain file type/extension exists in directory¶
if ls *.bib &>/dev/null; then
#
fi
refer to Check whether a certain file type/extension exists in directory
My application: TeXtemplates: create a tex template
path of the script¶
get the path of the current scripts
CURDIR=`/bin/pwd`
BASEDIR=$(dirname $0)
ABSPATH=$(readlink -f $0)
ABSDIR=$(dirname $ABSPATH)
refer to darrenderidder/bashpath.sh
if¶
We can add &> /dev/null
to hidden the output information in the condition of if
. For example, check if user exists,
#!/bin/bash
# refer to https://blog.51cto.com/64314491/1629175
if id $1 &> /dev/null; then
echo "$1 exists"
else
echo "$1 is not exists"
fi
文件测试¶
格式为 [ EXPR FILE ]
,其中常见 EXPR
有
-f
: 测试其是否为普通文件,即ls -l
时文件类型为-
的文件-d
: 测试其是否为目录文件,即ls -l
时文件类型为d
的文件-e
: 测试文件是否存在;存在为真,否则为假-r
: 测试文件对当前用户来说是否可读-w
: 测试文件对当前用户来说是否可写-x
: 测试文件对当前用户来说是否可执行-s
: 测试文件大小是否不空,不空则真,空则假
例子
if [ ! -e /tmp/test ]; then
mkdir /tmp/test
fi
refer to bash条件判断之if语句
[
(aka test
) vs [[
¶
Refer to [What is the difference between test, [ and [ ?
Both are used to evaluate expressions, but
[[
works only in Korn shell, Bash, Zsh, and recent versions of Yash and busyboxsh
[
is POSIX utilities (generally builtin)
But there are some differences:
- no word splitting or glob expansion will be done for
[[
, i.e., many arguments need not be quoted, while[
usually should be quoted - parentheses in
[[
do not need to be escaped
also see
- What do square brackets mean without the “if” on the left?
- Is double square brackets preferable over single square brackets in Bash?
$[
¶
Refer to What does a dollar sign followed by a square bracket mean in bash?
With $
, [
is also can be used for arithmetic expansion, such as
$ echo $[ $RANDOM % 2 ]
0 # 1
$ echo $[ 1+2 ]
3
and actually $[
syntax is an early syntax that was deprecated in favor of $((
, although it’s not completely removed yet.
=
vs ==
vs -eq
¶
from the above discussion:
==
is a bash-ism=
is POSIX
In bash the two are equivalent, but in plain sh =
is the only one guaranteed to work. And these two are for string comparisons, while -eq
is for numerical ones.
refer to Shell equality operators (=, ==, -eq)
compare string¶
grep keep the first line (use sed
instead)¶
Refer to Include header in the ‘grep’ result
I am using
$ sinfo -o "%P %N %C %G" -N | grep gpu
to get the GPU status of the nodes on the cluster, but the header cannot be kept, then I tried
$ sinfo -o "%P %N %C %G" -N | { head -1; grep gpu; }
but it only shows the header
Next I got the excellent solution via sed
,
$ sinfo -o "%P %N %C %G" -N | sed -n "1p;/gpu/p"
and it can hide the highlighter of gpu
.
compare two blocks in a txt file¶
for example, compare L82-95 with L108-123,
$ diff <(sed -n "82,95p" measure.jl) <(sed -n "108,123p" measure.jl)
default value¶
${1:-foo}
: if parameter is unset or null, the expansion of word is substituted.${1-foo}
: only substitute if parameter is unset.
refer to How to write a bash script that takes optional input arguments?
applications: