Skip to content

shell相关

教程参考

  1. 菜鸟教程

shell变量

  1. 定义变量时,变量名不加美元符号
  2. 变量名和等号之间不能有空格
  3. 变量名外面的花括号是可选的,加不加都行,加花括号是为了帮助解释器识别变量的边界,比如下面 for 循环中举的 ${skill} 例子。

iterators in for loop

  • separated by space
for skill in Ada Coffe Action Java; do
    echo "I am good at ${skill}Script"
done
  • start..stop..length
for i in {5..50..5}; do
  echo $i
done
  • construct an array

actually, it can be used to construct an array,

arr=({1..10..2})
echo ${arr[@]}
for i in ${arr[@]}; do
  echo $i
done
  • seq

alternatively, we can use seq,

for i in $(seq 5 5 50); do
    echo $i
done

shell字符串

  1. 单引号里的任何字符都会原样输出,单引号字符串中的变量是无效的;
  2. 单引号字串中不能出现单引号(对单引号使用转义符后也不行)。
  3. 双引号里可以有变量
  4. 双引号里可以出现转义字符

shell数组

  1. 在Shell中,用括号来表示数组,数组元素用“空格”符号分割开。

sed用法

参考

  1. sed命令_Linux sed 命令用法详解:功能强大的流式文本编辑器
  2. sed & awk常用正则表达式 - 菲一打 - 博客园
  • 打印特定行,比如第 10 行:sed '10!d' file.txt, 参考 Get specific line from text file using just shell script
  • 打印行范围,sed -n '10,20p' file.txt,则单独打印第 10 行也可以由 sed -n '10p' file.txt 给出,如果采用分号 ; 则不是连续选择,而只是特定的行,参考 sed之打印特定行与连续行
    • 第一行到最后一行:sed -n '1,$p'
    • 第一行和最后一行:sed -n '1p;$p', not sed -n '1;$p'
  • 删除最后一行:sed -i '$ d' file.txt

|的作用

竖线(|)元字符是元字符扩展集的一部分,用于指定正则表达式的联合。如果某行匹配其中的一个正则表达式,那么它就匹配该模式。

-r的作用

也就是使用扩展的正则表达式

参考Extended regexps - sed, a stream editor

摘录如下

The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.

就是说 basic 模式下,要使用特殊字符(如正则表达式中)需要转义,但 extended 模式相反,转义后表达的是原字符。

举个例子

  1. abc? becomes abc\? when using extended regular expressions. It matches the literal string ‘abc?’.
  2. c\+ becomes c+ when using extended regular expressions. It matches one or more ‘c’s.
  3. a\{3,\} becomes a{3,} when using extended regular expressions. It matches three or more ‘a’s.
  4. \(abc\)\{2,3\} becomes (abc){2,3} when using extended regular expressions. It matches either abcabc or abcabcabc.
  5. \(abc*\)\1 becomes (abc*)\1 when using extended regular expressions. Backreferences must still be escaped when using extended regular expressions.

实战一

![IMG_0802](https://user-images.githubusercontent.com/13688320/72489850-733ae480-3850-11ea-8e51-15021588a7e6.jpg)

替换成

[IMG_0802]: https://user-images.githubusercontent.com/13688320/72489850-733ae480-3850-11ea-8e51-15021588a7e6.jpg

解决方案

sed -i "s/\!\[IMG_\([0-9]\{4\}\)\](\(.*\))/\[IMG_\1\]\: \2/g" FILENAME
  • \(\) 用于匹配子串,并可以通过 \1, \2 引用
  • \! 需要 escape
  • \2 前面的空格不需要写成 [ ],不然会直接出现 [ ],而之前某次为了匹配多个空格需要写成 [ ]*

人总是善变的,过了一段时间,我又想把这些 img 下载到本地文件夹,但是之前处理过的文件都删掉了,只剩下传到 github 上的了,所以我首先要把文件下载到合适的位置并重命名。比如对于文件 _posts/2019-12-21-quant-genetics.md,只保留了 https://user-images.githubusercontent.com/ 的链接,采用下面的脚本下载到合适的位置并重命名,

grep -E "https://user-images." _posts/2019-12-21-quant-genetics.md | while read -a ADDR; do if [ ${#ADDR[@]} -eq 2 ]; then proxychains wget ${ADDR[1]} -O images/2019-12-21-quant-genetics/${ADDR[0]:1:8}.jpg; fi; done

其中

  • ADDR[0]:1:8 是所谓的 “Parameter Expansion” ${parameter:offset:length},用于提取特定范围的子串
  • wget -O 是重命名,这里顺带移动到合适的位置
  • proxychains 则是用于科学上网
  • read -a ADDR 表示将分割后的字符串(比如默认按照空格进行分割,或者指定 IFS=)放进数组 ADDR 中,详见 help read,而 man read 并没有给出参数列表。另外需要注意到数组 $ADDR 返回结果为 ${ADDR[0]}.

批量重命名

有时候下载文件时网站并没有区分同名文件,下载到本地后会出现 A.zipA (1).zip 的情况,但这两个并不是相同的文件,所以避免以后误删,决定重命名。不过此类文件有好几个,批量处理代码为

$ ls -1 | grep "(1)" | while read -a ADDR; do mv "${ADDR[0]} (1).zip" "${ADDR[0]}_SOMETHING.zip"; done

awk

参考技术|如何在Linux中使用awk命令

统计访问日志里每个 ip 访问次数

#!/bin/bash
cat access.log |sed -rn '/28\/Jan\/2015/p' > a.txt
cat a.txt |awk '{print $1}'|sort |uniq > ipnum.txt
for i in `cat ipnum.txt`; do
    iptj=`cat  access.log |grep $i | grep -v 400 |wc -l`
    echo "ip地址"$i"在2015-01-28日全天(24小时)累计成功请求"$iptj"次,平均每分钟请求次数为:"$(($iptj/1440)) >> result.txt
done

Refer to 用shell统计访问日志里每个ip访问次数

列的个数

如果间隔是空格,则直接用下面命令便可以得到列数,

awk '{print NF; exit}' file.txt

如果是其他的间隔符,比如 |,可以指定 -F'|'

参考 unix - count of columns in file

split string while reading files

specify IFS=.

  1. How to split a tab-delimited string in bash script WITHOUT collapsing blanks?
  2. Split String in shell script while reading from file
  3. Read a file line by line assigning the value to a variable

distribute jobs into queues

since different queues has different quota, try to assign the job into available nodes.

queue=(bigmem large batch)
queues=()
for ((i=0;i<12;i++)) do queues+=(${queue[0]}); done;
for ((i=0;i<20;i++)) do queues+=(${queue[1]}); done;
for ((i=0;i<15;i++)) do queues+=(${queue[2]}); done;

refer to

Command line arguments

refer to Taking Command Line Arguments in Bash

join elements of an array in Bash

arr=(a b c)
printf '%s\n' "$(IFS=,; printf '%s' "${arr[*]}")"
# a,b,c

where * or @ return all elements of such array.

refer to How can I join elements of an array in Bash?

A more complex way

list=
for nc in {2..10}; do
  for nf in 5 10 15; do
    if [ -z "$list" ]
    then
        list=acc-$nc-$nf
    else
        list=$list,acc-$nc-$nf
    fi
  done
done
echo $list

timestamp

timestamp=$(date +"%Y-%m-%dT%H:%M:%S")
echo $timestamp
# 2020-02-11T10:51:42

compare two timestamps

d1=$(date -d "2019-09-22 20:07:25" +'%s')
d2=$(date -d "2019-09-22 20:08:25" +'%s')
if [ $d1 -gt $d2 ]
then
  echo "d1 > d2"
else
  echo "d1 < d2"
fi

where

  • -d: display time described by STRING, not ‘now’ (from man date)
  • +%[format-option]: format specifiers (details formats refer to man date, but I am curious why +, no hints from many date, but here is one from date command in Linux with examples)
  • -gt: larger than, -lt: less than; with equality, -ge and -le, (from Shell 基本运算符)
  • 条件表达式要放在方括号之间,并且要有空格, from Shell 基本运算符

refer to How to compare two time stamps?

globbing for ls vs regular expression for find

Support we want to get abc2.txt as stated in Listing with ls and regular expression ,

ls does not support regular expressions, but it can work with globbing, or filename expressions.

ls *[!0-9][0-9].txt

where ! is complement.

Alternatively, we can use find -regex,

find . -maxdepth 1 -regex '\./.*[^0-9][0-9]\.txt'

where

  • -maxdepth 1 disables recursive, and only to find files in the current directory

We also can add -exec ls to get the output of ls, and change the regex type by -regextype egrep.

strip first 2 character from a string

simplest way:

${string:2}

some alternatives refer to How can I strip first X characters from string using sed?, or Remove first character of a string in Bash

select the first field

given filename file.txt, want to get a string file_test.

a=$(cut -d'.' -f1 <<< $1)_test
echo $a

where -d'.' is to define the delimiter, and then -f1 get the first field.

If we need to get the last field, we can use rev, i.e.,

echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev

refer to How to find the last field using ‘cut’

Multiple IFS

while IFS= read -a ADDR; do
        IFS=':' read -a Line <<< $ADDR
        echo ${Line[0]};
done < <(grep -nE "finished" slurm-37985.out)

will also output the numbers of the finished line.

refer to How can I store the “find” command results as an array in Bash

my working case:

files=()
start_time=$(date -d "2019-09-21T14:11:16" +'%s')
end_time=$(date -d "2019-09-22T20:07:00" +'%s')
while IFS=  read -r -d $'\0'; do
  IFS='_' read -ra ADDR <<< "$REPLY"
  timestamp=$(date -d ${ADDR[2]} +'%s')
  if [ $timestamp -ge $start_time -a $timestamp -lt $end_time ]; then
    curr_folder="${ADDR[0]}_${ADDR[1]}_${ADDR[2]}"
    files+=("${ADDR[0]}_${ADDR[1]}_${ADDR[2]}")
    qsub -v folder=${curr_folder} revisit_sil_parallel.job
  fi
done < <(find . -maxdepth 1 -regex "\./oracle_setting_2019-09-.*recall\.pdf" -print0)

链接自动推送

find -regex "\./.*\.html" | sed -n "s#\./#https://esl.hohoweiya.xyz/#p" >> ../urls.txt

Only show directory

ls -d */

refer to Listing only directories using ls in Bash?

My application: TeXtemplates: create a tex template

Check whether a certain file type/extension exists in directory

if ls *.bib &>/dev/null; then
  #
fi

refer to Check whether a certain file type/extension exists in directory

My application: TeXtemplates: create a tex template

path of the script

get the path of the current scripts

CURDIR=`/bin/pwd`
BASEDIR=$(dirname $0)
ABSPATH=$(readlink -f $0)
ABSDIR=$(dirname $ABSPATH)

refer to darrenderidder/bashpath.sh

if

We can add &> /dev/null to hidden the output information in the condition of if. For example, check if user exists,

#!/bin/bash
# refer to https://blog.51cto.com/64314491/1629175
if id $1 &> /dev/null; then
    echo "$1 exists"
else
    echo "$1 is not exists"
fi

文件测试

格式为 [ EXPR FILE ],其中常见 EXPR

  • -f: 测试其是否为普通文件,即 ls -l 时文件类型为 - 的文件
  • -d: 测试其是否为目录文件,即 ls -l 时文件类型为 d 的文件
  • -e: 测试文件是否存在;存在为真,否则为假
  • -r: 测试文件对当前用户来说是否可读
  • -w: 测试文件对当前用户来说是否可写
  • -x: 测试文件对当前用户来说是否可执行
  • -s: 测试文件大小是否不空,不空则真,空则假

例子

if [ ! -e /tmp/test ]; then
  mkdir /tmp/test
fi

refer to bash条件判断之if语句

[ (aka test) vs [[

Refer to [What is the difference between test, [ and [ ?

Both are used to evaluate expressions, but

  • [[ works only in Korn shell, Bash, Zsh, and recent versions of Yash and busybox sh
  • [ is POSIX utilities (generally builtin)

But there are some differences:

  • no word splitting or glob expansion will be done for [[, i.e., many arguments need not be quoted, while [ usually should be quoted
  • parentheses in [[ do not need to be escaped

also see

$[

Refer to What does a dollar sign followed by a square bracket mean in bash?

With $, [ is also can be used for arithmetic expansion, such as

$ echo $[ $RANDOM % 2 ]
0 # 1
$ echo $[ 1+2 ]
3

and actually $[ syntax is an early syntax that was deprecated in favor of $((, although it’s not completely removed yet.

= vs == vs -eq

from the above discussion:

  • == is a bash-ism
  • = is POSIX

In bash the two are equivalent, but in plain sh = is the only one guaranteed to work. And these two are for string comparisons, while -eq is for numerical ones.

refer to Shell equality operators (=, ==, -eq)

compare string

grep keep the first line (use sed instead)

Refer to Include header in the ‘grep’ result

I am using

$ sinfo -o "%P %N %C %G" -N | grep gpu

to get the GPU status of the nodes on the cluster, but the header cannot be kept, then I tried

$ sinfo -o "%P %N %C %G" -N | { head -1; grep gpu; }

but it only shows the header

Next I got the excellent solution via sed,

$ sinfo -o "%P %N %C %G" -N | sed -n "1p;/gpu/p"

and it can hide the highlighter of gpu.

compare two blocks in a txt file

for example, compare L82-95 with L108-123,

$ diff <(sed -n "82,95p" measure.jl) <(sed -n "108,123p" measure.jl)

default value

  • ${1:-foo}: if parameter is unset or null, the expansion of word is substituted.
  • ${1-foo}: only substitute if parameter is unset.

refer to How to write a bash script that takes optional input arguments?

applications: