Command-line Tools¶
aria2¶
Idle
It is a lightweight multi-protocol & multi-source command-line download utility.
Homepage: https://aria2.github.io/
awk
¶
basic usage¶
awk
标准用法为
awk '/pattern/ {print "$1"}'
/pattern/
可以是正则表达式,也可以是两个特殊的pattern,
BEGIN
: execute the action(s) before any input lines are readEND
: execute the action(s) before it actually exits
默认分隔符为空格,或者通过 -F
指定其它的分隔符,
$ echo 'a b' | awk '{print $2}'
b
$ echo 'a,b' | awk -F, '{print $2}'
b
其中 $i
指第 i 列,而 $0
指整条记录(including leading and trailing whitespace. ),详见 man awk
.
另外上述脚本中用到了 awk
支持跨行状态的特性,即
$ echo -e 'a b \n c d' | awk '{print $2}'
b
d
其中 -e
是为了 escape 换行符,最后一列也可以用 $NF
表示,
$ echo -e 'a b \n c d' | awk '{print $NF}'
b
d
如果想要得到列数,则使用 NF
,
$ echo -e 'a b \n c d' | awk '{print NF}'
2
2
如果默认每行列数相等,只想得到列数的话,可以使用
$ echo -e 'a b \n c d' | awk '{print NF; exit}'
2
如果数据文件为 file.txt
,则可以直接用
awk '{print NF; exit}' file.txt
如果需要输出行数,则用
awk '{print NR, ":", $0}' file.txt
如果想从 0 开始,则改成 NR-1
.
processing two files¶
$ awk 'NR == FNR {# some actions; next} # other condition {# other actions}' file1.txt file2.txt
where
NR
: stores the total number of input records read so far, regardless of how many files have been read.FNR
: stores the number of records read from the current file being processed.- so
NR == FNR
is only true when reading the first file next
prevents the other condition/actions when reading the first file
Info
A single number 1
can also serve as the condition, which means True, see also awk ‘{sub(/pattern/, “foobar”)} 1’.
“1” is an always-true pattern; the action is missing, which means that it is
{print}
(the default action that is executed if the pattern is true).
refer to Idiomatic awk (A fantastic tutorial with many examples)
For example, print the specific lines in test.md
whose row numbers are defined in line.txt
,
$ awk 'FNR==NR{wanted[$0]; next} FNR in wanted' lines.txt test.md
first came across in selecting a large number of (specific) rows in file - Stack Overflow, but it used wanted[$0]++
, which does not make differences.
FPAT: split field in double quotes
There might be commas outside the double quotes,
$ head -n1 pheno_eur.csv | awk 'BEGIN{ FPAT="([^,]+)|(\"[^\"]+\")" } {print $1 $2 $3 $1356 $1987 $1986}'
"eid""sex""age_recruit""Body mass index (BMI)""Systolic blood pressure, automated reading""Diastolic blood pressure, automated reading"
skip the first row
awk 'NR > 1{print $8}'
split strings
echo "1:2:3" | awk '{split($0, a, ":"); print a[1]}'
sum of a column of numbers¶
awk '{s+=$1} END {print s}' data.txt
refer to Bash command to sum a column of numbers - Stack Overflow for other approaches.
For example, sum up the memory usage,
$ ps -e -o pid,cmd,%mem --sort=-%mem | awk 'NR > 1{s+=$NF} END {print s}'
select lines with conditions¶
- select lines whose 2nd column is not empty: usage of
!~
$ echo -e 'a \n c d' | awk '$2 !~ /^$/{print $2}'
d
^M character needs to use \r
^M
character is also invisible, we can check it via cat -v
. To match such a character, we need \r
, see also .
- select lines whose 2nd column is not either empty or
-
: usage of|
$ echo -e 'a -\n c d' | awk '$2 !~ /-|^$/{print $2}'
d
- select lines whose both 2nd column and 3rd column are not empty: usage of
&&
$ echo -e 'a - 1 \n c d 3 \n 5 6 ' | awk '$2 !~ /-|^$/ && $3 !~ /^$/ {print $2}'
d
cat
¶
- add text to the beginning of a file
echo 'task goes here' | cat - todo.txt > temp && mv temp todo.txt
where -
represents the standard input, see also
alternatively, we can use sed
, refer to for more details
column
¶
display the csv beautifully,
$ head file1.txt | column -s, -t
if two files share the same columns,
$ (head file1.txt; head file2.txt) | column -s, -t
where
-s,
: specify the delimiter as,
-t
: print in a table
refer to View tabular file such as CSV from command line
convert
¶
图片裁剪¶
在连接显示器状态下,全屏截图时有一个屏幕是多余的,可以批量裁剪
# 截右屏
$ ls -1 | xargs -I {} convert {} -crop 1920x1200+1920+0 crop_{}
# 截左屏
$ ls -1 | xargs -I {} convert {} -crop 1920x1200+0+0 crop_{}
其中 -crop
参数格式为 width
xheight
+left
+top
.
Tip
Ctrl+Alt+PrtSc
可以只截鼠标所在屏幕。
图片拼接¶
# 水平方向
convert +append *.png out.png
# 垂直方向
convert -append *.png out.png
-resize xW
: 高度一样
如果同时想两张图片高度一样,则加入 -resize xW
语法,如
# NOT work
$ convert map.png IMG_20210808_172104.jpg +append -resize x600 /tmp/p1.png
# work
$ convert +append map.png IMG_20210808_172104.jpg -resize x600 /tmp/p1.png
但是要注意此时 +append
要放在前面,也就是 -resize
需要紧跟着图片。
参考 Merge Images Side by Side(Horizontally) - Stack Overflow
但是注意 -resize
会使得 orientation 无效,然后图片会发生旋转,参考 ImageMagick convert rotates images during resize,使用 -auto-orient
参数,就能避免丢失图片中的 orientation 信息,
$ convert -auto-orient +append map.png IMG_20210808_172104.jpg -resize x600 /tmp/p2.png
run in Julia
fignames = "/tmp/1-cv_optim_" .* string.(σs) .* ".png"
run(`convert $fignames +append /tmp/cv_optim.png`)
缩小图片大小¶
# only specify the wide as 1024 pixel to keep the aspect ratio
convert input.png -resize 1024x out.png
convert input.png -quality 50% out.png
合并jpg到pdf¶
参考convert images to pdf: How to make PDF Pages same size
直接采用
pdftk A.pdf B.pdf cat output merge.pdf
得到的pdf中页面大小不一致,于是采用下面的命令
convert a.png b.png -compress jpeg -resize 1240x1753 \
-extent 1240x1753 -gravity center \
-units PixelsPerInch -density 150x150 multipage.pdf
注意重点是 -density 150x150
,若去掉这个选项,则还是得不到相同页面大小的文件。
另外,上述命令是对于.png
而言的,完全可以换成.jpg
。
同时,注意1240x1753
中间是字母x
.
pdf 转为 jpg¶
-quality 100
控制质量
-density 600x600
控制分辨率
并注意参数放置文件的前面
pdf 转 png 更好的命令是 pdftoppm
,参考 How to convert PDF to Image?
pdftoppm alg.pdf alg -png -singlefile
图片质量比 convert
好很多!!
convert imgs to pdf¶
ls -1 ./*jpg | xargs -L1 -I {} img2pdf {} -o {}.pdf
pdftk likelihoodfree-design-a-discussion-{1..13}-1024.jpg.pdf cat output likelihoodfree-design-a-discussion.pdf
注意这里需要用 ls -1
,如果 ll
则第一行会有 total xxx
的信息,即 ll | wc -l
等于 ls -1 | wc -l
+ 1,而且在我的 Ubuntu 18.04 中,ll
甚至还会列出
./
../
这一点在服务器上没看到。
adjust brightness and contrast¶
Info
Here is one example used in my project.
$ convert -brightness-contrast 10x5 input.jpg output.jpg
where 10x5
increases the brightness 10 percent and the contrast 5 percent.
These two values range from -100 to 100, and
- negative value: decrease
- zero: leave it off
- positive value: increase
more details refer to ImageMagick: Annotated List of Command-line Options
As an alternative, the GUI software Shotwell
also provides similar functions, just clicking enhance
.
cd
¶
cd "$(dirname "$0")"
: cd current directory
cp
¶
the common usage is cp SOURCE DEST
, but if we want to copy multiple files into a single folder at once, we can use
cp -t DIRECTORY SOURCE
where SOURCE
can be multiple files, inspired from Copying multiple specific files from one folder to another - Ask Ubuntu
curl
¶
-O
: save locally with the same remote name
cut
¶
get the first field
To select the first field of a file file.txt
,
a=$(cut -d'.' -f1 <<< $1)_test
echo $a
where -d'.'
is to define the delimiter, and then -f1
get the first field.
get the last field
If we need to get the last field, we can use rev
, i.e.,
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
refer to How to find the last field using ‘cut’ and 10 command-line tools for data analysis in Linux
get multiple fields
-f 1-10
date
¶
timestamp=$(date +"%Y-%m-%dT%H:%M:%S")
echo $timestamp
# 2020-02-11T10:51:42
we can compare two timestamps as follows
d1=$(date -d "2019-09-22 20:07:25" +'%s')
d2=$(date -d "2019-09-22 20:08:25" +'%s')
if [ $d1 -gt $d2 ]
then
echo "d1 > d2"
else
echo "d1 < d2"
fi
where
-d
: display time described by STRING, not ‘now’ (fromman date
) Alternatively, we can use format-d "-10 days -8 hours -Iseconds"
to refer to the timestamp based on the current date, and-Iseconds
specifies the unit asseconds
, see the application in Git: change-commit-time.+%[format-option]
: format specifiers (details formats refer toman date
, but I am curious why+
, no hints frommany date
, but here is one from date command in Linux with examples)-gt
: larger than,-lt
: less than; with equality,-ge
and-le
, (from Shell 基本运算符)- 条件表达式要放在方括号之间,并且要有空格, from Shell 基本运算符
refer to How to compare two time stamps?
diff
¶
$ diff folder1 folder2
du
¶
- list size of subdirectories/files:
du -shc *
, where-c
outputs the total. If sort according to size with| sort -n
, the option-h
is problematic, sincesort
cannot recognize the unitsK, M
. Hopefully,sort
also supports-h
option, so just use| sort -h
.
emacs
¶
Uninstalled
常用命令¶
- 切换缓存区:C-o
- 水平新建缓存区:C-2
- 垂直新建缓存区:C-3
- 关闭当前缓存区:C-0
- 删除缓存区:C-k
- 只保留当前缓存区:C-1
Emacs使用Fcitx中文¶
参考博客:fcitx-emacs
- Step 1: 确定系统当前支持的字符集
locale -a
若其中有 zh_CN.utf8,则表明已经包含了中文字符集。
- Step 2: 设置系统变量
emacs ~/.bashrc
export LC_CTYPE=zh_CN.utf8
source ~/.bashrc
echo
¶
string started with -
¶
$ a="-n 1"
$ echo $a
1
$ echo "$a"
-n 1
double quotes are necessary, otherwise it would be treat as the option for echo
. But if the string is pure -
, the double quotes also failed,
$ b="-n"
$ echo "$b"
use printf
would be more proper,
$ printf "%s\n" "$b"
-n
$ printf "%s\n" $b
-n
and no need to add double quotes.
refer to Bash: echo string that starts with “-”
print bytes¶
echo -n -e '\x66\x6f\x6f'
do not miss quotes, and -e
is also necessary, refer to echo bytes to a file
different save behavior¶
a column of elements would be stored in an array, then save via echo
would result one line.
$ awk '{print $1}' duplicated.idx > t1.txt
$ cat t1.txt
2
2
$ t1=$(awk '{print $1}' duplicated.idx)
$ echo $t1 > t2.txt
$ cat t2.txt
2 2
ffmpeg
¶
提取音频¶
下载 B 站视频歌曲后,提取音频
ffmpeg -i input.mp4 output.mp3
B 站视频下载方法
最简单的方法是使用 you-get
工具,于是一行搞定,
you-get URL
除此之外,也可以手动下载:
- F12 打开开发者工具,并选择移动版页面
- 切换至 Network 下的 Media,然后 F5 刷新
- 网页上点击播放,等待缓存
- 右键点击缓存文件,选择 Copy > Copy link address,即可得到视频链接
去除音频¶
ffmpeg -i .\input.mp4 -map 0:0 -vcodec copy out.mp4
慢速播放和快速播放¶
# 2 times faster
$ ffmpeg -i input.mkv -filter:v "setpts=0.5*PTS" output.mkv
但是如果只对视频快速播放,而不处理音频,则文件的总时长仍不变。如果只关注视频,可以先去除音频,然后再做变速处理。
除此之外,还可以通过先转成 raw bitstream 文件(未尝试),详见 How to speed up / slow down a video – FFmpeg
对于 GIF 文件,还可以用 convert
中的 -delay
选项实现。两者区别在于,前者会丢帧,而后者不会,
$ convert -delay 10 input.gif output.gif
refer to https://infoheap.com/imagemagick-convert-edit-animated-gif-speed-fps/
视频旋转¶
直接用
ffmpeg -i in.mov -vf "transpose=1" out.mov
然后报错 “The encoder ‘aac’ is experimental but experimental codecs are not enabled”
注意添加 -strict -2
要注意放置位置,一开始直接在上述命令后面加入,但失败,应该写成
ffmpeg -i in.mov -vf "transpose=1" -strict -2 out.mov
视频剪切¶
ffmpeg -ss 00:00:30.0 -i input.wmv -c copy -t 00:00:10.0 output.wmv
where
- (optional)
-ss
specifies the start timestamp, the format isHH:MM:SS.xxx
- (optional)
-t
specifies the duration, or use-to
to specifies the end timestamp
refer to Using ffmpeg to cut up video
concat¶
$ ffmpeg -f concat -safe 0 -i <(echo file $PWD/8xonlyVID_20210808_170208.mp4; echo file $PWD/8xonlyVID_20210808_170328.mp4) -c copy 8xonlyVID_20210808_170208+328.mp4
note that $PWD
is necessary, otherwise it throws
Impossible to open ‘/dev/fd/8xonlyVID_20210808_170328.mp4’ /dev/fd/63: No such file or directory
Also note that &
seems will print the file info reversely,
$ echo "1" & echo "2"
[5] 5142
2
[4] Done echo "1"
1
$ echo "1"; echo "2"
1
2
refer to How to concatenate two MP4 files using FFmpeg? - Stack Overflow
find
¶
$ find . -group group
$ find . -user user
refer to list files with specific group and user name
$ find . -name '*.md'
alternatively,
$ ls -R | grep '\.md$'
where .
needs to be escape and $
is necessary, otherwise it would match strings like rmd
.
- list all symbolic links
$ find . -type l -ls
usage of -exec
find /path [args] -exec [cmd] {} \;
where
{}
is a placeholder, similar inxargs
.\;
indicates that for each found result, the commandcmd
is executed once with the found result.
For example, convert the file encoding in szcf-weiya/Matlab30IAs
find . -name '*.m' -ls -exec iconv -f GB18030 {} -t UTF8 -o {} \;
grep
¶
-P
: perl-style regex-o
: only print the matched part instead of the whole line-v
: 反选
$ grep -oP "hello \K\w+" <<< "hello world"
world
where \K
is the short form of (?<=pattern)
as a zero-width look-behind assertion before the text to output, and (?=pattern)
can be used as a zero-width look-ahead assertion after the text to output. For example, extract the text between hello
and weiya
.
$ grep -oP "hello \K(.*)(?=, weiya)" <<< "hello world, weiya!"
world
or equivalently,
$ grep -oP "(?<=hello )(.*)(?=, weiya)" <<< "hello world, weiya!"world
world
note that the space is also counted,
$ grep -oP "(?<=hello)(.*)(?=, weiya)" <<< "hello world, weiya!"
world
refer to Can grep output only specified groupings that match? - Unix & Linux Stack Exchange
Info
- find all files given keywords, refer to How do I find all files containing specific text on Linux? - Stack Overflow
grep -rnw '/path/to/somewhere/' -e 'pattern'
For example, J asked me about a situation that python failed to print to the log file in real time, and I indeed remembered that I had came cross this situation, but cannot find the relative notes. So I am trying to find files given possible keywords, such as real time
, print
, and finally I got the results
$ grep -rnw docs/*/*.md -e '输出'
docs/julia/index.md:765:> HASH函数是这么一种函数,他接受一段数据作为输入,然后生成一串数据作为输出,从理论上说,设计良好的HASH函数,对于任何不同的输入数据,都应该以极高的概率生成不同的输出数据,因此可以作为“指纹”使用,来判断两个文件是否相同。
docs/Linux/index.md:588:发现一件很迷的事情,要加上 `-u` 才能实现实时查看输出。
docs/shell/index.md:125:1. 单引号里的任何字符都会原样输出,单引号字符串中的变量是无效的;
As a comparison, the search function provided by GitHub is not so powerful since no related results returned in the search link https://github.com/szcf-weiya/techNotes/search?q=%E8%BE%93%E5%87%BA&type=issues
When I perform it on syslog
, it did not return all matched results, and outputs,
$ grep -i failed syslog
Jul 24 13:17:11 weiya-ThinkPad-T460p gvfsd-metadata[13786]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Jul 24 14:02:53 weiya-ThinkPad-T460p gvfsd-metadata[13786]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Jul 24 14:02:53 weiya-ThinkPad-T460p gvfsd-metadata[13786]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Binary file syslog matches
refer to https://stackoverflow.com/questions/23512852/grep-binary-file-matches-how-to-get-normal-grep-output, add -a
option.
htop
¶
A much more powerful command than top
, refer to Find out what processes are running in the background on Linux
- setup: modify the information layout when there are many CPUs. Firstly select the panel from the rightmost column, and then press
Left
,Right
,Up
,Down
to move. The resulting configure file would be written into~/.config/htop/htoprc
.
journalctl
¶
- list log of previous boots
$ journalctl --list-boots
- display last boot log
$ journalctl -b-1
ln
¶
- with
-s
, create a soft link - without
-s
, create a hard link
A “hard link” is actually between two directory entries; they’re really the same file. And the number of the permission of ll
also shows the number of hard links, such as 2
in -rw-rw-r-- 2
.
the same file as another is they have the same inode number; no other file will have that.
We can get the inode
as follows,
$ stat resolve_utf.py | grep -i inode
Device: 811h/2065d Inode: 14716809 Links: 2
refer to How to find out a file is hard link or symlink?
ls
¶
-S
: sort by filesize
# only show directory
ls -d */
refer to Listing only directories using ls in Bash?
My application: TeXtemplates: create a tex template
check whether a certain file type/extension exists in directory¶
if ls *.bib &>/dev/null; then
#
fi
refer to Check whether a certain file type/extension exists in directory
My application: TeXtemplates: create a tex template
mkdir
¶
notify-send
¶
- use
critical
level: by default the message in the notification list cannot show full message when hovering it, it can display the message when it pops up. So an alternative is to extend the time of showing up. However, the manualhelp notify-send
tells that Ubuntu’s Notify OSD and GNOME Shell both ignore the expire time parameter-t
. Hopefully, we can set-u critical
to make the urgency level high, and it turns out that the pops up window would not disappear only when you click it. - show whole message: leave summary empty and only show body, but still only when mouse is hovering the pop window, see also
- escape
-
in the string, otherwise it throwsUnknown option
- specify icon
-i your_icon_path
, note that the path should be full path instead of relative path.
location of banner notification
Gnome Extension > Just Perfection > Customize > Notification Banner Position
paste
¶
按列拼接文本文件
### 按列
paste file1 file2 > outputfile
### 按行
cat file1 file2 > outputfile
See also
cat
convert a column to a row with delimiter ,
$ for i in {1..10}; do echo $i; done | paste -s -d','
1,2,3,4,5,6,7,8,9,10
where -s
aims to paste one file at a time instead of in parallel, which results in one line. Refer to how to concatenate lines into one string
Example
Paste files from list of paths into single output file
paste `cat filelist.txt` > output.txt
and
touch buffer.txt
cat filelist.txt | xargs -iXX bash -c 'paste buffer.txt XX > output.txt; mv output.txt buffer.txt';
mv buffer.txt output.txt
pdftk
¶
- split pdf pages:
pdftk all.pdf cat 1 output first.pdf
, see also arXiv. Alternatively, one can useprint to file
function provided by pdf viewer, such asevince
, particularly whenpdftk
failed like in Issue 45. - modify pdf metadata via
pdftk
pdftk input.pdf dump_data output metadata
# edit metadata
pdftk input.pdf update_info metadata output output.pdf
ps
¶
$ ps -aef
weiya 793892 7821 0 19:33 pts/10 00:00:00 /bin/bash
root 794217 2 0 19:35 ? 00:00:00 [kworker/3:0]
root 794336 2 0 19:35 ? 00:00:00 [kworker/u8:3]
ps axo user:20,pid,pcpu,pmem,vsz,rss,tty,stat,start,time,comm
alias psaux='ps axo user:20,pid,pcpu,pmem,vsz,rss,tty,stat,start,time,comm'
ps2pdf, pdf2ps
¶
reduce pdf file size¶
It can be used to reduce the pdf size.
Generally, there are two major reasons why PDF file size can be unexpectedly large (refer to Understanding PDF File Size).
- one or more fonts are stored inside PDF document.
- using images for creating PDF file.
I just got a large non-scanned pdf with size 136M, and it probably is due to many embedded fonts which can be checked in the properties.
Then I tried the command ps2pdf
mentioned in Reduce PDF File Size in Linux, the file size is significantly reduced, only 5.5M!
$ ps2pdf -dPDFSETTINGS=/ebook Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf
$ pdfinfo Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf
Creator:
Producer: Acrobat Distiller 8.0.0(Windows)
CreationDate: Tue Jul 26 20:43:43 2011 CST
ModDate: Fri Aug 19 19:57:50 2011 CST
Tagged: no
UserProperties: no
Suspects: no
Form: AcroForm
JavaScript: no
Pages: 504
Encrypted: no
Page size: 439.37 x 666.142 pts
Page rot: 0
File size: 142394146 bytes
Optimized: no
PDF version: 1.3
$ pdfinfo Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf
Creator:
Producer: GPL Ghostscript 9.26
CreationDate: Tue Apr 13 18:04:57 2021 CST
ModDate: Tue Apr 13 18:04:57 2021 CST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 504
Encrypted: no
Page size: 439.37 x 666.14 pts
Page rot: 0
File size: 5766050 bytes
Optimized: no
PDF version: 1.4
We can compare the fonts before/after reducing,
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf | wc -l
57
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf | wc -l
125
and it seems not directly to remove fonts. Instead, most font names have been modified. Besides, these are duplicated font names (column one), such as
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf | sed 1,2d - | awk '{print $1}' | sort | uniq -c
1 AMJQSV+LMSans8-Regular
...
13 OELTPO+LMMathItalic10-Regular
...
1 Times
2 TimesNewRoman
1 TimesNewRoman,Italic
3 Times-Roman
...
15 YCQSHP+LMRoman10-Bold
4 YWGCMO+LMMathSymbols7-Regular
16 ZMWYHT+LMRoman10-Regular
Count the number of unique names,
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf | sed 1,2d - | awk '{print $1}' | sort | uniq -c | wc -l
50
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf | sed 1,2d - | awk '{print $1}' | sort | uniq -c | wc -l
55
it shows that the reduced pdf does not have duplicated font names, (here the first two lines are removed, 57 = 55 + 2).
pdffonts
Another application of pdffonts
is to check if the font has been embedded. If not, it might cause some display issue, such as the non-embedded Symbol
in The PDF viewer ‘Evince’ on Linux can not display some math symbols correctly. In contrast, Adobe Reader already ships with application-embedded instances of some fonts, such as Symbol
, so it can render the pdf properly.
A remedy is to use gs
, see more details in the above reference.
flatten pdf file¶
Flattening a PDF means to merge separated contents of the document into one so that,
- Interactive elements in PDF forms such as checkboxes, tex boxes, radio buttons, drop-down lists are no longer fillable
- Annotations become “native text”
- Multiple layers of text, images, page numbers, and header styles turn into one single layer.
An easy way is
pdf2ps orig.pdf - | ps2pdf - flattened.pdf
some alternatives can be found in is-there-a-way-to-flatten-a-pdf-image-from-the-command-line.
rename
¶
Ubuntu 18.04 和 CentOS 7 中的 rename
不一样,
# Ubuntu 18.04
$ rename -V
/usr/bin/rename using File::Rename version 0.20
# CentOS 7
$ rename -V
rename from util-linux 2.23.2
用法也有差异,前者采用类似 sed
格式语句进行替换
rename -n 's/Sam3/Stm32/' *.nc /*确认需要重命名的文件*/
rename -v 's/Sam3/Stm32/' *.nc /*执行修改,并列出已重命名的文件*/
而后者需要将替换的字符串当作参数传入,并且只替换第一次出现的字符串,即
rename Sam3 Stm32 *.nc
参考
sar
¶
a tool for checking io wait, refer to https://unix.stackexchange.com/questions/55212/how-can-i-monitor-disk-io
sar
# read more history
sar -f /var/log/sa/sa04
sed
¶
- 打印特定行,比如第 10 行:
sed '10!d' file.txt
, 参考 Get specific line from text file using just shell script - 打印行范围,
sed -n '10,20p' file.txt
,则单独打印第 10 行也可以由sed -n '10p' file.txt
给出,如果采用分号;
则不是连续选择,而只是特定的行,参考 sed之打印特定行与连续行- 第一行到最后一行:
sed -n '1,$p'
- 第一行和最后一行:
sed -n '1p;$p'
, notsed -n '1;$p'
- 第一行到最后一行:
- 删除最后一行:
sed -i '$ d' file.txt
- 在 vi 中注释多行:按住 v 选定特定行之后,按住
:s/^/#/g
即可添加注释,取消注释则用:s/^#//g
. 另见 VI. - print lines between two matching patterns ():
/^pattern1/,/^pattern2/p
, and if one want to just print once, use/^pattern1/,${p;/^pattern2/q}
- insertion (refer to and )
- insert before the line of matched expression:
sed '/expr/i something-to-insert'
- insert after the line: replace
i
witha
- insert multiple lines: add
\n
in the text to insert, or add\
at each end of line, see also - insert at the beginning without new line: ,
sed -i '1s/^/<added text> /' file
r
: read a file and append it at the current point,sed '/EOF/r $thingToAdd' $fileToAddItTo
, see also
- insert before the line of matched expression:
- 竖线
|
元字符是元字符扩展集的一部分,用于指定正则表达式的联合。如果某行匹配其中的一个正则表达式,那么它就匹配该模式。 - directly replace hex string, such as
's/\xee\x81\xab/合/g'
, see also - replace multi-line string:
- swap two texts, use
\x0
as a temp storage. refer to
~$ echo "abbc" | sed 's/ab/\x0/g; s/bc/ab/g; s/\x0/bc/g'
bcab
Refer to
-r
: 扩展的正则表达式¶
参考Extended regexps - sed, a stream editor
摘录如下
The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.
就是说 basic 模式下,要使用特殊字符(如正则表达式中)需要转义,但 extended 模式相反,转义后表达的是原字符。
举个例子
abc?
becomesabc\?
when using extended regular expressions. It matches the literal string ‘abc?’.c\+
becomesc+
when using extended regular expressions. It matches one or more ‘c’s.a\{3,\}
becomesa{3,}
when using extended regular expressions. It matches three or more ‘a’s.\(abc\)\{2,3\}
becomes(abc){2,3}
when using extended regular expressions. It matches eitherabcabc
orabcabcabc
.\(abc*\)\1
becomes(abc*)\1
when using extended regular expressions. Backreferences must still be escaped when using extended regular expressions.
single or double quotes¶
When using double quotes, the string is first interpreted by the shell before being passed to sed
. As a result,
- more backslashes are needed (see also my answer)
$ echo "\alpha" | sed 's/\\alpha/\\beta/'
\beta
$ echo "\alpha" | sed "s/\\\alpha/\\\beta/"
\beta
- command expressions (also dollar expressions) would be evaluated firstly
$ echo '`date`' | sed 's/`date`/`uptime`/'
`uptime`
$ echo '`date`' | sed "s/`date`/`uptime`/"
`date`
refer to single quote and double quotes in sed - Ask Ubuntu
sendmail
¶
send mail on the command line. On the stapc-WSL, install it via
$ sudo apt install sendmail
monitor the updates of /var/log/apache2/error.log
. If the modified time is recent, then send email to alert.
while true; do
last_date=$(date -r /var/log/apache2/error.log +%s)
# curr_date=$(date -d "-1 mins" +%s)
sleep 1m
curr_date=$(date +%s)
if [[ $last_date > $curr_date ]]; then
(
cat <<-EOT
TO: ${TO}
From: ${FROM}
Subject: ${SUBJ}
There is some updates on the /var/log/apache2/error.log
EOT
) | sendmail -v ${TO}
fi
done
其中 date -r file
返回文件的上次修改时间,而 +%s
将时间转换为 seconds,方便进行比较,另外 -d "-1 mins"
能对时间进行加减处理。
sort
¶
- sort according to the third column:
sort -k 3,3 file.txt
- sort with header
:
cat your_data | (sed -u 1q; sort)
tail
¶
-f
: output appended data as the file grows, powerful for checking the log file in real time.
tesseract
¶
OCR text extraction: Tesseract OCR
$ tesseract tmp.png stdout -l eng+chi_sim quiet
where
quiet
redirects the warning messagestdout
directly outputs the results instead of writing into another text file
more details refer to man tesseract
.
tmux
¶
可以实现本地终端分屏。
Info
现在改用 Terminator
, 又称 X-terminal-emulator
。
还可以切换后台运行,在服务器上操作特别方便。
Info
此前还用过类似的软件,screen
,
screen -list #或screen -r
screen -r [pid] # 进入
### ctrl+A, 然后输入":quit"
更多用法详见 linux screen 命令详解,以及 Kill detached screen session - Stack Overflow
常用操作
# new a shell
tmux
# new a shell with name
tmux new -s NAME
# view all shell
tmux ls
# go back
tmux attach-session -t [NUM]
# simplify
tmux attach -t [NUM]
# more simplify
tmux a -t [NUM]
# via name
tmux a -t NAME
# complete reset: https://stackoverflow.com/questions/38295615/complete-tmux-reset
tmux kill-server
# rename: https://superuser.com/questions/428016/how-do-i-rename-a-session-in-tmux
Ctrl + B, $
# kill the current session
Ctrl + B, x
refer to
- How do I access tmux session after I leave it?
- Getting started with Tmux
- tmux cheatsheet
- see also: Tmux copy paste
type
¶
which
vs type
¶
在 CentOS7 服务器上,
$ which -v
GNU which v2.20, Copyright (C) 1999 - 2008 Carlo Wood.
GNU which comes with ABSOLUTELY NO WARRANTY;
This program is free software; your freedom to use, change
and distribute this program is protected by the GPL.
which
可以返回 alias 中的命令,而且更具体地,man which
显示可以通过选项 --read-alias
和 --skip-alias
来控制要不要包括 alias.
而在本地 Ubuntu 18.04 机器上,不支持 -v
或 --version
来查看版本,而且 man which
也很简单,从中可以看出其大致版本信息,29 Jun 2016
。
那怎么显示 alias 呢,type
可以解决这个问题,注意查看其帮助文档需要用 help
而非 man
。
$ type scp_to_chpc
scp_to_chpc is a function
scp_to_chpc ()
{
scp -r $1 user@host:~/$2
}
uchardet
¶
$ uchardet FILENAME
detect the file encoding
unar
¶
如果 zip 文件解压乱码,可以试试 unar,
采用 unar your.zip
参考 Linux文件乱码
虽然它会自动识别编码,但有时候处理中文压缩文件仍然出现乱码,比如解压 https://uploads.cosx.org/2011/03/SongPoem.tar.gz 这个文件,
这时通过 -e
指定编码
unar -e GB18030 ~/PDownloads/SongPoem.tar.gz
/home/weiya/PDownloads/SongPoem.tar.gz: Tar in Gzip
SongPoem.csv (4171055 B)... OK.
宋词.R (583 B)... OK.
Successfully extracted to "SongPoem".
但是!这只能保证压缩文件的文件名以指定的编码格式进行编码,文件内容仍然是乱码,于是仍需指定编码格式。为了一劳永逸,直接转换成 UTF8 格式,
$ iconv -f GB18030 SongPoem.csv -t UTF8 -o SongPoem.csv.utf8
$ iconv -f GB18030 宋词.R -t UTF8 -o script.R
uniq
¶
- count the frequency:
cat file.txt | sort | uniq -c
.- note that
sort
is necessary, otherwiseuniq
only performs locally - examples: classificacaoFinal
- note that
unzip
¶
unzip all .zip
file in a directory¶
tried unzip *.zip
but does not work, it seems that I missed something although I have checked man unzip
in which *
is indeed allowed, then I found
unzip \*.zip
in Unzip All Files In A Directory
Otherwise, use quotes "*.zip"
. More advancely, only zip files with character 3
,
unzip "*3*.zip"
unzip
和右键 Extract Here
的区别¶
对于 A.zip,假设内部结构为 dir/file
,则通过 unzip A.zip
会直接得到 dir/file
,而右键解压会得到 A/dir/file
.
vi
¶
u
: undo,ctrl+u
: redo
复制¶
- 单行复制: 在命令模式下,将光标移动到将要复制的行处,按“yy”进行复制;
- 多行复制: 在命令模式下,
nyy
+p
:6,9 co 12
:复制第6行到第9行之间的内容到第12行后面。- 设置标签,光标移到起始行(结束行,粘贴行),输入
ma
(mb
,mc
):'a, 'b co 'c
。
Tip
将 co
改成 m
就变成剪切了。
删除¶
- 删除光标后的字符
d$
:.,$d
: 删除当前行到最后一行
参考 How to Delete Lines in Vim / Vi
去除 BOM¶
BOM (byte-order mark, 字节顺序标记) 是位于码点 U+FEFF
的统一码字符的名称。
在UTF-8中,虽然在 Unicode 标准上允许字节顺序标记的存在,但实际上并不一定需要。UTF-8编码过的字节顺序标记则被用来标示它是UTF-8的文件。它只用来标示一个UTF-8的文件,而不用来说明字节顺序。许多视窗程序(包含记事本)会需要添加字节顺序标记到UTF-8文件,否则将无法正确解析编码,而出现乱码。然而,在类Unix系统(大量使用文本文件,用于文件格式,用于进程间通信)中,这种做法则不被建议采用。因为它会妨碍到如解译器脚本开头的Shebang等的一些重要的码的正确处理。它亦会影响到无法识别它的编程语言。如gcc会报告源码档开头有无法识别的字符。
如果需要去除 BOM,直接 vim 打开,
:set nobomb
:wq
参考
Ctrl+s 假死¶
vim并没有死掉,只是停止向终端输出而已,要想退出这种状态,只需按 Ctrl + q
即可恢复正常。
执行当前脚本¶
:!%
其中 %
expands current file name,另外
:! %:p
会指定绝对路径,而如果路径中有空格,则用
:! "%:p"
参考
write with sudo¶
For example, as said in How does the vim “write with sudo” trick work?
:w !sudo tee %
and such reference gives a more detailed explanation for the trick.
打开另外一个文件¶
参考
对每行行首进行追加、替换¶
按住 v 或者 V 选定需要追加的行,然后再进入 :
模式,输入正常的 sed
命令,如
s/^/#/g
全选:VggG
或者 ggVG
,其中
gg
跳至第一行,G
跳到最后一行
参考 what is the command for “Select All” in vim and VsVim?
wget
¶
wget a series of files in order¶
下载连续编号的文件,如
wget http://work.caltech.edu/slides/slides{01..18}.pdf
参考 Wget a series of files in order
wget
vs curl
¶
wget
不用添加 -O
就可以将下载的文件存储下来,但是 curl
并不默认将下载的文件存入本地文件,除非加上 -o
选项,而 wget
的 -O
只是为了更改文件名。
比如这里,直接将下载的内容输出到下一个命令
curl -sL https://dl.winehq.org/wine-builds/winehq.key | apt-key add -
更多比较详见 What is the difference between curl and wget?
xargs
¶
mv files¶
use -I {}
to replace some string.
ls | grep 'config[0-9].txt' | xargs -I {} mv {} configs/
see more details in mv files with | xargs
rm files¶
it is safer to check the files before appending rm
into the pipeline.
ls | grep ".txt" | xargs -I {} rm -rf {}
an application is asked by @van1yu3
在当前目录下有10个子目录dir1-dir10,dir1-dir10里的文件都是相同的名字;我想要保留其中5个特定名字的文件,其他的删掉
then find
would be more suitable since a full path is required,
~/tmp4$ for i in {1..4}; do sh -c "mkdir $i; touch $i/foo.txt $i/bar.txt"; done
~/tmp4$ ls
1 2 3 4
~/tmp4$ tree
.
├── 1
│ ├── bar.txt
│ └── foo.txt
├── 2
│ ├── bar.txt
│ └── foo.txt
├── 3
│ ├── bar.txt
│ └── foo.txt
└── 4
├── bar.txt
└── foo.txt
4 directories, 8 files
~/tmp4$ find . -type f | grep -v "foo"
./2/bar.txt
./4/bar.txt
./1/bar.txt
./3/bar.txt
~/tmp4$ find . -type f | grep -v "foo" | xargs -I {} rm -f {}
~/tmp4$ tree
.
├── 1
│ └── foo.txt
├── 2
│ └── foo.txt
├── 3
│ └── foo.txt
└── 4
└── foo.txt
4 directories, 4 files
BTW, I tried the popular chatGPT to ask the question.