Skip to content

Command-line Tools

awk

basic usage

awk 标准用法为

awk '/pattern/ {print "$1"}'

/pattern/ 可以是正则表达式,也可以是两个特殊的pattern,

  • BEGIN: execute the action(s) before any input lines are read
  • END: execute the action(s) before it actually exits

默认分隔符为空格,或者通过 -F 指定其它的分隔符,

$ echo 'a b' | awk '{print $2}'
b
$ echo 'a,b' | awk -F, '{print $2}'
b

其中 $i 指第 i 列,而 $0 指整条记录(including leading and trailing whitespace. ),详见 man awk.

另外上述脚本中用到了 awk 支持跨行状态的特性,即

$ echo -e 'a b \n c d' | awk '{print $2}'
b
d

其中 -e 是为了 escape 换行符,最后一列也可以用 $NF 表示,

$ echo -e 'a b \n c d' | awk '{print $NF}'
b
d

如果想要得到列数,则使用 NF,

$ echo -e 'a b \n c d' | awk '{print NF}'
2
2

如果默认每行列数相等,只想得到列数的话,可以使用

$ echo -e 'a b \n c d' | awk '{print NF; exit}'
2

如果数据文件为 file.txt,则可以直接用

awk '{print NF; exit}' file.txt

如果需要输出行数,则用

awk '{print NR, ":", $0}' file.txt

如果想从 0 开始,则改成 NR-1.

processing two files

$ awk 'NR == FNR {# some actions; next} # other condition {# other actions}' file1.txt file2.txt

where

  • NR: stores the total number of input records read so far, regardless of how many files have been read.
  • FNR: stores the number of records read from the current file being processed.
  • so NR == FNR is only true when reading the first file
  • next prevents the other condition/actions when reading the first file

Info

A single number 1 can also serve as the condition, which means True, see also awk ‘{sub(/pattern/, “foobar”)} 1’.

“1” is an always-true pattern; the action is missing, which means that it is {print} (the default action that is executed if the pattern is true).

refer to Idiomatic awk (A fantastic tutorial with many examples)

For example, print the specific lines in test.md whose row numbers are defined in line.txt,

$ awk 'FNR==NR{wanted[$0]; next} FNR in wanted' lines.txt test.md

first came across in selecting a large number of (specific) rows in file - Stack Overflow, but it used wanted[$0]++, which does not make differences.

sum of a column of numbers

awk '{s+=$1} END {print $s}' data.txt

refer to Bash command to sum a column of numbers - Stack Overflow for other approaches.

column

display the csv beautifully,

$ head file1.txt | column -s, -t

if two files share the same columns,

$ (head file1.txt; head file2.txt) | column -s, -t

where

  • -s,: specify the delimiter as ,
  • -t: print in a table

refer to View tabular file such as CSV from command line

convert

图片拼接

# 水平方向
convert +append *.png out.png
# 垂直方向
convert -append *.png out.png

参考 How do I join two images in Ubuntu?

如果同时想两张图片高度一样,则加入 -resize xW 语法,如

# NOT work
$ convert map.png IMG_20210808_172104.jpg +append -resize x600 /tmp/p1.png
# work
$ convert +append map.png IMG_20210808_172104.jpg -resize x600 /tmp/p1.png

但是要注意此时 +append 要放在前面,也就是 -resize 需要紧跟着图片。

参考 Merge Images Side by Side(Horizontally) - Stack Overflow

但是注意 -resize 会使得 orientation 无效,然后图片会发生旋转,参考 ImageMagick convert rotates images during resize,使用 -auto-orient 参数,就能避免丢失图片中的 orientation 信息,

$ convert -auto-orient +append map.png IMG_20210808_172104.jpg -resize x600 /tmp/p2.png

缩小图片大小

# only specify the wide as 1024 pixel to keep the aspect ratio
convert input.png -resize 1024x out.png
convert input.png -quality 50% out.png

参考How can I compress images?

合并jpg到pdf

参考convert images to pdf: How to make PDF Pages same size

直接采用

pdftk A.pdf B.pdf cat output merge.pdf

得到的pdf中页面大小不一致,于是采用下面的命令

convert a.png b.png -compress jpeg -resize 1240x1753 \
                      -extent 1240x1753 -gravity center \
                      -units PixelsPerInch -density 150x150 multipage.pdf

注意重点是 -density 150x150,若去掉这个选项,则还是得不到相同页面大小的文件。

另外,上述命令是对于.png而言的,完全可以换成.jpg

同时,注意1240x1753中间是字母x.

pdf 转为 jpg

-quality 100 控制质量 -density 600x600 控制分辨率

并注意参数放置文件的前面

pdf 转 png 更好的命令是 pdftoppm,参考 How to convert PDF to Image?

pdftoppm alg.pdf alg -png -singlefile

图片质量比 convert 好很多!!

convert imgs to pdf

ls -1 ./*jpg | xargs -L1 -I {} img2pdf {} -o {}.pdf
pdftk likelihoodfree-design-a-discussion-{1..13}-1024.jpg.pdf cat output likelihoodfree-design-a-discussion.pdf

注意这里需要用 ls -1,如果 ll 则第一行会有 total xxx 的信息,即 ll | wc -l 等于 ls -1 | wc -l + 1,而且在我的 Ubuntu 18.04 中,ll 甚至还会列出

./
../

这一点在服务器上没看到。

adjust brightness and contrast

Info

Here is one example used in my project.

$ convert -brightness-contrast 10x5 input.jpg output.jpg

where 10x5 increases the brightness 10 percent and the contrast 5 percent.

These two values range from -100 to 100, and

  • negative value: decrease
  • zero: leave it off
  • positive value: increase

more details refer to ImageMagick: Annotated List of Command-line Options

As an alternative, the GUI software Shotwell also provides similar functions, just clicking enhance.

cd

cp

the common usage is cp SOURCE DEST, but if we want to copy multiple files into a single folder at once, we can use

cp -t DIRECTORY SOURCE

where SOURCE can be multiple files, inspired from Copying multiple specific files from one folder to another - Ask Ubuntu

cut

To select the first field of a file file.txt,

a=$(cut -d'.' -f1 <<< $1)_test
echo $a

where -d'.' is to define the delimiter, and then -f1 get the first field.

If we need to get the last field, we can use rev, i.e.,

echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev

refer to How to find the last field using ‘cut’ and 10 command-line tools for data analysis in Linux

date

timestamp=$(date +"%Y-%m-%dT%H:%M:%S")
echo $timestamp
# 2020-02-11T10:51:42

we can compare two timestamps as follows

d1=$(date -d "2019-09-22 20:07:25" +'%s')
d2=$(date -d "2019-09-22 20:08:25" +'%s')
if [ $d1 -gt $d2 ]
then
  echo "d1 > d2"
else
  echo "d1 < d2"
fi

where

  • -d: display time described by STRING, not ‘now’ (from man date) Alternatively, we can use format -d "-10 days -8 hours -Iseconds" to refer to the timestamp based on the current date, and -Iseconds specifies the unit as seconds, see the application in Git: change-commit-time.
  • +%[format-option]: format specifiers (details formats refer to man date, but I am curious why +, no hints from many date, but here is one from date command in Linux with examples)
  • -gt: larger than, -lt: less than; with equality, -ge and -le, (from Shell 基本运算符)
  • 条件表达式要放在方括号之间,并且要有空格, from Shell 基本运算符

refer to How to compare two time stamps?

du

  • list size of subdirectories/files: du -shc *, where -c outputs the total

echo

string started with -

$ a="-n 1"
$ echo $a
1
$ echo "$a"
-n 1

double quotes are necessary, otherwise it would be treat as the option for echo. But if the string is pure -, the double quotes also failed,

$ b="-n"
$ echo "$b"

use printf would be more proper,

$ printf "%s\n" "$b"
-n
$ printf "%s\n" $b
-n

and no need to add double quotes.

refer to Bash: echo string that starts with “-”

echo -n -e '\x66\x6f\x6f'

do not miss quotes, and -e is also necessary, refer to echo bytes to a file

different save behavior

a column of elements would be stored in an array, then save via echo would result one line.

$ awk '{print $1}' duplicated.idx > t1.txt
$ cat t1.txt 
2
2
$ t1=$(awk '{print $1}' duplicated.idx)
$ echo $t1 > t2.txt
$ cat t2.txt 
2 2

ffmpeg: 视频处理

去除音频

参考 如何使用ffmpeg去除视频声音?

ffmpeg -i .\input.mp4 -map 0:0 -vcodec copy out.mp4

慢速播放和快速播放

# 2 times faster
$ ffmpeg -i input.mkv -filter:v "setpts=0.5*PTS" output.mkv

但是如果只对视频快速播放,而不处理音频,则文件的总时长仍不变。如果只关注视频,可以先去除音频,然后再做变速处理。

参考 ffmpeg 视频倍速播放 和 慢速播放

视频旋转

参考How can I rotate a video?

直接用

ffmpeg -i in.mov -vf "transpose=1" out.mov

然后报错 “The encoder ‘aac’ is experimental but experimental codecs are not enabled”

注意添加 -strict -2 要注意放置位置,一开始直接在上述命令后面加入,但失败,应该写成

ffmpeg -i in.mov -vf "transpose=1" -strict -2 out.mov

视频剪切

ffmpeg -ss 00:00:30.0 -i input.wmv -c copy -t 00:00:10.0 output.wmv

where

  • (optional) -ss specifies the start timestamp, the format is HH:MM:SS.xxx
  • (optional) -t specifies the duration, or use -to to specifies the end timestamp

refer to Using ffmpeg to cut up video

concat

$ ffmpeg -f concat -safe 0 -i <(echo file $PWD/8xonlyVID_20210808_170208.mp4; echo file $PWD/8xonlyVID_20210808_170328.mp4) -c copy 8xonlyVID_20210808_170208+328.mp4

note that $PWD is necessary, otherwise it throws

Impossible to open ‘/dev/fd/8xonlyVID_20210808_170328.mp4’ /dev/fd/63: No such file or directory

Also note that & seems will print the file info reversely,

$ echo "1" & echo "2"
[5] 5142
2
[4]   Done                    echo "1"
1
$ echo "1"; echo "2"
1
2

refer to How to concatenate two MP4 files using FFmpeg? - Stack Overflow

find

$ find . -group group
$ find . -user user

refer to list files with specific group and user name

grep

  • -P: perl-style regex
  • -o: only print the matched part instead of the whole line
$ grep -oP "hello \K\w+" <<< "hello world"
world

where \K is the short form of (?<=pattern) as a zero-width look-behind assertion before the text to output, and (?=pattern) can be used as a zero-width look-ahead assertion after the text to output. For example, extract the text between hello and weiya.

$ grep -oP "hello \K(.*)(?=, weiya)" <<< "hello world, weiya!"
world

or equivalently,

$ grep -oP "(?<=hello )(.*)(?=, weiya)" <<< "hello world, weiya!"world
world

note that the space is also counted,

$ grep -oP "(?<=hello)(.*)(?=, weiya)" <<< "hello world, weiya!"
 world

refer to Can grep output only specified groupings that match? - Unix & Linux Stack Exchange

grep -rnw '/path/to/somewhere/' -e 'pattern'

For example, J asked me about a situation that python failed to print to the log file in real time, and I indeed remembered that I had came cross this situation, but cannot find the relative notes. So I am trying to find files given possible keywords, such as real time, print, and finally I got the results

$ grep -rnw docs/*/*.md -e '输出'
docs/julia/index.md:765:> HASH函数是这么一种函数,他接受一段数据作为输入,然后生成一串数据作为输出,从理论上说,设计良好的HASH函数,对于任何不同的输入数据,都应该以极高的概率生成不同的输出数据,因此可以作为“指纹”使用,来判断两个文件是否相同。
docs/Linux/index.md:588:发现一件很迷的事情,要加上 `-u` 才能实现实时查看输出。
docs/shell/index.md:125:1. 单引号里的任何字符都会原样输出,单引号字符串中的变量是无效的;

As a comparison, the search function provided by GitHub is not so powerful since no related results returned in the search link https://github.com/szcf-weiya/techNotes/search?q=%E8%BE%93%E5%87%BA&type=issues

When I perform it on syslog, it did not return all matched results, and outputs,

$ grep -i failed syslog
Jul 24 13:17:11 weiya-ThinkPad-T460p gvfsd-metadata[13786]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Jul 24 14:02:53 weiya-ThinkPad-T460p gvfsd-metadata[13786]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Jul 24 14:02:53 weiya-ThinkPad-T460p gvfsd-metadata[13786]: g_udev_device_has_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
Binary file syslog matches

refer to https://stackoverflow.com/questions/23512852/grep-binary-file-matches-how-to-get-normal-grep-output, add -a option.

htop

A much more powerful command than top, refer to Find out what processes are running in the background on Linux

ln

  • with -s, create a soft link
  • without -s, create a hard link

A “hard link” is actually between two directory entries; they’re really the same file. And the number of the permission of ll also shows the number of hard links, such as 2 in -rw-rw-r-- 2.

the same file as another is they have the same inode number; no other file will have that.

We can get the inode as follows,

$ stat resolve_utf.py | grep -i inode
Device: 811h/2065d  Inode: 14716809    Links: 2

refer to How to find out a file is hard link or symlink?

ls

  • -S: sort by filesize

modify and change time

在找学习资料时,突然不是很确定当初是否已经在用这台笔记本了,所以想确定一下本机的装机时间,参考 How can I tell what date Ubuntu was installed?,主要时通过查看文件的上次修改时间,比如

$ ls -lt /var/log/installer/
total 1200
-rw-rw-r-- 1 root   root 464905 Dec  2  2016 initial-status.gz
-rw-r--r-- 1 root   root     60 Dec  2  2016 media-info
-rw------- 1 syslog adm  334743 Dec  2  2016 syslog
-rw------- 1 root   root   2467 Dec  2  2016 debug
-rw------- 1 root   root 407422 Dec  2  2016 partman
-rw------- 1 root   root     17 Dec  2  2016 version
-rw------- 1 root   root    956 Dec  2  2016 casper.log

又如

$ ls -lt /
...
drwxrwxr-x   2 root root       4096 Dec  2  2016 cdrom
drwx------   2 root root      16384 Dec  2  2016 lost+found
drwxr-xr-x   2 root root       4096 Apr 21  2016 srv

出现了 2016.04.21 的一条记录。但如果我加上 -c,结果竟然不一样

$ ls -clt /
...
drwxrwxr-x   2 root root       4096 Dec  2  2016 cdrom
drwxr-xr-x   2 root root       4096 Dec  2  2016 srv
drwx------   2 root root      16384 Dec  2  2016 lost+found

难道 ls 默认显示的时间不是上次修改时间吗??另外注意到 srv 其实是一个空文件夹。

这时我用 stat 进一步查看,

$ stat /srv
  File: /srv
  Size: 4096        Blocks: 8          IO Block: 4096   directory
Device: 825h/2085d  Inode: 1179649     Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-05-05 08:43:20.955106697 +0800
Modify: 2016-04-21 06:07:49.000000000 +0800
Change: 2016-12-02 02:46:47.363728274 +0800
 Birth: -

发现有两个修改时间,ModifyChange两者区别在于

  • Modify: the last time the file was modified (content has been modified)
  • Change: the last time meta data of the file was changed (e.g. permissions)

然后进一步查看 Windows 系统的时间,

$ ll -clt
...
drwxrwxrwx  1 weiya weiya       4096 Oct  1  2016 '$Recycle.Bin'/
drwxrwxrwx  1 weiya weiya          0 Sep 29  2016  FFOutput/
-rwxrwxrwx  2 weiya weiya   15151172 Jul  2  2016  WindowsDENGL.tt2*
-rwxrwxrwx  2 weiya weiya   16092228 Jul  2  2016  WindowsDENG.tt2*
-rwxrwxrwx  2 weiya weiya   16217976 Jul  2  2016  WindowsDENGB.tt2*
-rwxrwxrwx  1 weiya weiya     400228 Mar 19  2016  bootmgr*
-rwxrwxrwx  1 weiya weiya          1 Mar 19  2016  BOOTNXT*
drwxrwxrwx  1 weiya weiya       8192 Mar 18  2016  Boot/

最早可以追溯到 2016.03.18.

only show directory

ls -d */

refer to Listing only directories using ls in Bash?

My application: TeXtemplates: create a tex template

check whether a certain file type/extension exists in directory

if ls *.bib &>/dev/null; then
  #
fi

refer to Check whether a certain file type/extension exists in directory

My application: TeXtemplates: create a tex template

mkdir

mv

mv files with xargs

use -I {} to replace some str.

ls | grep 'config[0-9].txt' | xargs -I {} mv {} configs/

see more details in mv files with | xargs

see also: xargs命令_Linux xargs 命令用法详解:给其他命令传递参数的一个过滤器

paste, cat: 文本文件拼接

### 按列
paste file1 file2 > outputfile
### 按行
cat file1 file2 > outputfile

convert a column to a row with delimiter ,

$ for i in {1..10}; do echo $i; done | paste -s -d','
1,2,3,4,5,6,7,8,9,10

where -s aims to paste one file at a time instead of in parallel, which results in one line. Refer to how to concatenate lines into one string

pdftk

  • split pdf pages: pdftk all.pdf cat 1 output first.pdf, see also arXiv.
  • modify pdf metadata via pdftk
pdftk input.pdf dump_data output metadata
# edit metadata
pdftk input.pdf update_info metadata output output.pdf

ps2pdf, pdf2ps

reduce pdf file size

It can be used to reduce the pdf size.

Generally, there are two major reasons why PDF file size can be unexpectedly large (refer to Understanding PDF File Size).

  • one or more fonts are stored inside PDF document.
  • using images for creating PDF file.

I just got a large non-scanned pdf with size 136M, and it probably is due to many embedded fonts which can be checked in the properties.

Then I tried the command ps2pdf mentioned in Reduce PDF File Size in Linux, the file size is significantly reduced, only 5.5M!

$ ps2pdf -dPDFSETTINGS=/ebook Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf
$ pdfinfo Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf 
Creator:        
Producer:       Acrobat Distiller 8.0.0(Windows)
CreationDate:   Tue Jul 26 20:43:43 2011 CST
ModDate:        Fri Aug 19 19:57:50 2011 CST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           AcroForm
JavaScript:     no
Pages:          504
Encrypted:      no
Page size:      439.37 x 666.142 pts
Page rot:       0
File size:      142394146 bytes
Optimized:      no
PDF version:    1.3
$ pdfinfo Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf 
Creator:        
Producer:       GPL Ghostscript 9.26
CreationDate:   Tue Apr 13 18:04:57 2021 CST
ModDate:        Tue Apr 13 18:04:57 2021 CST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          504
Encrypted:      no
Page size:      439.37 x 666.14 pts
Page rot:       0
File size:      5766050 bytes
Optimized:      no
PDF version:    1.4

We can compare the fonts before/after reducing,

$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf | wc -l
57
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf | wc -l
125

and it seems not directly to remove fonts. Instead, most font names have been modified. Besides, these are duplicated font names (column one), such as

$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf | sed 1,2d - | awk '{print $1}' | sort | uniq -c
      1 AMJQSV+LMSans8-Regular
        ...
     13 OELTPO+LMMathItalic10-Regular
        ...
      1 Times
      2 TimesNewRoman
      1 TimesNewRoman,Italic
      3 Times-Roman
        ...
     15 YCQSHP+LMRoman10-Bold
      4 YWGCMO+LMMathSymbols7-Regular
     16 ZMWYHT+LMRoman10-Regular

Count the number of unique names,

$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic.pdf | sed 1,2d - | awk '{print $1}' | sort | uniq -c | wc -l
50
$ pdffonts Puntanen2011_Book_MatrixTricksForLinearStatistic_reduced.pdf | sed 1,2d - | awk '{print $1}' | sort | uniq -c | wc -l
55

it shows that the reduced pdf does not have duplicated font names, (here the first two lines are removed, 57 = 55 + 2).

pdffonts

Another application of pdffonts is to check if the font has been embedded. If not, it might cause some display issue, such as the non-embedded Symbol in The PDF viewer ‘Evince’ on Linux can not display some math symbols correctly. In contrast, Adobe Reader already ships with application-embedded instances of some fonts, such as Symbol, so it can render the pdf properly. A remedy is to use gs, see more details in the above reference.

flatten pdf file

Flattening a PDF means to merge separated contents of the document into one so that,

  • Interactive elements in PDF forms such as checkboxes, tex boxes, radio buttons, drop-down lists are no longer fillable
  • Annotations become “native text”
  • Multiple layers of text, images, page numbers, and header styles turn into one single layer.

An easy way is

pdf2ps orig.pdf - | ps2pdf - flattened.pdf

some alternatives can be found in is-there-a-way-to-flatten-a-pdf-image-from-the-command-line.

rename

Ubuntu 18.04 和 CentOS 7 中的 rename 不一样,

# Ubuntu 18.04
$ rename -V
/usr/bin/rename using File::Rename version 0.20
# CentOS 7
$ rename -V
rename from util-linux 2.23.2

用法也有差异,前者采用类似 sed 格式语句进行替换

rename -n 's/Sam3/Stm32/' *.nc  /*确认需要重命名的文件*/
rename -v 's/Sam3/Stm32/' *.nc  /*执行修改,并列出已重命名的文件*/

而后者需要将替换的字符串当作参数传入,并且只替换第一次出现的字符串,即

rename Sam3 Stm32 *.nc

参考

sed

参考

  1. Linux sed 命令用法详解:功能强大的流式文本编辑器
  2. sed & awk常用正则表达式 - 菲一打 - 博客园

|的作用

竖线(|)元字符是元字符扩展集的一部分,用于指定正则表达式的联合。如果某行匹配其中的一个正则表达式,那么它就匹配该模式。

-r: 扩展的正则表达式

参考Extended regexps - sed, a stream editor

摘录如下

The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.

就是说 basic 模式下,要使用特殊字符(如正则表达式中)需要转义,但 extended 模式相反,转义后表达的是原字符。

举个例子

  1. abc? becomes abc\? when using extended regular expressions. It matches the literal string ‘abc?’.
  2. c\+ becomes c+ when using extended regular expressions. It matches one or more ‘c’s.
  3. a\{3,\} becomes a{3,} when using extended regular expressions. It matches three or more ‘a’s.
  4. \(abc\)\{2,3\} becomes (abc){2,3} when using extended regular expressions. It matches either abcabc or abcabcabc.
  5. \(abc*\)\1 becomes (abc*)\1 when using extended regular expressions. Backreferences must still be escaped when using extended regular expressions.

sendmail

send mail on the command line. On the stapc-WSL, install it via

$ sudo apt install sendmail

monitor the updates of /var/log/apache2/error.log. If the modified time is recent, then send email to alert.

while true; do
    last_date=$(date -r /var/log/apache2/error.log +%s)
#   curr_date=$(date -d "-1 mins" +%s)
    sleep 1m
    curr_date=$(date +%s)
    if [[ $last_date > $curr_date ]]; then
        (
            cat <<-EOT
            TO: ${TO}
            From: ${FROM}
            Subject: ${SUBJ}

            There is some updates on the /var/log/apache2/error.log

            EOT
        ) | sendmail -v ${TO}
    fi
done

其中 date -r file 返回文件的上次修改时间,而 +%s 将时间转换为 seconds,方便进行比较,另外 -d "-1 mins" 能对时间进行加减处理。

sort

tail

  • -f: output appended data as the file grows, powerful for checking the log file in real time.

tesseract

OCR text extraction: Tesseract OCR

$ tesseract tmp.png stdout -l eng+chi_sim quiet

where

  • quiet redirects the warning message
  • stdout directly outputs the results instead of writing into another text file

more details refer to man tesseract.

tmux

可以实现本地终端分屏。

参考 linux 工具——终端分屏与vim分屏

Info

现在改用 Terminator, 又称 X-terminal-emulator

还可以切换后台运行,在服务器上操作特别方便。

Info

此前还用过类似的软件,screen

screen -list #或screen -r
screen -r [pid] # 进入
### ctrl+A, 然后输入":quit"

更多用法详见 linux screen 命令详解,以及 Kill detached screen session - Stack Overflow

常用操作

# new a shell
tmux
# new a shell with name
tmux new -s NAME
# view all shell
tmux ls
# go back
tmux attach-session -t [NUM]
# simplify
tmux attach -t [NUM]
# more simplify
tmux a -t [NUM]
# via name
tmux a -t NAME
# complete reset: https://stackoverflow.com/questions/38295615/complete-tmux-reset
tmux kill-server
# rename: https://superuser.com/questions/428016/how-do-i-rename-a-session-in-tmux
Ctrl + B, $

refer to - How do I access tmux session after I leave it? - Getting started with Tmux - tmux cheatsheet

type

which vs type

在 CentOS7 服务器上,

$ which -v
GNU which v2.20, Copyright (C) 1999 - 2008 Carlo Wood.
GNU which comes with ABSOLUTELY NO WARRANTY;
This program is free software; your freedom to use, change
and distribute this program is protected by the GPL.

which 可以返回 alias 中的命令,而且更具体地,man which 显示可以通过选项 --read-alias--skip-alias 来控制要不要包括 alias.

而在本地 Ubuntu 18.04 机器上,不支持 -v--version 来查看版本,而且 man which 也很简单,从中可以看出其大致版本信息,29 Jun 2016

那怎么显示 alias 呢,type 可以解决这个问题,注意查看其帮助文档需要用 help 而非 man

$ type scp_to_chpc 
scp_to_chpc is a function
scp_to_chpc () 
{ 
    scp -r $1 user@host:~/$2
}

uniq

  • count the frequency: cat file.txt | sort | uniq -c.
    • note that sort is necessary, otherwise uniq only performs locally
    • examples: classificacaoFinal

unzip

unzip all .zip file in a directory

tried unzip *.zip but does not work, it seems that I missed something although I have checked man unzip in which * is indeed allowed, then I found

unzip \*.zip

in Unzip All Files In A Directory

Otherwise, use quotes "*.zip". More advancely, only zip files with character 3,

unzip "*3*.zip"

unzip 和右键 Extract Here 的区别

对于 A.zip,假设内部结构为 dir/file,则通过 unzip A.zip 会直接得到 dir/file,而右键解压会得到 A/dir/file.

wget

wget a series of files in order

下载连续编号的文件,如

wget http://work.caltech.edu/slides/slides{01..18}.pdf

参考 Wget a series of files in order

wget vs curl

wget 不用添加 -O 就可以将下载的文件存储下来,但是 curl 并不默认将下载的文件存入本地文件,除非加上 -o 选项,而 wget-O 只是为了更改文件名。

比如这里,直接将下载的内容输出到下一个命令

curl -sL https://dl.winehq.org/wine-builds/winehq.key | apt-key add -

更多比较详见 What is the difference between curl and wget?

unar

如果 zip 文件解压乱码,可以试试 unar,

采用 unar your.zip

参考 Linux文件乱码

Back to top