解决Linux/macOS下zip文件解压中文乱码

2017年10月30日 2471点热度 0人点赞 1条评论

一.原因

很多网站下载的zip文件都是在Windows下打包的，这就为乱码的出现埋下了铺垫。这个问题表面上是由于Windows(中文系统)下使用cp936也就是GBK编码，而Linux以及macOS下使用UTF-8编码，但是追根究底还是因为zip格式在设计之初就没有为文件编码预留存储数据的位置，导致解压程序在解压时只能依照系统编码进行解压，这种情况下一旦跨平台解压就可能导致中文乱码。

二.解决

以下方法来自知乎——>传送门

①.unzip修改版

首先建议尝试unzip是不是自带转码的版本，这个版本在CentOS上直接yum install unzip貌似就是了，部分系统中是unzip-iconv，还请自行尝试

unzip -O cp936 test.zip

如果支持就直接解压出来了，不支持就会跳参数出来

②.unar(并不是unrar, 请看好)

这个在macOS上用brew能装，CentOS7用yum，不过6貌似就没有现成的了

unar test.zip

③bsdtar(arch下是libarchive)

CentOS下可能为bsdtar3，我测试6是这样的，7没试，可以yum search bsdtar看看

bsdtar xvf archive.zip

④.p7zip+convmv

LANG=C 7za x test.zip
convmv -f GBK -t utf8 --notest -r .

⑤.python

原贴有人提到这个方案有风险，见仁见智吧

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import sys
import zipfile

#print "Processing File " + sys.argv[1]

file=zipfile.ZipFile(sys.argv[1],"r");
for name in file.namelist():
    utf8name=name.decode('gbk')
#    print "Extracting " + utf8name
    pathname = os.path.dirname(utf8name)
    if not os.path.exists(pathname) and pathname!= "":
        os.makedirs(pathname)
    data = file.read(name)
    if not os.path.exists(utf8name):
        fo = open(utf8name, "w")
        fo.write(data)
        fo.close
file.close()

保存为unzip.py然后chmod+x unzip.py执行./unzip.py test.zip

本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可