嗨,我想用 sc.BinaryFiles
scala版本2.11.12 spark版本2.3
这是我的密码
val input_bin = spark.sparkContext.binaryFiles("file.csv.gz.enc")
val input_utf = input_bin.map{f => f._2}.collect()(0).open().readUTF()
val input_base64 = Base64.decodeBase64(input_utf)
val GCM_NONCE_LENGTH = 12
val GCM_TAG_LENGTH = 16
val key = keyToSpec("<key>", 32, "AES", "UTF-8", Some("SHA-256"))
val cipher = Cipher.getInstance("AES/GCM/NoPadding")
val nonce = input_base64.slice(0, GCM_NONCE_LENGTH)
val spec = new GCMParameterSpec(128, nonce)
cipher.init(Cipher.DECRYPT_MODE, key, spec)
cipher.doFinal(input_base64)
def keyToSpec(key: String, keyLengthByte: Int, encryptAlgorithm: String,
keyEncode:String = "UTF-8", digestMethod: Option[String] = Some("SHA-1")): SecretKeySpec = {
//prepare key for encrypt/decrypt
var keyBytes: Array[Byte] = key.getBytes(keyEncode)
if (digestMethod.isDefined) {
val sha: MessageDigest = MessageDigest.getInstance(digestMethod.get)
keyBytes = sha.digest(keyBytes)
keyBytes = util.Arrays.copyOf(keyBytes, keyLengthByte)
}
new SecretKeySpec(keyBytes, encryptAlgorithm)
}
我发现了错误
javax.crypto.AEADBadTagException: Tag mismatch!
at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:571)
at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1046)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:983)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:845)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
at javax.crypto.Cipher.doFinal(Cipher.java:2165)
... 50 elided
输入文件相当大(大约2gb)。
我不确定,我读错了文件或解密有什么问题。
我也有python版本,这段代码对我来说很好
real_key = key
hash_key = SHA256.new()
hash_key.update(real_key)
for filename in os.listdir(kwargs['enc_path']):
if (re.search("\d{12}.csv.gz.enc$", filename)) and (dec_date in filename):
with open(os.path.join(kwargs['enc_path'], filename)) as f:
content = f.read()
f.close()
ct = base64.b64decode(content)
nonce, tag = ct[:12], ct[-16:]
cipher = AES.new(hash_key.digest(), AES.MODE_GCM, nonce)
dec = cipher.decrypt_and_verify(ct[12:-16], tag)
decrypted_data = gzip.decompress(dec).decode('utf-8')
有什么建议吗?
谢谢您
更新#1
通过将spark的文件读取方法改为本地文件读取(scala.io),并添加用于解密的aad,还应用了@topaco和@blackbishop的答案,我解决了这个问题
cipher.init(Cipher.DECRYPT_MODE, key, spec)
cipher.updateAAD(Array[Byte]()) // AAD can be changed to be any value but must be exact the same value when file is encrypted
cipher.doFinal(input_base64.drop(12)) // drop nonce before decrypt
我还在想,为什么spark不起作用
1条答案
按热度按时间0x6upsns1#
您使用的解密方法与python中的不同。在scala代码中,当调用
doFinal
你在传递所有的密码。尝试以下更改: