This is the codebase accompanying the publication Towards Near-imperceptible Steganographic Text. It implements the design of linguistic steganographic system outlined in the paper, the patient-Huffman
algorithm proposed, as well as the code we used for the empirical study in the paper.
The steganographic systems we studied assume a cryptographic system that produces ciphertext to be encoded into stegotext. In this work, we encode the ciphertext into fluent stegotext by controlling the sampling from a language model. We focus on providing imperceptibility (steganographic secrecy) whereas the cryptographic security is provided by the cryptosystem.
example.ipynb
contains a full example including the encryption/decryption steps.core.py
contains an illustrative minimal working example of the encoding/decoding of the stegosystem.GPT-2
(included as a git submodule) and the publicly released GPT-2-117M
language model to generate stegotext.patient-Huffman
encoding algorithm. And its corresponding decoding method.samples/
directory contains 20 samples generated using patient-Huffman
(imperceptibility parameter of 0.08, and random bitstrings of length 32) and 20 samples from the base language model. This is to provide a subjective sense of the imperceptibility offered by the algorithm by comparing the controlled samples to the uncontrolled (standard sampling) samples.Independent replications are more than welcome! Please bring them to our attention and we will list them here. For the original code that we used at the time of ACL submission, see the git commit tagged acl-2019
.
This is intended as a research prototype. Please exercise caution when using it as a privacy protection tool.
Please cite our work if you find this repo or the associated paper useful.
Dai, Falcon Z and Cai, Zheng. Towards Near-imperceptible Steganographic Text. Proceedings of ACL. 2019.
@inproceedings{dai-cai-2019-towards,
title = "Towards Near-imperceptible Steganographic Text",
author = "Dai, Falcon Z and Cai, Zheng",
booktitle = "Proceedings of Association for Computational Linguistics",
month = july,
year = "2019",
publisher = "Association for Computational Linguistics"
}