detect-secrets
is an aptly named module for (surprise, surprise) detecting secrets within a
code base.
However, unlike other similar packages that solely focus on finding secrets, this package is designed with the enterprise client in mind: providing a backwards compatible, systematic means of:
Preventing new secrets from entering the code base,
Detecting if such preventions are explicitly bypassed, and
Providing a checklist of secrets to roll, and migrate off to a more secure storage.
This way, you create a separation of concern: accepting that there may currently be secrets hiding in your large repository (this is what we refer to as a baseline), but preventing this issue from getting any larger, without dealing with the potentially gargantuan effort of moving existing secrets away.
It does this by running periodic diff outputs against heuristically crafted regex statements, to identify whether any new secret has been committed. This way, it avoids the overhead of digging through all git history, as well as the need to scan the entire repository every time.
For a look at recent changes, please see CHANGELOG.md.
If you are looking to contribute, please see CONTRIBUTING.md.
For more detailed documentation, check out our other documentation.
Create a baseline of potential secrets currently found in your git repository.
$ detect-secrets scan > .secrets.baseline
or, to run it from a different directory:
$ detect-secrets -C /path/to/directory scan > /path/to/directory/.secrets.baseline
Scanning non-git tracked files:
$ detect-secrets scan test_data/ --all-files > .secrets.baseline
This will rescan your codebase, and:
Update/upgrade your baseline to be compatible with the latest version,
Add any new secrets it finds to your baseline,
Remove any secrets no longer in your codebase
This will also preserve any labelled secrets you have.
$ detect-secrets scan --baseline .secrets.baseline
For baselines older than version 0.9, just recreate it.
Scanning Staged Files Only:
$ git diff --staged --name-only -z | xargs -0 detect-secrets-hook --baseline .secrets.baseline
Scanning All Tracked Files:
$ git ls-files -z | xargs -0 detect-secrets-hook --baseline .secrets.baseline
$ detect-secrets scan --list-all-plugins ArtifactoryDetector AWSKeyDetector AzureStorageKeyDetector BasicAuthDetector CloudantDetector DiscordBotTokenDetector GitHubTokenDetector GitLabTokenDetector Base64HighEntropyString HexHighEntropyString IbmCloudIamDetector IbmCosHmacDetector IPPublicDetector JwtTokenDetector KeywordDetector MailchimpDetector NpmDetector OpenAIDetector PrivateKeyDetector PypiTokenDetector SendGridDetector SlackDetector SoftlayerDetector SquareOAuthDetector StripeDetector TelegramBotTokenDetector TwilioKeyDetector
$ detect-secrets scan --disable-plugin KeywordDetector --disable-plugin AWSKeyDetector
If you want to only run a specific plugin, you can do:
$ detect-secrets scan --list-all-plugins | grep -v 'BasicAuthDetector' | sed "s#^#--disable-plugin #g" | xargs detect-secrets scan test_data
This is an optional step to label the results in your baseline. It can be used to narrow down your checklist of secrets to migrate, or to better configure your plugins to improve its signal-to-noise ratio.
$ detect-secrets audit .secrets.baseline
Basic Use:
from detect_secrets import SecretsCollectionfrom detect_secrets.settings import default_settingssecrets = SecretsCollection()with default_settings():secrets.scan_file('test_data/config.ini')import jsonprint(json.dumps(secrets.json(), indent=2))
More Advanced Configuration:
from detect_secrets import SecretsCollectionfrom detect_secrets.settings import transient_settingssecrets = SecretsCollection()with transient_settings({# Only run scans with only these plugins.# This format is the same as the one that is saved in the generated baseline.'plugins_used': [# Example of configuring a built-in plugin{'name': 'Base64HighEntropyString','limit': 5.0, },# Example of using a custom plugin{'name': 'HippoDetector','path': 'file:///Users/aaronloo/Documents/github/detect-secrets/testing/plugins.py', }, ],# We can also specify whichever additional filters we want.# This is an example of using the function `is_identified_by_ML_model` within the# local file `./private-filters/example.py`.'filters_used': [ {'path': 'file://private-filters/example.py::is_identified_by_ML_model', }, ] }) as settings:# If we want to make any further adjustments to the created settings object (e.g.# disabling default filters), we can do so as such.settings.disable_filters('detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign','detect_secrets.filters.heuristic.is_likely_id_string', )secrets.scan_file('test_data/config.ini')
$ pip install detect-secrets ?
Install via brew:
$ brew install detect-secrets
detect-secrets
comes with three different tools, and there is often confusion around which one
to use. Use this handy checklist to help you decide:
Do you want to add secrets to your baseline? If so, use detect-secrets scan
.
Do you want to alert off new secrets not in the baseline? If so, use detect-secrets-hook
.
Are you analyzing the baseline itself? If so, use detect-secrets audit
.
$ detect-secrets scan --help usage: detect-secrets scan [-h] [--string [STRING]] [--only-allowlisted] [--all-files] [--baseline FILENAME] [--force-use-all-plugins] [--slim] [--list-all-plugins] [-p PLUGIN] [--base64-limit [BASE64_LIMIT]] [--hex-limit [HEX_LIMIT]] [--disable-plugin DISABLE_PLUGIN] [-n | --only-verified] [--exclude-lines EXCLUDE_LINES] [--exclude-files EXCLUDE_FILES] [--exclude-secrets EXCLUDE_SECRETS] [--word-list WORD_LIST_FILE] [-f FILTER] [--disable-filter DISABLE_FILTER] [path [path ...]] Scans a repository for secrets in code. The generated output is compatible with `detect-secrets-hook --baseline`. positional arguments: path Scans the entire codebase and outputs a snapshot of currently identified secrets. optional arguments: -h, --help show this help message and exit --string [STRING] Scans an individual string, and displays configured plugins' verdict. --only-allowlisted Only scans the lines that are flagged with `allowlist secret`. This helps verify that individual exceptions are indeed non-secrets. scan options: --all-files Scan all files recursively (as compared to only scanning git tracked files). --baseline FILENAME If provided, will update existing baseline by importing settings from it. --force-use-all-plugins If a baseline is provided, detect-secrets will default to loading the plugins specified by that baseline. However, this may also mean it doesn't perform the scan with the latest plugins. If this flag is provided, it will always use the latest plugins --slim Slim baselines are created with the intention of minimizing differences between commits. However, they are not compatible with the `audit` functionality, and slim baselines will need to be remade to be audited. plugin options: Configure settings for each secret scanning ruleset. By default, all plugins are enabled unless explicitly disabled. --list-all-plugins Lists all plugins that will be used for the scan. -p PLUGIN, --plugin PLUGIN Specify path to custom secret detector plugin. --base64-limit [BASE64_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 4.5. --hex-limit [HEX_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 3.0. --disable-plugin DISABLE_PLUGIN Plugin class names to disable. e.g. Base64HighEntropyString filter options: Configure settings for filtering out secrets after they are flagged by the engine. -n, --no-verify Disables additional verification of secrets via network call. --only-verified Only flags secrets that can be verified. --exclude-lines EXCLUDE_LINES If lines match this regex, it will be ignored. --exclude-files EXCLUDE_FILES If filenames match this regex, it will be ignored. --exclude-secrets EXCLUDE_SECRETS If secrets match this regex, it will be ignored. --word-list WORD_LIST_FILE Text file with a list of words, if a secret contains a word in the list we ignore it. -f FILTER, --filter FILTER Specify path to custom filter. May be a python module path (e.g. detect_secrets.filters.common.is_invalid_file) or a local file path (e.g. file://path/to/file.py::function_name). --disable-filter DISABLE_FILTER Specify filter to disable. e.g. detect_secrets.filters.common.is_invalid_file
$ detect-secrets-hook --help usage: detect-secrets-hook [-h] [-v] [--version] [--baseline FILENAME] [--list-all-plugins] [-p PLUGIN] [--base64-limit [BASE64_LIMIT]] [--hex-limit [HEX_LIMIT]] [--disable-plugin DISABLE_PLUGIN] [-n | --only-verified] [--exclude-lines EXCLUDE_LINES] [--exclude-files EXCLUDE_FILES] [--exclude-secrets EXCLUDE_SECRETS] [--word-list WORD_LIST_FILE] [-f FILTER] [--disable-filter DISABLE_FILTER] [filenames [filenames ...]] positional arguments: filenames Filenames to check. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose mode. --version Display version information. --json Print detect-secrets-hook output as JSON --baseline FILENAME Explicitly ignore secrets through a baseline generated by `detect-secrets scan` plugin options: Configure settings for each secret scanning ruleset. By default, all plugins are enabled unless explicitly disabled. --list-all-plugins Lists all plugins that will be used for the scan. -p PLUGIN, --plugin PLUGIN Specify path to custom secret detector plugin. --base64-limit [BASE64_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 4.5. --hex-limit [HEX_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 3.0. --disable-plugin DISABLE_PLUGIN Plugin class names to disable. e.g. Base64HighEntropyString filter options: Configure settings for filtering out secrets after they are flagged by the engine. -n, --no-verify Disables additional verification of secrets via network call. --only-verified Only flags secrets that can be verified. --exclude-lines EXCLUDE_LINES If lines match this regex, it will be ignored. --exclude-files EXCLUDE_FILES If filenames match this regex, it will be ignored. --exclude-secrets EXCLUDE_SECRETS If secrets match this regex, it will be ignored. -f FILTER, --filter FILTER Specify path to custom filter. May be a python module path (e.g. detect_secrets.filters.common.is_invalid_file) or a local file path (e.g. file://path/to/file.py::function_name). --disable-filter DISABLE_FILTER Specify filter to disable. e.g. detect_secrets.filters.common.is_invalid_file
We recommend setting this up as a pre-commit hook. One way to do this is by using the pre-commit framework:
# .pre-commit-config.yamlrepos: - repo: https://github.com/Yelp/detect-secretsrev: v1.5.0hooks: - id: detect-secretsargs: ['--baseline', '.secrets.baseline']exclude: package.lock.json
There are times when we want to exclude a false positive from blocking a commit, without creating a baseline to do so. You can do so by adding a comment as such:
secret = "hunter2" # pragma: allowlist secret
or
// pragma: allowlist nextline secretconst secret = "hunter2";
$ detect-secrets audit --help usage: detect-secrets audit [-h] [--diff] [--stats] [--report] [--only-real | --only-false] [--json] filename [filename ...] Auditing a baseline allows analysts to label results, and optimize plugins forthe highest signal-to-noise ratio for their environment. positional arguments: filename Audit a given baseline file to distinguish the difference between false and true positives. optional arguments: -h, --help show this help message and exit --diff Allows the comparison of two baseline files, in order to effectively distinguish the difference between various plugin configurations. --stats Displays the results of an interactive auditing session which have been saved to a baseline file. --report Displays a report with the secrets detected reporting: Display a summary with all the findings and the made decisions. To be used with the report mode (--report). --only-real Only includes real secrets in the report --only-false Only includes false positives in the report analytics: Quantify the success of your plugins based on the labelled results in your baseline. To be used with the statistics mode (--stats). --json Outputs results in a machine-readable format.
This tool operates through a system of plugins and filters.
Plugins find secrets in code
Filters ignore false positives to increase scanning precision
You can adjust both to suit your precision/recall needs.
There are three different strategies we employ to try and find secrets in code:
Regex-based Rules
These are the most common type of plugin, and work well with well-structured secrets. These secrets can optionally be verified, which increases scanning precision. However, solely depending on these may negatively affect the recall of your scan.
Entropy Detector
This searches for "secret-looking" strings through a variety of heuristic approaches. This is great for non-structured secrets, but may require tuning to adjust the scanning precision.
Keyword Detector
This ignores the secret value, and searches for variable names that are often associated with assigning secrets with hard-coded values. This is great for "non-secret-looking" strings (e.g. le3tc0de passwords), but may require tuning filters to adjust the scanning precision.
Want to find a secret that we don't currently catch? You can also (easily) develop your own plugin, and use it with the engine! For more information, check out the plugin documentation.
detect-secrets
comes with several different in-built filters that may suit your needs.
Sometimes, you want to be able to globally allow certain lines in your scan, if they match a specific pattern. You can specify a regex rule as such:
$ detect-secrets scan --exclude-lines 'password = (blah|fake)'
Or you can specify multiple regex rules as such:
$ detect-secrets scan --exclude-lines 'password = blah' --exclude-lines 'password = fake'
Sometimes, you want to be able to ignore certain files in your scan. You can specify a regex pattern to do so, and if the filename meets this regex pattern, it will not be scanned:
$ detect-secrets scan --exclude-files '.*.signature$'
Or you can specify multiple regex patterns as such:
$ detect-secrets scan --exclude-files '.*.signature$' --exclude-files '.*/i18n/.*'
Sometimes, you want to be able to ignore certain secret values in your scan. You can specify a regex rule as such:
$ detect-secrets scan --exclude-secrets '(fakesecret|${.*})'
Or you can specify multiple regex rules as such:
$ detect-secrets scan --exclude-secrets 'fakesecret' --exclude-secrets '${.*})'
Sometimes, you want to apply an exclusion to a specific line, rather than globally excluding it. You can do so with inline allowlisting as such:
API_KEY = 'this-will-ordinarily-be-detected-by-a-plugin' # pragma: allowlist secret
These comments are supported in multiple languages. e.g.
const GoogleCredentialPassword = "something-secret-here"; // pragma: allowlist secret
You can also use:
# pragma: allowlist nextline secretAPI_KEY = 'WillAlsoBeIgnored'
This may be a convenient way for you to ignore secrets, without needing to regenerate the entire baseline again. If you need to explicitly search for these allowlisted secrets, you can also do:
$ detect-secrets scan --only-allowlisted
Want to write more custom logic to filter out false positives? Check out how to do this in our filters documentation.
The --exclude-secrets
flag allows you to specify regex rules to exclude secret values. However,
if you want to specify a large list of words instead, you can use the --word-list
flag.
To use this feature, be sure to install the pyahocorasick
package, or simply use:
$ pip install detect-secrets[word_list]
Then, you can use it as such:
$ cat wordlist.txt not-a-real-secret $ cat sample.ini password = not-a-real-secret# Will show results$ detect-secrets scan sample.ini# No results found$ detect-secrets scan --word-list wordlist.txt
The Gibberish Detector is a simple ML model, that attempts to determine whether a secret value is actually gibberish, with the assumption that real secret values are not word-like.
To use this feature, be sure to install the gibberish-detector
package, or use:
$ pip install detect-secrets[gibberish]
Check out the gibberish-detector package for more information on how to train the model. A pre-trained model (seeded by processing RFCs) will be included for easy use.
You can also specify your own model as such:
$ detect-secrets scan --gibberish-model custom.model
This is not a default plugin, given that this will ignore secrets such as password
.
This is not meant to be a sure-fire solution to prevent secrets from entering the codebase. Only proper developer education can truly do that. This pre-commit hook merely implements several heuristics to try and prevent obvious cases of committing secrets.
Things That Won't Be Prevented:
Multi-line secrets
Default passwords that don't trigger the KeywordDetector
(e.g. login = "hunter2"
)
"Did not detect git repository." warning encountered, even though I'm in a git repo.
Check to see whether your git
version is >= 1.8.5. If not, please upgrade it then try again.
More details here.
detect-secrets audit
displays "Not a valid baseline file!" after creating baseline.
Ensure the file encoding of your baseline file is UTF-8. More details here.