Skip to content

Utf16 bom support #4326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

joeleonjr
Copy link
Contributor

Description:

Secrets in UTF-16-Encoded Files are not always detected due to data chunk changes made in the UTF-8 extractSubstrings() function.

In engine.go, TH loops each decoder, passing the Chunk Data in for processing. The UTF-8 decoder runs first. If the data chunk is invalid UTF-8, the UTF-8 decoder will execute the function extractSubstrings(). The result of that function is applied to the Chunk's Data field, which is then passed into all subsequent decoders. Part of that function alters the data structure of valid UTF-16 data, making detecting some secrets impossible.

Here's an example to test out:

echo <VALID_DETECTABLE_SECRET> > secret.txt
printf '\xFF\xFE' > utf16le.txt && iconv -f UTF-8 -t UTF-16LE secret.txt >> utf16le.txt
printf '\xFF\xFE' > utf16le.txt && iconv -f UTF-8 -t UTF-16LE secret.txt >> utf16le.txt
trufflehog filesystem utf*

Originally, I thought the problem was we did not address the UTF-16 Byte Order Marks (BOM) #FEFF and #FFFE. However, the existing logic takes care of those in the utf16ToUTF8 function in utf16.go. I added two test cases to prove that.

The only change needed is creating a copy of the chunk prior to processing each decoder.

If that change is too expensive, I have 2 other ideas:

  1. Move extractSubstrings out from the UTF-8 decoder and invoke it directly engine.go prior to running FindDetectorMatches during a failed UTF-8 decode.
  2. Store the results of that function in a separate variable for later processing in FindDetectorMatches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant