-
Notifications
You must be signed in to change notification settings - Fork 67
Add parser for unsaved Windows Notepad tabs #540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 37 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
c528241
Initial commit
joost-j 3594ff0
Removed unused 'seek_size' function
joost-j b1bcd69
Refactored the code to work with new LEB128 structure, added some mor…
joost-j c634987
Added more comments
joost-j d3d35a1
Refactor c_def to include parsing of both variants
joost-j cef81d0
Bump dissect.cstruct version to >=4.0.dev for clarity
joost-j 7934f3e
Apply suggestions from code review
joost-j e6ea019
Removed duplicate brackets and refactor assertion into warning log
joost-j 12fdd4a
Change variable names to fsize1 and fsize2, plus some linting
joost-j 39a34a7
Refactored to work with LEB128 backport
joost-j 8566028
Process feedback
joost-j 56a26fa
Set cstruct dependency to next release
joost-j b18e975
Restore original shimcache.py file
joost-j 1a1d80d
Move TextEditorTabRecord definition
joost-j b00bdc3
Remove content_length field from record
joost-j a124202
Apply suggestions from code review
joost-j dbaca5d
Change TabEditorTabRecord formatting
joost-j d66fa54
Black formatting, fix tests, add annotations import
joost-j bdaccbc
Bump cstruct version again
joost-j ad78273
Bump dependencies as leb128 is now included in dev release
joost-j 0d9c88f
Implemented deletion of characters, refactored, added new tests
joost-j 304db58
Small comment changes
joost-j 2ca889c
Remove chunked addition of zero bytes
joost-j 74ffb83
Added new test, changed to list insertion instead of appending
joost-j c148061
Refactored test file and removed fileState enum
joost-j 2bf6e2f
Small comment changes/typos
joost-j a19c49b
Split plugin from parsing logic, added more tests
joost-j f808bc7
Removed fh.read() and re-added them to the c_def
joost-j 9b38f3e
Added options and more test cases to support newest version
joost-j a3b6f27
Added separate records for unsaved/saved tabs, included more data (ti…
joost-j 677817c
Change cstruct version
joost-j 9674e37
Remove the --include-deleted-contents arg and make it default
joost-j 06e3f07
Rewrite TabContent records into WindowsNotepadTab class
joost-j a384fd9
Implement repr for WindowsNotepadTab class
joost-j 914c324
Merge branch 'main' into feature/windows_notepad_tabs
joost-j e625684
Add typehints and small fixes
Horofic 9bb13c7
Merge branch 'main' into feature/windows_notepad_tabs
Horofic 27fca92
Add suggestions
Horofic a9b32eb
Merge branch 'feature/windows_notepad_tabs' of github.com:joost-j/dis…
Horofic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
from dissect.target.helpers.descriptor_extensions import UserRecordDescriptorExtension | ||
from dissect.target.helpers.record import create_extended_descriptor | ||
from dissect.target.plugin import NamespacePlugin | ||
|
||
GENERIC_TAB_CONTENTS_RECORD_FIELDS = [("string", "content"), ("path", "path"), ("string", "deleted_content")] | ||
|
||
TexteditorTabContentRecord = create_extended_descriptor([UserRecordDescriptorExtension])( | ||
"texteditor/tab", GENERIC_TAB_CONTENTS_RECORD_FIELDS | ||
) | ||
|
||
|
||
class TexteditorPlugin(NamespacePlugin): | ||
__namespace__ = "texteditor" |
341 changes: 341 additions & 0 deletions
341
dissect/target/plugins/apps/texteditor/windowsnotepad.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,341 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
import zlib | ||
from typing import Iterator | ||
|
||
from dissect.cstruct import cstruct | ||
from dissect.util.ts import wintimestamp | ||
from flow.record.fieldtypes import digest | ||
|
||
from dissect.target.exceptions import UnsupportedPluginError | ||
from dissect.target.helpers.descriptor_extensions import UserRecordDescriptorExtension | ||
from dissect.target.helpers.fsutil import TargetPath | ||
from dissect.target.helpers.record import ( | ||
DynamicDescriptor, | ||
UnixUserRecord, | ||
WindowsUserRecord, | ||
create_extended_descriptor, | ||
) | ||
from dissect.target.plugin import export | ||
from dissect.target.plugins.apps.texteditor.texteditor import ( | ||
GENERIC_TAB_CONTENTS_RECORD_FIELDS, | ||
TexteditorPlugin, | ||
) | ||
|
||
# Thanks to @Nordgaren, @daddycocoaman, @JustArion and @ogmini for their suggestions and feedback in the PR | ||
# thread. This really helped to figure out the last missing bits and pieces | ||
# required for recovering text from these files. | ||
|
||
c_def = """ | ||
struct file_header { | ||
char magic[2]; // NP | ||
uleb128 updateNumber; // increases on every settings update when fileType=9, | ||
// doesn't seem to change on fileType 0 or 1 | ||
uleb128 fileType; // 0 if unsaved, 1 if saved, 9 if contains settings? | ||
} | ||
|
||
struct tab_header_saved { | ||
uleb128 filePathLength; | ||
wchar filePath[filePathLength]; | ||
uleb128 fileSize; // likely similar to fixedSizeBlockLength | ||
uleb128 encoding; | ||
uleb128 carriageReturnType; | ||
uleb128 timestamp; // Windows Filetime format (not unix timestamp) | ||
joost-j marked this conversation as resolved.
Show resolved
Hide resolved
|
||
char sha256[32]; | ||
joost-j marked this conversation as resolved.
Show resolved
Hide resolved
|
||
char unk0; | ||
char unk1; | ||
uleb128 fixedSizeBlockLength; | ||
uleb128 fixedSizeBlockLengthDuplicate; | ||
uint8 wordWrap; // 1 if wordwrap enabled, 0 if disabled | ||
uint8 rightToLeft; | ||
uint8 showUnicode; | ||
uint8 optionsVersion; | ||
}; | ||
|
||
struct tab_header_unsaved { | ||
char unk0; | ||
uleb128 fixedSizeBlockLength; // will always be 00 when unsaved because size is not yet known | ||
uleb128 fixedSizeBlockLengthDuplicate; // will always be 00 when unsaved because size is not yet known | ||
uint8 wordWrap; // 1 if wordwrap enabled, 0 if disabled | ||
uint8 rightToLeft; | ||
uint8 showUnicode; | ||
uint8 optionsVersion; | ||
}; | ||
|
||
struct tab_header_crc32_stub { | ||
char unk1; | ||
char unk2; | ||
char crc32[4]; | ||
}; | ||
|
||
struct fixed_size_data_block { | ||
uleb128 nAdded; | ||
wchar data[nAdded]; | ||
uint8 hasRemainingVariableDataBlocks; // indicates whether after this single-data block more data will follow | ||
char crc32[4]; | ||
}; | ||
|
||
struct variable_size_data_block { | ||
uleb128 offset; | ||
uleb128 nDeleted; | ||
uleb128 nAdded; | ||
wchar data[nAdded]; | ||
char crc32[4]; | ||
}; | ||
|
||
struct options_v1 { | ||
uleb128 unk; | ||
}; | ||
|
||
struct options_v2 { | ||
uleb128 unk1; // likely autocorrect or spellcheck | ||
uleb128 unk2; // likely autocorrect or spellcheck | ||
}; | ||
""" | ||
|
||
joost-j marked this conversation as resolved.
Show resolved
Hide resolved
|
||
WINDOWS_SAVED_TABS_EXTRA_FIELDS = [("datetime", "modification_time"), ("digest", "hashes"), ("path", "saved_path")] | ||
|
||
WindowsNotepadUnsavedTabRecord = create_extended_descriptor([UserRecordDescriptorExtension])( | ||
"texteditor/windowsnotepad/tab/unsaved", | ||
GENERIC_TAB_CONTENTS_RECORD_FIELDS, | ||
) | ||
|
||
WindowsNotepadSavedTabRecord = create_extended_descriptor([UserRecordDescriptorExtension])( | ||
"texteditor/windowsnotepad/tab/saved", | ||
GENERIC_TAB_CONTENTS_RECORD_FIELDS + WINDOWS_SAVED_TABS_EXTRA_FIELDS, | ||
) | ||
|
||
c_windowstab = cstruct() | ||
c_windowstab.load(c_def) | ||
Horofic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def _calc_crc32(data: bytes) -> bytes: | ||
"""Perform a CRC32 checksum on the data and return it as bytes.""" | ||
return zlib.crc32(data).to_bytes(length=4, byteorder="big") | ||
joost-j marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
class WindowsNotepadTab: | ||
"""Windows notepad tab content parser""" | ||
|
||
def __init__(self, file: TargetPath): | ||
self.file = file | ||
self.is_saved = None | ||
self.content = None | ||
self.deleted_content = None | ||
self._process_tab_file() | ||
|
||
def __repr__(self) -> str: | ||
return ( | ||
f"<{self.__class__.__name__} saved={self.is_saved} " | ||
f"content_size={len(self.content)} has_deleted_content={self.deleted_content is not None}>" | ||
) | ||
|
||
def _process_tab_file(self) -> None: | ||
"""Parse a binary tab file and reconstruct the contents.""" | ||
with self.file.open("rb") as fh: | ||
# Header is the same for all types | ||
self.file_header = c_windowstab.file_header(fh) | ||
|
||
# fileType == 1 # 0 is unsaved, 1 is saved, 9 is settings? | ||
self.is_saved = self.file_header.fileType == 1 | ||
|
||
# Tabs can be saved to a file with a filename on disk, or unsaved (kept in the TabState folder). | ||
# Depending on the file's saved state, different header fields are present | ||
self.tab_header = ( | ||
c_windowstab.tab_header_saved(fh) if self.is_saved else c_windowstab.tab_header_unsaved(fh) | ||
) | ||
|
||
# There appears to be a optionsVersion field that specifies the options that are passed. | ||
# At the moment of writing, it is not sure whether this specifies a version or a number of bytes | ||
# that is parsed, so just going with the 'optionsVersion' type for now. | ||
# We don't use the options, but since they are required for the CRC32 checksum | ||
# we store the byte representation | ||
if self.tab_header.optionsVersion == 0: | ||
# No options specified | ||
self.options = b"" | ||
elif self.tab_header.optionsVersion == 1: | ||
self.options = c_windowstab.options_v1(fh).dumps() | ||
elif self.tab_header.optionsVersion == 2: | ||
self.options = c_windowstab.options_v2(fh).dumps() | ||
else: | ||
# Raise an error, since we don't know how many bytes future optionVersions will occupy. | ||
# Now knowing how many bytes to parse can mess up the alignment and structs. | ||
raise NotImplementedError("Unknown Windows Notepad tab option version") | ||
|
||
# If the file is not saved to disk and no fixedSizeBlockLength is present, an extra checksum stub | ||
# is present. So parse that first | ||
if not self.is_saved and self.tab_header.fixedSizeBlockLength == 0: | ||
# Two unknown bytes before the CRC32 | ||
tab_header_crc32_stub = c_windowstab.tab_header_crc32_stub(fh) | ||
|
||
# Calculate CRC32 of the header and check if it matches | ||
actual_header_crc32 = _calc_crc32( | ||
self.file_header.dumps()[3:] | ||
+ self.tab_header.dumps() | ||
+ self.options | ||
+ tab_header_crc32_stub.dumps()[:-4] | ||
) | ||
if tab_header_crc32_stub.crc32 != actual_header_crc32: | ||
logging.warning( | ||
"CRC32 mismatch in header of file: %s (expected=%s, actual=%s)", | ||
self.file.name, | ||
tab_header_crc32_stub.crc32.hex(), | ||
actual_header_crc32.hex(), | ||
) | ||
|
||
# Used to store the final content | ||
self.content = "" | ||
|
||
# In the case that a fixedSizeDataBlock is present, this value is set to a nonzero value | ||
if self.tab_header.fixedSizeBlockLength > 0: | ||
# So we parse the fixed size data block | ||
self.data_entry = c_windowstab.fixed_size_data_block(fh) | ||
|
||
# The header (minus the magic) plus all data is included in the checksum | ||
actual_crc32 = _calc_crc32( | ||
self.file_header.dumps()[3:] + self.tab_header.dumps() + self.options + self.data_entry.dumps()[:-4] | ||
) | ||
|
||
if self.data_entry.crc32 != actual_crc32: | ||
logging.warning( | ||
"CRC32 mismatch in single-block file: %s (expected=%s, actual=%s)", | ||
self.file.name, | ||
self.data_entry.crc32.hex(), | ||
actual_crc32.hex(), | ||
) | ||
|
||
# Add the content of the fixed size data block to the tab content | ||
self.content += self.data_entry.data | ||
|
||
# Used to store the deleted content, if available | ||
deleted_content = "" | ||
|
||
# If fixedSizeBlockLength in the header has a value of zero, this means that the entire file consists of | ||
# variable-length blocks. Furthermore, if there is any remaining data after the | ||
# first fixed size blocks, as indicated by the value of hasRemainingVariableDataBlocks, | ||
# also continue we also want to continue parsing | ||
if self.tab_header.fixedSizeBlockLength == 0 or ( | ||
self.tab_header.fixedSizeBlockLength > 0 and self.data_entry.hasRemainingVariableDataBlocks == 1 | ||
): | ||
# Here, data is stored in variable-length blocks. This happens, for example, when several | ||
# additions and deletions of characters have been recorded and these changes have not been 'flushed' | ||
|
||
# Since we don't know the size of the file up front, and offsets don't necessarily have to be in order, | ||
# a list is used to easily insert text at offsets | ||
text = [] | ||
|
||
while True: | ||
# Unfortunately, there is no way of determining how many blocks there are. So just try to parse | ||
# until we reach EOF, after which we stop. | ||
try: | ||
data_entry = c_windowstab.variable_size_data_block(fh) | ||
except EOFError: | ||
break | ||
|
||
# Either the nAdded is nonzero, or the nDeleted | ||
if data_entry.nAdded > 0: | ||
# Check the CRC32 checksum for this block | ||
actual_crc32 = _calc_crc32(data_entry.dumps()[:-4]) | ||
if data_entry.crc32 != actual_crc32: | ||
logging.warning( | ||
"CRC32 mismatch in multi-block file: %s (expected=%s, actual=%s)", | ||
self.file.name, | ||
data_entry.crc32.hex(), | ||
actual_crc32.hex(), | ||
) | ||
|
||
# Insert the text at the correct offset. | ||
for idx in range(data_entry.nAdded): | ||
text.insert(data_entry.offset + idx, data_entry.data[idx]) | ||
|
||
elif data_entry.nDeleted > 0: | ||
joost-j marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Create a new slice. Include everything up to the offset, | ||
# plus everything after the nDeleted following bytes | ||
deleted_content += "".join(text[data_entry.offset : data_entry.offset + data_entry.nDeleted]) | ||
text = text[: data_entry.offset] + text[data_entry.offset + data_entry.nDeleted :] | ||
|
||
# Join all the characters to reconstruct the original text within the variable-length data blocks | ||
text = "".join(text) | ||
|
||
# Finally, add the reconstructed text to the tab content | ||
self.content += text | ||
|
||
# Set None if no deleted content was found | ||
self.deleted_content = deleted_content if deleted_content else None | ||
|
||
|
||
class WindowsNotepadPlugin(TexteditorPlugin): | ||
"""Windows notepad tab content plugin.""" | ||
|
||
__namespace__ = "windowsnotepad" | ||
|
||
GLOB = "AppData/Local/Packages/Microsoft.WindowsNotepad_*/LocalState/TabState/*.bin" | ||
|
||
def __init__(self, target): | ||
Horofic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
super().__init__(target) | ||
self.users_tabs: list[TargetPath, UnixUserRecord | WindowsUserRecord] = [] | ||
for user_details in self.target.user_details.all_with_home(): | ||
for tab_file in user_details.home_path.glob(self.GLOB): | ||
# These files seem to contain information on different settings / configurations, | ||
# and are skipped for now | ||
if tab_file.name.endswith(".1.bin") or tab_file.name.endswith(".0.bin"): | ||
continue | ||
|
||
self.users_tabs.append((tab_file, user_details.user)) | ||
|
||
def check_compatible(self) -> None: | ||
if not self.users_tabs: | ||
raise UnsupportedPluginError("No Windows Notepad tab files found") | ||
|
||
@export(record=DynamicDescriptor(["path", "datetime", "string"])) | ||
Horofic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def tabs(self) -> Iterator[WindowsNotepadSavedTabRecord | WindowsNotepadUnsavedTabRecord]: | ||
"""Return contents from Windows 11 Notepad tabs - and its deleted content if available. | ||
|
||
Windows Notepad application for Windows 11 is now able to restore both saved and unsaved tabs when you re-open | ||
the application. | ||
|
||
|
||
Resources: | ||
- https://github.com/fox-it/dissect.target/pull/540 | ||
- https://github.com/JustArion/Notepad-Tabs | ||
- https://github.com/ogmini/Notepad-Tabstate-Buffer | ||
- https://github.com/ogmini/Notepad-State-Library | ||
- https://github.com/Nordgaren/tabstate-util | ||
- https://github.com/Nordgaren/tabstate-util/issues/1 | ||
- https://medium.com/@mahmoudsoheem/new-digital-forensics-artifact-from-windows-notepad-527645906b7b | ||
|
||
Yields a WindowsNotepadSavedTabRecord or WindowsNotepadUnsavedTabRecord. with fields: | ||
|
||
.. code-block:: text | ||
|
||
content (string): The content of the tab. | ||
path (path): The path to the tab file. | ||
deleted_content (string): The deleted content of the tab, if available. | ||
hashes (digest): A digest of the tab content. | ||
saved_path (path): The path where the tab was saved. | ||
modification_time (datetime): The modification time of the tab. | ||
""" | ||
for file, user in self.users_tabs: | ||
# Parse the file | ||
tab: WindowsNotepadTab = WindowsNotepadTab(file) | ||
|
||
if tab.is_saved: | ||
yield WindowsNotepadSavedTabRecord( | ||
content=tab.content, | ||
path=tab.file, | ||
deleted_content=tab.deleted_content, | ||
hashes=digest((None, None, tab.tab_header.sha256.hex())), | ||
saved_path=tab.tab_header.filePath, | ||
modification_time=wintimestamp(tab.tab_header.timestamp), | ||
_target=self.target, | ||
_user=user, | ||
) | ||
else: | ||
yield WindowsNotepadUnsavedTabRecord( | ||
content=tab.content, | ||
path=tab.file, | ||
_target=self.target, | ||
_user=user, | ||
deleted_content=tab.deleted_content, | ||
) |
Binary file added
BIN
+6.12 KB
tests/_data/plugins/apps/texteditor/windowsnotepad/3d0cc86e-dfc9-4f16-b74a-918c2c24188c.bin
Binary file not shown.
Binary file added
BIN
+145 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/3f915e17-cf6c-462b-9bd1-2f23314cb979.bin
Binary file not shown.
Binary file added
BIN
+250 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/85167c9d-aac2-4469-ae44-db5dccf8f7f4.bin
Binary file not shown.
Binary file added
BIN
+377 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/appclosed_saved_and_deletions.bin
Binary file not shown.
Binary file added
BIN
+63 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/appclosed_unsaved.bin
Binary file not shown.
Binary file added
BIN
+560 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/ba291ccd-f1c3-4ca8-949c-c01f6633789d.bin
Binary file not shown.
Binary file added
BIN
+200 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/c515e86f-08b3-4d76-844a-cddfcd43fcbb.bin
Binary file not shown.
Binary file added
BIN
+225 KB
tests/_data/plugins/apps/texteditor/windowsnotepad/cfe38135-9dca-4480-944f-d5ea0e1e589f.bin
Binary file not shown.
Binary file added
BIN
+330 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/dae80df8-e1e5-4996-87fe-b453f63fcb19.bin
Binary file not shown.
Binary file added
BIN
+138 KB
tests/_data/plugins/apps/texteditor/windowsnotepad/e609218e-94f2-45fa-84e2-f29df2190b26.bin
Binary file not shown.
Binary file added
BIN
+2.5 KB
tests/_data/plugins/apps/texteditor/windowsnotepad/lots-of-deletions.bin
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+268 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/stored_unsaved_with_new_data.bin
Binary file not shown.
Binary file added
BIN
+460 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/unsaved-with-deletions.bin
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+145 Bytes
tests/_data/plugins/apps/texteditor/windowsnotepad/wrong-checksum.bin
Binary file not shown.
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.