Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/actions/spelling/expect/expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1017,6 +1017,7 @@ minkernel
MINMAXINFO
minwin
minwindef
misprediction
MMBB
mmcc
MMCPL
Expand Down
89 changes: 75 additions & 14 deletions src/terminal/adapter/SixelParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -729,39 +729,100 @@ void SixelParser::_decreaseFilledBackgroundHeight(const int decreasedHeight) noe
}
}

#pragma warning(push)
#pragma warning(disable : 26429) // Symbol 'imageBufferPtr' is never tested for nullness, it can be marked as not_null (f.23).
#pragma warning(disable : 26481) // Don't use pointer arithmetic. Use span instead (bounds.1).
#pragma warning(disable : 26490) // Don't use reinterpret_cast (type.1).

void SixelParser::_writeToImageBuffer(int sixelValue, int repeatCount)
{
// On terminals that support the raster attributes command (which sets the
// background size), the background is only drawn when the first sixel value
// is received. So if we haven't filled it yet, we need to do so now.
_fillImageBackground();

repeatCount = std::min(repeatCount, _imageMaxWidth - _imageCursor.x);
if (repeatCount <= 0)
{
return;
}

if (sixelValue == 0)
{
_imageCursor.x += repeatCount;
return;
}

// This allows us to unsafely cast _imageBuffer to uint16_t
// and benefit from compiler/STL optimizations.
static_assert(sizeof(IndexedPixel) == sizeof(int16_t));
static_assert(alignof(IndexedPixel) == alignof(int16_t));

// Then we need to render the 6 vertical pixels that are represented by the
// bits in the sixel value. Although note that each of these sixel pixels
// may cover more than one device pixel, depending on the aspect ratio.
const auto targetOffset = _imageCursor.y * _imageMaxWidth + _imageCursor.x;
auto imageBufferPtr = std::next(_imageBuffer.data(), targetOffset);
repeatCount = std::min(repeatCount, _imageMaxWidth - _imageCursor.x);
for (auto i = 0; i < 6; i++)
const auto foreground = std::bit_cast<int16_t>(_foregroundPixel);
auto imageBufferPtr = reinterpret_cast<int16_t*>(_imageBuffer.data() + targetOffset);
Comment on lines +765 to +766
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I did this int16_t cast, because std::fill contains optimizations for primitive types (those that can be memset'd). To my dismay, I found out today that they added this check for "are all bits zero? if so, use memset": microsoft/STL@a27d894


// A aspect ratio of 1:1 is the most common and worth optimizing.
if (_pixelAspectRatio == 1)
{
if (sixelValue & 1)
do
{
auto repeatAspectRatio = _pixelAspectRatio;
do
// This gets unrolled by MSVC. It's written this way to use CMOV instructions.
// Modern CPUs have fat caches and deep pipelines. It's better to do pointless reads
// from memory than causing branch misprediction, as sixelValue is highly random.
for (int i = 0; i < 6; i++)
{
std::fill_n(imageBufferPtr, repeatCount, _foregroundPixel);
std::advance(imageBufferPtr, _imageMaxWidth);
} while (--repeatAspectRatio > 0);
}
else
const auto test = sixelValue & (1 << i);
// Possibly pointless read from memory, but...
const auto before = imageBufferPtr[i * _imageMaxWidth];
// ...it allows the compiler to turn this into a CMOV (= no branch misprediction).
const auto after = test ? foreground : before;
imageBufferPtr[i * _imageMaxWidth] = after;
}
Comment on lines +773 to +784
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments, but the idea here is that pixelValue is highly random leading to branch mispredictions when bit-testing it. We can avoid that by always loading the old bitmap value and then CMOVing between the old and new value before storing it again. With modern CPUs and their deep pipelines this runs faster.


_imageCursor.x += 1;
imageBufferPtr += 1;
repeatCount -= 1;
} while (repeatCount > 0);
}
else
{
for (auto i = 0; i < 6; i++)
{
std::advance(imageBufferPtr, _imageMaxWidth * _pixelAspectRatio);
if (sixelValue & 1)
{
auto repeatAspectRatio = _pixelAspectRatio;
do
{
// If this used std::fill_n or just a primitive loop, MSVC would compile
// it to a `rep stosw` instruction, which has a high startup cost.
// This is not ideal when our repeatCount is almost always small.
// The way this does ptr++ and len-- is also ideal for optimization.
auto ptr = imageBufferPtr;
auto remaining = repeatCount;
do
{
__iso_volatile_store16(ptr++, foreground);
} while (--remaining != 0);
Comment on lines +800 to +809
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To work around std::fill being bloated with memset and to work around the compiler inserting a pointless rep stos here, we can use volatile writes. This then compiles down to a very small compact loop = makes CPUs happy.


imageBufferPtr += _imageMaxWidth;
} while (--repeatAspectRatio > 0);
}
else
{
std::advance(imageBufferPtr, _imageMaxWidth * _pixelAspectRatio);
}
sixelValue >>= 1;
}
sixelValue >>= 1;
_imageCursor.x += repeatCount;
}
_imageCursor.x += repeatCount;
}

#pragma warning(pop)

void SixelParser::_eraseImageBufferRows(const int rowCount, const til::CoordType rowOffset) noexcept
{
const auto pixelCount = rowCount * _cellSize.height;
Expand Down
4 changes: 2 additions & 2 deletions src/terminal/adapter/SixelParser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ namespace Microsoft::Console::VirtualTerminal
// to retain the 16-bit size.
static constexpr size_t MAX_COLORS = 256;
using IndexType = uint8_t;
struct IndexedPixel
struct alignas(int16_t) IndexedPixel
{
uint8_t transparent = false;
IndexType colorIndex = 0;
Expand Down Expand Up @@ -104,8 +104,8 @@ namespace Microsoft::Console::VirtualTerminal
const size_t _maxColors;
size_t _colorsUsed = 0;
size_t _colorsAvailable = 0;
bool _colorTableChanged = false;
IndexedPixel _foregroundPixel = {};
bool _colorTableChanged = false;

void _initImageBuffer();
void _resizeImageBuffer(const til::CoordType requiredHeight);
Expand Down
Loading