Skip to content

Cache the ApiData values we produce for an AdditionalText file. #6556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 30, 2023

Conversation

CyrusNajmabadi
Copy link
Member

Typing in roslyn shows this to be the highest allocation hitter. This is because diagnostics will end up calling into this analyzer, which then rereads teh SourceText for these additional files, over and over again.

For example, this was typing just for just a few seconds:

image

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner March 29, 2023 23:26
@@ -147,83 +169,110 @@ static void RegisterImplActions(CompilationStartAnalysisContext compilationConte
}
}

private static ApiData ReadApiData(List<(string path, SourceText sourceText)> data, bool isShippedApi)
private static ApiData ReadApiData(string path, SourceText sourceText, bool isShippedApi)
Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now computes teh ApiData for a single file. handling of multiple files is taken care of by the caller.

@CyrusNajmabadi CyrusNajmabadi requested a review from sharwell March 30, 2023 00:11
@@ -20,21 +20,24 @@ namespace Microsoft.CodeAnalysis.PublicApiAnalyzers
{
public partial class DeclarePublicApiAnalyzer : DiagnosticAnalyzer
{
private sealed record AdditionalFileInfo(string Path, SourceText SourceText, bool IsShippedApi);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common data shared among all ApiLine instances.

private sealed class ApiLine
private sealed record AdditionalFileInfo(string Path, SourceText SourceText, bool IsShippedApi);

private readonly record struct ApiLine
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made into a simple struct, which also has a pointer to the common data stored across all lines from teh same file.

@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Merging #6556 (ba8f956) into main (b84dbd4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6556   +/-   ##
=======================================
  Coverage   96.43%   96.43%           
=======================================
  Files        1372     1372           
  Lines      320278   320264   -14     
  Branches    10295    10293    -2     
=======================================
- Hits       308848   308839    -9     
+ Misses       8975     8971    -4     
+ Partials     2455     2454    -1     

}
public SourceText SourceText => FileInfo.SourceText;
public string Path => FileInfo.Path;
public bool IsShippedApi => FileInfo?.IsShippedApi is true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileInfo?

why null check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a struct, and used in an fashion where it can be in the default state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should the other ones be checked then too?

@ToddGrun
Copy link

        void CheckAndRegisterImplementation(bool isPublic)

could this be made static if you passed in the compilationContext?


Refers to: src/PublicApiAnalyzers/Core/Analyzers/DeclarePublicApiAnalyzer.cs:115 in a86ecec. [](commit_id = a86ecec, deletion_comment = False)

maxNullableRank = Math.Max(rank, maxNullableRank);
continue;
}
lineNumber++;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lineNumber++;

I assume it's intentional that this doesn't count ws only / empty lines, but it seems odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup yup. im preserving semantics here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @sharwell can verify it's the intended behavior, it definitely doesn't match what the variable name indicates to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my preference is to not change this. this was the behavior from before. If we change it, we risk breaking things for some customer who does have blank lines and then the nullable directive :)

foreach (var (path, sourceText) in data)
{
int rank = -1;
// current line we're on. Note: we ignore whitespace lines when computin gthis.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computin gthis

nit: typo

{
if (!TryGetApiText(analyzerOptions.AdditionalFiles, isPublic, cancellationToken, out var shippedText, out var unshippedText))
using var allShippedData = ArrayBuilder<ApiData>.GetInstance();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using var allShippedData

This threw me for a loop until I saw this was coming from an implementation in this repo. I assume we don't want a similar disposable pattern for the one in Roslyn?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Roslyn one is disposable as well, just a slightly different pattern (due to compiler team concerns)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I'm not seeing the Roslyn ones as IDisposable from code inspection, but I'm probably missing it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a struct disposable wrapper around ours. look for using var _ = ArrayBuilder... in roslyn.sln :)

/// Cache from additional text instance to the api data we have read out for that specific file. We only store
/// data for additional texts that explicitly match the public/internal api file names we expect.
/// </summary>
private static readonly ConditionalWeakTable<AdditionalText, ApiData> s_additionalTextToApiData = new();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don’t use a CWT here. We already have an API on AnalysisContext to allow caching per-file data across compilations: https://github.com/dotnet/roslyn/blob/3a7a7407ea3c831630ecf2754092c33df3a6e452/src/Compilers/Core/Portable/DiagnosticAnalyzer/DiagnosticAnalysisContext.cs#L234

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, we also use a CWT, so the API already provides a simple way to achieve caching:
Note

Example test analyzer for this API: https://github.com/dotnet/roslyn/blob/3a7a7407ea3c831630ecf2754092c33df3a6e452/src/Compilers/Test/Core/Diagnostics/CommonDiagnosticAnalyzers.cs#L1605

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i don't seem to be able to use that helper. it is keyed off of SourceText. But avoiding the source-text is the point here. I need to keep off of AdditionalFile. Thoughts @mavasani ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But avoiding the source-text is the point here

I am not sure I understand. SourceText is already strongly held by the AdditionalText: https://github.com/dotnet/roslyn/blob/3a7a7407ea3c831630ecf2754092c33df3a6e452/src/Compilers/Core/Portable/AdditionalTextFile.cs#L49-L53

Isn't the key point here avoiding repeated compute of the data associated with each AdditionalText file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Went deep down the rabit hole of trying to make this keyed off sourcetext. It's painful. Effectively, the data we compute off the source-text holds onto data outside of the source text (specifically, things like the path and other information about the original additionaltext). I went down the path of trying to extract this out (e.g. RawApiData, ApiData, RawApiLine, ApiLine) and have the map point from the SourceText to the raw data. But then there was so much wrapping of this i needed to do to make working with things palatable (and not have additional copies/allocatins happening).

If we make it so that hte context objects allow for storing arbitrary K/V and not just Text/Value or Tree/Value, then this will be trivial to do.

I discussed this with manish, and we decided that would take too much time. So we're going to accept that this is a CWT. However, i'm making it non-static so it has the same lifetime as the analyzer (And whatever owns it).

@CyrusNajmabadi CyrusNajmabadi requested a review from mavasani March 30, 2023 03:44
Copy link

@mavasani mavasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'd like a signoff from @ToddGrun before merge.

@CyrusNajmabadi CyrusNajmabadi requested a review from ToddGrun March 30, 2023 05:46
@ToddGrun
Copy link

LGTM. I am curious why we reach this code so often (I believe it's multiple times per keystroke). I thought Cyrus mentioned that we had a cache that should hit before this point.

@CyrusNajmabadi
Copy link
Member Author

@mavasani it would def be good to see if there's something causing caching of diagnostic info to not work properly. Or if the diagnostic subsystem itself is calling this stuff multiple times over different codepaths.

@CyrusNajmabadi CyrusNajmabadi merged commit 9a0d72e into main Mar 30, 2023
@CyrusNajmabadi CyrusNajmabadi deleted the readTextLess branch March 30, 2023 14:46
@github-actions github-actions bot added this to the vNext milestone Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants