Skip to content

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Sep 12, 2025

Phasing out got-scraping-specific interfaces in favour of native fetch API.

Closes #3071

@barjin barjin self-assigned this Sep 12, 2025
@barjin barjin marked this pull request as draft September 12, 2025 15:09
@barjin barjin marked this pull request as ready for review September 17, 2025 11:43
]);
expect(isFromCache).toEqual({ first: false, second: true });
});
// test('should work with cacheable-request', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems got-scraping-specific, I'm not sure if we're using this anywhere (at scale).

Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't managed to read the whole thing yet, sorry

crawlingContext: CheerioCrawlingContext,
) {
const body = await readStreamToString(response);
protected override async _parseHTML(response: Response, isXml: boolean, crawlingContext: CheerioCrawlingContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a merge conflict with the ContextPipeline PR. I would like to go first if that's possible (the vessel that's harder to steer has the right of way).

@@ -1,5 +1,5 @@
import type { BatchAddRequestsResult, Dictionary } from '@crawlee/types';
import type { OptionsInit, Response as GotResponse } from 'got-scraping';
import type { OptionsInit } from 'got-scraping';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see this go as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to do that as a part of a separate PR. Removing got-scraping (and all the type todos) is no small feat, which would make it hard to review, if done all-in-one.

* Perform an HTTP Request and return after the response headers are received. The body may be read from a stream contained in the response.
*/
stream(request: HttpRequest, onRedirect?: RedirectHandler): Promise<StreamingHttpResponse>;
stream(request: HttpRequest, onRedirect?: RedirectHandler): Promise<Response>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the stream method obsolete? The web Response class can be streamed using response.body when the caller chooses to do so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it actually is 👍 I'd prefer to do this in a separate PR, for the same reasons as the total got-scraping phase-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants