-
Couldn't load subscription status.
- Fork 2.8k
Closed
Labels
help wantedThis would make a good PRThis would make a good PR
Description
Currently the Cheerio doc loader is hardcoded to get the entire page, using $("body"). However typically there is a main content area, which is important, and surrounding elements which are not.
async load(): Promise<Document[]> {
const $ = await this.scrape();
const text = $("body").text();
const metadata = { source: this.webPath };
return [new Document({ pageContent: text, metadata })];
}
It would be good if it were possible to pass in an option jquery style selector to target the exact content.
Metadata
Metadata
Assignees
Labels
help wantedThis would make a good PRThis would make a good PR