Skip to content

Crawler indexes all pages on a particular domain rather pages under a path #6

@Amolith

Description

@Amolith

When running Lieu over all the sites in the fediring, we've found that it's only bound by domain rather than domain+path. This causes quirks with static site hosts like cronut.cafe; the only cronut.cafe user who's also a member of the ring is ~sfr, but multiple other users who aren't members have been indexed as well: https://search.fediring.net/?q=cronut

I think a good solution might be keeping track of not only the domain that's being crawled but also the original URL and ignoring links to parent directories.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions