Skip to content

[Bug] - maintainer package HTML page is not being returned successfully #1147

@art1f1c3R

Description

@art1f1c3R

Description

The package page returned by PyPIRegistry.get_package_page is now returning JavaScript errors again. Previously appending a / to the URL worked, but this no longer appears to be the case.

Steps to Reproduce

  1. Step 1: add in some print statement or file write to see what the HTML returned by get_package_page is, I did:
    def get_package_page(self, package_name: str) -> str | None:
        # Important: trailing '/' avoids JS-based redirect; ensures Macaron can access the page directly
        url = urllib.parse.urljoin(self.registry_url, f"project/{package_name}/")
        response = send_get_http_raw(url)
        if response:
            html_snippets = response.content.decode("utf-8")
            print(f"{html_snippets}")
        ...
  1. Step 2: I used ajax-requester as per one of the integration cases to test this. macaron -v analyze -purl pkg:pypi/ajax-requester
  2. Step 3: Observe the HTML output. I tried this with and without the trailing / in the URL. Both resulted in:
 <!DOCTYPE html>
<html lang="en">
  <head>
    <meta
      http-equiv="Content-Security-Policy"
      content="default-src 'self'; img-src 'self' data:; media-src 'self' data:; object-src 'none'; style-src 'self' 'sha256-o4vzfmmUENEg4chMjjRP9EuW9ucGnGIGVdbl8d0SHQQ='; script-src 'self' 'sha256-KXex2o39zxtnzVWK4H5rW07g2+BlwSPtn+aguzsWkNg=';"
    />
    <link
      href="/_fs-ch-1T1wmsGaOgGaSxcX/assets/inter-var.woff2"
      rel="preload"
      as="font"
      type="font/woff2"
      crossorigin
    />
    <link href="/_fs-ch-1T1wmsGaOgGaSxcX/assets/styles.css" rel="stylesheet" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>Client Challenge</title>
    <style>
      #loading-error {
        font-size: 16px;
        font-family: 'Inter', sans-serif;
        margin-top: 10px;
        margin-left: 10px;
        display: none;
      }
    </style>
  </head>
  <body>
    <noscript>
      <div class="noscript-container">
        <div class="noscript-content">
          <img
            src="/_fs-ch-1T1wmsGaOgGaSxcX/assets/errorIcon.svg"
            alt=""
            role="presentation"
            class="error-icon"
          />
          <span class="noscript-span"
            >JavaScript is disabled in your browser.</span
          >
          <p>Please enable JavaScript to proceed.</p>
        </div>
      </div>
    </noscript>
    <div id="loading-error" role="alert" aria-live="polite">
      A required part of this site couldn’t load. This may be due to a browser
      extension, network issues, or browser settings. Please check your
      connection, disable any ad blockers, or try using a different browser.
    </div>
    <script>
      function loadScript(src) {
        return new Promise((resolve, reject) => {
          const script = document.createElement('script');
          script.onload = resolve;
          script.onerror = (event) => {
            console.error('Script load error event:', event);
            document.getElementById('loading-error').style.display = 'block';
            loadingError.setAttribute('aria-hidden', 'false');
            reject(
              new Error(
                `Failed to load script: ${src}, Please contact the service administrator.`
              )
            );
          };
          script.src = src;
          document.body.appendChild(script);
        });
      }

      loadScript('/_fs-ch-1T1wmsGaOgGaSxcX/errors.js')
        .then(() => {
          const script = document.createElement('script');
          script.src = '/_fs-ch-1T1wmsGaOgGaSxcX/script.js?reload=true';
          script.onerror = (event) => {
            console.error('Script load error event:', event);
            const errorMsg = new Error(
              `Failed to load script: ${script.src}. Please contact the service administrator.`
            );
            console.error(errorMsg);
            handleScriptError();
          };
          document.body.appendChild(script);
        })
        .catch((error) => {
          console.error(error);
        });
    </script>
  </body>
</html>

This results in heuristics like CloserReleaseJoinDate giving inaccurate results. In this case, it means that a HeuristicResult.FAIL is always returned because the maintainers_join_date list is not None but is empty.

Expected Behavior

Expect the package page HTML to be returned.

Actual Behavior

Observe the HTML output above.

Debug Information

Observe the HTML output above. No errors or warnings are produced by Macaron for this issue.

Environment Information

Operating System: 24.04.1 LTS (Noble Numbat) on WSL version 2.5.9.0

CPU architecture information: x86-64

Bash Version: 5.2.21(1)-release (x86_64-pc-linux-gnu)

Python virtual environment, version: 3.11.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageThe issue needs to be triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions