Skip to content

Extractor variable support #5727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

murat-kekij
Copy link

@murat-kekij murat-kekij commented Oct 13, 2024

Proposed changes

Closes #2647

  • Adds variable support to regex extractors
  • Adds variable support to json extractors
Test Server
package main

import (
  "encoding/json"
  "fmt"
  "log"
  "net/http"
  "strings"
)

// DomainResponse represents the structure of the response where the domain name is a key
type DomainResponse map[string]interface{}

// Static data for the example
var exampleData = map[string]interface{}{
  "subdomains": []string{"api", "www", "test"},
  "ip":         "192.168.1.1",
  "region":     "us-east",
}

func main() {
  http.HandleFunc("/json-test", jsonHandler)
  http.HandleFunc("/regex-test", regexHandler)

  fmt.Println("Server is running on http://127.0.0.1:5005")
  log.Fatal(http.ListenAndServe(":5005", nil))
}

// jsonHandler returns a JSON object with a dynamic key (domain name based on the request URL)
func jsonHandler(w http.ResponseWriter, r *http.Request) {
  // Get the host from the request
  host := strings.Split(r.Host, ":")[0]

  // Create the response with the dynamic host as the key
  response := DomainResponse{
  	host: exampleData,
  }

  // Set response header to application/json
  w.Header().Set("Content-Type", "application/json")
  w.WriteHeader(http.StatusOK)
  json.NewEncoder(w).Encode(response)
}

func regexHandler(w http.ResponseWriter, r *http.Request) {
  nonce := "abc123"
  scriptSrc := fmt.Sprintf("/static/main.%s.js", nonce)

  // HTML content with the script tag
  htmlContent := fmt.Sprintf(`
  <html>
  <head>
  	<title>Test Page</title>
  </head>
  <body>
  	<h1>Test Page With Dynamic Script Tag</h1>
  	<script src="%s"></script>
  </body>
  </html>
  `, scriptSrc)

  // Write HTML response
  w.Header().Set("Content-Type", "text/html")
  w.WriteHeader(http.StatusOK)
  w.Write([]byte(htmlContent))
}
Test Json Extractor
id: http-variable-json-extractor

info:
name: HTTP Variable JSON Extractor
author: pdteam
severity: info

http:
- method: GET
  path:
    - "{{BaseURL}}/json-test"

  extractors:
    - type: json
      part: body
      name: subdomains
      json:
        - '."{{FQDN}}".subdomains[]'
Test Regex Extractor
id: http-variable-regex-extractor

info:
name: HTTP Variable Regex Extractor
author: pdteam
severity: info

http:
- method: GET
  path:
    - "{{BaseURL}}/regex-test"

  extractors:
    - type: regex
      part: body
      name: mainjs
      regex:
        - '{{script_regex}}'

Command

nuclei -t ./http-variable-regex-extractor.yaml -u http://127.0.0.1:5005 -var "script_regex=/static/main\.[a-zA-Z0-9]+\.js"

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • New Features

    • Regex and JSON extractors now support dynamic patterns that can be resolved at runtime using input data, allowing for more flexible and parameterized extraction.
  • Bug Fixes

    • Prevents errors by skipping compilation of extractor patterns that contain unresolved variables.
  • Tests

    • Updated tests to accommodate changes in extractor method signatures.

@auto-assign auto-assign bot requested a review from dogancanbakir October 13, 2024 17:22
@GeorginaReeder
Copy link

Thanks so much for your contribution @murat-kekij !

tarunKoyalwar

This comment was marked as outdated.

@tarunKoyalwar tarunKoyalwar self-requested a review December 1, 2024 16:34
Copy link
Member

@tarunKoyalwar tarunKoyalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a panic in unit test due to nil pointer dereference

Copy link

This pull request has been automatically marked as stale due to inactivity. It will be closed in 7 days if no further activity occurs. Please update if you wish to keep it open.

@github-actions github-actions bot added the Status: Stale This issue/PR has been inactive for a while and may be closed soon if no further activity occ label Jul 27, 2025
Copy link
Contributor

coderabbitai bot commented Jul 27, 2025

Walkthrough

The changes introduce support for variable resolution in regex and JSON extractors. Extractor methods now accept a data map for dynamic variable evaluation. Compilation and extraction logic are updated to check for unresolved variables, evaluate them at runtime using the data context, and compile the resulting patterns or queries as needed. Method signatures and internal extractor calls are updated accordingly.

Changes

Files / Paths Change Summary
pkg/operators/extractors/compile.go Added unresolved variable checks before compiling regex/JSON patterns; skips compilation if unresolved.
pkg/operators/extractors/extract.go Updated ExtractRegex/ExtractJSON to accept data map, evaluate variables at runtime, and compile dynamically. Method signatures changed.
pkg/operators/extractors/extract_test.go Updated test calls to ExtractRegex/ExtractJSON to include additional nil argument for new parameter.
pkg/protocols/dns/operators.go
pkg/protocols/file/operators.go
pkg/protocols/headless/operators.go
pkg/protocols/http/operators.go
pkg/protocols/network/operators.go
pkg/protocols/offlinehttp/operators.go
pkg/protocols/protocols.go
Updated internal calls to ExtractRegex/ExtractJSON to pass the data map as an argument.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Extractor
    participant Expressions

    Caller->>Extractor: ExtractRegex(corpus, data)
    loop For each regex pattern
        Extractor->>Expressions: ContainsUnresolvedVariables(pattern)
        alt Unresolved variables found
            Extractor->>Expressions: Evaluate(pattern, data)
            Extractor->>Extractor: Compile evaluated pattern
            alt Compilation fails
                Extractor->>Extractor: Log warning, skip
            end
        else No unresolved variables
            Extractor->>Extractor: Use precompiled pattern
        end
        Extractor->>Extractor: Extract matches
    end
    Extractor-->>Caller: Return unique matches
Loading
sequenceDiagram
    participant Caller
    participant Extractor
    participant Expressions

    Caller->>Extractor: ExtractJSON(corpus, data)
    loop For each JSON query
        Extractor->>Expressions: ContainsUnresolvedVariables(query)
        alt Unresolved variables found
            Extractor->>Expressions: Evaluate(query, data)
            Extractor->>Extractor: Compile evaluated query
            alt Compilation fails
                Extractor->>Extractor: Log warning, skip
            end
        else No unresolved variables
            Extractor->>Extractor: Use precompiled query
        end
        Extractor->>Extractor: Extract results
    end
    Extractor-->>Caller: Return unique results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through fields of code,
Where variables now freely flow’d.
Extractors sniff the data breeze,
Compiling patterns with greater ease.
With every hop, new matches found—
Dynamic queries now abound!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0de96 and 3875f99.

📒 Files selected for processing (10)
  • pkg/operators/extractors/compile.go (3 hunks)
  • pkg/operators/extractors/extract.go (3 hunks)
  • pkg/operators/extractors/extract_test.go (2 hunks)
  • pkg/protocols/dns/operators.go (1 hunks)
  • pkg/protocols/file/operators.go (1 hunks)
  • pkg/protocols/headless/operators.go (1 hunks)
  • pkg/protocols/http/operators.go (1 hunks)
  • pkg/protocols/network/operators.go (1 hunks)
  • pkg/protocols/offlinehttp/operators.go (1 hunks)
  • pkg/protocols/protocols.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
pkg/protocols/http/operators.go (1)

Learnt from: dwisiswant0
PR: #6290
File: pkg/protocols/http/build_request.go:457-464
Timestamp: 2025-06-30T16:34:42.125Z
Learning: In the projectdiscovery/retryablehttp-go package, the Request struct embeds URL fields directly, making req.Scheme, req.Host, and other URL fields accessible directly on the Request object instead of requiring req.URL.Scheme, req.URL.Host, etc.

pkg/protocols/file/operators.go (1)

Learnt from: dwisiswant0
PR: #6290
File: pkg/protocols/http/build_request.go:457-464
Timestamp: 2025-06-30T16:34:42.125Z
Learning: In the projectdiscovery/retryablehttp-go package, the Request struct embeds URL fields directly, making req.Scheme, req.Host, and other URL fields accessible directly on the Request object instead of requiring req.URL.Scheme, req.URL.Host, etc.

pkg/operators/extractors/compile.go (2)

Learnt from: hdm
PR: #6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: To make the template caching mechanism in pkg/templates/compile.go production-ready, DSLs need to be updated to use runtime options instead of cached variables, rather than restoring the Compile() calls on each request.

Learnt from: hdm
PR: #6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: In pkg/templates/compile.go, the template caching mechanism intentionally skips calling Compile() on copied requests to achieve performance benefits. This is the intended design, not a bug. The current implementation isn't production-ready but represents the desired direction.

🔇 Additional comments (16)
pkg/protocols/network/operators.go (1)

49-49: LGTM! Consistent implementation of variable support.

The addition of the data parameter to ExtractRegex correctly enables dynamic variable evaluation in regex extractors, aligning with the PR objectives.

pkg/protocols/offlinehttp/operators.go (1)

69-69: LGTM! Consistent variable support implementation.

The update to include the data parameter in ExtractRegex is correct and maintains consistency with other protocol implementations.

pkg/operators/extractors/extract_test.go (1)

14-14: LGTM! Test updates correctly reflect new method signatures.

The addition of nil as the second parameter to ExtractRegex and ExtractJSON calls is appropriate for tests that don't require variable resolution functionality.

Also applies to: 17-17, 73-73, 76-76

pkg/protocols/headless/operators.go (1)

74-74: LGTM! Headless protocol now supports variable evaluation.

The addition of the data parameter to ExtractRegex maintains consistency with other protocol implementations and enables variable support in headless extractors.

pkg/protocols/dns/operators.go (1)

60-60: LGTM! DNS protocol variable support implemented correctly.

The update to include the data parameter in ExtractRegex is correct, and the use of types.ToString(item) appropriately handles DNS data type conversion while enabling variable evaluation.

pkg/protocols/http/operators.go (1)

70-70: LGTM! Consistent parameter addition for variable support.

The changes correctly pass the data map to ExtractRegex and ExtractJSON methods, enabling variable resolution in extraction patterns. This aligns with the broader refactor across all protocol operators.

Also applies to: 76-76

pkg/protocols/file/operators.go (1)

48-48: LGTM! Consistent implementation across protocols.

The changes mirror those in other protocol operators, correctly adding the data parameter to enable variable resolution in file protocol extractors.

Also applies to: 52-52

pkg/protocols/protocols.go (1)

330-330: LGTM! Default extract function updated consistently.

The MakeDefaultExtractFunc correctly passes the data parameter to extractor methods, ensuring variable support is available for protocols using the default implementation.

Also applies to: 334-334

pkg/operators/extractors/compile.go (3)

11-11: LGTM! Necessary import for variable detection.

The expressions package import is required for the ContainsUnresolvedVariables function used in the compilation logic.


24-27: LGTM! Smart deferred compilation for regex patterns with variables.

The logic correctly detects unresolved variables in regex patterns and defers compilation by appending nil to the compiled slice. This enables runtime variable resolution while maintaining backward compatibility for static patterns.


39-42: LGTM! Consistent deferred compilation for JSON queries.

The same deferred compilation approach is applied to JSON queries, ensuring consistent behavior between regex and JSON extractors when variables are present.

pkg/operators/extractors/extract.go (5)

6-6: LGTM! Required imports for runtime compilation and logging.

The new imports support the runtime variable resolution functionality:

  • regexp for runtime regex compilation
  • gojq for runtime JSON query compilation
  • gologger for error logging
  • expressions for variable detection and evaluation

Also applies to: 11-11, 13-14


19-19: LGTM! Method signature updated to support variable resolution.

The ExtractRegex method now accepts a data parameter, enabling access to the full context for variable resolution while maintaining backward compatibility through the existing corpus parameter.


23-35: LGTM! Robust runtime variable resolution for regex patterns.

The implementation correctly:

  • Detects unresolved variables in regex patterns
  • Evaluates variables against the data context
  • Compiles resolved patterns at runtime
  • Handles errors gracefully with warning logs instead of failing
  • Continues processing other patterns when one fails

This provides a good balance between functionality and resilience.


148-148: LGTM! Consistent method signature for JSON extraction.

The ExtractJSON method signature matches the ExtractRegex pattern, maintaining API consistency across extractor types.


157-174: LGTM! Comprehensive runtime resolution for JSON queries.

The JSON extractor implementation mirrors the regex approach with appropriate adaptations:

  • Uses gojq.Parse and gojq.Compile for JSON query compilation
  • Maintains the same error handling pattern with warning logs
  • Properly handles the multi-step compilation process (parse → compile)

The implementation is consistent and robust.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale This issue/PR has been inactive for a while and may be closed soon if no further activity occ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extractors add variable support
4 participants