Skip to content

Conversation

@pranath-reddy
Copy link
Member

@pranath-reddy pranath-reddy commented Jul 22, 2025

Summary 📝

This PR adds the implementation of the CombinedCodeSearchTool, which aggregates results from multiple code search tools and reranks them using a pretrained CrossEncoder model. The combined tool queries the local semantic code search, GitHub keyword-based search (via SearxNG), and the SDE API-based search, returning a unified and ranked list of relevant code repositories.

Details

  • Introduced CombinedCodeSearchTool under akd.tools.code_search to support unified querying across multiple search tools.
  • Combines results from:
    • LocalRepoCodeSearchTool for vector similarity search over local repositories
    • GitHubCodeSearchTool for keyword-based GitHub repository search via SearxNG
    • SDECodeSearchTool for querying the NASA SDE code search API
  • Aggregates outputs and reranks them using a pretrained CrossEncoder model (e.g., cross-encoder/ms-marco-MiniLM-L6-v2)
  • All results are returned in the standard SearchResultItem format with additional metadata:
    • extra["tool"] stores the source tool name
    • extra["score"] stores the reranker score assigned by the CrossEncoder

Usage

from akd.tools.code_search import (
    CodeSearchToolInputSchema,
    CombinedCodeSearchTool,
    CombinedCodeSearchToolConfig,
    SDECodeSearchTool,
    GitHubCodeSearchTool    
)

# Configure the tool
search_cfg = CombinedCodeSearchToolConfig()

# Initialize the code search tool
search_tool = CombinedCodeSearchTool(config=search_cfg, tools=[SDECodeSearchTool(), GitHubCodeSearchTool()])

# Run the search
result = await search_tool._arun(
    CodeSearchToolInputSchema(
        queries=["landslide nepal"],
        max_results=10
    )
)

Bugfixes 🐛

  • Fixed a bug in _sort_results where result["extra"] was incorrectly used to access the extra field of SearchResultItem. It is now correctly accessed as result.extra.

- Added a combined tool that uses MS Marco MiniLM cross encoder for reranking
- Fixed a minor bug in _sort_results
@pranath-reddy pranath-reddy requested a review from NISH1001 July 22, 2025 00:19
@pranath-reddy pranath-reddy self-assigned this Jul 22, 2025
@pranath-reddy pranath-reddy changed the title Add CombinedCodeSearchTool for unified code search with CrossEncoder-based reranking Add CombinedCodeSearchTool for unified code search Jul 22, 2025
- updated constructor to invoke base constructor first to avoid explicitly setting the config
@NISH1001
Copy link
Collaborator

@pranath-reddy could you confirm if it can also accept the code search input schema?

@NISH1001
Copy link
Collaborator

NISH1001 commented Jul 22, 2025

And in the usage section could you also add different code search tools as parameters list to let know that it's the original purpose

@NISH1001 NISH1001 mentioned this pull request Jul 22, 2025
2 tasks
@pranath-reddy
Copy link
Member Author

@pranath-reddy could you confirm if it can also accept the code search input schema?

It doesn't work. Should I add top_k as a computed field directly to CodeSearchToolInputSchema?

@NISH1001
Copy link
Collaborator

@pranath-reddy could you confirm if it can also accept the code search input schema?

It doesn't work. Should I add top_k as a computed field directly to CodeSearchToolInputSchema?

Yes. Let's do it

- added `top_k` as a computed field directly to `CodeSearchToolInputSchema`
@NISH1001 NISH1001 merged commit f9b606f into develop Jul 22, 2025
@NISH1001 NISH1001 deleted the feature/combined-code-agent branch July 22, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants