Skip to content

Conversation

nndn
Copy link
Contributor

@nndn nndn commented Jun 7, 2025

Summary by CodeRabbit

  • New Features

    • Added support for specifying a commit ID when retrieving file contents and project structures, allowing users to access files and repositories at a specific commit.
    • Users can now provide a commit ID or branch name when making parsing requests, offering greater flexibility and precision.
  • Improvements

    • Enhanced project registration and retrieval to support commit-level granularity, enabling more accurate project identification.
    • Improved error messages when API keys are missing, providing clearer feedback to users.
    • Extended project setup and commit status checks to handle specific commit IDs, improving update accuracy.
  • Bug Fixes

    • Resolved issues with event loop handling in synchronous operations to prevent runtime errors in certain environments.

Copy link
Contributor

coderabbitai bot commented Jun 7, 2025

Walkthrough

This update introduces commit-level granularity throughout the codebase by adding a commit_id parameter to project, code retrieval, and parsing logic. Method signatures and internal calls are updated to prioritize commit SHA over branch name when specified. Error handling and event payloads are also enhanced to incorporate commit context, supporting more precise and reproducible project operations.

Changes

File(s) Change Summary
app/modules/code_provider/code_provider_service.py,
app/modules/code_provider/github/github_service.py,
app/modules/code_provider/local_repo/local_repo_service.py
Added commit_id parameter to get_file_content methods; logic updated to use commit SHA or branch as reference.
app/modules/code_provider/github/github_service.py Updated project structure retrieval to accept a ref (commit or branch); recursive structure fetches now propagate ref.
app/modules/intelligence/tools/change_detection/change_detection_tool.py,
app/modules/intelligence/tools/kg_based_tools/get_code_from_multiple_node_ids_tool.py,
app/modules/intelligence/tools/kg_based_tools/get_code_from_node_id_tool.py,
app/modules/intelligence/tools/kg_based_tools/get_code_from_probable_node_name_tool.py
Appended commit_id to calls to get_file_content in tool logic; in one case, also adjusted line range retrieval.
app/modules/intelligence/tools/think_tool.py Enhanced run method to support execution within an existing asyncio event loop via thread executor.
app/modules/key_management/secret_manager.py Improved error handling by raising explicit ValueError when no API key is found.
app/modules/parsing/graph_construction/parsing_controller.py,
app/modules/parsing/graph_construction/parsing_helper.py,
app/modules/parsing/graph_construction/parsing_service.py,
app/modules/parsing/graph_construction/parsing_schema.py
Added commit_id support to parsing requests, project setup, event payloads, and directory parsing logic.
app/modules/projects/projects_service.py Added commit_id to project registration and retrieval methods; logic prioritizes commit over branch when present.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ParsingController
    participant ProjectService
    participant CodeProviderService
    participant RepoService

    Client->>ParsingController: parse_directory(repo_details with commit_id)
    ParsingController->>ProjectService: get_project_from_db(..., commit_id)
    alt Project exists
        ParsingController->>CodeProviderService: get_file_content(..., commit_id)
        CodeProviderService->>RepoService: get_file_content(..., commit_id)
    else Project does not exist
        ParsingController->>ProjectService: register_project(..., commit_id)
        ParsingController->>CodeProviderService: get_file_content(..., commit_id)
        CodeProviderService->>RepoService: get_file_content(..., commit_id)
    end
Loading

Poem

🐇
A hop and a skip through branches and time,
Now commits are tracked—oh, isn’t that sublime?
With each SHA we leap, so precise and neat,
Your project’s true state, no longer a feat.
From branch or commit, the code’s always right—
A rabbit’s delight in versioned insight!
🥕

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
app/modules/code_provider/local_repo/local_repo_service.py (1)

58-69: ⚠️ Potential issue

Restore original repo state after reading file content.

The method checks out a specific commit or branch but doesn't restore the original state afterwards. This could leave the repository in an unexpected state for subsequent operations.

Consider storing and restoring the original branch/commit:

 repo = self.get_repo(repo_path)
+original_ref = repo.head.ref if repo.head.is_detached else repo.head.commit
 if commit_id:
     repo.git.checkout(commit_id)
 else:
     repo.git.checkout(branch_name)
-file_full_path = os.path.join(repo_path, file_path)
-with open(file_full_path, "r", encoding="utf-8") as file:
-    lines = file.readlines()
-    if (start_line == end_line == 0) or (start_line == end_line == None):
-        return "".join(lines)
-    start = start_line - 2 if start_line - 2 > 0 else 0
-    selected_lines = lines[start:end_line]
-    return "".join(selected_lines)
+try:
+    file_full_path = os.path.join(repo_path, file_path)
+    with open(file_full_path, "r", encoding="utf-8") as file:
+        lines = file.readlines()
+        if (start_line == end_line == 0) or (start_line == end_line == None):
+            return "".join(lines)
+        start = start_line - 2 if start_line - 2 > 0 else 0
+        selected_lines = lines[start:end_line]
+        return "".join(selected_lines)
+finally:
+    # Restore original state
+    if isinstance(original_ref, str):
+        repo.git.checkout(original_ref)
+    else:
+        repo.git.checkout(original_ref.hexsha)
🧰 Tools
🪛 Ruff (0.11.9)

65-65: Comparison to None should be cond is None

Replace with cond is None

(E711)

app/modules/code_provider/github/github_service.py (1)

140-143: ⚠️ Potential issue

Public repo fallback should also use the ref parameter.

When falling back to public repository access, the code doesn't pass the branch_name or commit_id as ref, which could result in fetching content from the wrong branch/commit.

Apply this diff to fix the issue:

 try:
     github = self.get_public_github_instance()
     repo = github.get_repo(repo_name)
-    file_contents = repo.get_contents(file_path)
+    file_contents = repo.get_contents(
+        file_path, ref=commit_id if commit_id else branch_name
+    )
🧹 Nitpick comments (2)
app/modules/intelligence/tools/think_tool.py (1)

92-109: Move inline import to top of file for better organization.

The async event loop handling logic is well-implemented and correctly addresses the nested event loop issue. However, the concurrent.futures import should be moved to the top-level imports for better code organization.

Apply this diff to move the import:

 import asyncio
+import concurrent.futures
 from typing import Dict, Any
 from langchain_core.tools import StructuredTool
 from pydantic import BaseModel, Field
 from sqlalchemy.orm import Session
 from app.modules.intelligence.provider.provider_service import ProviderService

And remove the inline import:

         try:
             # Check if we're already in an event loop
             loop = asyncio.get_running_loop()
             # If we're in a running loop, we need to use a different approach
-            import concurrent.futures
app/modules/projects/projects_service.py (1)

150-160: Simplify by removing unnecessary else block.

The static analysis correctly identifies that the else block is unnecessary after the return statement.

Apply this diff to simplify the code:

 if commit_id:
     # If commit_id is provided, use it for deduplication
     project = query.filter(Project.commit_id == commit_id).first()
     if project:
         return project
-    else:
-        return None
+    return None
 
 # Fall back to branch_name if commit_id is not provided or no match was found
 project = query.filter(Project.branch_name == branch_name).first()
 return project
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 153-156: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3240fa5 and b61e48b.

📒 Files selected for processing (14)
  • app/modules/code_provider/code_provider_service.py (1 hunks)
  • app/modules/code_provider/github/github_service.py (6 hunks)
  • app/modules/code_provider/local_repo/local_repo_service.py (3 hunks)
  • app/modules/intelligence/tools/change_detection/change_detection_tool.py (1 hunks)
  • app/modules/intelligence/tools/kg_based_tools/get_code_from_multiple_node_ids_tool.py (1 hunks)
  • app/modules/intelligence/tools/kg_based_tools/get_code_from_node_id_tool.py (1 hunks)
  • app/modules/intelligence/tools/kg_based_tools/get_code_from_probable_node_name_tool.py (1 hunks)
  • app/modules/intelligence/tools/think_tool.py (1 hunks)
  • app/modules/key_management/secret_manager.py (1 hunks)
  • app/modules/parsing/graph_construction/parsing_controller.py (3 hunks)
  • app/modules/parsing/graph_construction/parsing_helper.py (4 hunks)
  • app/modules/parsing/graph_construction/parsing_schema.py (1 hunks)
  • app/modules/parsing/graph_construction/parsing_service.py (1 hunks)
  • app/modules/projects/projects_service.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (5)
app/modules/parsing/graph_construction/parsing_service.py (1)
app/modules/parsing/graph_construction/parsing_helper.py (1)
  • setup_project_directory (295-385)
app/modules/parsing/graph_construction/parsing_controller.py (1)
app/modules/projects/projects_service.py (1)
  • get_global_project_from_db (162-203)
app/modules/parsing/graph_construction/parsing_helper.py (4)
app/modules/projects/projects_service.py (1)
  • register_project (59-80)
app/modules/code_provider/code_provider_service.py (1)
  • get_repo (19-20)
app/modules/code_provider/local_repo/local_repo_service.py (1)
  • get_repo (27-32)
app/modules/code_provider/github/github_service.py (1)
  • get_repo (545-568)
app/modules/code_provider/github/github_service.py (2)
app/modules/code_provider/code_provider_service.py (1)
  • get_repo (19-20)
app/modules/code_provider/local_repo/local_repo_service.py (1)
  • get_repo (27-32)
app/modules/projects/projects_service.py (2)
app/modules/projects/projects_model.py (1)
  • Project (21-65)
app/modules/projects/projects_schema.py (1)
  • ProjectStatusEnum (6-12)
🪛 Pylint (3.3.7)
app/modules/key_management/secret_manager.py

[error] 941-941: Value 'user' is unsubscriptable

(E1136)

app/modules/projects/projects_service.py

[refactor] 153-156: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

🪛 Ruff (0.11.9)
app/modules/parsing/graph_construction/parsing_helper.py

349-352: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


526-526: Do not use bare except

(E722)

🔇 Additional comments (21)
app/modules/intelligence/tools/think_tool.py (3)

92-109: The event loop handling logic is correctly implemented.

The approach of detecting running event loops with asyncio.get_running_loop() and using ThreadPoolExecutor to create new loops in separate threads is a proper solution for avoiding nested event loop errors. The cleanup logic in the finally block ensures proper resource management.


111-116: Exception handling and cleanup logic is correct.

The fallback logic for when no event loop is running properly creates a new loop and ensures cleanup in the finally block. The exception handling correctly catches RuntimeError from get_running_loop().


92-116: Verify alignment with PR objectives.

The changes in this file appear to focus on async/sync interoperability rather than commit-level parsing functionality. Please confirm that these event loop handling improvements are intentionally part of the "feat: parse with commit id" PR.

Likely an incorrect or invalid review comment.

app/modules/intelligence/tools/kg_based_tools/get_code_from_node_id_tool.py (1)

108-108: LGTM: Commit ID parameter correctly added

The addition of project.commit_id parameter to the CodeProviderService.get_file_content call is consistent with the commit-level granularity enhancement across the codebase.

app/modules/parsing/graph_construction/parsing_service.py (2)

94-94: LGTM: Commit ID parameter correctly added for development mode

The addition of commit_id=repo_details.commit_id parameter is consistent with the enhanced setup_project_directory method signature and enables commit-specific repository operations.


107-107: LGTM: Commit ID parameter correctly added for non-development mode

The addition of commit_id=repo_details.commit_id parameter ensures both development and non-development code paths support commit-level granularity consistently.

app/modules/intelligence/tools/kg_based_tools/get_code_from_multiple_node_ids_tool.py (1)

127-127: LGTM: Commit ID parameter correctly added

The addition of project.commit_id parameter maintains consistency with other tools in the kg_based_tools directory and supports commit-specific file content retrieval.

app/modules/intelligence/tools/change_detection/change_detection_tool.py (1)

98-98: LGTM: Commit ID parameter correctly added with appropriate dict access

The addition of project["commit_id"] parameter is consistent with the dict-style access pattern used for other project properties in this method and enables commit-specific file content retrieval for change detection.

app/modules/intelligence/tools/kg_based_tools/get_code_from_probable_node_name_tool.py (2)

174-174: Verify the intended behavior of the start_line adjustment.

The change from start_line to start_line - 3 will retrieve 3 additional lines of context before the actual start line. While this may be beneficial for providing more context, ensure this change is intentional and doesn't break existing functionality that expects the exact start line.


178-178: LGTM: Commit ID integration implemented correctly.

The addition of project.commit_id as a parameter aligns perfectly with the broader effort to support commit-level granularity throughout the codebase. This ensures file content is retrieved at the specific commit rather than just the branch head.

app/modules/parsing/graph_construction/parsing_schema.py (1)

9-10: LGTM: Well-designed schema changes for commit-level support.

The changes appropriately make branch_name optional while adding commit_id as an optional field. This design allows for flexible repository referencing - either by branch name, commit ID, or both. The default values of None provide good backward compatibility.

app/modules/code_provider/code_provider_service.py (1)

25-44: LGTM: Commit ID parameter integration implemented correctly.

The addition of the commit_id parameter and its pass-through to the service instance is implemented correctly. The reformatting to put each parameter on its own line also improves readability. This change maintains the service abstraction while enabling commit-level file content retrieval.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 25-25: Too many arguments (8/5)

(R0913)


[refactor] 25-25: Too many positional arguments (8/5)

(R0917)

app/modules/key_management/secret_manager.py (1)

941-941: LGTM: Improved error handling with descriptive exception.

The change from a bare raise to an explicit ValueError with a descriptive message significantly improves error handling and debugging. The static analysis warning about user being unsubscriptable appears to be a false positive, as user is consistently treated as a dictionary returned from AuthService.check_auth throughout the codebase.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 941-941: Value 'user' is unsubscriptable

(E1136)

app/modules/parsing/graph_construction/parsing_controller.py (1)

227-234: Proper integration of commit_id in project registration and event tracking.

The commit_id is correctly propagated to the project registration and included in the PostHog event payload, maintaining consistency with the commit-level granularity feature.

Also applies to: 248-257

app/modules/parsing/graph_construction/parsing_helper.py (2)

303-325: Proper addition of commit_id parameter and propagation.

The commit_id parameter is correctly added to the method signature and properly passed to both get_project_from_db and register_project methods.


356-373: Correct implementation of commit-specific tarball download.

The code properly handles downloading tarballs for either a specific commit or branch, setting the appropriate commit SHA in both cases.

app/modules/projects/projects_service.py (3)

59-80: LGTM!

The addition of the optional commit_id parameter to register_project is properly implemented and correctly passed to the Project instance.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 59-59: Too many arguments (7/5)

(R0913)


[refactor] 59-59: Too many positional arguments (7/5)

(R0917)


162-203: Well-implemented commit_id prioritization logic.

The method correctly prioritizes commit_id over branch_name and maintains consistent ordering with created_at.asc(). The implementation follows the same pattern as get_project_from_db.


248-248: Good addition of commit_id to project details.

The commit_id is properly included in the returned project details.

app/modules/code_provider/github/github_service.py (2)

172-174: Line selection offset removal - verify this is intentional.

The removal of the -2 offset for start line selection is a behavioral change. This will now select exactly the requested lines instead of including 2 lines before. Please confirm this change is intentional and aligns with the expected behavior.


640-719: Proper implementation of ref parameter propagation.

The ref parameter is correctly added and propagated through recursive calls, ensuring consistent commit/branch reference throughout the structure traversal.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 640-640: Too many arguments (6/5)

(R0913)


[refactor] 640-640: Too many positional arguments (6/5)

(R0917)

Copy link

sonarqubecloud bot commented Jun 9, 2025

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b61e48b and bae0a6f.

📒 Files selected for processing (2)
  • app/modules/parsing/graph_construction/parsing_controller.py (3 hunks)
  • app/modules/projects/projects_service.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/modules/parsing/graph_construction/parsing_controller.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/modules/projects/projects_service.py (2)
app/modules/projects/projects_model.py (1)
  • Project (21-65)
app/modules/projects/projects_schema.py (1)
  • ProjectStatusEnum (6-12)
🪛 Pylint (3.3.7)
app/modules/projects/projects_service.py

[refactor] 153-156: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

🔇 Additional comments (3)
app/modules/projects/projects_service.py (3)

65-65: LGTM: Commit ID parameter added correctly.

The addition of the optional commit_id parameter to the register_project method and its usage in the Project instantiation is consistent with the feature objectives and follows the established pattern.

Also applies to: 74-74


248-248: LGTM: Commit ID properly included in project details.

The addition of commit_id to the returned project details is consistent with the enhanced commit-aware functionality.


147-147: Repo path filter consistency resolved.

The repo_path filter is now consistently applied in both get_project_from_db and get_global_project_from_db methods, addressing the previous review concern about inconsistent filtering behavior.

@nndn nndn requested a review from dhirenmathur June 9, 2025 16:10
@nndn nndn merged commit 6d74b03 into main Jun 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants