-
Notifications
You must be signed in to change notification settings - Fork 462
feat: parse with commit id #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces commit-level granularity throughout the codebase by adding a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ParsingController
participant ProjectService
participant CodeProviderService
participant RepoService
Client->>ParsingController: parse_directory(repo_details with commit_id)
ParsingController->>ProjectService: get_project_from_db(..., commit_id)
alt Project exists
ParsingController->>CodeProviderService: get_file_content(..., commit_id)
CodeProviderService->>RepoService: get_file_content(..., commit_id)
else Project does not exist
ParsingController->>ProjectService: register_project(..., commit_id)
ParsingController->>CodeProviderService: get_file_content(..., commit_id)
CodeProviderService->>RepoService: get_file_content(..., commit_id)
end
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (2)
app/modules/code_provider/local_repo/local_repo_service.py (1)
58-69
:⚠️ Potential issueRestore original repo state after reading file content.
The method checks out a specific commit or branch but doesn't restore the original state afterwards. This could leave the repository in an unexpected state for subsequent operations.
Consider storing and restoring the original branch/commit:
repo = self.get_repo(repo_path) +original_ref = repo.head.ref if repo.head.is_detached else repo.head.commit if commit_id: repo.git.checkout(commit_id) else: repo.git.checkout(branch_name) -file_full_path = os.path.join(repo_path, file_path) -with open(file_full_path, "r", encoding="utf-8") as file: - lines = file.readlines() - if (start_line == end_line == 0) or (start_line == end_line == None): - return "".join(lines) - start = start_line - 2 if start_line - 2 > 0 else 0 - selected_lines = lines[start:end_line] - return "".join(selected_lines) +try: + file_full_path = os.path.join(repo_path, file_path) + with open(file_full_path, "r", encoding="utf-8") as file: + lines = file.readlines() + if (start_line == end_line == 0) or (start_line == end_line == None): + return "".join(lines) + start = start_line - 2 if start_line - 2 > 0 else 0 + selected_lines = lines[start:end_line] + return "".join(selected_lines) +finally: + # Restore original state + if isinstance(original_ref, str): + repo.git.checkout(original_ref) + else: + repo.git.checkout(original_ref.hexsha)🧰 Tools
🪛 Ruff (0.11.9)
65-65: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
app/modules/code_provider/github/github_service.py (1)
140-143
:⚠️ Potential issuePublic repo fallback should also use the ref parameter.
When falling back to public repository access, the code doesn't pass the branch_name or commit_id as ref, which could result in fetching content from the wrong branch/commit.
Apply this diff to fix the issue:
try: github = self.get_public_github_instance() repo = github.get_repo(repo_name) - file_contents = repo.get_contents(file_path) + file_contents = repo.get_contents( + file_path, ref=commit_id if commit_id else branch_name + )
🧹 Nitpick comments (2)
app/modules/intelligence/tools/think_tool.py (1)
92-109
: Move inline import to top of file for better organization.The async event loop handling logic is well-implemented and correctly addresses the nested event loop issue. However, the
concurrent.futures
import should be moved to the top-level imports for better code organization.Apply this diff to move the import:
import asyncio +import concurrent.futures from typing import Dict, Any from langchain_core.tools import StructuredTool from pydantic import BaseModel, Field from sqlalchemy.orm import Session from app.modules.intelligence.provider.provider_service import ProviderService
And remove the inline import:
try: # Check if we're already in an event loop loop = asyncio.get_running_loop() # If we're in a running loop, we need to use a different approach - import concurrent.futures
app/modules/projects/projects_service.py (1)
150-160
: Simplify by removing unnecessary else block.The static analysis correctly identifies that the else block is unnecessary after the return statement.
Apply this diff to simplify the code:
if commit_id: # If commit_id is provided, use it for deduplication project = query.filter(Project.commit_id == commit_id).first() if project: return project - else: - return None + return None # Fall back to branch_name if commit_id is not provided or no match was found project = query.filter(Project.branch_name == branch_name).first() return project🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 153-156: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
app/modules/code_provider/code_provider_service.py
(1 hunks)app/modules/code_provider/github/github_service.py
(6 hunks)app/modules/code_provider/local_repo/local_repo_service.py
(3 hunks)app/modules/intelligence/tools/change_detection/change_detection_tool.py
(1 hunks)app/modules/intelligence/tools/kg_based_tools/get_code_from_multiple_node_ids_tool.py
(1 hunks)app/modules/intelligence/tools/kg_based_tools/get_code_from_node_id_tool.py
(1 hunks)app/modules/intelligence/tools/kg_based_tools/get_code_from_probable_node_name_tool.py
(1 hunks)app/modules/intelligence/tools/think_tool.py
(1 hunks)app/modules/key_management/secret_manager.py
(1 hunks)app/modules/parsing/graph_construction/parsing_controller.py
(3 hunks)app/modules/parsing/graph_construction/parsing_helper.py
(4 hunks)app/modules/parsing/graph_construction/parsing_schema.py
(1 hunks)app/modules/parsing/graph_construction/parsing_service.py
(1 hunks)app/modules/projects/projects_service.py
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (5)
app/modules/parsing/graph_construction/parsing_service.py (1)
app/modules/parsing/graph_construction/parsing_helper.py (1)
setup_project_directory
(295-385)
app/modules/parsing/graph_construction/parsing_controller.py (1)
app/modules/projects/projects_service.py (1)
get_global_project_from_db
(162-203)
app/modules/parsing/graph_construction/parsing_helper.py (4)
app/modules/projects/projects_service.py (1)
register_project
(59-80)app/modules/code_provider/code_provider_service.py (1)
get_repo
(19-20)app/modules/code_provider/local_repo/local_repo_service.py (1)
get_repo
(27-32)app/modules/code_provider/github/github_service.py (1)
get_repo
(545-568)
app/modules/code_provider/github/github_service.py (2)
app/modules/code_provider/code_provider_service.py (1)
get_repo
(19-20)app/modules/code_provider/local_repo/local_repo_service.py (1)
get_repo
(27-32)
app/modules/projects/projects_service.py (2)
app/modules/projects/projects_model.py (1)
Project
(21-65)app/modules/projects/projects_schema.py (1)
ProjectStatusEnum
(6-12)
🪛 Pylint (3.3.7)
app/modules/key_management/secret_manager.py
[error] 941-941: Value 'user' is unsubscriptable
(E1136)
app/modules/projects/projects_service.py
[refactor] 153-156: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🪛 Ruff (0.11.9)
app/modules/parsing/graph_construction/parsing_helper.py
349-352: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
526-526: Do not use bare except
(E722)
🔇 Additional comments (21)
app/modules/intelligence/tools/think_tool.py (3)
92-109
: The event loop handling logic is correctly implemented.The approach of detecting running event loops with
asyncio.get_running_loop()
and usingThreadPoolExecutor
to create new loops in separate threads is a proper solution for avoiding nested event loop errors. The cleanup logic in thefinally
block ensures proper resource management.
111-116
: Exception handling and cleanup logic is correct.The fallback logic for when no event loop is running properly creates a new loop and ensures cleanup in the
finally
block. The exception handling correctly catchesRuntimeError
fromget_running_loop()
.
92-116
: Verify alignment with PR objectives.The changes in this file appear to focus on async/sync interoperability rather than commit-level parsing functionality. Please confirm that these event loop handling improvements are intentionally part of the "feat: parse with commit id" PR.
Likely an incorrect or invalid review comment.
app/modules/intelligence/tools/kg_based_tools/get_code_from_node_id_tool.py (1)
108-108
: LGTM: Commit ID parameter correctly addedThe addition of
project.commit_id
parameter to theCodeProviderService.get_file_content
call is consistent with the commit-level granularity enhancement across the codebase.app/modules/parsing/graph_construction/parsing_service.py (2)
94-94
: LGTM: Commit ID parameter correctly added for development modeThe addition of
commit_id=repo_details.commit_id
parameter is consistent with the enhancedsetup_project_directory
method signature and enables commit-specific repository operations.
107-107
: LGTM: Commit ID parameter correctly added for non-development modeThe addition of
commit_id=repo_details.commit_id
parameter ensures both development and non-development code paths support commit-level granularity consistently.app/modules/intelligence/tools/kg_based_tools/get_code_from_multiple_node_ids_tool.py (1)
127-127
: LGTM: Commit ID parameter correctly addedThe addition of
project.commit_id
parameter maintains consistency with other tools in the kg_based_tools directory and supports commit-specific file content retrieval.app/modules/intelligence/tools/change_detection/change_detection_tool.py (1)
98-98
: LGTM: Commit ID parameter correctly added with appropriate dict accessThe addition of
project["commit_id"]
parameter is consistent with the dict-style access pattern used for other project properties in this method and enables commit-specific file content retrieval for change detection.app/modules/intelligence/tools/kg_based_tools/get_code_from_probable_node_name_tool.py (2)
174-174
: Verify the intended behavior of the start_line adjustment.The change from
start_line
tostart_line - 3
will retrieve 3 additional lines of context before the actual start line. While this may be beneficial for providing more context, ensure this change is intentional and doesn't break existing functionality that expects the exact start line.
178-178
: LGTM: Commit ID integration implemented correctly.The addition of
project.commit_id
as a parameter aligns perfectly with the broader effort to support commit-level granularity throughout the codebase. This ensures file content is retrieved at the specific commit rather than just the branch head.app/modules/parsing/graph_construction/parsing_schema.py (1)
9-10
: LGTM: Well-designed schema changes for commit-level support.The changes appropriately make
branch_name
optional while addingcommit_id
as an optional field. This design allows for flexible repository referencing - either by branch name, commit ID, or both. The default values ofNone
provide good backward compatibility.app/modules/code_provider/code_provider_service.py (1)
25-44
: LGTM: Commit ID parameter integration implemented correctly.The addition of the
commit_id
parameter and its pass-through to the service instance is implemented correctly. The reformatting to put each parameter on its own line also improves readability. This change maintains the service abstraction while enabling commit-level file content retrieval.🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 25-25: Too many arguments (8/5)
(R0913)
[refactor] 25-25: Too many positional arguments (8/5)
(R0917)
app/modules/key_management/secret_manager.py (1)
941-941
: LGTM: Improved error handling with descriptive exception.The change from a bare
raise
to an explicitValueError
with a descriptive message significantly improves error handling and debugging. The static analysis warning aboutuser
being unsubscriptable appears to be a false positive, asuser
is consistently treated as a dictionary returned fromAuthService.check_auth
throughout the codebase.🧰 Tools
🪛 Pylint (3.3.7)
[error] 941-941: Value 'user' is unsubscriptable
(E1136)
app/modules/parsing/graph_construction/parsing_controller.py (1)
227-234
: Proper integration of commit_id in project registration and event tracking.The commit_id is correctly propagated to the project registration and included in the PostHog event payload, maintaining consistency with the commit-level granularity feature.
Also applies to: 248-257
app/modules/parsing/graph_construction/parsing_helper.py (2)
303-325
: Proper addition of commit_id parameter and propagation.The commit_id parameter is correctly added to the method signature and properly passed to both
get_project_from_db
andregister_project
methods.
356-373
: Correct implementation of commit-specific tarball download.The code properly handles downloading tarballs for either a specific commit or branch, setting the appropriate commit SHA in both cases.
app/modules/projects/projects_service.py (3)
59-80
: LGTM!The addition of the optional
commit_id
parameter toregister_project
is properly implemented and correctly passed to the Project instance.🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 59-59: Too many arguments (7/5)
(R0913)
[refactor] 59-59: Too many positional arguments (7/5)
(R0917)
162-203
: Well-implemented commit_id prioritization logic.The method correctly prioritizes commit_id over branch_name and maintains consistent ordering with
created_at.asc()
. The implementation follows the same pattern asget_project_from_db
.
248-248
: Good addition of commit_id to project details.The commit_id is properly included in the returned project details.
app/modules/code_provider/github/github_service.py (2)
172-174
: Line selection offset removal - verify this is intentional.The removal of the
-2
offset for start line selection is a behavioral change. This will now select exactly the requested lines instead of including 2 lines before. Please confirm this change is intentional and aligns with the expected behavior.
640-719
: Proper implementation of ref parameter propagation.The ref parameter is correctly added and propagated through recursive calls, ensuring consistent commit/branch reference throughout the structure traversal.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 640-640: Too many arguments (6/5)
(R0913)
[refactor] 640-640: Too many positional arguments (6/5)
(R0917)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/modules/parsing/graph_construction/parsing_controller.py
(3 hunks)app/modules/projects/projects_service.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/modules/parsing/graph_construction/parsing_controller.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/modules/projects/projects_service.py (2)
app/modules/projects/projects_model.py (1)
Project
(21-65)app/modules/projects/projects_schema.py (1)
ProjectStatusEnum
(6-12)
🪛 Pylint (3.3.7)
app/modules/projects/projects_service.py
[refactor] 153-156: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🔇 Additional comments (3)
app/modules/projects/projects_service.py (3)
65-65
: LGTM: Commit ID parameter added correctly.The addition of the optional
commit_id
parameter to theregister_project
method and its usage in the Project instantiation is consistent with the feature objectives and follows the established pattern.Also applies to: 74-74
248-248
: LGTM: Commit ID properly included in project details.The addition of
commit_id
to the returned project details is consistent with the enhanced commit-aware functionality.
147-147
: Repo path filter consistency resolved.The
repo_path
filter is now consistently applied in bothget_project_from_db
andget_global_project_from_db
methods, addressing the previous review concern about inconsistent filtering behavior.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes