Skip to content

Conversation

@jwm4
Copy link
Contributor

@jwm4 jwm4 commented Jan 10, 2025

Adds new ilab rag convert command which takes in input documents from a taxonomy or a directory of original document files (e.g., PDFs) and outputs Docling JSON for each input document.

Issue resolved by this Pull Request:
Resolves #2890

Dev Docs related to this Pull Request:
Link to Dev Doc or PR: instructlab/dev-docs#161

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

@mergify
Copy link
Contributor

mergify bot commented Jan 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @jwm4 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added testing Relates to testing needs-rebase This Pull Request needs to be rebased ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jan 10, 2025
@mergify mergify bot added ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased ci-failure PR has at least one CI failure labels Jan 15, 2025
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reviewing the tests. These align with the team's expectations of basic unit tests for a new implementation, no major concerns from me

@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jan 16, 2025
@nathan-weinberg nathan-weinberg self-requested a review January 16, 2025 15:21
@jwm4 jwm4 marked this pull request as ready for review January 16, 2025 19:31
@jwm4 jwm4 requested a review from a team January 16, 2025 19:42
@jwm4 jwm4 changed the title [Draft] ilab rag convert feat(rag): Add lab rag convert CLI command Jan 16, 2025
@jwm4 jwm4 force-pushed the jwm4-rag-convert branch from 180a27e to 1f01cf2 Compare January 16, 2025 21:04
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jan 16, 2025
@jwm4 jwm4 marked this pull request as draft January 16, 2025 21:06
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jan 16, 2025
@jwm4 jwm4 marked this pull request as ready for review January 17, 2025 00:58
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jan 17, 2025
Copy link
Member

@nathan-weinberg nathan-weinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question for Charlie about the testdata but otherwise this is looking good to me - approving from side

@nathan-weinberg nathan-weinberg added the hold In-progress PR. Tag should be removed before merge. label Jan 17, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Jan 17, 2025
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing major, I appreciate using the new config_class and the added comments to some of the existing assertions, doc strings, etc.

Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jwm4 can you squash the commits and then I will remove the hold?

@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Jan 17, 2025
@jwm4 jwm4 force-pushed the jwm4-rag-convert branch from 306054b to be9101b Compare January 17, 2025 18:07
@jwm4
Copy link
Contributor Author

jwm4 commented Jan 17, 2025

Commits are squashed now.

@jwm4 jwm4 removed the hold In-progress PR. Tag should be removed before merge. label Jan 17, 2025
@mergify mergify bot merged commit bc784d7 into instructlab:main Jan 17, 2025
29 checks passed
@jwm4 jwm4 deleted the jwm4-rag-convert branch February 12, 2025 13:52
@booxter booxter mentioned this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RAG][Dev] Implement ilab rag convert

5 participants