-
Notifications
You must be signed in to change notification settings - Fork 51
Text extraction #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text extraction #138
Conversation
* Update README.md De-claw * Update forBeginners.md de-CLAW * Update README.md
| update_cache: yes | ||
| when: ansible_os_family == "Redhat" | ||
|
|
||
| - name: Download Islandora Text Extraction module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just add this and the next task to variables in our inventory.
|
|
|
Box provisions ok and I can confirm the listeners are deploying as is the |
|
I was looking to test this as well. Please modify the instructions/playbook to include the Currently, only one model can be chosen. Thus, not sure how to tag a node "as both Image and Digital document." Questions
Thank you. |
|
@Natkeeran - Good questions! |
|
Linking to Islandora/documentation#932 |
|
Testing and review of this PR to be done as part of our paged content sprint! |
|
@ajstanley, @dannylamb : I'm testing the PR and running into some issues. I tried to run the test, but after modifying the "Model" field to allow the user to check both Image and Digital Document, Drupal said that it could not make the change to the database. I tried to walk through the steps anyway, but the ocr file was not generated. So I'm guessing Drupal was being serious about not being to update the database. A few requests/questions:
|
|
@dbernstein - You should be able to navigate to |
|
Getting the following error: Also, wondering if it is passing any language parameters to tesseract. |
|
Superseded by #140 |
What does this Pull Request do?
Adds text extraction module to playbook.
What's new?
With text extraction in place images with OCR will have that text extracted and put into an editable media.
Any Original File with an Original File tag will also have text extracted into an editable media
How should this be tested?
After playbook is spun up create a node tagged as both Image and Digital document.
Add image media (containing test) tagged as Original File.
Extracted Text media should be created and attached to node.
Create another node and attach a media tagged with Original File and with a media type of application/pdf.
Extracted Text media should be created and attached to node.
Interested parties
@Islandora-Devops/committers