Skip to content

Commit ba49385

Browse files
feat: added font_family to document.proto (#404)
* feat: added font_family to document.proto feat: added ImageQualityScores message to document.proto feat: added PropertyMetadata and EntityTypeMetadata to document_schema.proto PiperOrigin-RevId: 486975621 Source-Link: googleapis/googleapis@398c9f9 Source-Link: googleapis/googleapis-gen@7cd1f5f Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiN2NkMWY1ZjRlNDM1Nzc3Y2I4MjRhZjI2OGRjOGQzNzEzNDYxM2U2YSJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Update constraints-3.7.txt Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Holt Skinner <[email protected]>
1 parent c6e079f commit ba49385

File tree

9 files changed

+165
-44
lines changed

9 files changed

+165
-44
lines changed

packages/google-cloud-documentai/google/cloud/documentai_v1/services/document_processor_service/async_client.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,11 @@
3939
from google.protobuf import timestamp_pb2 # type: ignore
4040

4141
from google.cloud.documentai_v1.services.document_processor_service import pagers
42-
from google.cloud.documentai_v1.types import document, document_processor_service
42+
from google.cloud.documentai_v1.types import (
43+
document,
44+
document_processor_service,
45+
document_schema,
46+
)
4347
from google.cloud.documentai_v1.types import processor
4448
from google.cloud.documentai_v1.types import processor as gcd_processor
4549
from google.cloud.documentai_v1.types import processor_type

packages/google-cloud-documentai/google/cloud/documentai_v1/services/document_processor_service/client.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,11 @@
4242
from google.protobuf import timestamp_pb2 # type: ignore
4343

4444
from google.cloud.documentai_v1.services.document_processor_service import pagers
45-
from google.cloud.documentai_v1.types import document, document_processor_service
45+
from google.cloud.documentai_v1.types import (
46+
document,
47+
document_processor_service,
48+
document_schema,
49+
)
4650
from google.cloud.documentai_v1.types import processor
4751
from google.cloud.documentai_v1.types import processor as gcd_processor
4852
from google.cloud.documentai_v1.types import processor_type

packages/google-cloud-documentai/google/cloud/documentai_v1/types/barcode.py

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -28,25 +28,41 @@ class Barcode(proto.Message):
2828
2929
Attributes:
3030
format_ (str):
31-
Format of a barcode. The supported formats are: CODE_128:
32-
Code 128 type. CODE_39: Code 39 type. CODE_93: Code 93 type.
33-
CODABAR: Codabar type. DATA_MATRIX: 2D Data Matrix type.
34-
ITF: ITF type. EAN_13: EAN-13 type. EAN_8: EAN-8 type.
35-
QR_CODE: 2D QR code type. UPC_A: UPC-A type. UPC_E: UPC-E
36-
type. PDF417: PDF417 type. AZTEC: 2D Aztec code type.
37-
DATABAR: GS1 DataBar code type.
31+
Format of a barcode. The supported formats are:
32+
33+
- ``CODE_128``: Code 128 type.
34+
- ``CODE_39``: Code 39 type.
35+
- ``CODE_93``: Code 93 type.
36+
- ``CODABAR``: Codabar type.
37+
- ``DATA_MATRIX``: 2D Data Matrix type.
38+
- ``ITF``: ITF type.
39+
- ``EAN_13``: EAN-13 type.
40+
- ``EAN_8``: EAN-8 type.
41+
- ``QR_CODE``: 2D QR code type.
42+
- ``UPC_A``: UPC-A type.
43+
- ``UPC_E``: UPC-E type.
44+
- ``PDF417``: PDF417 type.
45+
- ``AZTEC``: 2D Aztec code type.
46+
- ``DATABAR``: GS1 DataBar code type.
3847
value_format (str):
3948
Value format describes the format of the value that a
40-
barcode encodes. The supported formats are: CONTACT_INFO:
41-
Contact information. EMAIL: Email address. ISBN: ISBN
42-
identifier. PHONE: Phone number. PRODUCT: Product. SMS: SMS
43-
message. TEXT: Text string. URL: URL address. WIFI: Wifi
44-
information. GEO: Geo-localization. CALENDAR_EVENT: Calendar
45-
event. DRIVER_LICENSE: Driver's license.
49+
barcode encodes. The supported formats are:
50+
51+
- ``CONTACT_INFO``: Contact information.
52+
- ``EMAIL``: Email address.
53+
- ``ISBN``: ISBN identifier.
54+
- ``PHONE``: Phone number.
55+
- ``PRODUCT``: Product.
56+
- ``SMS``: SMS message.
57+
- ``TEXT``: Text string.
58+
- ``URL``: URL address.
59+
- ``WIFI``: Wifi information.
60+
- ``GEO``: Geo-localization.
61+
- ``CALENDAR_EVENT``: Calendar event.
62+
- ``DRIVER_LICENSE``: Driver's license.
4663
raw_value (str):
47-
Raw value encoded in the barcode.
48-
For example,
49-
'MEBKM:TITLE:Google;URL:https://www.google.com;;'.
64+
Raw value encoded in the barcode. For example:
65+
``'MEBKM:TITLE:Google;URL:https://www.google.com;;'``.
5066
"""
5167

5268
format_ = proto.Field(

packages/google-cloud-documentai/google/cloud/documentai_v1/types/document.py

Lines changed: 85 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,10 @@ class Document(proto.Message):
8585
[Document.entities][google.cloud.documentai.v1.Document.entities].
8686
text_changes (Sequence[google.cloud.documentai_v1.types.Document.TextChange]):
8787
Placeholder. A list of text corrections made to
88-
[Document.text]. This is usually used for annotating
89-
corrections to OCR mistakes. Text changes for a given
90-
revision may not overlap with each other.
88+
[Document.text][google.cloud.documentai.v1.Document.text].
89+
This is usually used for annotating corrections to OCR
90+
mistakes. Text changes for a given revision may not overlap
91+
with each other.
9192
shard_info (google.cloud.documentai_v1.types.Document.ShardInfo):
9293
Information about the sharding if this
9394
document is sharded part of a larger document.
@@ -153,6 +154,9 @@ class Style(proto.Message):
153154
https://www.w3schools.com/cssref/pr_text_text-decoration.asp
154155
font_size (google.cloud.documentai_v1.types.Document.Style.FontSize):
155156
Font size.
157+
font_family (str):
158+
Font family such as ``Arial``, ``Times New Roman``.
159+
https://www.w3schools.com/cssref/pr_font_font-family.asp
156160
"""
157161

158162
class FontSize(proto.Message):
@@ -207,6 +211,10 @@ class FontSize(proto.Message):
207211
number=7,
208212
message="Document.Style.FontSize",
209213
)
214+
font_family = proto.Field(
215+
proto.STRING,
216+
number=8,
217+
)
210218

211219
class Page(proto.Message):
212220
r"""A page in a [Document][google.cloud.documentai.v1.Document].
@@ -266,6 +274,8 @@ class Page(proto.Message):
266274
page.
267275
detected_barcodes (Sequence[google.cloud.documentai_v1.types.Document.Page.DetectedBarcode]):
268276
A list of detected barcodes.
277+
image_quality_scores (google.cloud.documentai_v1.types.Document.Page.ImageQualityScores):
278+
Image Quality Scores.
269279
provenance (google.cloud.documentai_v1.types.Document.Provenance):
270280
The history of this page.
271281
"""
@@ -374,7 +384,7 @@ class Layout(proto.Message):
374384
[Layout][google.cloud.documentai.v1.Document.Page.Layout]
375385
within context of the object this layout is for. e.g.
376386
confidence can be for a single token, a table, a visual
377-
element, etc. depending on context. Range [0, 1].
387+
element, etc. depending on context. Range ``[0, 1]``.
378388
bounding_poly (google.cloud.documentai_v1.types.BoundingPoly):
379389
The bounding polygon for the
380390
[Layout][google.cloud.documentai.v1.Document.Page.Layout].
@@ -520,7 +530,7 @@ class Token(proto.Message):
520530
A list of detected languages together with
521531
confidence.
522532
provenance (google.cloud.documentai_v1.types.Document.Provenance):
523-
The history of this annotation.
533+
The history of this annotation.
524534
"""
525535

526536
class DetectedBreak(proto.Message):
@@ -636,6 +646,8 @@ class Table(proto.Message):
636646
detected_languages (Sequence[google.cloud.documentai_v1.types.Document.Page.DetectedLanguage]):
637647
A list of detected languages together with
638648
confidence.
649+
provenance (google.cloud.documentai_v1.types.Document.Provenance):
650+
The history of this table.
639651
"""
640652

641653
class TableRow(proto.Message):
@@ -708,6 +720,11 @@ class TableCell(proto.Message):
708720
number=4,
709721
message="Document.Page.DetectedLanguage",
710722
)
723+
provenance = proto.Field(
724+
proto.MESSAGE,
725+
number=5,
726+
message="Document.Provenance",
727+
)
711728

712729
class FormField(proto.Message):
713730
r"""A form field detected on the page.
@@ -818,11 +835,11 @@ class DetectedLanguage(proto.Message):
818835
819836
Attributes:
820837
language_code (str):
821-
The BCP-47 language code, such as "en-US" or "sr-Latn". For
822-
more information, see
838+
The BCP-47 language code, such as ``en-US`` or ``sr-Latn``.
839+
For more information, see
823840
https://www.unicode.org/reports/tr35/#Unicode_locale_identifier.
824841
confidence (float):
825-
Confidence of detected language. Range [0, 1].
842+
Confidence of detected language. Range ``[0, 1]``.
826843
"""
827844

828845
language_code = proto.Field(
@@ -834,6 +851,56 @@ class DetectedLanguage(proto.Message):
834851
number=2,
835852
)
836853

854+
class ImageQualityScores(proto.Message):
855+
r"""Image Quality Scores for the page image
856+
857+
Attributes:
858+
quality_score (float):
859+
The overall quality score. Range ``[0, 1]`` where 1 is
860+
perfect quality.
861+
detected_defects (Sequence[google.cloud.documentai_v1.types.Document.Page.ImageQualityScores.DetectedDefect]):
862+
A list of detected defects.
863+
"""
864+
865+
class DetectedDefect(proto.Message):
866+
r"""Image Quality Defects
867+
868+
Attributes:
869+
type_ (str):
870+
Name of the defect type. Supported values are:
871+
872+
- ``quality/defect_blurry``
873+
- ``quality/defect_noisy``
874+
- ``quality/defect_dark``
875+
- ``quality/defect_faint``
876+
- ``quality/defect_text_too_small``
877+
- ``quality/defect_document_cutoff``
878+
- ``quality/defect_text_cutoff``
879+
- ``quality/defect_glare``
880+
confidence (float):
881+
Confidence of detected defect. Range ``[0, 1]`` where 1
882+
indicates strong confidence of that the defect exists.
883+
"""
884+
885+
type_ = proto.Field(
886+
proto.STRING,
887+
number=1,
888+
)
889+
confidence = proto.Field(
890+
proto.FLOAT,
891+
number=2,
892+
)
893+
894+
quality_score = proto.Field(
895+
proto.FLOAT,
896+
number=1,
897+
)
898+
detected_defects = proto.RepeatedField(
899+
proto.MESSAGE,
900+
number=2,
901+
message="Document.Page.ImageQualityScores.DetectedDefect",
902+
)
903+
837904
page_number = proto.Field(
838905
proto.INT32,
839906
number=1,
@@ -908,6 +975,11 @@ class DetectedLanguage(proto.Message):
908975
number=15,
909976
message="Document.Page.DetectedBarcode",
910977
)
978+
image_quality_scores = proto.Field(
979+
proto.MESSAGE,
980+
number=17,
981+
message="Document.Page.ImageQualityScores",
982+
)
911983
provenance = proto.Field(
912984
proto.MESSAGE,
913985
number=16,
@@ -927,14 +999,13 @@ class Entity(proto.Message):
927999
type_ (str):
9281000
Required. Entity type from a schema e.g. ``Address``.
9291001
mention_text (str):
930-
Optional. Text value in the document e.g.
931-
``1600 Amphitheatre Pkwy``. If the entity is not present in
932-
the document, this field will be empty.
1002+
Optional. Text value of the entity e.g.
1003+
``1600 Amphitheatre Pkwy``.
9331004
mention_id (str):
9341005
Optional. Deprecated. Use ``id`` field instead.
9351006
confidence (float):
936-
Optional. Confidence of detected Schema entity. Range [0,
937-
1].
1007+
Optional. Confidence of detected Schema entity. Range
1008+
``[0, 1]``.
9381009
page_anchor (google.cloud.documentai_v1.types.Document.PageAnchor):
9391010
Optional. Represents the provenance of this
9401011
entity wrt. the location on the page where it
@@ -1230,7 +1301,7 @@ class PageRef(proto.Message):
12301301
a layout element on the page.
12311302
confidence (float):
12321303
Optional. Confidence of detected page element, if
1233-
applicable. Range [0, 1].
1304+
applicable. Range ``[0, 1]``.
12341305
"""
12351306

12361307
class LayoutType(proto.Enum):

packages/google-cloud-documentai/google/cloud/documentai_v1/types/document_io.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515
#
16+
from google.protobuf import field_mask_pb2 # type: ignore
1617
import proto # type: ignore
1718

1819
__protobuf__ = proto.module(
@@ -36,7 +37,8 @@ class RawDocument(proto.Message):
3637
Inline document content.
3738
mime_type (str):
3839
An IANA MIME type (RFC6838) indicating the nature and format
39-
of the [content].
40+
of the
41+
[content][google.cloud.documentai.v1.RawDocument.content].
4042
"""
4143

4244
content = proto.Field(
@@ -113,7 +115,7 @@ class BatchDocumentsInputConfig(proto.Message):
113115
Attributes:
114116
gcs_prefix (google.cloud.documentai_v1.types.GcsPrefix):
115117
The set of documents that match the specified Cloud Storage
116-
[gcs_prefix].
118+
``gcs_prefix``.
117119
118120
This field is a member of `oneof`_ ``source``.
119121
gcs_documents (google.cloud.documentai_v1.types.GcsDocuments):
@@ -159,12 +161,22 @@ class GcsOutputConfig(proto.Message):
159161
gcs_uri (str):
160162
The Cloud Storage uri (a directory) of the
161163
output.
164+
field_mask (google.protobuf.field_mask_pb2.FieldMask):
165+
Specifies which fields to include in the output documents.
166+
Only supports top level document and pages field so it must
167+
be in the form of ``{document_field_name}`` or
168+
``pages.{page_field_name}``.
162169
"""
163170

164171
gcs_uri = proto.Field(
165172
proto.STRING,
166173
number=1,
167174
)
175+
field_mask = proto.Field(
176+
proto.MESSAGE,
177+
number=2,
178+
message=field_mask_pb2.FieldMask,
179+
)
168180

169181
gcs_output_config = proto.Field(
170182
proto.MESSAGE,

packages/google-cloud-documentai/google/cloud/documentai_v1/types/document_processor_service.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,10 @@ class ProcessRequest(proto.Message):
106106
Whether Human Review feature should be
107107
skipped for this request. Default to false.
108108
field_mask (google.protobuf.field_mask_pb2.FieldMask):
109-
Specifies which fields to include in
110-
ProcessResponse's document.
109+
Specifies which fields to include in ProcessResponse's
110+
document. Only supports top level document and pages field
111+
so it must be in the form of ``{document_field_name}`` or
112+
``pages.{page_field_name}``.
111113
"""
112114

113115
inline_document = proto.Field(

packages/google-cloud-documentai/google/cloud/documentai_v1/types/document_schema.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -63,16 +63,16 @@ class EntityType(proto.Message):
6363
and cannot be a 'Common Type'. Besides that we use the
6464
following naming conventions:
6565
66-
- *use snake_casing*
66+
- *use ``snake_casing``*
6767
- name matching is case-insensitive
6868
- Maximum 64 characters.
6969
- Must start with a letter.
7070
- Allowed characters: ASCII letters ``[a-z0-9_-]``. (For
7171
backward compatibility internal infrastructure and
7272
tooling can handle any ascii character)
73-
- The '/' is sometimes used to denote a property of a type.
74-
For example line_item/amount. This convention is
75-
deprecated, but will still be honored for backward
73+
- The ``/`` is sometimes used to denote a property of a
74+
type. For example ``line_item/amount``. This convention
75+
is deprecated, but will still be honored for backward
7676
compatibility.
7777
base_types (Sequence[str]):
7878
The entity type that this type is derived

0 commit comments

Comments
 (0)