Skip to content

Conversation

zhangqs0205
Copy link
Contributor

@zhangqs0205 zhangqs0205 commented Jun 12, 2025

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Comment on lines 23 to 34
| name | type | required | default value |
|---------------------------|---------|----------|---------------|
| server_url | string | yes | - |
| bulk_size | int | no | 50 |
| max_cache_row_size | int | no | 0 |
| consumer | string | no | - |
| entity_name | string | yes | users |
| record_type | string | yes | users |
| schema | string | yes | users |
| distinct_id_column | string | yes | - |
| identity_fields | array | yes | - |
| property_fields | array | yes | - |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content in the "Description" column is relatively complex, it is explained separately below the table. The format of Jdbc.md was referenced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhilinli123 Please review

@hailin0 hailin0 requested a review from Copilot June 13, 2025 01:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces initial support for the SensorsData connector (v2), including module setup, SDK-based writer implementation, distribution registration, end-to-end test scaffold, and documentation.

  • Added a new connector-sensorsdata module with config, sink, writer, and error handling classes
  • Registered the connector in seatunnel-dist, plugin mappings, and labeler config
  • Provided E2E test scaffold and comprehensive user documentation

Reviewed Changes

Copilot reviewed 53 out of 53 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-sensorsdata-e2e/README.md Added E2E test stub README
seatunnel-dist/pom.xml Added connector-sensorsdata as provided dependency
seatunnel-connectors-v2/pom.xml Included connector-sensorsdata module
seatunnel-connectors-v2/connector-sensorsdata/src/test/.../SensorsDataSDKFactoryTest.java Added basic factory optionRule unit test
seatunnel-connectors-v2/connector-sensorsdata/src/main/java/.../state/*.java Added state and commit-info classes
seatunnel-connectors-v2/connector-sensorsdata/src/main/java/.../sdk/sink/SensorsDataSDKWriter.java Implemented SDK-based SinkWriter
seatunnel-connectors-v2/connector-sensorsdata/src/main/java/.../sdk/sink/SensorsDataSDKSinkFactory.java Hooked into SeaTunnel TableSinkFactory
seatunnel-connectors-v2/connector-sensorsdata/src/main/java/.../sdk/sink/SensorsDataSDKSink.java Implemented SeaTunnelSink wrapper
seatunnel-connectors-v2/connector-sensorsdata/src/main/java/.../sdk/exception/*.java Defined connector-specific exceptions and error codes
seatunnel-connectors-v2/connector-sensorsdata/src/main/java/.../sdk/config/*.java Added sink options and config classes
seatunnel-connectors-v2/connector-sensorsdata/pom.xml New Maven POM for connector-sensorsdata module
plugin-mapping.properties Registered seatunnel.sink.SensorsData mapping
config/plugin_config Listed connector-sensorsdata
.github/workflows/labeler/label-scope-conf.yml Added labeler rules for connector-sensorsdata
docs/en/connector-v2/sink/SensorsData.md Written connector user guide
docs/en/connector-v2/changelog/connector-sensorsdata.md Stubbed changelog for connector-sensorsdata
Comments suppressed due to low confidence (4)

seatunnel-connectors-v2/connector-sensorsdata/src/test/java/org/apache/seatunnel/connectors/seatunnel/sensorsdata/sdk/SensorsDataSDKFactoryTest.java:29

  • [nitpick] This test only verifies optionRule is non-null; consider adding tests for write logic, record-type dispatching, and error handling to ensure full coverage.
Assertions.assertNotNull((new SensorsDataSDKSinkFactory()).optionRule());

docs/en/connector-v2/sink/SensorsData.md:28

  • The default for 'consumer' in code is 'batch'; update the default value in the docs table to 'batch' for consistency.
| consumer                  | string  | no       | -             |

seatunnel-e2e/seatunnel-connector-v2-e2e/connector-sensorsdata-e2e/README.md:1

  • [nitpick] Rename the header to '# SensorsData connector E2E test' to match the module name and avoid confusion.
# SDH connector E2E test

docs/en/connector-v2/sink/SensorsData.md:127

  • The type for 'skip_error_record' should be [boolean], not [string], to align with the code.
### skip_error_record[string]

@zhangqs0205
Copy link
Contributor Author

zhangqs0205 commented Jun 13, 2025

I have a question and would appreciate your advice:
As seen in this PR, two modules are submitted:

seatunnel-connector-v2/connector-sensorsdata
seatunnel-format/seatunnel-format-sensorsdata

Here, connector-sensorsdata depends on seatunnel-format-sensorsdata.

Our scenario is:

On our private branch, besides connector-sensorsdata, we have several transform modules. We extracted common dependencies for these modules into seatunnel-format-sensorsdata.

For the code submitted to the community, since seatunnel-format-sensorsdata is only used by connector-sensorsdata, it seems more reasonable to consolidate them into a single module. However, if we do this, each transform module in our private branch would need to duplicate these base classes, which isn’t practical.

My questions are:

  1. Should we submit seatunnel-format-sensorsdata as an independent module to the community?
  2. If it shouldn’t be a separate module, do you have any suggestions?
    One idea I have:
    Release seatunnel-format-sensorsdata as a separate Maven Central artifact.
    But this artifact would be specifically for Seatunnel since it depends on seatunnel-api.

@zhilinli123

@nielifeng nielifeng requested a review from Copilot June 16, 2025 01:38
Copilot

This comment was marked as outdated.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the new SensorsData connector (v2) for SeaTunnel, implementing end-to-end support in code, configuration, documentation, and CI.

  • Adds connector-sensorsdata as a Maven module and registers it in distribution and plugin mappings
  • Implements core connector classes: config, factory, sink, writer, state, and exceptions
  • Provides unit test skeleton, detailed documentation, changelog, and updates CI workflows and plugin configs

Reviewed Changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated no comments.

Show a summary per file
File Description
seatunnel-dist/pom.xml Added connector-sensorsdata dependency
seatunnel-connectors-v2/pom.xml Registered connector-sensorsdata module
connector-sensorsdata/src/main/java/... Implemented config, factory, sink, writer, state, exception classes
connector-sensorsdata/src/test/java/... Added basic SensorsDataSDKFactoryTest
plugin-mapping.properties Mapped SensorsData to connector-sensorsdata
docs/en/connector-v2/sink/SensorsData.md Created connector documentation and examples
docs/en/connector-v2/changelog/connector-sensorsdata.md Added changelog scaffold
config/plugin_config Added connector-sensorsdata entry
.github/workflows/labeler/label-scope-conf.yml Added labeling rules for connector-sensorsdata
.github/workflows/backend.yml Introduced CI job for sensorsdata integration tests
Comments suppressed due to low confidence (4)

docs/en/connector-v2/sink/SensorsData.md:28

  • The default value for consumer in code is "batch", but the docs show -. Please update the default in the table to batch to match the implementation.
| consumer                  | string  | no       | -             |

seatunnel-connectors-v2/connector-sensorsdata/src/test/java/org/apache/seatunnel/connectors/seatunnel/sensorsdata/sdk/SensorsDataSDKFactoryTest.java:29

  • [nitpick] The test only asserts that optionRule() is non-null. To improve coverage, add assertions that required options (e.g., SERVER_URL) are present in the returned rule.
        Assertions.assertNotNull((new SensorsDataSDKSinkFactory()).optionRule());

.github/workflows/backend.yml:1460

  • The CI step refers to :connector-sensorsdata-e2e, but no such module was added. Either add the corresponding e2e test module or remove/update this CI job reference.
          ./mvnw -B -T 1 verify -DskipUT=true -DskipIT=false -D"license.skipAddThirdParty"=true -D"skip.ui"=true --no-snapshot-updates -pl :connector-sensorsdata-e2e -am -Pci

seatunnel-connectors-v2/connector-sensorsdata/src/main/java/org/apache/seatunnel/connectors/seatunnel/sensorsdata/sdk/exception/SensorsDataConnectorErrorCode.java:24

  • This enum constant UNKNOWN_RECORD_TYPE is never referenced in the code; consider removing it or aligning with used error codes to avoid dead entries.
    UNKNOWN_RECORD_TYPE("SENSORS_DATA-02", "Unknown record type"),

@zhangqs0205 zhangqs0205 marked this pull request as ready for review June 19, 2025 01:45
@zhilinli123
Copy link
Contributor

Author

I have a question and would appreciate your advice: As seen in this PR, two modules are submitted:

seatunnel-connector-v2/connector-sensorsdata
seatunnel-format/seatunnel-format-sensorsdata

Here, connector-sensorsdata depends on seatunnel-format-sensorsdata.

Our scenario is:

On our private branch, besides connector-sensorsdata, we have several transform modules. We extracted common dependencies for these modules into seatunnel-format-sensorsdata.

For the code submitted to the community, since seatunnel-format-sensorsdata is only used by connector-sensorsdata, it seems more reasonable to consolidate them into a single module. However, if we do this, each transform module in our private branch would need to duplicate these base classes, which isn’t practical.

My questions are:

  1. Should we submit seatunnel-format-sensorsdata as an independent module to the community?
  2. If it shouldn’t be a separate module, do you have any suggestions?
    One idea I have:
    Release seatunnel-format-sensorsdata as a separate Maven Central artifact.
    But this artifact would be specifically for Seatunnel since it depends on seatunnel-api.

@zhilinli123

PTAL : @hailin0 @Hisoka-X

@Hisoka-X
Copy link
Member

Please move the code of seatunnel-format-sensorsdata to connector-sensorsdata

@zhangqs0205
Copy link
Contributor Author

zhangqs0205 commented Jun 24, 2025

Please move the code of seatunnel-format-sensorsdata to connector-sensorsdata

Indeed, based on the currently submitted code, moving seatunnel-format-sensorsdata into connector-sensorsdata appears to be correct.

However, in our internal branch, several other transforms also depend on seatunnel-format-sensorsdata. As a common dependency, keeping it as a separate module would be easier to maintain.

If we move it, merging new community code into our internal branch in the future might become more difficult. Do you have any good suggestions?

@Hisoka-X

@Hisoka-X
Copy link
Member

Please move the code of seatunnel-format-sensorsdata to connector-sensorsdata

Indeed, based on the currently submitted code, moving seatunnel-format-sensorsdata into connector-sensorsdata appears to be correct.

However, in our internal branch, several other transforms also depend on seatunnel-format-sensorsdata. As a common dependency, keeping it as a separate module would be easier to maintain.

If we move it, merging new community code into our internal branch in the future might become more difficult. Do you have any good suggestions?

@Hisoka-X

You can add the dependency connector-sensorsdata to your own transform module pom.

@github-actions github-actions bot removed the format label Jun 30, 2025
@zhangqs0205
Copy link
Contributor Author

Please move the code of seatunnel-format-sensorsdata to connector-sensorsdata

Indeed, based on the currently submitted code, moving seatunnel-format-sensorsdata into connector-sensorsdata appears to be correct.
However, in our internal branch, several other transforms also depend on seatunnel-format-sensorsdata. As a common dependency, keeping it as a separate module would be easier to maintain.
If we move it, merging new community code into our internal branch in the future might become more difficult. Do you have any good suggestions?
@Hisoka-X

You can add the dependency connector-sensorsdata to your own transform module pom.

Moved

@@ -0,0 +1,2 @@
# SDH connector E2E test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.

@@ -1435,3 +1435,28 @@ jobs:
./mvnw -B -T 1 verify -DskipUT=true -DskipIT=false -D"license.skipAddThirdParty"=true -D"skip.ui"=true --no-snapshot-updates -pl :connector-redis-e2e -am -Pci
env:
MAVEN_OPTS: -Xmx4096m

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update

modules_arr.remove("connector-jdbc-e2e")

@zhangqs0205 zhangqs0205 requested a review from Hisoka-X July 3, 2025 02:50
@Hisoka-X
Copy link
Member

Hisoka-X commented Jul 3, 2025

wating test case passed.

@zhangqs0205
Copy link
Contributor Author

wating test case passed.

passed

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhangqs0205

@corgy-w corgy-w merged commit bb53f77 into apache:dev Aug 14, 2025
6 checks passed
xiaochen-zhou added a commit to TongchengOpenSource/seatunnel that referenced this pull request Aug 17, 2025
1

1

1

1

1

update doc

[Improve][Csv] support configurable CSV delimiter in file connector (apache#9660)

[Fix][Doc] Fix rest api finished-jobs doc miss `SAVEPOINT_DONE` (apache#9676)

[Fix] [connector-jdbc] prevent precision loss in Float to BigDecimal conversion (apache#9670)

[Feature][Connector-File-Hadoop]Support multi table sink feature for HdfsFile (apache#9651)

[HotFix][CI] The Some CI subtasks are not executed (apache#9684)

[Docs][Connector-Iceberg] Update Iceberg doc with support S3 Tables rest catalogs (apache#9686)

[Docs] update jdbc related document (apache#9679)

[Fix][Connector-V2] Fix misleading parameter name in DorisStreamLoad (apache#9685)

[Feature] [connector-file] Add configurable sheet_max_rows support for Excel sink connector (apache#9668)

[fix][connectors-v2] repeated commit cause task exceptions (apache#9665)

[Feature][connector-v2] Add Sensorsdata Connector Support apache#9323 (apache#9432)

[Improve][API] Optimize the enumerator API semantics and reduce lock calls at the connector level (apache#9671)

[Hotfix][Zeta] Fix custom https port configuration not working (apache#9705)

[Improve][Doc] Add SeaTunnel tools into Readme (apache#9707)

[Improve][Core] Unify the aws-sdk-v2 version to 2.31.30 (apache#9698)

[Feature][Transform-V2] Add `TRIM_SCALE` function for sql transform (apache#9700)

[Improve][Core] Update apache common to apache common lang3 (apache#9694)

Co-authored-by: litiliu <[email protected]>
HighandLight pushed a commit to HighandLight/seatunnel that referenced this pull request Aug 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants