Skip to content

Conversation

@zhengchenyu
Copy link
Contributor

After TEZ-4542, app may run into an issue of real small sortspan (per record in this case), eventually the job failed due to timeout.
Here, fix int overflow problem in another way.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 27m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 16m 8s master passed
+1 💚 compile 0m 32s master passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 compile 0m 32s master passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+1 💚 checkstyle 1m 24s master passed
+1 💚 javadoc 0m 42s master passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 javadoc 0m 26s master passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+0 🆗 spotbugs 1m 40s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 38s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 24s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 javac 0m 25s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+1 💚 javac 0m 21s the patch passed
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05
+1 💚 findbugs 1m 2s the patch passed
_ Other Tests _
+1 💚 unit 5m 50s tez-runtime-library in the patch passed.
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
58m 33s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-367/1/artifact/out/Dockerfile
GITHUB PR #367
JIRA Issue TEZ-4577
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux a8af2c083205 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 174d4e3
Default Java Private Build-1.8.0_422-8u422-b05-1~22.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~22.04-b05
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-367/1/testReport/
Max. process+thread count 1100 (vs. ulimit of 5500)
modules C: tez-runtime-library U: tez-runtime-library
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-367/1/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@zhengchenyu
Copy link
Contributor Author

@abstractdog @yigress @rbalamohan Can you please review this pr?

@zhengchenyu
Copy link
Contributor Author

@abstractdog I see TEZ-4542 have merged to release-0.10.4-rc0. I think this pr should also be merged into release-0.10.4-rc0.

@yigress
Copy link

yigress commented Aug 23, 2024

+1 LGTM

@zhengchenyu
Copy link
Contributor Author

@abstractdog Hi, how about review this PR? Since TEZ-4542 may cause performance degradation in some scenarios, we should merge this to fix.

@abstractdog
Copy link
Contributor

abstractdog commented Dec 12, 2024

thanks a lot @zhengchenyu for taking care of this
I believe this is almost ready to go in, let me ask one more thing
so I just confirmed that the unit test added with TEZ-4542 indeed reproduces the issue, which remains solved with reverting TEZ-4542 + applying this(long) cast
what is strange is without the patch, the full trace of the IllegalArgumentException is not visible, I can only see:

[ERROR] org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory  Time elapsed: 1.584 s  <<< ERROR!
java.lang.IllegalArgumentException
	at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory(TestPipelinedSorter.java:878)

[INFO]
[INFO] Results:

can you please check if this can be easily solved in the scope of this patch? (or can you see the same on your machine?)
thanks in advance!

@zhengchenyu
Copy link
Contributor Author

@abstractdog
No need to revert TEZ-4542. This PR can be understood as a solution to the problem described in TEZ-4542 in another way. From another perspective, this PR has actually reverted TEZ-4542.

In my pc, without TEZ-4542 and TEZ-4577, testWithLargeRecordAndLowMemory will fail, the error log are below:

java.lang.IllegalArgumentException: newPosition > limit: (16777216 > 1048576)

	at java.nio.Buffer.createPositionException(Buffer.java:269)
	at java.nio.Buffer.position(Buffer.java:244)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.<init>(PipelinedSorter.java:952)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:361)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:434)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:390)
	at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory(TestPipelinedSorter.java:878)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)

@abstractdog
Copy link
Contributor

@abstractdog No need to revert TEZ-4542. This PR can be understood as a solution to the problem described in TEZ-4542 in another way. From another perspective, this PR has actually reverted TEZ-4542.

In my pc, without TEZ-4542 and TEZ-4577, testWithLargeRecordAndLowMemory will fail, the error log are below:

java.lang.IllegalArgumentException: newPosition > limit: (16777216 > 1048576)

	at java.nio.Buffer.createPositionException(Buffer.java:269)
	at java.nio.Buffer.position(Buffer.java:244)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.<init>(PipelinedSorter.java:952)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:361)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:434)
	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:390)
	at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory(TestPipelinedSorter.java:878)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)

okay, thanks for clarifying, in this case, problem is on my side :)
agree, no matter if we call this revert or not, fixed the overflow issue
thanks for that!

+1

@abstractdog abstractdog merged commit d84fdca into apache:master Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants