Skip to content

Releases: apache/uniffle

Release v0.10.0

25 Sep 08:00

Choose a tag to compare

Highlight

  • GRPC_NETTY mode now is GA (that is enabled by default), better performance and less GC overhead
  • Partition reassign for Spark now is GA to improve shuffle writing stability
  • Partition split for Spark is introduced to improve huge partition shuffle writing speed
  • Remote merge is supported for MapReduce and Tez
  • Impressive shuffle write + read performance improvements by the overlapping compression/decompression
  • Spark Uniffle UI is introduced for better observation

What's Changed

  • [#1626] fix(server): Remove the meaningless eventOfUnderStorageManagers cache by @rickyma in #1627
  • build(deps): bump h2 from 0.3.21 to 0.3.26 in /rust/experimental/server by @dependabot[bot] in #1625
  • [#1596][FOLLOWUP] fix(netty): Send failed responses only when the channel is writable by @rickyma in #1641
  • [#1459][FOLLOWUP] improvement(server): Print an error log when an event is dropped by @rickyma in #1643
  • chore(rust): disable flaky test of test_ticket_manager by @zuston in #1637
  • [#1634] fix(server): remove app folder if app is expired by @zuston in #1635
  • [#1459][FOLLOWUP] fix(server): Fix the issue of log variable printing by @rickyma in #1648
  • [#1645] feat(server): Add gauge metrics for reading localfile data by @rickyma in #1646
  • [MINOR] Remove meaningless string concatenation by @rickyma in #1647
  • [#1087] feat(spark): Support dynamic allocation for Gluten Uniffle by @summaryzb in #1649
  • [#1629] fix(operator): ShuffleServer cannot be deleted even though there are no more application. by @zhengchenyu in #1630
  • [#1608][part-3] feat(spark3): support reading partition data from multiple reassignment servers by @zuston in #1615
  • [#1607] docs: Performance report with partial TPC-DS(SF=40000) queries by @rickyma in #1650
  • [#1655] Bump version to 0.10.0-SNAPSHOT by @EnricoMi in #1656
  • [#1607][FOLLOWUP] docs(benchmark): Performance report with partial TPC-DS(SF=40000) queries by @rickyma in #1661
  • [#1662] fix(test): Fix Netty related tests by @rickyma in #1663
  • [#1664] improvement(docs): Update the descriptions and default values of outdated conf by @rickyma in #1665
  • [#378][FOLLOWUP] fix(server): Fix huge_partition_num metric by @rickyma in #1669
  • [#1341] fix(mr): Fix MR Combiner ArrayIndexOutOfBoundsException Bug. by @qijiale76 in #1666
  • [#1585] feat(server): Support app-level block size statistics to report metrics by @leslizhang in #1593
  • [MINOR] chore(rust): disable flaky test of local_store_test by @zuston in #1674
  • [#1657] build: Add license information after version 0.9.0 by @jerqi in #1671
  • [#1543] improvement(spark): Optimize shuffle reading when both sort and combine are used. by @qijiale76 in #1640
  • [#1675] fix(test): Fix tests which may be flaky on different machines by @rickyma in #1676
  • [#1682] feat(server): Introduce localfile isWritable metric by @zuston in #1683
  • [#1608][part-5] feat(spark3): always use the available assignment by @zuston in #1652
  • [#1678] fix(server): disk size leak on removing resources by AppPurgeEvent by @zuston in #1679
  • [#1684] fix(server): use the diskSize obtained from periodic check to determine whether is writable by @xianjingfeng in #1685
  • [#1680] improvement(server): Remove partial HDFS files that written by server self for expired apps by @xianjingfeng in #1681
  • [#1686] feat(netty): Support pending tasks number metrics for Netty EventLoopGroup by @rickyma in #1687
  • [#1538] feat(spark): report blockIds to spark driver optionally by @zuston in #1677
  • [MINOR] docs: modify the default value of rss.coordinator.select.partition.strategy by @xianjingfeng in #1692
  • [MINOR] fix(typo): Correct the removeShuffle method name by @rickyma in #1697
  • [#1594] improvement(client):support generating larger block size during shuffle map task by spill partial partitions data by @leslizhang in #1670
  • [#1703] improvement(server): Bump gRPC from 1.61.1 to 1.63.0 by @rickyma in #1704
  • [#1701] improvement(Netty): Bump Netty from 4.1.106.Final to 4.1.109.Final by @rickyma in #1702
  • [#1608][part-6] improvement(spark): verify the sent blocks count by @zuston in #1690
  • [#1675][FOLLOWUP] fix(test): Fix flaky tests which may cause port conflicts by @rickyma in #1696
  • [#1608][part-7] improvement(doc): add doc and optimize reassign config options by @zuston in #1693
  • [#1709] feat(coordinator): Introduce pluggable ClientConfApplyManager for fetchClientConf rpc by @zuston in #1710
  • [#1699] improvement: Upgrade from commons-collections:commons-collections:3.2.2 to org.apache.commons:commons-collections:4.4 by @rickyma in #1700
  • [#1673] fix(K8S): Fix the deployment of stable version K8S cluster by @jerqi in #1694
  • [#519][FOLLOWUP] Speed up ConcurrentHashMap#computeIfAbsent by @rickyma in #1719
  • [#1721] fix(coordinator): classCastExpection of boolean->String with yaml style remote client conf by @zuston in #1722
  • [MINOR] fix(docs): Update outdated config: rss.writer.send.check.timeout -> rss.client.send.check.timeout.ms by @rickyma in #1734
  • [#1711] feat(server): Introduce the reconfigurable conf by @zuston in #1712
  • [#1728] feat(server): Introduce disks timeout metrics by @rickyma in #1729
  • [#1698] fix(test): Increase tests running memory for better stability by @rickyma in #1726
  • [#1731] improvement(server): Bump gRPC from 1.63.0 to 1.64.0 by @rickyma in #1732
  • [#1675][FOLLOWUP] fix(test): Fix various flaky tests by @rickyma in #1730
  • [#1706] improvement: Upgrade the default NodeJS and npm versions of dashboard. by @yl09099 in #1707
  • [MINOR] fix(server): Use warn log when unable to acquire memory by @rickyma in #1733
  • [#1698][FOLLOWUP] fix(test): Increase stability of tests by @rickyma in #1739
  • [#1711][FOLLOWUP] fix(server): Avoid outputing too much incorrect non-update logs by @zuston in #1737
  • [#1149] fix: GC logs in JDK 11 do not include date and time stamps. by @qijiale76 in #1240
  • [#1675][FOLLOWUP] fix(test): Explicitly close resources to avoid unexcepted behaviors by @rickyma in #1740
  • [#1711][FOLLOWUP] feat(coordinator): refactor the reconfigurable conf by @zuston in #1741
  • [#1743] fix: Add exception handling for thread pools by @rickyma in #1744
  • [#1755] fix(spark): Avoid task failure of inconsistent record number by @zuston in #1756
  • [#1757] feat(server): Add block number check on getting shuffle result by @zuston in #1758
  • [#1711] feat(server): Make the health checker execution timeout reconfigurable by @zuston in #1754
  • [MINOR] fix(dashboard): Display the registration time in 24-hour format by @rickyma in #1752
  • [#1699][FOLLOWUP] fix(client): Add commons-collections4 dependencies in shaded clients by @rickyma in #1742
  • [#1755][FOLLOWUP] fix(server): Incorrect request info to output too much logs by @zuston in #1760
  • [#1764] fix(client): Fix timeout time unit for unregister requests by @EnricoMi in #1766
  • [#1717] improvement: Flush a part of partitions if the shuffle size too big by @xianjingfeng in #1718
  • [#1727] improvement(server): Introduce block num threshold for early buffer flush to mitigate GC issues by @rickyma in #1759
  • [#1767] feat(netty): The client-side supports choosing Netty's ByteBufAllocator by @rickyma in #1768
  • [#1608] improvement(spark3): Output more task level infos in driver side when reassigning on block sent failure by @zuston in #1771
  • [#1608]...
Read more

Release v0.9.2

09 Jan 02:55

Choose a tag to compare

  • [MINOR] community: Add security.md (#2268)
  • [#1900] improvement: Rename DISCLAIMER-WIP to DISCLAIMER (#1974)
  • Update create-package.sh (#2019)
  • [MINOR] Correct the NOTICE and LICENSE (#2271)
  • [MINOR] Add left licenses (#2272)
  • [#2304] improvement: Update the year of NOTICE (#2313)
  • [#2305] improvement: Add copyright for PingCAP (#2312)
  • [#2306] improvement: Remove repeated notice information (#2311)
  • [#2307] license: Add left jars with Apache license (#2310)

Release v0.9.1

05 Dec 09:38
ca5d32b

Choose a tag to compare

Highlight

  • Optimized dashboard.
  • Optimized or fixed logs, startup scripts, compilation scripts, etc.
  • Reduce block id layout limitations and simplify layout configuration for MR/Tez.

ChangeLog

  • [MINOR] fix(client/netty): ShuffleServerGrpcNettyClient missing to send shuffleId and partitionIds for requirePreAllocation request (#2053)
  • [#1398] fix(mr)(tez): Make attempId computable and move it to taskAttemptId in BlockId layout. (#2027)
  • [MINOR] docs: Fix docs yaml parse error about dashboard_guide.md (#1981)
  • [MINOR] fix(test): fix flaky test ServletTest.testUnhealthyNodesServlet (#1952)
  • [MINOR] fix(test): fix flaky test ShuffleServerOnRandomPortTest (#1953)
  • [#1910] fix: Remove the method name from the log (#1911)
  • [MINOR] Improvement(dashboard): Support display human-readable time format for app page (#2011)
  • [MINOR] improvement(client): Rename rss.shade.packageName from org.apache.uniffle to org.apache.uniffle.shaded (#1883)
  • replace netcat to netcat-openbsd in Dockerfile (#1950)
  • [#1982] fix(build): specify maven.compiler.release while JDK version greater than 8 (#1983)
  • [#1149][FOLLOWUP] fix(coordinator): Fix coordinator startup issues (#1902)
  • [#1826][FOLLOWUP] fix(build): Revert incorrect shift statements deletion in build_distribution.sh (#1830)
  • [#1818] fix(spark3): Avoid calling RssShuffleDataIterator.cleanup multiple times (#1819)
  • [MINOR] fix(docs) Correct the example of decommission interface (#1777)
  • [#1699][FOLLOWUP] fix(client): Add commons-collections4 dependencies in shaded clients (#1742)
  • [MINOR] fix(dashboard): Display the registration time in 24-hour format (#1752)
  • [#1743] fix: Add exception handling for thread pools (#1744)
  • [#1698][FOLLOWUP] fix(test): Adjust jvm opts to increase stability of tests (#1739)
  • [#1698] fix(test): Increase tests running memory for better stability (#1726)

Release v0.9.0

16 Jul 08:34
4944d54

Choose a tag to compare

Apache Uniffle (Incubating) Release v0.9.0

Highlight

  • Introduce dashboard.
  • Introduce rust-based shuffle server.
  • Add support for Spark 3.5.
  • The data transportation Netty mode is production available.
  • Reduce block id layout limitations and simplify layout configuration for Spark.

ChangeLog

  • [#1751][0.9] improvement: support gluten (#1753)
  • [#1764] fix(client): Fix timeout time unit for unregister requests (#1766)
  • [#1149] fix: GC logs in JDK 11 do not include date and time stamps. (#1240)
  • [#1675][FOLLOWUP] fix(test): Fix various flaky tests (#1730)
  • [MINOR] fix: Update outdated config: rss.writer.send.check.timeout -> rss.client.send.check.timeout.ms (#1734)
  • [#1721] fix(coordinator): classCastExpection of boolean->String with yaml style remote client conf (#1722)
  • [#1673] fix(K8S): Fix the deployment of stable version K8S cluster (#1694)
  • [#1675][FOLLOWUP] fix(test): Fix flaky tests which may cause port conflicts (#1696)
  • [MINOR] fix(typo): Correct the removeShuffle method name (#1697)
  • [MINOR] docs: modify the default value of rss.coordinator.select.partition.strategy in docs (#1692)
  • [#1680] improvement(server): Remove partial HDFS files that written by server self for expired apps (#1681)
  • [#1675] fix(test): Fix tests which may be flaky on different machines (#1676)
  • [#1684] fix(server): use the diskSize obtained from periodic check to determine whether is writable (#1685)
  • [#1678] fix(server): disk size leak on removing resources by AppPurgeEvent (#1679) (#1689)
  • [#1657] build: Add license information after version 0.9.0 (#1671)
  • [MINOR] chore(rust): disable flaky test of local_store_test (#1674)
  • [#1459][FOLLOWUP] fix(server): Fix the issue of log variable printing (#1672)
  • [#1459][FOLLOWUP] improvement(server): Print an error log when an event is dropped (#1643)
  • [#1341] fix(mr): Fix MR Combiner ArrayIndexOutOfBoundsException Bug. (#1666)
  • [#378][FOLLOWUP] fix(server): Fix huge_partition_num metric (#1669)
  • [#1662] fix(test): Fix Netty related flaky tests (#1663)
  • [#1629] fix(operator): Support parsing NaN float value in metrics (#1630)
  • [#1634] fix(server): remove app folder if app is expired (#1635)
  • [MINOR] chore(rust): disable flaky test of test_ticket_manager (#1637)
  • [#1596][FOLLOWUP] fix(netty): Send failed responses only when the channel is writable (#1641)
  • [#1626] fix(server): Remove the meaningless eventOfUnderStorageManagers cache (#1627)
  • [#1631] fix(server): ShuffleTaskInfo may leak when app is removed. (#1632)
  • [#1373][FOLLOWUP] fix(spark): register with incorrect partitionRanges after reassign (#1612)
  • [#1608][part-2] fix(spark): avoid releasing block in advance when enable block resend (#1610)
  • [#1606] feat(client): Add client retry mechanism for NO_BUFFER when reading data(memory/local/index) (#1616)
  • [#1608][part-1] fix(spark): Only share the replacement servers for faulty servers in one stage (#1609)
  • [#1373][FOLLOWUP] fix(spark): shuffle manager rpc service invalid when partition data reassign is enabled (#1583)
  • [#1596] fix(netty): Use a ChannelFutureListener callback mechanism to release readMemory (#1605)
  • [#1598] fix(server) Fix inaccurate used_direct_memory_size metric (#1599)
  • [#1472][FOLLOWUP] improvement(server): Release memory more accurately when failing to cache shuffle data (#1597)
  • [MINOR] refactor: Calling lock() method outside try block to avoid unnecessary errors (#1590)
  • [#1591] feat(spark): Support Spark 3.5.1 (#1592)
  • [#1586] improvement(netty): Allow Netty Worker thread pool size to dynamically adapt to the number of processor cores (#1587)
  • [#1588] improvement(server): Add exception handling for the thread pool when flushing events (#1589)
  • [#1576] feat(doc): server deploy guide without hadoop-home env (#1577)
  • [#1571] fix(server): Memory may leak when EventInvalidException occurs (#1574)
  • [#1373][FOLLOWUP] fix(spark): incorrect partition id type (#1582)
  • [#1373][FOLLOWUP] fix(spark3):Add client type when request shuffle assignment (#1580)
  • build(deps): bump google.golang.org/protobuf from 1.28.0 to 1.33.0 (#1575)
  • [#1554] feat(spark): Fetch dynamic client conf as early as possible (#1557)
  • [#1572] fix(spark): Exceptions might be discarded when spilling buffers (#1573)
  • [#1564] fix(server): disk health check invalid when hang (#1568)
  • [#731][FOLLOWUP] feat(Spark): Configure blockIdLayout for Spark based on max partitions (#1566)
  • [#1567] fix(spark): Let Spark use its own NettyUtils (#1565)
  • [#1569] fix(rust): flaky test for test_ticket_manager (#1570)
  • [MINOR] improvement(test): A better computation logic for WriteAndReadMetricsTest without using reflection (#1563)
  • [#731] feat(spark): Make blockid layout configurable for Spark clients (#1528)
  • [#808] improvement(spark): Verify the number of written records to ensure data correctness (#1558)
  • [MINOR] improvement(client): Override getClientInfo method in ShuffleServerGrpcNettyClient and remove unused getDesc method (#1559)
  • [#1552] improvement: Migrate from log4j1 to log4j2 (#1553)
  • [#1472][part-6] FOLLOWUP: Fix Netty transport time when sending shuffle data requests (#1551)
  • [#134][FOLLOWUP] improvement(spark2): Use taskId and attemptNo as taskAttemptId (#1544)
  • [#1549] fix(common): Uniformly throw RssException for external callers (#1550)
  • [MINOR] test: Use sensible partition ids in ShuffleReadClientImplTest (#1545)
  • [#1546] fix(spark): NPE could happen before uncompressing after #1360 (#1547)
  • feat(docker): Add example docker compose Uniffle/Spark cluster (#1532)
  • [#1472][part-6] fix(netty): Make UTs truly test Netty mode (#1540)
  • [MINOR] improvement(tez): Only invoking LOG.debug when LOG.isDebugEnabled is true (#1541)
  • [#1459] fix(server): Memory leak for exceptional scenarios when flushing events (#1537)
  • [#1472] fix(client): IlegalReferenceCountException for clientReadHandler.readShuffleData (#1536)
  • [#1472][part-5] Use UnpooledByteBufAllocator to fix inaccurate usedMemory issue causing OOM (#1534)
  • [MINOR] refactor(common): Move blockId bit logic into common class (#1527)
  • [#1373][part-1] feat(spark): partition write to multi servers leveraging from reassignment mechanism (#1445)
  • [MINOR] Update dashboard pom.xml to take arguments for node and npm download locations (#1530)
  • [#1316] improvement(spark): detect OutputTracker API version via Spark version (#1317)
  • [#134] improvement(spark3): Use taskId and attemptNo as taskAttemptId (#1529)
  • [MINOR] feat(build): Allow to build distribution without some modules (#1525)
  • [#1407] fix(rust): use grpc runtime worker threads and adjust default runtime config (#1517)
  • [#1407] feat(rust): fix + add total grpc request metrics (#1516)
  • [#1407] chore(rust): add cpu profile doc (#1515)
  • [#1472][part-2] fix(server): Reuse ByteBuf when decoding shuffle blocks instead of reallocating it (#1521)
  • [MINOR] fix(CI): Improve dashboard across the CI (#1526)
  • [#1472][part-3] fix(client): Fix occasional IllegalReferenceCountException issues in extremely rare scenarios (#1522)
  • [MINOR] fix(pom): Add missing shuffle-server dependencies to work with -Ptez
  • [#1472][part-4] feature(server): Add metrics for Netty's pinnedDirectMemory and usedDirectMemory (#1524)
  • [#1472][part-1] fix(server): Upgrade Netty and GRPC (#1520)
  • [MINOR] fix(deploy): Fix invocation of kubernetes bash scripts (#1513)
  • [#1476] feat(rust): Provide dedicated unregister app rpc interface (#1511)
  • [#1476] feat(spark): Provide dedicated unregister app rpc interface (#1510)
  • [MINOR] improvement(CI): Rework build and rust workflow events (#1508)
  • [#1407] fix(rust): drop events and release memory when errors happened (#1509)
  • [#1267][FOLLOWUP] improvement(client): INFO log level should be used in RetryUtils (#1500)
  • [MINOR] feat(CI): Report test results in github comments (#1506)
  • [#1407] fix(rust): return error when getting data from hdfs by client (#1507)
  • [#1501] fix(server): storage selection cache accidentally deleted when clearing stage level data. (#1505)
  • [#1407] fix(rust): dont panic when no available local disks (#1504)
  • fix(rust): avoid checking storage type in runtime (#1503)
  • [MINOR] build: Move dashboard module into profile and disable it by default (#1498)
  • [#1497] improvement(spark): flushing buffer if the memoryUsed of the first record of WriterBuffer larger than bufferSize (#1485)
  • [MINOR] improvement(test): Identify duplicate blocks in TestUtils.validateResult (#1495)
  • [MINOR] fix: Get and increment ATOMIC_LONG in that order everywhere (#1496)
  • [MINOR] docs: Improve comment on blockId structure (#1492)
  • [MINOR] fix(server): Assert actual number of bitmaps matches bitNum (#1493)
  • [#1490] improvement(spark3): Disable dynamic allocation shuffle tracking by default (#1491)
  • [#1407] feat(rust): support more metrics about disk and topN data size (#1488)
  • [#1407] feat(rust): support multiple spill policies and simplify hdfs config (#1487)
  • [#1356] feat(server): improve expired buffers metric and log (#1469)
  • [#1464][FOLLOWUP] improvement(spark): print abnormal shuffle servers that blocks fail to send (#1473)
  • [#1467] feat(server): introduce total hdfs write data size for huge partition (#1468)
  • [#1355] fix(client): Netty client will leak when decoding responses (#1455)
  • [#1462] fix(server): Memory may leak when flushQueue is full (#1463)
  • [#1466] feat(server): introduce the JvmPauseMonitor to detect the gc pause (#1470)
  • [#1459] improvement(server): refactor DefaultFlushEventHandler and support event retry into pending queue (#1461)
  • [#1464] improvement(spark): print abnormal shuffle servers that blocks fail to send (#1465)
  • [#1456] improvement(client): Better exception handling when calling requireBuffer using GRPC (#1457)
  • [#1428] fix(server): fallback invalid when local storage can't write (#1429)
  • [#1453] improvement: Force to use the UNIX line...
Read more

Release v0.8.0

13 Dec 02:58
aa25cfa

Choose a tag to compare

Apache Uniffle (Incubating) Release v0.8.0

Highlight

  • Support TEZ
  • Introduce Netty for shuffle data transmission
  • Use off heap memory to store shuffle data.
  • Introduce REST API for cluster management.
  • Introduce command line for cluster management.

ChangeLog

  • Change license owner to ASF by @kaijchen in #5
  • Trivial code improvements by @wForget in #7
  • [Minor] Store shuffleId int to be consistent with other data structure by @zuston in #10
  • Introduce the asList method in ConfigOptions by @zuston in #9
  • Rename package by @jerqi in #6
  • Minimize apache-rat excluded files by @kaijchen in #11
  • Update module names by @kaijchen in #12
  • Covert PartitionAssignmentInfo to static inner class by @pan3793 in #15
  • [Followup] Migrate to Junit5 by @zuston in #14
  • [Bug] Fix NPE problem when process the event if application was cleared already by @colinmjj in #16
  • [CI] Enable codecov report by @kaijchen in #17
  • Correct the config description and fix typo by @zuston in #19
  • Add CI and Codecov badges in README by @kaijchen in #20
  • [Followup] Use asList method in some existing configOptions by @zuston in #18
  • Move rss-integration-spark-common-test module package by @wForget in #23
  • [INFRA] Improve asf.yaml to reduce the notifications by @jerryshao in #25
  • [TEST] Improve code coverage in rss-common by @kaijchen in #26
  • Remove redundant package by @wForget in #27
  • [CI] Switch to temurin JDK by @kaijchen in #24
  • [INFRA] Improve asf.yaml to reduce the notifications (another-try) by @jerryshao in #33
  • Bump commons-lang3 from 3.5 to 3.10 by @wForget in #28
  • Fix the log of incorrectly bound class by @wForget in #35
  • [TYPO] Fix misspelled word "integration" by @kaijchen in #34
  • Fix some hyperlink in README.md by @daugraph in #32
  • Upgrade gRPC to support Apple Silicon by @pan3793 in #13
  • Allow to specify custom tags to decide the assignment of servers by @zuston in #30
  • Optimize the bash script by @zuston in #29
  • [Improvement] reduce compiler warnings by @advancedxy in #46
  • [Chore]: document update and build time optimize by @advancedxy in #45
  • Supplement doc about assignment tags by @zuston in #47
  • [Bug] Fix skip() api maybe skip unexpected bytes which makes inconsistent data by @colinmjj in #40
  • [improvement] Remove experimental feature with ShuffleUploader by @colinmjj in #51
  • [Improvement] Provides utility classes for creating thread factories by @smallzhongfeng in #49
  • Enable spotbugs and fix high priority bugs by @kaijchen in #38
  • [CI] Change default checkstyle severity to error by @kaijchen in #57
  • [Style] Check indentation by @kaijchen in #56
  • [Experimental Feature] MR Supports Remote Spill by @frankliee in #55
  • [Improvement] Log indicate the shuffle server host:port when doing re… by @zuston in #58
  • Send commit concurrently in client side by @zuston in #59
  • Explicitly set the constructor with AccessManager when extending AccessChecker by @zuston in #43
  • [DOC] Replace Firestorm with Uniffle by @jerqi in #60
  • Introduce the extraProperties to support user-defined pluggable accessCheckers by @zuston in #42
  • Log enhancement: Merge multiple logs into oneline and add more description by @zuston in #62
  • [TEST] Add more unit tests in rss-common by @kaijchen in #63
  • [MINOR] Comments of PartitionBalanceAssignmentStrategy miss byte units by @smallzhongfeng in #68
  • [Minor] Make config keys and default values finalized by @kaijchen in #70
  • [Log Improvment] Add more detailed debug info for MR client by @frankliee in #84
  • [Improvement] Shutdown the grpc executors pool when closing by @zuston in #83
  • Log enhancement: return error message when getting assignment servers and log exception when initializing by @zuston in #64
  • [ISSUE-48] [Feature] Init Kubernetes operator directory by @jerqi in #75
  • [Improvement] No need to use synchronized lock of the method scope when getting client by @zuston in #82
  • [DOC] Remove Wechat group in README by @jerqi in #88
  • [Performance Optimization] Improve the speed of writing index file in shuffle server by @zuston in #91
  • [DOC] Update title and description in README by @kaijchen in #94
  • [Improvement] ShuffleBlock should be release when finished reading by @xianjingfeng in #74
  • [IMPROVEMENT][COMMON] Fix common module code style by @jerqi in #99
  • [Improvement]LocalStorage init use multi thread #71 by @xianjingfeng in #72
  • [Improvement] Use OR operation instead of serialization for cloning BitMaps by @kaijchen in #103
  • [Improvement] Ignore partial failure on initializing local storage in shuffle server side by @zuston in #102
  • [CI] Test compile in Java 11 and Java 17 by @kaijchen in #105
  • Sleep less time but try more times when stopping by @xianjingfeng in #112
  • [Improvement] Use ConfigBuilder to rewrite the class RssSparkConfig by @smallzhongfeng in #104
  • [Improvement] Introduce config to customize assignment server numbers in client side by @zuston in #100
  • Assign partition again if registerShuffleServers failed by @xianjingfeng in #115
  • [ISSUE-106][IMPROVEMENT] Set rpc timeout for all rpc interface by @xianjingfeng in #113
  • [MINOR][IMPROVEMENT] Avoid CoordinatorServer#initialization multiple new Configuration() by @zwangsheng in #118
  • [Improve] Remove useless server id from StorageManagerFactory#createStorageManager by @zwangsheng in #119
  • [MINOR][IMPROVEMENT][COORD] Fix coordinator module code style by @jerqi in #122
  • [Improvement] Set heartBeatExecutorService as daemon thread by @smallzhongfeng in #121
  • [JUnit] Introduce the property of trimStackTrace to show error stacktrace in mvn-test by @zuston in #126
  • Make the conf of rss.storage.basePath as list by @zuston in #130
  • [MINOR][IMPROVEMENT][STORAGE] Fix storage module code style by @jerqi in #131
  • [Improvement] Add timeout reconnection when DelegationRssShuffleManager send the request of AccessCluster by @smallzhongfeng in #139
  • [MINOR] Fix flaky test testGetHostIp by @izchen in #141
  • [Improvement] Add the number of unhealthy nodes in CoordinatorMetrics by @smallzhongfeng in #147
  • [ISSUE-48][FEATURE] Add Uniffle Dockerfile by @wangao1236 in #132
  • [BUGFIX] Fix memory leak which cause oom by @summaryzb in #145
  • [Log Improvement] Output the register...
Read more

Release v0.7.1

10 Jul 02:46

Choose a tag to compare

Apache Uniffle (Incubating) Release v0.7.1

Highlight

  • Improvements

    • Refresh application when reading memory data to prevent application from being expired.
  • Major bug fixes

    • Cache proxy user ugi to avoid memory leak.
    • Make metric reporter usable by fixing some logic errors.
    • Close ShuffleWriteClient after task was completed to exit container process.
    • Make sure 'finishShuffle' invoked after all shuffle data sent.
    • Update local storage metadata for all related events instead of just the first event.
    • Correct a wrong metric, grpc_server_connection_number.
  • Minor bug fixes

    • Avoid returning null in defaultUserApps when quote file doesn't config user.
    • Fix LocalStorageManager divide by zero exception.

ChangeLog

  • [ISSUE-669] improvement: refresh application when reading memory data (#741)
  • Bump project version to 0.7.1-SNAPSHOT
  • [ISSUE-772] fix(kerberos): cache proxy user ugi to avoid memory leak (#773)
  • Revert "[ISSUE-772] fix(kerberos): cache proxy user ugi to avoid memory leak (#773)
  • [ISSUE-796][0.7] bug: Fix the issues of MetricReporter (#821)
  • [MINOR][0.7] Avoid returning null in defaultUserApps when quota file doesn't config user (#822)
  • [ISSUE-772] fix(kerberos): cache proxy user ugi to avoid memory leak (#773)
  • Revert "[ISSUE-772] fix(kerberos): cache proxy user ugi to avoid memory leak (#773)
  • [ISSUE-772][0.7] fix(kerberos): cache proxy user ugi to avoid memory leak (#773)
  • [ISSUE-715] fix(mr): The container does not exit because shuffleclient is not closed (#882)
  • [ISSUE-886] fix(mr): MR Client may lose data or throw exception when rss.storage.type without MEMORY (#887)
  • Revert "[ISSUE-886] fix(mr): MR Client may lose data or throw exception when rss.storage.type without MEMORY (#887)
  • [MINOR] fix: Fix LocalStorageManager divide by zero exception (#900)
  • [ISSUE-881][0.7] fix(followup): Ensure LocalStorageMeta disk size is correctly updated when events are processed for 0.7.0 (#914)
  • [ISSUE-933][0.7] fix: incorrect metric grpc_server_connection_number (#941)
  • change version to 0.7.1 release

Release v0.7.0

10 Apr 08:41

Choose a tag to compare

Apache Uniffle (Incubating) Release v0.7.0

Highlight

  • Better support for Spark AQE
    • Leveraging the LOCAL_ORDER data distribution to improve performance
    • Estimating assignment number of shuffle servers to improve user experience
    • Better assignment policy of assigning adjacent partitions to the same shuffle server to improve performance
  • Optimization of huge partition to improve stability of shuffle servers
  • More bug fixes and usability improvements of K8S operator
  • Add support of user quota management and more compression algorithms
  • Add support of spark data eviction mechanism of stage level
  • Some improvement of stability and performance

Changelog

  • Add more badges in README by @kaijchen in #219
  • Fix incorrect log format strings by @kaijchen in #220
  • Change total lines badge url to sloc.xyz in README by @kaijchen in #222
  • [MINOR] Fix warnings reported by lgtm by @kaijchen in #223
  • [MINOR] Simplify creating buffer logic by @zuston in #227
  • Support cancelling previous ci actions by @zuston in #225
  • Use the conf of shuffleNodesNumber from jobs to be as checking factor by @zuston in #208
  • Output the stderr and stdout to output file in startup script by @zuston in #226
  • [ISSUE-48][FEATURE][FOLLOW UP] Add controller component by @wangao1236 in #214
  • Add more metrics about requiring read memory by @zuston in #231
  • Adjust the memory required times to match grpc max deadline conf by @zuston in #218
  • [MINOR] Fix flaky test by @jerqi in #238
  • [ISSUE-48][FEATURE][FOLLOW UP] Add yaml of components and crd exampes by @wangao1236 in #236
  • Fix Flaky test GetShuffleReportForMultiPartTest by @leixm in #241
  • Set the default disk capacity to the total space by @zuston in #237
  • Add issue template by @jerqi in #8
  • [MINOR] Fix inefficient map iteration by @kaijchen in #245
  • Support deploy multiple shuffle servers in a single node by @xianjingfeng in #166
  • Fast fail when reading failed in ComposedClientReadHandler by @zuston in #213
  • Fix startup shell problem by @jerqi in #251
  • New version 0.7.0-snapshot by @jerqi in #252
  • [ISSUE-196] Fix flaky test about kerberos by @zuston in #250
  • [ISSUE-48][FEATURE][FOLLOW UP] add unit test for validating rss objects by @wangao1236 in #248
  • [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy by @smallzhongfeng in #210
  • [TYPO] Replace Chinese colon by ASCII colon by @kaijchen in #255
  • Introduce startup-silent-period mechanism to avoid partial assignments by @zuston in #247
  • Replace DISCLAIMER with DISCLAIMER-WIP by @jerqi in #258
  • [ISSUE-244] Fix flaky test of CoordinatorGrpcTest.rpcMetricsTest by @zuston in #256
  • Fix flaky test of ClientConfManagerTest by @smallzhongfeng in #260
  • [Refactor] Optimize creating shuffle handlers by @zuston in #259
  • Introduce data cleanup mechanism on stage level by @zuston in #249
  • [ISSUE-48][FEATURE][FOLLOW UP] add docs for operator by @wangao1236 in #261
  • Fix potenial missing reads of exclude nodes by @zuston in #269
  • [ISSUE-257] RssMRUtils#getBlockId change the partitionId of int type to long by @fpkgithub in #266
  • [ISSUE-273][BUG] Get shuffle result failed caused by concurrent calls to registerShuffle by @leixm in #274
  • Add enum type test about case insensitive by @zuston in #280
  • Support ZSTD by @zuston in #254
  • [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file by @leixm in #275
  • Remove code quality badge and add release badge by @kaijchen in #284
  • [ISSUE-163][FEATURE] Write to hdfs when local disk can't be write by @xianjingfeng in #235
  • Upgrade Github actions for Node.js 16 by @kaijchen in #292
  • Fix NPE in WriteBufferManager.addRecord by @wForget in #296
  • Fix AbstractStorage#containsWriteHandler by @xianjingfeng in #281
  • Add more test cases on LocalStorageManager.selectStorage by @zuston in #298
  • [ISSUE-137][Improvement][AQE] Sort MapId before the data are flushed by @zuston in #293
  • [ISSUE-283][FEATURE] Support snappy compression/decompression by @amaliujia in #304
  • [ISSUE-290] Make RpcNodePort and HttpNodePort optional by @amaliujia in #305
  • [ISSUE-301][Subtask][Improvement][AQE] Merge continuous ShuffleDataSegment into single one by @zuston in #303
  • Cleanup RuntimeException and fetchRemoteStorage logic in ClientUtils by @kaijchen in #295
  • [ISSUE-135][FOLLOWUP][Improvement][AQE] Assign adjacent partitions to the same ShuffleServer by @leixm in #307
  • Correct the contributing guide link in pull-request template by @zuston in #314
  • Fix bug of "Comparison method violates its general contract" by @zuston in #315
  • [AQE][LocalOrder] Fix potenial bug when merging continuous segments by @zuston in #318
  • [AQE][LocalOrder] Fix wrong param of expectedTaskIds in LocalOrderSegmentSplit by @zuston in #319
  • [Feature] Support the estimated number of ShuffleServers required. by @leixm in #322
  • [Bug] Fix potenial bug when the index reading offset is greater than data length by @zuston in #320
  • [ISSUE-154][Improvement] Support Empty assignment to Shuffle Server by @rhh777 in #325
  • [Bug] Fix invalid owner of host path volumes by @wangao1236 in #330
  • [ISSUE-309][FEATURE] Support ShuffleServer latency metrics. by @leixm in #327
  • [ISSUE-329]Catch NPE in ShuffleTaskManager#addFinishedBlockIds by @xianjingfeng in #331
  • [BUG] Fix wrong method name by @leixm in #335
  • [ISSUE-328] Cleanup unused shuffle servers after stage completed by @xianjingfeng in #334
  • [MINOR] Migrate RankValue to the package of the common class by @smallzhongfeng in #265
  • [BUG] Fix incorrect spark metrics by @zuston in #324
  • [Improvement][LocalOrder] Add tests about keeping consistent with FixedSize when no skew optimization by @zuston in #336
  • [INFRA] Add k8s pipeline by @jerqi in #340
  • Remove unused class of RssShuffleUtils by @zuston in #345
  • [ISSUE-342][Improvement] Check Spark Serializer type by @chong0929 in #344
  • [Feature] Support user's app quota level limit by @smallzhongfeng in #311
  • [BUG][AQE][LocalOrder] Fix the bug of missed data due to block sorting by @zuston in #347
  • [ISSUE-364] Fix indexWriter don't close if exception thrown when close dataWriter by @xianjingfeng in #349
  • [BUG] Fix flaky test of AQESkewedJoinWithLocalOrderTest by @zuston in #350
  • Add collaborators by @jerqi in #351
  • [BUG][FOLLOWUP] Fix flaky test of AQESkewedJoinWithLocalOrderTest by @zuston in https://github.com/...
Read more

Release v0.6.1

09 Dec 09:12

Choose a tag to compare

Apache Uniffle (Incubating) Release v0.6.1

Highlight

  • Major bug fixes

    • Partition cannot be accessed in MapReduce when the reduce task number exceeds 1024.
    • Get shuffle result failure caused by concurrent calls to registerShuffle.
    • Inconsistent blocks caused by missing length in RssUtils#transIndexDataToSegments.
    • Handle NPE in WriteBufferManager#addRecord in the same way as Spark.
    • AbstractStorage#containsWriteHandler is checking the wrong Map.
    • indexWriter isn't closed if exception is thrown when closing dataWriter.
    • Incorrect dependency of protobuf-java at compile time.
    • Potential memory leak when encountering disk unhealthy.
  • Minor bug fixes

    • Potenial missing reads of exclude nodes.
    • Incorrect contributing link in pull-request template.
    • Incorrect spark metrics.

ChangeLog

  • [ISSUE-257] RssMRUtils#getBlockId change the partitionId of int type to long (#266)
  • [ISSUE-273][BUG] Get shuffle result failed caused by concurrent calls to registerShuffle (#274)
  • Fix potenial missing reads of exclude nodes (#269)
  • [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file (#275)
  • Fix NPE in WriteBufferManager.addRecord (#296)
  • Fix AbstractStorage#containsWriteHandler (#281)
  • Correct the pull-request contributing link in template (#314)
  • [BUG] Fix incorrect spark metrics (#324)
  • [ISSUE-364] Fix indexWriter don't close if exception thrown when close dataWriter (#349)
  • [ISSUE-228] Fix the problem of protobuf-java incorrect dependency at compile time (#362)
  • Bump project version to 0.6.1
  • [BUG] Potenial memory leak when encountering disk unhealthy (#370)

Release v0.6.0

27 Oct 10:50

Choose a tag to compare

Apache Uniffle (Incubating) Release v0.6.0

Highlight

  • Optimize the assignment strategy

  • Some improvement of stability and performance

  • Add a plugin mechanism of SelectStorageStrategy

  • Add LowestIOSampleCostSelectStorageStrategy

  • Support Kerberos HDFS

ChangeLog

  • Change license owner to ASF by @kaijchen in #5
  • Trivial code improvements by @wForget in #7
  • [Minor] Store shuffleId int to be consistent with other data structure by @zuston in #10
  • Introduce the asList method in ConfigOptions by @zuston in #9
  • Rename package by @jerqi in #6
  • Minimize apache-rat excluded files by @kaijchen in #11
  • Update module names by @kaijchen in #12
  • Covert PartitionAssignmentInfo to static inner class by @pan3793 in #15
  • [Followup] Migrate to Junit5 by @zuston in #14
  • [Bug] Fix NPE problem when process the event if application was cleared already by @colinmjj in #16
  • [CI] Enable codecov report by @kaijchen in #17
  • Correct the config description and fix typo by @zuston in #19
  • Add CI and Codecov badges in README by @kaijchen in #20
  • [Followup] Use asList method in some existing configOptions by @zuston in #18
  • Move rss-integration-spark-common-test module package by @wForget in #23
  • [INFRA] Improve asf.yaml to reduce the notifications by @jerryshao in #25
  • [TEST] Improve code coverage in rss-common by @kaijchen in #26
  • Remove redundant package by @wForget in #27
  • [CI] Switch to temurin JDK by @kaijchen in #24
  • [INFRA] Improve asf.yaml to reduce the notifications (another-try) by @jerryshao in #33
  • Bump commons-lang3 from 3.5 to 3.10 by @wForget in #28
  • Fix the log of incorrectly bound class by @wForget in #35
  • [TYPO] Fix misspelled word "integration" by @kaijchen in #34
  • Fix some hyperlink in README.md by @daugraph in #32
  • Upgrade gRPC to support Apple Silicon by @pan3793 in #13
  • Allow to specify custom tags to decide the assignment of servers by @zuston in #30
  • Optimize the bash script by @zuston in #29
  • [Improvement] reduce compiler warnings by @advancedxy in #46
  • [Chore]: document update and build time optimize by @advancedxy in #45
  • Supplement doc about assignment tags by @zuston in #47
  • [Bug] Fix skip() api maybe skip unexpected bytes which makes inconsistent data by @colinmjj in #40
  • [improvement] Remove experimental feature with ShuffleUploader by @colinmjj in #51
  • [Improvement] Provides utility classes for creating thread factories by @smallzhongfeng in #49
  • Enable spotbugs and fix high priority bugs by @kaijchen in #38
  • [CI] Change default checkstyle severity to error by @kaijchen in #57
  • [Style] Check indentation by @kaijchen in #56
  • [Experimental Feature] MR Supports Remote Spill by @frankliee in #55
  • [Improvement] Log indicate the shuffle server host:port when doing re… by @zuston in #58
  • Send commit concurrently in client side by @zuston in #59
  • Explicitly set the constructor with AccessManager when extending AccessChecker by @zuston in #43
  • [DOC] Replace Firestorm with Uniffle by @jerqi in #60
  • Introduce the extraProperties to support user-defined pluggable accessCheckers by @zuston in #42
  • Log enhancement: Merge multiple logs into oneline and add more description by @zuston in #62
  • [TEST] Add more unit tests in rss-common by @kaijchen in #63
  • [MINOR] Comments of PartitionBalanceAssignmentStrategy miss byte units by @smallzhongfeng in #68
  • [Minor] Make config keys and default values finalized by @kaijchen in #70
  • [Log Improvment] Add more detailed debug info for MR client by @frankliee in #84
  • [Improvement] Shutdown the grpc executors pool when closing by @zuston in #83
  • Log enhancement: return error message when getting assignment servers and log exception when initializing by @zuston in #64
  • [ISSUE-48] [Feature] Init Kubernetes operator directory by @jerqi in #75
  • [Improvement] No need to use synchronized lock of the method scope when getting client by @zuston in #82
  • [DOC] Remove Wechat group in README by @jerqi in #88
  • [Performance Optimization] Improve the speed of writing index file in shuffle server by @zuston in #91
  • [DOC] Update title and description in README by @kaijchen in #94
  • [Improvement] ShuffleBlock should be release when finished reading by @xianjingfeng in #74
  • [IMPROVEMENT][COMMON] Fix common module code style by @jerqi in #99
  • [Improvement]LocalStorage init use multi thread #71 by @xianjingfeng in #72
  • [Improvement] Use OR operation instead of serialization for cloning BitMaps by @kaijchen in #103
  • [Improvement] Ignore partial failure on initializing local storage in shuffle server side by @zuston in #102
  • [CI] Test compile in Java 11 and Java 17 by @kaijchen in #105
  • Sleep less time but try more times when stopping by @xianjingfeng in #112
  • [Improvement] Use ConfigBuilder to rewrite the class RssSparkConfig by @smallzhongfeng in #104
  • [Improvement] Introduce config to customize assignment server numbers in client side by @zuston in #100
  • Assign partition again if registerShuffleServers failed by @xianjingfeng in #115
  • [ISSUE-106][IMPROVEMENT] Set rpc timeout for all rpc interface by @xianjingfeng in #113
  • [MINOR][IMPROVEMENT] Avoid CoordinatorServer#initialization multiple new Configuration() by @zwangsheng in #118
  • [Improve] Remove useless server id from StorageManagerFactory#createStorageManager by @zwangsheng in #119
  • [MINOR][IMPROVEMENT][COORD] Fix coordinator module code style by @jerqi in #122
  • [Improvement] Set heartBeatExecutorService as daemon thread by @smallzhongfeng in #121
  • [JUnit] Introduce the property of trimStackTrace to show error stacktrace in mvn-test by @zuston in #126
  • Make the conf of rss.storage.basePath as list by @zuston in #130
  • [MINOR][IMPROVEMENT][STORAGE] Fix storage module code style by @jerqi in #131
  • [Improvement] Add timeout reconnection when DelegationRssShuffleManager send the request of AccessCluster by @smallzhongfeng in #139
  • [MINOR] Fix flaky test testGetHostIp by @izchen in #141
  • [Improvement] Add the number of unhealthy nodes in CoordinatorMetrics by @smallzhongfeng in #147
  • [ISSUE-48][FEATURE] Add Uniffle Dockerfile by @wangao1236 in #132
  • [BUGFIX] Fix memory leak which cause oom by @summaryzb in #145
  • [Log Improvement] Output the re...
Read more