Skip to content

Conversation

pnodet
Copy link
Member

@pnodet pnodet commented Sep 1, 2025

Summary

  • Replaced Vec collection patterns with streaming processing to drastically reduce memory usage
  • Memory complexity reduced from O(n) to O(1) - constant memory usage regardless of file size
  • Enables processing of arbitrarily large point clouds without out-of-memory errors

Changes

Core Optimization

  • Two-pass streaming approach in convert_pointcloud and convert_pointclouds:
    • First pass: Scan points to determine bounds and color presence
    • Second pass: Stream points directly to LAS writer without collecting in memory
  • Removed unnecessary Vec<las::Point> collections and associated Mutex usage
  • Changed from parallel to sequential processing in convert_pointclouds to enable streaming

Additional Improvements

  • Fixed Rayon thread pool initialization to be idempotent (prevents panics in benchmarks)
  • Added comprehensive benchmarking infrastructure using Criterion
  • Fixed various clippy warnings and improved code quality
  • Simplified test code to avoid thread pool conflicts

Performance Impact

Memory Usage

  • Before: O(n) - memory usage grows linearly with point cloud size
  • After: O(1) - constant memory usage regardless of input size
  • Can now process files that would previously cause OOM errors

Processing Speed

  • Maintains similar or better performance compared to the original implementation
  • Sequential processing in convert_pointclouds may be slightly slower for small files but enables massive memory savings

Testing

  • All existing tests pass
  • Added benchmark suite to compare old vs new implementation
  • Benchmarks confirm significant memory reduction while maintaining performance

Benchmarking

Run benchmarks with:

cargo bench

Copy link

coderabbitai bot commented Sep 1, 2025

Warning

Rate limit exceeded

@pnodet has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 34 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between af7284d and 6f2ccab.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • Cargo.toml (1 hunks)
  • benches/comparison_benchmark.rs (1 hunks)
  • benches/memory_benchmark.rs (1 hunks)
  • src/convert_file.rs (4 hunks)
  • src/convert_pointcloud.rs (5 hunks)
  • src/convert_pointcloud_old.rs (1 hunks)
  • src/lib.rs (1 hunks)
  • src/main.rs (1 hunks)
  • src/stations.rs (1 hunks)
  • src/utils.rs (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch streaming-memory-opt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Replace Vec collection with streaming approach to reduce memory usage
from O(n) to O(1).

Changes:
- Implement two-pass streaming in convert_pointcloud and
convert_pointclouds
  - First pass: determine bounds and color presence
  - Second pass: stream points directly to LAS writer
- Remove unnecessary Vec collections and Mutex usage
- Fix Rayon thread pool initialization for robustness
- Add comprehensive benchmarking infrastructure with Criterion
- Fix various clippy warnings and improve code quality

Performance improvements:
- Memory usage now constant regardless of file size
- Can process arbitrarily large point clouds without OOM
- Maintains or improves processing speed
@pnodet pnodet force-pushed the streaming-memory-opt branch from 7bf2e7c to 6f2ccab Compare September 1, 2025 16:30
@pnodet pnodet requested a review from pnwatin September 1, 2025 16:30
@pnodet pnodet closed this Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant