Skip to content

Commit 7aeda31

Browse files
committed
feat: working parallel processing
1 parent a1c3593 commit 7aeda31

File tree

7 files changed

+277
-50
lines changed

7 files changed

+277
-50
lines changed

README.adoc

Lines changed: 20 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,8 @@ Available subcommands:
8383
** `--manifest-path PATH` - Path to schemas.yml manifest file
8484
** `--cache-dir DIR` - Directory for caching downloaded tools
8585
** `--log-dir DIR` - Directory for storing log files
86-
** `--parallel` / `--no-parallel` - Enable/disable parallel processing with Ractors
87-
** `--ractors NUM` - Number of parallel ractors to use (default: auto-configured)
86+
** `--parallel` / `--no-parallel` - Enable/disable parallel processing with Fractors (default: enabled)
87+
** `--workers NUM` - Number of parallel workers to use (default: auto-configured)
8888
* `clean` - Remove generated documentation
8989
* `distclean` - Remove generated documentation and downloaded tools
9090
** `--global-cache` - Also clean the global cache directory
@@ -99,8 +99,8 @@ bundle exec hrma build documentation
9999
# Generate documentation with custom manifest file
100100
bundle exec hrma build documentation --manifest-path=custom-schemas.yml
101101
102-
# Generate documentation with 4 ractors
103-
bundle exec hrma build documentation --ractors=4
102+
# Generate documentation with 4 workers
103+
bundle exec hrma build documentation --workers=4
104104
105105
# Generate documentation without parallel processing
106106
bundle exec hrma build documentation --no-parallel
@@ -158,36 +158,35 @@ bundle exec hrma config set cache_dir /path/to/cache
158158

159159
=== Parallel processing
160160

161-
The tool supports parallel processing using Ruby's Ractor feature. This
162-
significantly speeds up documentation generation for large numbers of schema
163-
files.
161+
The tool supports parallel processing using Ruby's Ractor feature through the Fractor framework. This
162+
significantly speeds up documentation generation for large numbers of schema files.
164163

165-
By default, the tool automatically determines the optimal number of ractors to
164+
By default, the tool automatically determines the optimal number of workers to
166165
use based on your system resources:
167166

168-
* In "auto" mode (default), the number of ractors is determined by:
167+
* In "auto" mode (default), the number of workers is determined by:
169168
** Using half of your CPU cores (rounded down)
170169
** Ensuring at least 2 cores are left free for system processes
171-
** Using at least 1 ractor
172-
** Using one ractor per file when possible (up to the calculated maximum)
170+
** Using at least 1 worker
171+
** Using one worker per file when possible (up to the calculated maximum)
173172

174173
This auto-configuration provides a good balance between performance and system
175174
responsiveness.
176175

177176
[example]
178177
====
179-
* With 4 files on a 4-core system: 1 ractor would be used (half cores = 2, but ensuring 2 cores are free = 1)
180-
* With 4 files on an 8-core system: 4 ractors would be used (half cores = 4, which leaves enough free cores)
181-
* With 4 files on a 16-core system: 4 ractors would be used (one per file, even though 8 ractors would be available)
182-
* With 10 files on a 16-core system: 8 ractors would be used (half cores = 8, which is less than file count)
178+
* With 4 files on a 4-core system: 1 worker would be used (half cores = 2, but ensuring 2 cores are free = 1)
179+
* With 4 files on an 8-core system: 4 workers would be used (half cores = 4, which leaves enough free cores)
180+
* With 4 files on a 16-core system: 4 workers would be used (one per file, even though 8 workers would be available)
181+
* With 10 files on a 16-core system: 8 workers would be used (half cores = 8, which is less than file count)
183182
====
184183

185-
You can manually specify the number of ractors:
184+
You can manually specify the number of workers:
186185

187186
[source,sh]
188187
----
189-
# Use 4 ractors for parallel processing
190-
bundle exec hrma build documentation --ractors=4
188+
# Use 4 workers for parallel processing
189+
bundle exec hrma build documentation --workers=4
191190
----
192191

193192
To disable parallel processing entirely:
@@ -196,9 +195,6 @@ To disable parallel processing entirely:
196195
----
197196
# Disable parallel processing
198197
bundle exec hrma build documentation --no-parallel
199-
200-
# Alternative method
201-
HRMA_DISABLE_RACTORS=1 bundle exec hrma build documentation
202198
----
203199

204200

@@ -215,8 +211,9 @@ The `hrma` tool is organized into several components:
215211
=== Build system
216212

217213
* `lib/hrma/build/document_generator.rb` - Main class for generating documentation
218-
* `lib/hrma/build/ractor_document_processor.rb` - Processor for XSD files that can run within a Ractor
219-
* `lib/hrma/build/documentation.rb` - Module with documentation generation utilities
214+
* `lib/hrma/build/schema_processor.rb` - Processes individual schema files
215+
* `lib/hrma/build/schema_work.rb` - Work item representation for parallel processing
216+
* `lib/hrma/build/schema_worker.rb` - Worker implementation for parallel processing
220217
* `lib/hrma/build/tools.rb` - Handles downloading and setting up external tools
221218
* `lib/hrma/build/cleaner.rb` - Handles cleaning generated files
222219

hrma.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ Gem::Specification.new do |spec|
1919
spec.add_dependency "thor", "~> 1.2"
2020
spec.add_dependency "rake", "~> 13.0"
2121
spec.add_dependency "ruby-progressbar", "~> 1.13"
22+
spec.add_dependency "fractor", "~> 0.1"
2223
end

lib/hrma/README.adoc

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
= HRMA Library Documentation
2+
3+
== Overview
4+
5+
The HRMA (Harmonized Resources Maintenance Agency) library provides tools for managing ISO/TC 211 schemas and generating documentation. This document describes the internal structure and components of the library.
6+
7+
== Directory Structure
8+
9+
* `lib/hrma/` - Root directory for the HRMA library
10+
** `build/` - Documentation generation components
11+
** `commands/` - Command implementations for the CLI
12+
** `cli.rb` - Main CLI class
13+
** `config.rb` - Configuration management
14+
** `version.rb` - Version information
15+
16+
== Key Components
17+
18+
=== Build System
19+
20+
The build system is responsible for generating documentation from schema files:
21+
22+
* `build/document_generator.rb` - Main class for generating documentation
23+
* `build/schema_processor.rb` - Processes individual schema files
24+
* `build/schema_work.rb` - Work item representation for parallel processing
25+
* `build/schema_worker.rb` - Worker implementation for parallel processing
26+
* `build/tools.rb` - Handles downloading and setting up external tools
27+
* `build/cleaner.rb` - Handles cleaning generated files
28+
29+
=== Parallel Processing
30+
31+
The library uses Ruby's Ractor feature for parallel processing of schema files:
32+
33+
* Work is distributed across multiple Ractors using the Fractor framework
34+
* Each schema file is processed in its own Ractor
35+
* Results are collected and aggregated
36+
* The number of Ractors is configurable or auto-detected based on system resources
37+
38+
=== Commands
39+
40+
The command system provides the CLI interface:
41+
42+
* `commands/build.rb` - Commands for building documentation
43+
* `commands/schemas.rb` - Commands for managing schemas
44+
* `commands/config.rb` - Commands for managing configuration
45+
46+
=== Configuration
47+
48+
Configuration is managed through:
49+
50+
* Command-line options
51+
* Environment variables
52+
* Configuration file (`~/.hrma/config.yml`)
53+
54+
== Development
55+
56+
=== Adding New Features
57+
58+
When adding new features:
59+
60+
1. Identify the appropriate component to modify
61+
2. Update tests to cover the new functionality
62+
3. Update documentation (including this README)
63+
4. Update the main README.adoc if the feature affects user-facing functionality
64+
65+
=== Parallel Processing
66+
67+
The parallel processing system uses the Fractor framework to distribute work across multiple Ractors:
68+
69+
1. `SchemaWork` objects represent individual schema files to process
70+
2. `SchemaWorker` processes each work item in a separate Ractor
71+
3. `DocumentGenerator` coordinates the workers and collects results
72+
73+
When modifying the parallel processing system:
74+
75+
* Ensure all objects passed between Ractors are shareable
76+
* Handle errors appropriately to prevent worker crashes
77+
* Consider the impact on memory usage and system resources

lib/hrma/build/document_generator.rb

Lines changed: 82 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,12 @@
44
require "fileutils"
55
require "ruby-progressbar"
66
require "logger"
7+
require "etc"
8+
require "fractor"
79
require_relative "../config"
810
require_relative "schema_processor"
11+
require_relative "schema_work"
12+
require_relative "schema_worker"
913

1014
module Hrma
1115
module Build
@@ -35,11 +39,80 @@ def generate
3539
puts "Found #{xsd_files.size} XSD files to process"
3640
@progressbar = create_progressbar(xsd_files.size)
3741

38-
generate_sequential(xsd_files)
42+
if options[:parallel] == false
43+
generate_sequential(xsd_files)
44+
else
45+
generate_parallel(xsd_files)
46+
end
3947

4048
puts "\nDocumentation generation complete. See _site/ directory."
4149
end
4250

51+
# Generate documentation in parallel using Fractors
52+
#
53+
# @param xsd_files [Array<String>] List of XSD files to process
54+
# @return [void]
55+
def generate_parallel(xsd_files)
56+
puts "Generating documentation in parallel..."
57+
58+
# Determine number of workers - use either the specified number or auto-detect
59+
num_workers = options[:workers] || [xsd_files.size, Etc.nprocessors].min
60+
puts "Using #{num_workers} parallel workers"
61+
62+
# Create work items - each item contains just the basic string path and log file path
63+
work_items = xsd_files.map do |xsd_file|
64+
# Create log file path for this file if log_dir is specified
65+
log_file = nil
66+
if @log_dir
67+
log_file_name = "#{File.basename(xsd_file, '.xsd')}.log"
68+
log_file = File.join(@log_dir, log_file_name)
69+
FileUtils.mkdir_p(File.dirname(log_file))
70+
end
71+
72+
# Use the original string path directly - no nested objects
73+
SchemaWork.new({
74+
schema_path: xsd_file, # This is just a string
75+
log_file: log_file
76+
})
77+
end
78+
79+
# Create supervisor with worker pools
80+
supervisor = Fractor::Supervisor.new(
81+
worker_pools: [
82+
{ worker_class: SchemaWorker, num_workers: num_workers }
83+
]
84+
)
85+
86+
# Add work items
87+
supervisor.add_work_items(work_items)
88+
89+
# Run processing
90+
supervisor.run
91+
92+
# Process results
93+
process_results(supervisor.results)
94+
end
95+
96+
# Process results from parallel processing
97+
#
98+
# @param aggregator [Fractor::ResultAggregator] Result aggregator
99+
# @return [void]
100+
def process_results(aggregator)
101+
# Handle successful results
102+
aggregator.results.each do |result|
103+
schema_path = result.work.input[:schema_path]
104+
puts "Successfully processed #{schema_path}"
105+
progressbar.increment
106+
end
107+
108+
# Handle errors
109+
aggregator.errors.each do |error_result|
110+
schema_path = error_result.work.input[:schema_path]
111+
puts "Error processing #{schema_path}: #{error_result.error}"
112+
progressbar.increment
113+
end
114+
end
115+
43116
private
44117

45118
# Load XSD files from schemas.yml
@@ -59,38 +132,20 @@ def load_xsd_files
59132
xsd_files
60133
end
61134

62-
# Generate documentation sequentially
135+
# Generate documentation sequentially (using parallel processing with 1 worker)
63136
#
64137
# @param xsd_files [Array<String>] List of XSD files to process
65138
# @return [void]
66139
def generate_sequential(xsd_files)
67-
puts "Generating documentation sequentially..."
68-
69-
# Create a schema processor
70-
processor = SchemaProcessor.new
140+
puts "Generating documentation sequentially (single worker)..."
71141

72-
# Process each file
73-
xsd_files.each do |xsd_file|
74-
puts "Processing: #{xsd_file}"
142+
# Just use parallel processing with 1 worker
143+
options_with_one_worker = options.dup
144+
options_with_one_worker[:workers] = 1
145+
@options = options_with_one_worker
75146

76-
# Create a logger for this file if log_dir is specified
77-
logger = create_logger(xsd_file) if @log_dir
78-
79-
# Process the file
80-
result = processor.process(schema_path: xsd_file, logger: logger)
81-
82-
# Close the logger if it was created
83-
logger&.close
84-
85-
# Handle the result
86-
if result
87-
puts "Successfully processed #{xsd_file}"
88-
else
89-
puts "Error processing #{xsd_file}"
90-
end
91-
92-
progressbar.increment
93-
end
147+
# Use the parallel implementation with 1 worker
148+
generate_parallel(xsd_files)
94149
end
95150

96151
# Create a logger for a specific file

lib/hrma/build/schema_work.rb

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# frozen_string_literal: true
2+
3+
require 'fractor'
4+
5+
module Hrma
6+
module Build
7+
# Class representing a work item for schema processing
8+
class SchemaWork < Fractor::Work
9+
attr_reader :schema_path, :log_file
10+
11+
# Initialize a new SchemaWork
12+
#
13+
# @param data [Hash] Hash containing schema_path and log_file
14+
# @option data [String] :schema_path Path to the schema file
15+
# @option data [String, nil] :log_file Path to log file, if any
16+
def initialize(data)
17+
@schema_path = data[:schema_path]
18+
@log_file = data[:log_file]
19+
super(data)
20+
end
21+
22+
# Provide a readable representation of this work item
23+
#
24+
# @return [String] String representation
25+
def to_s
26+
"SchemaWork: #{@schema_path}"
27+
end
28+
end
29+
end
30+
end

0 commit comments

Comments
 (0)