refactor: 音素を列挙型で表現する #1190

qryxip · 2025-11-02T17:35:13Z

内容

OjtPhonemeを解体し、AudioQueryレベルの表現のenum Phonemeと音素IDレベルのenum PhonemeCodeに置き換える。
[追記] さらにOptionalConsonant型とMoraTail型を誕生させました。
[追記] Claude君にレビューさせたらこんな図解を出してくれた。

#1157 の前準備。

ゆくゆくはこのenum Phonemeを、(Frame)AudioQuery自体の表現に使うようにする。あと VOICEVOX/voicevox_engine#1004 の取り込みも予定。

VOICEVOX/voicevox_engine#993 の考え方を一部参考にした。以下1名の許諾のもと、 #874 にのっとりMITライセンスとしてライセンスする。

@tarepan (is_unvoiced_mora_tailというメソッドを生やす考え方)

その他

Hiroshiba

AIUEO → UnvoicedVowelHoge
aiueo → VoicedVowelHoge
pau cl N → MorableHoge（単体でモーラになる母音以外の音素、くらいの意図）
それ以外 → ConsonantHoge（Consonantは子音）

とかで良いかな～と思いました！
たぶん現状答えがないので、一旦これでくらいの気持ち。。

１つ１つに名前もたせるのではなく、音素ごとに属性を持たせる設計のが合うかもです。どっちが良いか現状不明。。
Phoneme.NはMorable属性持ってる、Phoneme.pauはMorableとPause属性を持ってる、みたいな･･･。
（Pythonだとこういう型実装苦手そうだけど、Rustならできるのかもと思って提案してみました。あまり深く考えて作らなくても良さそう感。）

codspeed-hq · 2025-11-12T17:14:24Z

CodSpeed Performance Report

Merging #1190 will improve performances by 25.91%

_{Comparing qryxip:pr/refactor-make-phoneme-enum (25015b5) with main (ee2ca83)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 2 improvements
⏩ 22 skipped¹

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_asyncio_tts`	2.3 s	1.9 s	+25.91%
⚡	`test_blocking_tts`	2.3 s	1.9 s	+24.2%

22 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

qryxip · 2025-11-12T18:32:42Z

一部の操作に対するパニックメッセージが変わってしまうので、このPRを"refactor"とするために別PRで変更を入れたい。

[追記] #1197, #1203

qryxip · 2025-11-12T19:11:22Z

ゆくゆくはこのenum Phonemeを、(Frame)AudioQuery自体の表現に使うようにする。

AudioQueryではsilはsilとして保存しないといけないことに気がついた。となるとsilを含むpub enum Phonemeと、音素IDが割り振られているpub(crate) enum InferablePhonemeみたいに分けた方がよさそう。

qryxip · 2025-11-15T21:08:02Z

となるとsilを含むpub enum Phonemeと、音素IDが割り振られているpub(crate) enum InferablePhonemeみたいに分けた方がよさそう。

done. InferablePhonemeはPhonemeCodeと命名しました。PhonemeIdとかと迷いましたが。

crates/voicevox_core/src/engine/talk/interpret_query.rs

これを、 ``` called `Option::unwrap()` on a `None` value ``` こうする。 ``` invalid phoneme: "不正なラベル" ``` #1190 を完全に"refactor"とするため、音素が不正なときのパニックメッセージを先に定めるのが目的。 VOICEVOX/voicevox_engine#1004 については別途、 #1190の後に行う。 Fixes: #1202

qryxip · 2025-11-17T04:41:21Z

このPRのdescriptionを更新し、リファクタとしての概要を書きました。

Co-authored-by: tarepan <[email protected]>

qryxip · 2025-11-17T11:48:45Z

１つ１つに名前もたせるのではなく、音素ごとに属性を持たせる設計のが合うかもです。どっちが良いか現状不明。。
Phoneme.NはMorable属性持ってる、Phoneme.pauはMorableとPause属性を持ってる、みたいな･･･。
（Pythonだとこういう型実装苦手そうだけど、Rustならできるのかもと思って提案してみました。あまり深く考えて作らなくても良さそう感。）

そういえばこれについては、Rustであっても列挙型の方がしっくり来ました。もし属性のようなものを考えるにしても、コンストラクトタとかPythonの@propertyとかで現れるような感じにしたいです。

qryxip · 2025-11-17T11:57:35Z

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

+    //None = -1,
+    MorablePau = 0,


📝 この = 数値というのは実は最初以降は書かなくても自動で連番を振ってくれるし、0始まりであれば最初も省略できる。ただし今回の場合、OptionalConsonantとMoraTailにおける「疎」の部分のことがあるのであえて全部に明示的な番号を割り振った。

ちなみに0から44までの間に欠番があるとContiguousをderiveするところで「contiguousじゃない」と怒られるので、その点で安心できたというのもある (例えば10と書くところを間違えて100にしても怒ってくれる)。

qryxip · 2025-11-17T12:03:12Z

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

+pub(crate) enum MoraTail {
+    //None = -1,
+    MorablePau = 0,
+    UnvoicedVowelA = 1,
+    UnvoicedVowelE = 2,
+    UnvoicedVowelI = 3,
+    MorableN = 4,
+    UnvoicedVowelO = 5,
+    UnvoicedVowelU = 6,
+    VoicedVowelA = 7,
+    //ConsonantB = 8,
+    //ConsonantBy = 9,
+    //ConsonantCh = 10,
+    MorableCl = 11,
+    //ConsonantD = 12,
+    //ConsonantDy = 13,
+    VoicedVowelE = 14,
+    //ConsonantF = 15,
+    //ConsonantG = 16,
+    //ConsonantGw = 17,
+    //ConsonantGy = 18,
+    //ConsonantH = 19,
+    //ConsonantHy = 20,
+    VoicedVowelI = 21,
+    //ConsonantJ = 22,
+    //ConsonantK = 23,
+    //ConsonantKw = 24,
+    //ConsonantKy = 25,
+    //ConsonantM = 26,
+    //ConsonantMy = 27,
+    //ConsonantN = 28,
+    //ConsonantNy = 29,
+    VoicedVowelO = 30,
+    //ConsonantP = 31,
+    //ConsonantPy = 32,
+    //ConsonantR = 33,
+    //ConsonantRy = 34,
+    //ConsonantS = 35,
+    //ConsonantSh = 36,
+    //ConsonantT = 37,
+    //ConsonantTs = 38,
+    //ConsonantTy = 39,
+    VoicedVowelU = 40,
+    //ConsonantV = 41,
+    //ConsonantW = 42,
+    //ConsonantY = 43,
+    //ConsonantZ = 44,
+}


📝 元の表現:

"a", "i", "u", "e", "o", "N", "A", "I", "U", "E", "O", "cl", "pau"

qryxip · 2025-11-17T12:04:08Z

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

+            Self::UnvoicedVowelA
+                | Self::UnvoicedVowelI
+                | Self::UnvoicedVowelU
+                | Self::UnvoicedVowelE
+                | Self::UnvoicedVowelO
+                | Self::MorablePau
+                | Self::MorableCl


📝 元の表現:

"A", "I", "U", "E", "O", "cl", "pau"

(今気付いたけどClとPauの順番が逆だった)
[追記] Cl → Pauの順に直した: 6b8af86 (#1190)

qryxip · 2025-11-17T19:53:18Z

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

+        phoneme
+            .into_integer()
+            .try_into()
+            .expect("should be ensured by the above assertion")


📝 ここのexpectのパニックメッセージとか、ここの下の方にあるassert_eq!についてはリリースビルドの最適化で分岐ごとちゃんと消えてくれるらしい。

Copilot

Pull Request Overview

This PR refactors the phoneme representation in the codebase by decomposing OjtPhoneme into a more type-safe enum-based system. The refactoring introduces enum Phoneme for AudioQuery-level representation, enum PhonemeCode for phoneme ID-level representation, and two specialized types OptionalConsonant and MoraTail to categorize phoneme codes based on their linguistic properties.

Replaces string-based OjtPhoneme with strongly-typed enum representations
Introduces Phoneme, PhonemeCode, OptionalConsonant, and MoraTail types
Uses bytemuck for zero-cost casting between enum and numeric representations

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
crates/voicevox_core/src/engine/acoustic_feature_extractor.rs	Complete rewrite replacing `OjtPhoneme` with new enum-based phoneme system including `Phoneme`, `PhonemeCode`, `OptionalConsonant`, and `MoraTail` types
crates/voicevox_core/src/engine/talk/interpret_query.rs	Updates function signatures and implementations to use new `PhonemeCode`, `OptionalConsonant`, and `MoraTail` types, replacing previous `OjtPhoneme` usage
crates/voicevox_core/src/synthesizer.rs	Updates to use `PhonemeCode` instead of `OjtPhoneme`, utilizes `bytemuck::must_cast_slice` for zero-cost conversions, and simplifies phoneme ID extraction logic
crates/voicevox_core/src/engine.rs	Updates public exports to expose `PhonemeCode` instead of `OjtPhoneme`
crates/voicevox_core/Cargo.toml	Adds `bytemuck` dependency with `derive` and `must_cast` features
Cargo.toml	Adds `bytemuck = "1.24.0"` to workspace dependencies
Cargo.lock	Records `bytemuck` and `bytemuck_derive` dependency additions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

Hiroshiba

LGTM！！！

おつでした！！
今の形が最適なのか正直わからないのですが、もし課題点とか見つかったらそのときに随時変えていけば良さそう感。

Hiroshiba · 2025-11-18T00:48:37Z

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

+    /// `sil`。
+    #[display("{_0}")]
+    Sil(Sil),


（別のとこのコメントですが議論をわけたいのでここでコメントします）

AudioQueryではsilはsilとして保存しないといけないことに気がついた。

ちょっと聞いておきたいだけなのですが、AudioQueryも別にsil保存じゃなくてpau変換後でも別に問題なさそうですが、silじゃないといけない理由ってなんででしたっけ 👀
VOICEVOX_ENGINEがそうなってるからとか？

VOICEVOX_ENGINEがそうなってるからとか？

そうですね。挙動の現状維持をしたい感じです。このPRがリファクタPRだというのもある。

Hiroshiba · 2025-11-18T00:58:01Z

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

+
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, derive_more::Display)]
+pub(super) struct Sil(
+    String, // invariant: must contain "sil"


これ何のことかコメントからわかりませんでした。
このStringはsilを含まないといけないってことなんですかね。

主語がわかりづらい気がするけど、僕がRustの書き方に慣れてないだけな気もしました。
だったら別に問題じゃなさそう！

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs

qryxip · 2025-11-18T12:40:43Z

25015b5 (#1190): ちょっとしたコード移動

qryxip force-pushed the pr/refactor-make-phoneme-enum branch 2 times, most recently from 07a8526 to 8b34d2a Compare November 2, 2025 17:52

[skip ci] refactor: 音素を列挙型で表現する

1677bad

qryxip force-pushed the pr/refactor-make-phoneme-enum branch from 8b34d2a to 1677bad Compare November 2, 2025 18:03

Hiroshiba reviewed Nov 2, 2025

View reviewed changes

qryxip added 2 commits November 11, 2025 23:22

Merge branch 'main' into pr/refactor-make-phoneme-enum

1c9e028

完成

1db7830

qryxip marked this pull request as ready for review November 12, 2025 17:07

qryxip requested a review from Hiroshiba November 12, 2025 17:07

qryxip added 5 commits November 13, 2025 02:34

Flatten matches

c32f00b

Reorder a derive

cf357e4

Update the panic message

9e38bad

TODOを入れる

6d80280

Empty → None

5537cab

qryxip marked this pull request as draft November 15, 2025 16:08

qryxip added 2 commits November 16, 2025 01:08

Merge branch 'main' into pr/refactor-make-phoneme-enum

ba724a2

とりあえず完成

031d984

qryxip mentioned this pull request Nov 16, 2025

feat: AudioQueryの音素が不正であるときのパニックメッセージを改善 #1203

Merged

qryxip commented Nov 16, 2025

View reviewed changes

crates/voicevox_core/src/engine/talk/interpret_query.rs Outdated Show resolved Hide resolved

qryxip added 7 commits November 16, 2025 19:01

bytemuckをworkspace.dependenciesに

050b784

マクロ定義位置を移動

e39204a

phoneme_matches!という形でまとめる

3ce2913

diffを抑える

d769ebc

Silの場所を移動

a21a362

"sil"を含む"sil"ではない文字列を考慮

a94f151

bytemuck::{=> must_}cast_slice

5ebb4b9

Hiroshiba removed their request for review November 17, 2025 02:25

This comment was marked as off-topic.

Sign in to view

qryxip added 3 commits November 17, 2025 13:13

Merge branch 'main' into pr/refactor-make-phoneme-enum

a1fc55d

todo!ではなくpanic!に

0bb1692

テスト側を色々合わせる

5d0c9cc

qryxip marked this pull request as ready for review November 17, 2025 04:37

qryxip requested a review from Hiroshiba November 17, 2025 04:37

qryxip and others added 4 commits November 17, 2025 13:59

thx Claude Code

25ad57f

OptionalConsonant型とMoraTail型を増設し、PhonemeCodeから-1を排除

e4bd7ca

Co-authored-by: tarepan <[email protected]>

thx Claude Code!

30ade1e

テスト追加

3f45074

qryxip commented Nov 17, 2025

View reviewed changes

色々

6b8af86

qryxip commented Nov 17, 2025

View reviewed changes

Hiroshiba requested a review from Copilot November 18, 2025 00:39

Copilot started reviewing on behalf of Hiroshiba November 18, 2025 00:39 View session

Copilot finished reviewing on behalf of Hiroshiba November 18, 2025 00:41

Copilot AI reviewed Nov 18, 2025

View reviewed changes

crates/voicevox_core/src/engine/acoustic_feature_extractor.rs Show resolved Hide resolved

Hiroshiba approved these changes Nov 18, 2025

View reviewed changes

Silの実装をモジュールに隔離

25015b5

qryxip merged commit d37430b into VOICEVOX:main Nov 19, 2025
57 of 66 checks passed

qryxip deleted the pr/refactor-make-phoneme-enum branch November 19, 2025 03:13

refactor: 音素を列挙型で表現する #1190

refactor: 音素を列挙型で表現する #1190

Uh oh!

Conversation

qryxip commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

内容

関連 Issue

その他

Uh oh!

Hiroshiba left a comment

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #1190 will improve performances by 25.91%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

qryxip commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qryxip commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qryxip commented Nov 15, 2025

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

qryxip commented Nov 17, 2025

Uh oh!

qryxip commented Nov 17, 2025

Uh oh!

qryxip Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qryxip Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

qryxip Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qryxip Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Hiroshiba left a comment

Choose a reason for hiding this comment

Uh oh!

Hiroshiba Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

qryxip Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Hiroshiba Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qryxip commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

qryxip commented Nov 2, 2025 •

edited

Loading

codspeed-hq bot commented Nov 12, 2025 •

edited

Loading

qryxip commented Nov 12, 2025 •

edited

Loading

qryxip commented Nov 12, 2025 •

edited

Loading

qryxip Nov 17, 2025 •

edited

Loading

qryxip Nov 17, 2025 •

edited

Loading