Skip to content

Conversation

@qryxip
Copy link
Member

@qryxip qryxip commented Nov 2, 2025

内容

OjtPhonemeを解体し、AudioQueryレベルの表現のenum Phonemeと音素IDレベルのenum PhonemeCodeに置き換える。
[追記] さらにOptionalConsonant型とMoraTail型を誕生させました。
[追記] Claude君にレビューさせたらこんな図解を出してくれた。
image

#1157 の前準備。

ゆくゆくはこのenum Phonemeを、(Frame)AudioQuery自体の表現に使うようにする。あと VOICEVOX/voicevox_engine#1004 の取り込みも予定。

VOICEVOX/voicevox_engine#993 の考え方を一部参考にした。以下1名の許諾のもと、 #874 にのっとりMITライセンスとしてライセンスする。

  • @tarepan (is_unvoiced_mora_tailというメソッドを生やす考え方)

関連 Issue

その他

@qryxip qryxip force-pushed the pr/refactor-make-phoneme-enum branch 2 times, most recently from 07a8526 to 8b34d2a Compare November 2, 2025 17:52
@qryxip qryxip force-pushed the pr/refactor-make-phoneme-enum branch from 8b34d2a to 1677bad Compare November 2, 2025 18:03
Copy link
Member

@Hiroshiba Hiroshiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUEO → UnvoicedVowelHoge
aiueo → VoicedVowelHoge
pau cl N → MorableHoge(単体でモーラになる母音以外の音素、くらいの意図)
それ以外 → ConsonantHoge(Consonantは子音)

とかで良いかな~と思いました!
たぶん現状答えがないので、一旦これでくらいの気持ち。。

1つ1つに名前もたせるのではなく、音素ごとに属性を持たせる設計のが合うかもです。どっちが良いか現状不明。。
Phoneme.NMorable属性持ってる、Phoneme.pauMorablePause属性を持ってる、みたいな・・・。
(Pythonだとこういう型実装苦手そうだけど、Rustならできるのかもと思って提案してみました。あまり深く考えて作らなくても良さそう感。)

@qryxip qryxip marked this pull request as ready for review November 12, 2025 17:07
@qryxip qryxip requested a review from Hiroshiba November 12, 2025 17:07
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 12, 2025

CodSpeed Performance Report

Merging #1190 will improve performances by 25.91%

Comparing qryxip:pr/refactor-make-phoneme-enum (25015b5) with main (ee2ca83)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 2 improvements
⏩ 22 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
test_asyncio_tts 2.3 s 1.9 s +25.91%
test_blocking_tts 2.3 s 1.9 s +24.2%

Footnotes

  1. 22 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@qryxip
Copy link
Member Author

qryxip commented Nov 12, 2025

一部の操作に対するパニックメッセージが変わってしまうので、このPRを"refactor"とするために別PRで変更を入れたい。

[追記] #1197, #1203

@qryxip
Copy link
Member Author

qryxip commented Nov 12, 2025

ゆくゆくはこのenum Phonemeを、(Frame)AudioQuery自体の表現に使うようにする。

AudioQueryではsilsilとして保存しないといけないことに気がついた。となるとsilを含むpub enum Phonemeと、音素IDが割り振られているpub(crate) enum InferablePhonemeみたいに分けた方がよさそう。

@qryxip qryxip marked this pull request as draft November 15, 2025 16:08
@qryxip
Copy link
Member Author

qryxip commented Nov 15, 2025

となるとsilを含むpub enum Phonemeと、音素IDが割り振られているpub(crate) enum InferablePhonemeみたいに分けた方がよさそう。

done. InferablePhonemePhonemeCodeと命名しました。PhonemeIdとかと迷いましたが。

@Hiroshiba Hiroshiba removed their request for review November 17, 2025 02:25
Hiroshiba

This comment was marked as off-topic.

qryxip added a commit that referenced this pull request Nov 17, 2025
これを、

```
called `Option::unwrap()` on a `None` value
```

こうする。

```
invalid phoneme: "不正なラベル"
```

#1190 を完全に"refactor"とするため、音素が不正なときのパニックメッセージ
を先に定めるのが目的。 VOICEVOX/voicevox_engine#1004 については別途、
#1190の後に行う。

Fixes: #1202
@qryxip qryxip marked this pull request as ready for review November 17, 2025 04:37
@qryxip qryxip requested a review from Hiroshiba November 17, 2025 04:37
@qryxip
Copy link
Member Author

qryxip commented Nov 17, 2025

このPRのdescriptionを更新し、リファクタとしての概要を書きました。

@qryxip
Copy link
Member Author

qryxip commented Nov 17, 2025

1つ1つに名前もたせるのではなく、音素ごとに属性を持たせる設計のが合うかもです。どっちが良いか現状不明。。
Phoneme.NMorable属性持ってる、Phoneme.pauMorablePause属性を持ってる、みたいな・・・。
(Pythonだとこういう型実装苦手そうだけど、Rustならできるのかもと思って提案してみました。あまり深く考えて作らなくても良さそう感。)

そういえばこれについては、Rustであっても列挙型の方がしっくり来ました。もし属性のようなものを考えるにしても、コンストラクトタとかPythonの@propertyとかで現れるような感じにしたいです。

Comment on lines +258 to +259
//None = -1,
MorablePau = 0,
Copy link
Member Author

@qryxip qryxip Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 この = 数値というのは実は最初以降は書かなくても自動で連番を振ってくれるし、0始まりであれば最初も省略できる。ただし今回の場合、OptionalConsonantMoraTailにおける「疎」の部分のことがあるのであえて全部に明示的な番号を割り振った。

ちなみに0から44までの間に欠番があるとContiguousderiveするところで「contiguousじゃない」と怒られるので、その点で安心できたというのもある (例えば10と書くところを間違えて100にしても怒ってくれる)。

Comment on lines +434 to +481
pub(crate) enum MoraTail {
//None = -1,
MorablePau = 0,
UnvoicedVowelA = 1,
UnvoicedVowelE = 2,
UnvoicedVowelI = 3,
MorableN = 4,
UnvoicedVowelO = 5,
UnvoicedVowelU = 6,
VoicedVowelA = 7,
//ConsonantB = 8,
//ConsonantBy = 9,
//ConsonantCh = 10,
MorableCl = 11,
//ConsonantD = 12,
//ConsonantDy = 13,
VoicedVowelE = 14,
//ConsonantF = 15,
//ConsonantG = 16,
//ConsonantGw = 17,
//ConsonantGy = 18,
//ConsonantH = 19,
//ConsonantHy = 20,
VoicedVowelI = 21,
//ConsonantJ = 22,
//ConsonantK = 23,
//ConsonantKw = 24,
//ConsonantKy = 25,
//ConsonantM = 26,
//ConsonantMy = 27,
//ConsonantN = 28,
//ConsonantNy = 29,
VoicedVowelO = 30,
//ConsonantP = 31,
//ConsonantPy = 32,
//ConsonantR = 33,
//ConsonantRy = 34,
//ConsonantS = 35,
//ConsonantSh = 36,
//ConsonantT = 37,
//ConsonantTs = 38,
//ConsonantTy = 39,
VoicedVowelU = 40,
//ConsonantV = 41,
//ConsonantW = 42,
//ConsonantY = 43,
//ConsonantZ = 44,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 元の表現:

"a", "i", "u", "e", "o", "N", "A", "I", "U", "E", "O", "cl", "pau"

Comment on lines 487 to 493
Self::UnvoicedVowelA
| Self::UnvoicedVowelI
| Self::UnvoicedVowelU
| Self::UnvoicedVowelE
| Self::UnvoicedVowelO
| Self::MorablePau
| Self::MorableCl
Copy link
Member Author

@qryxip qryxip Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 元の表現:

"A", "I", "U", "E", "O", "cl", "pau"

(今気付いたけどClPauの順番が逆だった)
[追記] ClPauの順に直した: 6b8af86 (#1190)

phoneme
.into_integer()
.try_into()
.expect("should be ensured by the above assertion")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 ここのexpectのパニックメッセージとか、ここの下の方にあるassert_eq!についてはリリースビルドの最適化で分岐ごとちゃんと消えてくれるらしい。

@Hiroshiba Hiroshiba requested a review from Copilot November 18, 2025 00:39
Copilot finished reviewing on behalf of Hiroshiba November 18, 2025 00:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the phoneme representation in the codebase by decomposing OjtPhoneme into a more type-safe enum-based system. The refactoring introduces enum Phoneme for AudioQuery-level representation, enum PhonemeCode for phoneme ID-level representation, and two specialized types OptionalConsonant and MoraTail to categorize phoneme codes based on their linguistic properties.

  • Replaces string-based OjtPhoneme with strongly-typed enum representations
  • Introduces Phoneme, PhonemeCode, OptionalConsonant, and MoraTail types
  • Uses bytemuck for zero-cost casting between enum and numeric representations

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/voicevox_core/src/engine/acoustic_feature_extractor.rs Complete rewrite replacing OjtPhoneme with new enum-based phoneme system including Phoneme, PhonemeCode, OptionalConsonant, and MoraTail types
crates/voicevox_core/src/engine/talk/interpret_query.rs Updates function signatures and implementations to use new PhonemeCode, OptionalConsonant, and MoraTail types, replacing previous OjtPhoneme usage
crates/voicevox_core/src/synthesizer.rs Updates to use PhonemeCode instead of OjtPhoneme, utilizes bytemuck::must_cast_slice for zero-cost conversions, and simplifies phoneme ID extraction logic
crates/voicevox_core/src/engine.rs Updates public exports to expose PhonemeCode instead of OjtPhoneme
crates/voicevox_core/Cargo.toml Adds bytemuck dependency with derive and must_cast features
Cargo.toml Adds bytemuck = "1.24.0" to workspace dependencies
Cargo.lock Records bytemuck and bytemuck_derive dependency additions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@Hiroshiba Hiroshiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!

おつでした!!
今の形が最適なのか正直わからないのですが、もし課題点とか見つかったらそのときに随時変えていけば良さそう感。

Comment on lines +13 to +15
/// `sil`。
#[display("{_0}")]
Sil(Sil),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(別のとこのコメントですが議論をわけたいのでここでコメントします)

AudioQueryではsilはsilとして保存しないといけないことに気がついた。

ちょっと聞いておきたいだけなのですが、AudioQueryも別にsil保存じゃなくてpau変換後でも別に問題なさそうですが、silじゃないといけない理由ってなんででしたっけ 👀
VOICEVOX_ENGINEがそうなってるからとか?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VOICEVOX_ENGINEがそうなってるからとか?

そうですね。挙動の現状維持をしたい感じです。このPRがリファクタPRだというのもある。


#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, derive_more::Display)]
pub(super) struct Sil(
String, // invariant: must contain "sil"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

これ何のことかコメントからわかりませんでした。
このStringはsilを含まないといけないってことなんですかね。

主語がわかりづらい気がするけど、僕がRustの書き方に慣れてないだけな気もしました。
だったら別に問題じゃなさそう!

@qryxip
Copy link
Member Author

qryxip commented Nov 18, 2025

25015b5 (#1190): ちょっとしたコード移動

@qryxip qryxip merged commit d37430b into VOICEVOX:main Nov 19, 2025
57 of 66 checks passed
@qryxip qryxip deleted the pr/refactor-make-phoneme-enum branch November 19, 2025 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants