-
Notifications
You must be signed in to change notification settings - Fork 132
refactor: 音素を列挙型で表現する #1190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: 音素を列挙型で表現する #1190
Conversation
07a8526 to
8b34d2a
Compare
8b34d2a to
1677bad
Compare
Hiroshiba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUEO → UnvoicedVowelHoge
aiueo → VoicedVowelHoge
pau cl N → MorableHoge(単体でモーラになる母音以外の音素、くらいの意図)
それ以外 → ConsonantHoge(Consonantは子音)
とかで良いかな~と思いました!
たぶん現状答えがないので、一旦これでくらいの気持ち。。
1つ1つに名前もたせるのではなく、音素ごとに属性を持たせる設計のが合うかもです。どっちが良いか現状不明。。
Phoneme.NはMorable属性持ってる、Phoneme.pauはMorableとPause属性を持ってる、みたいな・・・。
(Pythonだとこういう型実装苦手そうだけど、Rustならできるのかもと思って提案してみました。あまり深く考えて作らなくても良さそう感。)
CodSpeed Performance ReportMerging #1190 will improve performances by 25.91%Comparing
|
| Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|
| ⚡ | test_asyncio_tts |
2.3 s | 1.9 s | +25.91% |
| ⚡ | test_blocking_tts |
2.3 s | 1.9 s | +24.2% |
Footnotes
-
22 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
AudioQueryでは |
done. |
これを、 ``` called `Option::unwrap()` on a `None` value ``` こうする。 ``` invalid phoneme: "不正なラベル" ``` #1190 を完全に"refactor"とするため、音素が不正なときのパニックメッセージ を先に定めるのが目的。 VOICEVOX/voicevox_engine#1004 については別途、 #1190の後に行う。 Fixes: #1202
|
このPRのdescriptionを更新し、リファクタとしての概要を書きました。 |
Co-authored-by: tarepan <[email protected]>
そういえばこれについては、Rustであっても列挙型の方がしっくり来ました。もし属性のようなものを考えるにしても、コンストラクトタとかPythonの |
| //None = -1, | ||
| MorablePau = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 この = 数値というのは実は最初以降は書かなくても自動で連番を振ってくれるし、0始まりであれば最初も省略できる。ただし今回の場合、OptionalConsonantとMoraTailにおける「疎」の部分のことがあるのであえて全部に明示的な番号を割り振った。
ちなみに0から44までの間に欠番があるとContiguousをderiveするところで「contiguousじゃない」と怒られるので、その点で安心できたというのもある (例えば10と書くところを間違えて100にしても怒ってくれる)。
| pub(crate) enum MoraTail { | ||
| //None = -1, | ||
| MorablePau = 0, | ||
| UnvoicedVowelA = 1, | ||
| UnvoicedVowelE = 2, | ||
| UnvoicedVowelI = 3, | ||
| MorableN = 4, | ||
| UnvoicedVowelO = 5, | ||
| UnvoicedVowelU = 6, | ||
| VoicedVowelA = 7, | ||
| //ConsonantB = 8, | ||
| //ConsonantBy = 9, | ||
| //ConsonantCh = 10, | ||
| MorableCl = 11, | ||
| //ConsonantD = 12, | ||
| //ConsonantDy = 13, | ||
| VoicedVowelE = 14, | ||
| //ConsonantF = 15, | ||
| //ConsonantG = 16, | ||
| //ConsonantGw = 17, | ||
| //ConsonantGy = 18, | ||
| //ConsonantH = 19, | ||
| //ConsonantHy = 20, | ||
| VoicedVowelI = 21, | ||
| //ConsonantJ = 22, | ||
| //ConsonantK = 23, | ||
| //ConsonantKw = 24, | ||
| //ConsonantKy = 25, | ||
| //ConsonantM = 26, | ||
| //ConsonantMy = 27, | ||
| //ConsonantN = 28, | ||
| //ConsonantNy = 29, | ||
| VoicedVowelO = 30, | ||
| //ConsonantP = 31, | ||
| //ConsonantPy = 32, | ||
| //ConsonantR = 33, | ||
| //ConsonantRy = 34, | ||
| //ConsonantS = 35, | ||
| //ConsonantSh = 36, | ||
| //ConsonantT = 37, | ||
| //ConsonantTs = 38, | ||
| //ConsonantTy = 39, | ||
| VoicedVowelU = 40, | ||
| //ConsonantV = 41, | ||
| //ConsonantW = 42, | ||
| //ConsonantY = 43, | ||
| //ConsonantZ = 44, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 元の表現:
"a", "i", "u", "e", "o", "N", "A", "I", "U", "E", "O", "cl", "pau"| Self::UnvoicedVowelA | ||
| | Self::UnvoicedVowelI | ||
| | Self::UnvoicedVowelU | ||
| | Self::UnvoicedVowelE | ||
| | Self::UnvoicedVowelO | ||
| | Self::MorablePau | ||
| | Self::MorableCl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 元の表現:
"A", "I", "U", "E", "O", "cl", "pau"(今気付いたけどClとPauの順番が逆だった)
[追記] Cl → Pauの順に直した: 6b8af86 (#1190)
| phoneme | ||
| .into_integer() | ||
| .try_into() | ||
| .expect("should be ensured by the above assertion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 ここのexpectのパニックメッセージとか、ここの下の方にあるassert_eq!についてはリリースビルドの最適化で分岐ごとちゃんと消えてくれるらしい。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the phoneme representation in the codebase by decomposing OjtPhoneme into a more type-safe enum-based system. The refactoring introduces enum Phoneme for AudioQuery-level representation, enum PhonemeCode for phoneme ID-level representation, and two specialized types OptionalConsonant and MoraTail to categorize phoneme codes based on their linguistic properties.
- Replaces string-based
OjtPhonemewith strongly-typed enum representations - Introduces
Phoneme,PhonemeCode,OptionalConsonant, andMoraTailtypes - Uses
bytemuckfor zero-cost casting between enum and numeric representations
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/voicevox_core/src/engine/acoustic_feature_extractor.rs | Complete rewrite replacing OjtPhoneme with new enum-based phoneme system including Phoneme, PhonemeCode, OptionalConsonant, and MoraTail types |
| crates/voicevox_core/src/engine/talk/interpret_query.rs | Updates function signatures and implementations to use new PhonemeCode, OptionalConsonant, and MoraTail types, replacing previous OjtPhoneme usage |
| crates/voicevox_core/src/synthesizer.rs | Updates to use PhonemeCode instead of OjtPhoneme, utilizes bytemuck::must_cast_slice for zero-cost conversions, and simplifies phoneme ID extraction logic |
| crates/voicevox_core/src/engine.rs | Updates public exports to expose PhonemeCode instead of OjtPhoneme |
| crates/voicevox_core/Cargo.toml | Adds bytemuck dependency with derive and must_cast features |
| Cargo.toml | Adds bytemuck = "1.24.0" to workspace dependencies |
| Cargo.lock | Records bytemuck and bytemuck_derive dependency additions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hiroshiba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
おつでした!!
今の形が最適なのか正直わからないのですが、もし課題点とか見つかったらそのときに随時変えていけば良さそう感。
| /// `sil`。 | ||
| #[display("{_0}")] | ||
| Sil(Sil), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(別のとこのコメントですが議論をわけたいのでここでコメントします)
AudioQueryではsilはsilとして保存しないといけないことに気がついた。
ちょっと聞いておきたいだけなのですが、AudioQueryも別にsil保存じゃなくてpau変換後でも別に問題なさそうですが、silじゃないといけない理由ってなんででしたっけ 👀
VOICEVOX_ENGINEがそうなってるからとか?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VOICEVOX_ENGINEがそうなってるからとか?
そうですね。挙動の現状維持をしたい感じです。このPRがリファクタPRだというのもある。
|
|
||
| #[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, derive_more::Display)] | ||
| pub(super) struct Sil( | ||
| String, // invariant: must contain "sil" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
これ何のことかコメントからわかりませんでした。
このStringはsilを含まないといけないってことなんですかね。
主語がわかりづらい気がするけど、僕がRustの書き方に慣れてないだけな気もしました。
だったら別に問題じゃなさそう!
|
|
内容
OjtPhonemeを解体し、AudioQueryレベルの表現のenum Phonemeと音素IDレベルのenum PhonemeCodeに置き換える。[追記] さらに
OptionalConsonant型とMoraTail型を誕生させました。[追記] Claude君にレビューさせたらこんな図解を出してくれた。
#1157 の前準備。
ゆくゆくはこの
enum Phonemeを、(Frame)AudioQuery自体の表現に使うようにする。あと VOICEVOX/voicevox_engine#1004 の取り込みも予定。VOICEVOX/voicevox_engine#993 の考え方を一部参考にした。以下1名の許諾のもと、 #874 にのっとりMITライセンスとしてライセンスする。
is_unvoiced_mora_tailというメソッドを生やす考え方)関連 Issue
その他