In this subnet, we aim to gather the wisdom of the whole open-source community to create the best open-source TTS models.
(What does that mean?)
From my research, I am only able to find this product, https://app.myshell.ai/robot-workshop/widget/1788537048029802496. This requires a login, and I'm not able to find any information on how to use the model outside of this - nor see any usage statistics.
This seems like a big missed opportunity. TTS models are very easy to present the usage of. I would advise the team to make more easily accessible ways to interact with the outputs.
Of course, the subnet itself just produces model weights, and not inference, so they would need to be hosted by the team or validators to be used.
On that subject, I don't see how validators themselves can get value from this directly. Of course, it could provide a massive positive influence on bittensor, but I think this is missing right now.
2/10
The repository readme features nice diagrams explaining the mechanism.
I would love to see more detail on the 'Speaker Alignment Rater' and the 'Pronunciation Rater', to see how they actually work in practice, and whether they are robust.
But on the face of it, it seems well thought out and designed.
There is also this file which gives more info.
All subnets should do that. Unfortunately, this file is an old code copy and pasted from old Nous subnet 6, and is very out of date. So subnet 3 should do it too!
Initial dataset size Very limited to just a fixed list of samples
All miners need to do is heavily overfit these samples, and fine-tune the model to get these samples right. I am surprised how this can still exist in the subnet some 6 months after launch.
Judgement So, let's look at how scores are actually judged:
myshell-test/judge Myshell uses their own custom model for something; I should probably go into more detail here.
Consensus alpha
The subnet uses a 'Consensus alpha', to make sure validators stay close on vtrust to each other. I'm not sure if this is a good thing or not.
While validators stay high on vtrust, it goes against the philosophy of Bittensor. Validators should form independent opinions, and the network should aggregate them. This is one of the key principles of Bittensor. By blending your weights to match everyone else, we are diluting strong opinions.
5/10 from me. Worries about certain aspects, but there are a lot of measures that could make it robust.
It would be more obvious the benefit if we could see the output.
Not amazing.
These are some examples of very 'researcher'-esque code, which is quite hard to read 1. 2. 3.
There are a lot more. It's quite difficult for non-TTS experts to understand the validation mechanism, which is a bad quality for a subnet in my opinion.
I understand the measuring of miners must be specific to TTS, but the code should be clear for validators and miners to understand.
5/10
There's pretty much 1 miner dominating this subnet, likely uploading lots of duplicate models to claim all the rewards.
This makes sense in a way, as Bittensor is permissionless, so one participant will usually win.
Missing is any historical indication of what is happening, who the winners have been, and how it has changed hands. I can't see any indication from their Hugging Face of the improvement over time, or any metrics to how good the competition has been.
4/10
From the github:
Roadmap As building a TTS model is a complex task, we will divide the development into several phases. Phase 1: Initial release of the subnet, including miner and validator functionality. This phase aims to build a comprehensive pipeline for TTS model training and evaluation. We will begin with a fixed speaker. Phase 2: Increase the coverage of the speaker and conversation pool. We will recurrently update the speaker and language to cover more scenarios. Phase 3: More generally, we can have fast-clone models that can be adapted to new speakers with a small amount of data, e.g., OpenVoice. We will move to fast-clone models in this phase.
It seems MyShell has progressed to phase 2 after around 6 months of development.
Having said that, the updating of speakers has slowed down massively, and there is less and less active development work.
I think the space this subnet is in has immense potential, but we need some renewed enthusiasm to get there.
4/10
Good contributions to decentralized AI. However, we are only optimizing MeloTTS models at the current time, and only for a select number of speakers.
7.5/10
https://huggingface.co/spaces/myshell-test/tts-subnet-leaderboard
This is great, as it shows the leaderboard and allows people to see how they are doing, and what they need to do to compete.
I would like to see head-to-head comparisons or more information on how the scores are actually calculated, but it's a good start.
4/10
Quite a few worries here, with this subnet biased towards MyShell.
This puts MyShell at a distinct advantage. Coupled with the winner-takes-most mechanism, I would be worried about the decentralization of this subnet.
1/10
Most of the models are based on models developed by MyShell, with just a few added tweaks.
Given that the output is not used anywhere that I can see, I think there is little innovation happening here.
2/10
Very little engagement in the channel. I had nearly all my questions ignored by the subnet 3 team, which would have helped clarify the above points.
Responses are few and far between, with the subnet developer being quite inactive.
Speakers are changed less frequently than promised.
2/10