A Redundant Deterministic Verification of Large Language Models
The main product that uses subnet 4 is Sybil - a search engine. Despite popular thought and historic claims, Sybil's search functionality is not powered by subnet 4. The miners from subnet 4 potentially power the LLM inference on Sybil.
The usecases are very obvious for an LLM provider, and I'm glad to see Subnet 4 recognizing this and building product. They also have built an organic inference server with Golang for other validators to use, which is awesome.
6/10
With Jaro scoring
The incentive mechanism is very simple - miners get rewarded for their tokens / second for a small LLM model, given they have passed verification
The last 15 WPS from each miner are used to calculate the scores. The average of these values is then plotted on a sigmoid curve, to calculate the final score for a miner.
Synthetic queries are generated using a fixed dataset of questions and then running that through an LLM to create a search style question with a semi-fixed system prompt. The system prompt has two parts which can be varied. This will help to produce varied synthetic queries and is good.
'Organic queries' (so-called queries that are generated from real usage) are scored on this subnet, but not very much. 5 organic queries are scored 1/25th of 'steps', with the other 24 steps scoring a bunch of organic queries. I would recommend increasing this to be more proportional - if a validator uses a lot of organic bandwidth, that should be scored first.
The 'quality' of a response is calculated using 'Jaro' - which basically just compares two responses to see if they are similar. So, the validator runs the LLM for a query and compares it to the response from the miner.
This is a bad way to verify an LLM for a few reasons:
We have historically seen lots of the described issues because of this mechanism.
Another big factor missing from their scoring mechanism is some sort of measurement of how much a miner can do.
Miners do not receive incentives for doing 'more' work, and instead we rely solely on miners answering 100% of queries. This will also inevitably lead to vtrust issues as some validators will use organics more.
1/10
** Without Jaro scoring** After a quick look, subnet 4 updated their verification to some sort of 'proof of work' using the weight values. Unfortunately, this is very abusable and a very bad verification method. Also, the code is full of bugs and validators are not to verify any requests, so old miners get all the incentives and new miners can't register again.
0/10
Subnet 4 has slightly altered the usual Synapse-Axon protocol to make it more robust to other programming languages and work with their organic server, which is great. It's also implemented in Golang already, which is good too!
Generally the code is good quality, and I do like what they're doing with their Golang server.
8.5/10
Miner competition here is fierce, which is great.
There are some concerns about miners needing lots of UIDs, but there is certainly not stagnant competition here. I think the game should be made fairer and more robust, but it's a good start.
6.5/10.
Subnet 4 has shown lots of potential and is actively moving in the right direction. They also claim to be bringing in a new verification mechanism soon, which is sorely needed. The 'Hub' is in its infancy, but again, development is active. I think if the team continues, this subnet could do well.
The current design does not scale very well beyond one LLM, which limits real-world usage. They need to think about a better design for more models and more requests before the subnet has bigger potential.
5/10
Absolutely, no doubt about it.
10/10
Dashboards have been developed and cropped up over the last month or so, which is great. Miners are able to get good insight into how they're performing, and the last 2 hours of data for the subnet can be seen on targon stats. Data from a small number of validators is available.
There is the organic server being developed for validators, which is also great. I would love to see this be easier to use or adopted by more validators. The Targon website could also offer deeper insights into individual miner comparisons, but it's good.
8/10
In the past, there have been very heavy concerns about Subnet 4's decentralization. The validation mechanism relied on a closed-source API, which the subnet owners maintained. They would then give API access to validators. This was very much not decentralized and had a lot of attack vectors.
I'm glad to see Subnet 4 has moved past this and has fully decentralized their subnet. There aren't any centralized components that I can see, which is great.
9/10
The main innovation from Subnet 4 has come from their Golang server and Epistula package. This is great, but there isn't much innovation on the subnet side, as working inference has been in place on multiple other subnets for quite some time earlier. It would be nice to see subnet 4 do something other subnets are not (such as the Golang server or using other languages).
5/10
Sadly, this is the section where there is massive concern for me on subnet 4. Specifically, regarding the team.
The Manifold team on subnet 4 has had lots of controversy in the past and seems to have not moved on from this.
In the past, subnet 4 implemented the closed-source API, which the subnet owners maintained. The logic behind this was that miners could not predict synthetic queries, as only the subnet owner had access. Manifold then proceeded to register a massive amount of miners (at least 60) to mine on their own subnet. Since they have access to this closed-source API, they could have easily been cheating. This was pointed out many times to the team, but they did nothing to address this until the heat got too much.
Furthermore, there were constant issues with new miners not being able to register and pass immunity. This was at the same time Manifold had a significant number of miners already on the subnet, which is a huge conflict of interest. This issue was also not addressed for quite some time.
False claims
Manifold has also been called out for making false claims repeatedly. In the past, Sybil was the name for subnet 4, and subnet 4 was claimed to be the 'multimodal' subnet, powering Sybil's search. This was proven not to be the case multiple times from Floppy Fish, but Manifold did not explain or make this clear many times.
Recently, a claim was made that subnet 4 could provide over 2000 tokens per second after Manifold demonstrated the subnet to Const. Const made the claim, and this apparent fact was shared widely. The manifold team themselves even claimed the top miners were indeed providing inference at these speeds and even shared posts making the claim of 2000+ tokens/s. Const even explicitly stated he made a miscalculation, but manifold continue to tout this as a fact for quite some time after. Please NOTE: I am not criticizing the miscalculation from Const, which I believe was an honest mistake.
I find this incredibly distressing as we move to a DAO world. Token holders will have to make decisions about which dynamic tokens to hold. Misinformation and abuse in this world is a very big threat to Bittensor and DAO and could cause a lot of issues, which is a primary concern of mine.
I think the subnet 4 team has a significant amount of progress to make in this department after being regularly called out by other leading validators such as Taostats (Mog Machine) and Datura (Fish).
The engagement with the community is also very poor from the Manifold team—even refusing to verify these claims unless they are being paid to do so—against the philosophy of community-led ownership of subnets in a DAO world.
-10/10
The team usually ships updates with little notice. They recently started to announce updates a few days prior before shipping, which is a much better approach. They should follow this for all future releases, giving miners and validators time to update and digest updates.
If this is not given, subnet owners themselves again get an advantage.
5/10
Massively improve the transparency, communication, and community efforts.