is it in reality not “all” but only “all posts that at least one user of this instance is subscribed to”?
Exactly this, yes. Not literally ‘all’ (a brand new instance would have nothing in its All feed). This is what was meant by ‘partial data set’ - everything for a subscribed community (from the moment it was subscribed to), but nothing for a community that no-one’s subscribed to.
Some instances run bots to populated their All feed more than what would happen naturally (with the idea being that the bot unsubscribes when a human does)
Yeah. There’s no wildcard call. One thing you could do to script it would be pull JSONs from https://data.lemmyverse.net - use one for the initial effort, then subsequent ones to track new communities. You’d definitely want to filter it - as you’ve noticed the vast majority of that 30k are dead or spam or something you wouldn’t want for one reason or another (e.g. communities from instances you’ve defederated from).
As for what bots do, it depends on how they were programmed I suppose. There’s a bonkers one on https://leaf.dance that just seems to crawl comments and subscribe to any ! links it finds, but there are others (I can’t remember their names) where it’s more of a manual job (the mods of a community submit the details to it).