For example, if I (on kbin.run - which is Mbin, but for the purposes of this let’s just assume it’s Kbin) go to a random magazine on kbin.social, I will often see a prompt that the magazine may be incomplete and that I should visit the original instance for all the content.

Why doesn’t the request to that magazine automatically trigger a “pull” from that instance for that magazine, or at least cause it to check if the number of threads is the same (and conditionally pull on that)? I would think by pulling the changes then, magazines would never be out-of-date.

I get that it would be a lot heavier of a load on the servers, but in combination with good caching techniques (maybe setting a time of 1 day or something until the next pull occurs, idk) I feel like that could be mitigated.

Is this maybe an implementation detail of ActivityPub?

Thank you!

  • Skull giver@popplesburger.hilciferous.nl
    link
    fedilink
    arrow-up
    7
    ·
    6 months ago

    ActivityPub is almost exclusively push-based. There are APIs for retrieving content, of course, but those aren’t meant to be the primary method of federation. Kbin can expose a pull API of its own, of course, but other servers that host objects that may be represented as magazines won’t expose that API.

    Fediverse servers sometimes lose connectivity as well, for example when another server is under DDoS attack and the ActivityPub endpoints get shut down. That means the code still needs to be designed to deal with the occasional out-of-dateness.

    With the size of some magazines, the sync process can involve hundreds or thousands of objects every hour. After all, every vote is a federated ActivityPub object. To prevent abuse, any receiving server would also need to verify all of those objects’ signatures (so a server cannot pretend to be super popular as easily). With a couple hundred Kbin servers, that can be quite a big load compared to the push based system ActivityPub is built around.

    This problem is particularly annoying on Mastodon, where the sync process is almost entirely broken. Very few servers see all reactions under a post, and tools like Fedifetcher do exactly what you propose Kbin should do. In my experience this adds quite significant load to Mastodon, because every incoming message needs a bunch of fetches, and the more incoming messages you get, the worse the problem becomes.

    You can probably write a tool to make Kbin sync in the same fashion, but I’m not sure if it’ll be taken up.

    • deafboy@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      6 months ago

      The more I learn about the ActivityPub, the more I’m convinced it is not ready for production…

      • Skull giver@popplesburger.hilciferous.nl
        link
        fedilink
        arrow-up
        1
        ·
        6 months ago

        It’s in production right now and it’s working just fine. Push based systems are efficient and effective as long as retransmission on failure is taken into account. Some services don’t do that, or at least not enough, or suffer from thundering horde issues when they recover from downtime, but that’s all because of the individual implementations.

        This problem barely exists in centralised systems such as Twitter or Reddit (it does crop up sometimes), but they don’t tell you that your local cluster may be out of sync. They just pretend you’re up to date and that the comments you didn’t see five minutes ago have always been there.

        Of course, you’re free to use a different protocol for your own fediverse server if you feel ActivityPub isn’t up to the task. There are various federating protocols available, and services sometimes speak multiple protocols. Matrix and OStatus also federate, for instance.

        OStatus has been mostly abandoned, but it worked similarly to ActivityPub, with subscriptions and publications using standard protocols. GNU Social and, I believe, Friendica still use it.

        If you value consistency above all else, you could use Matrix, for example, which does active syncing to ensure data is up to date. However, you’ll need a beefy server if you’re going to keep magazine equivalents up to date for thousands of people. My Synapse server (8GB of RAM, 4 Epyc cores) takes a couple of minutes to join rooms like #matrix on the main server with a few hundred active participants. Granted, Synapse is written in Python, but I doubt the alternatives will load as quickly as reading a magazine will ever be.

        Even protocols like IRC suffer from “net splits” that break up federation between them. NNTP is partially pull based, so perhaps that’ll suit your needs, but interactions aren’t nearly as fast as ActivityPub in my experience. SMTP is also a federated protocol but that doesn’t provide pull mechanisms either. You could use a DHT and use BitTorrent-like unified data storage, but that’s slow as hell in compassion. Perhaps a system of linked IPFS addresses could be used as a fast method of pulling in distributed posts, but that’s not exactly a fast protocol either, and it’s not really meant for this stuff.

        There are fast, distributed systems, like databases, but they only work because all servers in a cluster can be trusted. They’re also severely restrained in what directions data can flow and how applications react to data insertions. A database model would be quite trivial to disrupt or DDoS if you use it for the “anyone can federate with anyone else” style federation that ActivityPub is designed for. Blockchains may work for that, but they’re slow, inefficient, and, as a consequence, usually expensive.

        I’m afraid you’ll have to build a protocol from the ground up if you want to enable pulling in magazines, kbin style. You could accept ActivityPub submissions and serve them on your own server using your own protocol, of course, but I’m not sure if you’ll find anyone who will implement your protocol if you don’t find a solution to the performance problems that distributed systems need to cope with.

  • Rimu@piefed.social
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    6 months ago

    There is no good reason for it. Just a choice by the coder of kbin.

    PieFed retrieves the last 50 posts when a Lemmy community is added for the first time. It only takes a few seconds because all 50 posts can be retrieved at once by making a GET request to the community outbox. It doesn’t do this for Kbin magazines because the Kbin developer chose not to make an outbox.

  • NaN@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    6 months ago

    I don’t think it’s usually out of date that’s the issue, but rather history. Generally activitypub instances don’t get historical data, they get data from the point someone subscribes, so you may be missing old threads and comments on your local instance that exist on the hosting instance. If nobody is subscribed your instance may not be getting new content though.

    That’s one reason there are tools available like bots that “mass subscribe” to content on various instances when you set up a new one, otherwise it will stay pretty empty.