Interested in the intersections between policy, law and technology. Programmer, lawyer, civil servant, orthodox Marxist. Blind.


Interesado en la intersección entre la política, el derecho y la tecnología. Programador, abogado, funcionario, marxista ortodoxo. Ciego.

  • 0 Posts
  • 34 Comments
Joined 1 year ago
cake
Cake day: June 5th, 2023

help-circle




  • For me the weirdest part of the interview is where he says he doesn’t want to follow anyone, that he wants the algorithm to just pick up on his interests. It’s so diametrically opposed to how I want to intentionally use social networks and how the fedi tends to work that it’s sometimes hard to remember there are people who take that view.




  • Very well-reasoned article, though the political constraints might end up making implementing its recommendations impossible. Hard to see how the US and EU could make the rhetorical shifts it would take. If events continue as they are now, the military realities may preclude it. While it seems advantageous to reach a negotiated settlement for all sides at the moment, this will not remain the case forever.


  • I can think of alternatives. For example, the server could keep the user’s private key, encrypted with a passphrase that the user must have. So key loss wouldn’t be an issue. (Yes, passphrase loss might, but there are lots of ways to keep those safely already, compared to key material which is difficult to handle.)








  • The biggest issues for me are:

    1. No centralisation means there’s no canonical single source of truth.
    2. Account migration.
    3. Implementation compatibility.

    No single source of truth leads to the weird effect that if you check a post on your instance, it will have different replies from those on a different instance. Only the original instance where it got posted will have a complete reply set–and only if there are no suspensions involved. Some of this is fixable in principle, but there are technical obstacles.

    Account migration is possible, but migration of posts and follows is non-trivial, Also migration between different implementations is usually not possible. Would be nice if people could keep a distinction between their instance, and their identity, so that the identity could refer to their own domain, for example.

    Last, the issue with implementation compatibility. Ideally it should be possible to use the same account to access different services, and to some extent it works (mastodon can post replies to lemmy or upvote, but not downvote, for example).



  • Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.

    Article 3 has a very broad exception for scientific research: “Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access.” There is no opt-out clause to this.

    Article 4 has a narrower exception for text and data mining in general: “Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.” This one’s narrower because it also provides that, “The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.”

    So, effectively, this means scientific research can data mine freely without rights’ holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.



  • Perhaps the manual reporting tool is enough? Then that content can be forwarded to the central ms service. I wonder if that API can report back to say whether it is positive.

    The problem with a lot of this tooling is you need some sort of accreditation to use it, because it somewhat relies on security through obscurity. As far as I know you can’t just hit MS’s servers and ask “is this CSAM?” If something like that were possible it might work.

    Can you elaborate on the hash problem?

    Sure. When you have an image, you can do lots of things to it that change it in some way: change the compression, the format, crop it, apply a filter… This all changes the file and so it changes the hash. The perceptual hash system works on the basis of some computer vision stuff and the idea is that it will try to generate the same hash for pictures that are substantially the same. But this tech is imperfect and probably will have changes. So if there’s a change in the way the hash gets calculated, it wouldn’t be enough with keeping hashes, you’d have to keep the original file to recalculate, which is storing CSAM, which is ordinarily not allowed and for good reason.

    For a hint on how bad these hashes can get, they are reversible, vulnerable to pre-image attacks, and so on.

    Some of this is probably inevitable in this type of systems. You don’t want to make it easy for someone to hit the servers with a large number of hashes, and then use IPFS or BitTorrent DHT to retrieve positives (you’d be helping people getting CSAM). The problem is hard.

    Personally I was thinking of generating a federated set based on user reporting. Perhaps enhanced by checking with the central service as mentioned above. This db can then be synced with trusted instances.

    Something like that could work, maybe obscuring some of the hash content (random parts of it) so that it doesn’t become a way to actually find the stuff.

    Whatever decisions are made have to be well thought through so as not to make the problem worse.