Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • llama@lemmy.zip
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 days ago

    But if it’s a public instance and they’re just scraping the public website content they haven’t agreed to the terms of use and it probably doesn’t have any teeth? Besides it’s meta so what would one do anyway? Their lawyers will just drain your finances on court fees and continuances.

    • litchralee@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      11
      ·
      edit-2
      2 days ago

      In the somewhat-distant past, “trespass to chattels” is a type of lawsuit in Anglo-American law that could be raised in response to the abuse of a publicly-accessible computer system, originally meant as a remedy for the diminishment of someone’s else’s property (eg milking their cow). How the modern case law is understood, it allows the owner of a system (eg a Fediverse instance) to recover money due to a tortfeasor’s (eg Meta) conduct that interferes with the normal function of the system. The bar had been raised since the 80s, requiring direct impact to the system, not just that someone accessed the system without explicit authorization. Even outright malice does not suffice, since the test is whether the system was degraded in some way.

      A run-of-the-mill scraper querying once daily wouldn’t meet the test, and something as minimal as an ICMP ping every second wouldn’t meet the test. But AI scraping to the tune of hundreds of queries per day, adding up to double digit percentage points of server bandwidth for a small Fediverse instance, that might.

      That some instance operators have to consider adding more vCPUs or RAM, or operators that successfully applied blockers like Anubis, in response to AI scraping underscores how harmful – and thus potentially legally actionable – those actions are, suggesting a decent chance such a lawsuit could be successful.

      • AceFuzzLord@lemmy.zip
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        2 days ago

        Good luck filling and winning a lawsuit against meta. They have enough money and influence that if they wanted, they could just send an email to your server hosting service and forcing them to shut you down. That, or just spend probably less than $100k to keep you in court long enough you go bankrupt. It’s a losing game… at least until more non far left socialists are running the show around the world.

        • litchralee@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          2 days ago

          The cynicism surrounding the USA court system is not without cause, but the suggestion to not even bother trying has always rubbed me the wrong way. Firstly, on philosophical grounds, it’s defeatism and on-par with appeasement. But secondly, average Americans can and have prevailed when up against a multinational company.

          The one which often comes to mind is the case of a Philadelphia man winning a default judgement against Wells Fargo and was on the cusp of having the local sheriff auction off a branch’s furniture, until they all settled the matter. The man in question wrote about his experience here: https://lawsintexas.com/this-is-how-my-qwr-foreclosed-wells-fargo/

          As for how to use Meta, the average Joe need not hire a major law firm, but can choose to pursue a limited suit in small claims court. For Meta, which is headquartered in Silicon Valley in California, the Superior Court in Santa Clara County would be the venue. Drawbacks include: having to get to Silicon Valley for court dates, and a total claims limit of $12.5k.

          But on the flip side, the small claims court does not allow lawyers to argue the case before the judge, meaning it’s basically you and Meta’s representative. That representative might still have legal training, but it won’t be a situation like in the 1997 film The Rainmaker where it’s one solo lawyer versus a whole team of lawyers.

          There’s also fewer avenues for Meta to inflate costs, such as attempting to pull the case into federal court: diversity jurisdiction isn’t available unless a claim is over $75k. But they can create difficulties through the discovery process, and other pre-trial activities.

          Do I think this is viable? Possibly, but it’ll still take a fair amount of effort to have a lawyer work the case prior to trial, even if that lawyer can’t actually do the talking in front of the judge. Easily 5 digit territory to pay your lawyer. But again, this is balanced by Meta having to deal with the nuisance of having someone on their side also put in a similar amount of effort. And when the max cap for small claims is $12.5k, Meta also has enough money to just pay up and then steer their AI scrapers away from your server, saving everyone the bother. See “nuisance value lawsuits”. Also, spiteful lawsuits are a thing.

          After all, it’s not like everyone is going to sue Meta in small claims court, right? Right?