cross-posted from: https://slrpnk.net/post/24512484
(The pic is sample output for an arbitrary query on “vegan vege pesc”. Irrelevant side note: there is no free-world venue for pescatarians… just one in L/W that scrolled off the screen)
CF
The federation is not wholly decentralised, obviously, when giant centralised fiefdoms like Facebook “Threads”™ and Cloudflare hook in their technofeudal variety of oppressive infra and abuse their power.
Each post submission begins with finding a relevant venue for the content and it must be consistent with my sense of ethics. Cloudflare is automatically nixed because it’s inherently centralised in a walled garden (regardless of the user count for any given node). CF is a non-starter for an open, free, and fair society (fair implying power balance, equality, transparency, etc).
My script queries the catalog of communities for relevant venues. It still prints the Cloudflare walled garden because it’s useful to see what names match my regex queries, which sometimes helps form a better query. It’s the only thing #LemmyWorld is good for (a shit-ton of community names with redundant variations of the same subject matter). Those results are in red, tagged with a thundercloud (🌩 ), and printed first (because when they scroll off the terminal I don’t typically care to scroll up to see them).
non-CF
CF is not the only issue. Some non-CF nodes are centralized due to uncontrolled growth to disproportionately large sizes. I don’t cancel them hard-and-fast like CF nodes, but they get treated with low “last resort” favorability. They have the warning symbol (⚠) and are in yellow.
Is my math decent?
My script began by filtering on total user count. Then I realised dead or dormant users probably should not count because such users don’t really contribute to a node’s disproportionate power over a population. It’s active users that matter. But if the number of active users in a day are filtered on, that’s too dynamic for deciding where my post can live for a month or however long it is relevant. So I took the
users_active_half_year
count. Is that sensible?What constitutes an “active” user, simply logging in, or commenting?
The line is drawn at 2 standard deviations above the average – after tossing outliers. Nodes with less than 5 active users in ½ a year are likely 1-person nodes which do not influence the average. The average is around 320 active ½yr users per node. The standard deviation is ~702 users. My statistical competence is rusty for sure, but I’m a bit bothered by a standard deviation that’s more than double the mean. Seems like a variation so wild it should perhaps be disregarded. Nonetheless, I opted to flag nodes that exceed ~1724
users_active_half_year
.The pseudocode looks like this:
avg=$(sqlite3 "$db" 'select round(avg([counts.users_active_half_year])) from node_tbl where tags not like "%cloudflare%" and [counts.users_active_half_year] > 4') variance=$(sqlite3 "$db" 'select avg(([counts.users_active_half_year] - subtbl.aua) * ([counts.users_active_half_year] - subtbl.aua)) as var from node_tbl, (select avg([counts.users_active_half_year]) as aua from node_tbl where tags not like "%cloudflare%" and [counts.users_active_half_year] > 4) as subtbl where tags not like "%cloudflare%" and [counts.users_active_half_year] > 4;') sqlite3 "$db" "select case when baseurl in (select baseurl from node_tbl where [counts.users_active_half_year] > $avg+sqrt($variance)*2) then '$yellow⚠' else '$cyan' end||baseurl||'$reset',name from community_tbl where (name like '%${1}%' or desc like '%${1}%') and baseurl not in (select baseurl from node_tbl where tags like '%cloudflare%') order by baseurl,name"
Code is ugly because sqlite does not have a stdev builtin function.
My other thought is to cut slack for closed nodes because at least they are expected to shrink. To list the possible figures to filter on, this is a record for lemmy.ml (the biggest non-Cloudflare node):
record for lemmy.ml
url = https://lemmy.ml/ baseurl = lemmy.ml name = Lemmy desc = A community of privacy and FOSS enthusiasts, run by Lemmy’s developers downvotes = 1 nsfw = 1 create_admin = 0 private = 0 fed = 1 version = 0.19.12 open = 1 usage.users.total = 54790 usage.users.activeHalfyear = 4201 usage.users.activeMonth = 2125 usage.localPosts = 167331 usage.localComments = 818559 counts.site_id = 1 counts.users = 54790 counts.posts = 167331 counts.comments = 818559 counts.communities = 4608 counts.users_active_day = 947 counts.users_active_week = 1496 counts.users_active_month = 2125 counts.users_active_half_year = 4201 icon = https://lemmy.ml/pictrs/image/fa6d9660-4f1f-4e90-ac73-b897216db6f3.png banner = langs = ["all"] date = 2019-04-20T18:53:54.608882Z published = 1555786434000 time = 1751974533970 score = uptime.domain = lemmy.ml uptime.latency = 0.034 uptime.countryname = France uptime.uptime_alltime = 99.04 uptime.date_created = uptime.date_updated = 2021-10-29 15:09:21 uptime.date_laststats = 2025-04-11 21:03:25 uptime.score = 100 uptime.status = 1 isSuspicious = 0 metrics.usersTotal = 54790 metrics.usersMonth = 2125 metrics.usersWeek = 1496 metrics.totalActivity = 985890 metrics.localPosts = 167331 metrics.localComments = 818559 metrics.averageUsers = 50720.8825256975 metrics.biggestJump = 225 metrics.averagePerMinute = 0.02475 metrics.userActivityScore = 0.055574151274483 metrics.activityUserScore = 17.9939770031028 metrics.userActiveMonthScore = 25.7835294117647 tags = [] susReason = [] trust.lastCrawled = 1751974533970 trust.baseurl = lemmy.ml trust.metrics.usersTotal = 54790 trust.metrics.usersMonth = 2125 trust.metrics.usersWeek = 1496 trust.metrics.totalActivity = 985890 trust.metrics.localPosts = 167331 trust.metrics.localComments = 818559 trust.metrics.averageUsers = 50720.8825256975 trust.metrics.biggestJump = 225 trust.metrics.averagePerMinute = 0.02475
trust.metrics.userActivityScore = 0.055574151274483 trust.metrics.activityUserScore = 17.9939770031028 trust.metrics.userActiveMonthScore = 25.7835294117647 trust.users = 54790 trust.name = Lemmy trust.base = lemmy.ml trust.actor_id = https://lemmy.ml/ trust.tags = [] trust.guarantor = fediseer.com trust.endorsements = 17 trust.score = 598.1875 trust.reasons = [] blocks.incoming = 0 blocks.outgoing = 0 blocked = []
Some communities missing from the Lemmyverse DB - why?
Anyone know why some slrpnk.net communities are in the Lemmyverse DB, and some are not? E.g. why is !nolawns@slrpnk.net missing, despite many others from the same node that are included?
More importantly, what’s the fix apart from crawling all the nodes (which would probably be unwelcome)? Is there another open DB apart from Lemmyverse? There is fediverse.space and fediverse.observer, but they don’t appear to be sharing their data.