Admiral Patrick
I’m surprisingly level-headed for being a walking knot of anxiety.
Ask me anything.
Special skills include: Knowing all the “na na na nah nah nah na” parts of the Three’s Company theme.
I also develop Tesseract UI for Lemmy/Sublinks
Avatar by @SatyrSack@feddit.org
- 17 Posts
- 249 Comments
Admiral Patrick@dubvee.orgto
Showerthoughts@lemmy.world•Ad companies are the ones destroying civilizationEnglish
678·5 days agoAgree. Which is why I get so irrationally annoyed when sharing a good piece of journalism that’s not catering to ad-clicks and the peanut gallery here grabs their torches and pitchforks while shouting “PaYwALL!” despite me posting the gist of the article in the post body (enough to get the gist but not the full article for copyright reasons). It’s one of several reasons why I don’t even bother anymore.
Like, good journalism costs money. That money’s gotta come from somewhere if you want good journalists to be able to eat and keep doing what they do.
Admiral Patrick@dubvee.orgOPto
Programmer Humor@programming.dev•In conclusion: Magic DNSEnglish
1·5 days agoI do!
Kubernetes is a nightmare and overkill for most things we need to run, and Docker Swarm is super easy to setup and maintain.
We only use it for one application, though. The app needs to scale horizontally and scale up and down with demand, so I put together a 6 node swarm cluster just for it. Works great, though the auto scaling required some helper scripting.
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•What can you host with limited bandwidth but lots of storage?English
16·10 days ago1080p buffered generously but it worked :) The sweet spot was having it transcode to 720p (yay hardware acceleration). I wasn’t sharing it with anyone at the time, so it was just me watching at work on one phone while using my second phone at home for internet.
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•What can you host with limited bandwidth but lots of storage?English
30·10 days agoJust about anything as long as you don’t need to serve it to hundreds of people simultaneously. Hell, I once hosted Jellyfin over a 3G hotpot and it managed.
Pretty much any web-based app will work fine. Streaming servers (Emby, Plex, Jellyfin, etc) work fine for a few simultaneous people as long as you’re not trying to push 4K or something. 1080p can work fine at 4 Mbps or less (transcoding is your friend here). Chat servers (Matrix, XMPP, etc) are also a good candidate.
I hosted everything I wanted with 30 Mbps upload before I got symmetric fiber.
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•Based on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersEnglish
16·11 days agoMaybe I should flesh it out into an actual guide. The Nepenthes docs are “meh” at best and completely gloss over integrating it into your stack.
You’ll also need to give it corpus text to generate slop from. I used transcripts from 4 or 5 weird episodes of Voyager (let’s be honest: shit got weird on Voyager lol), mixed with some Jack Handy quotes and a few transcripts of Married…with Children episodes.
https://content.dubvee.org/ is where that bot traffic lands up if you want to see what I’m feeding them.
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•Based on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersEnglish
33·12 days agoThanks!
Mostly there’s three steps involved:
- Setup Nepenthes to receive the traffic
- Perform bot detection on inbound requests (I use a regex list and one is provided below)
- Configure traffic rules in your load balancer / reverse proxy to send the detected bot traffic to Nepenthes instead of the actual backend for the service(s) you run.
Here’s a rough guide I commented a while back: https://dubvee.org/comment/5198738
Here’s the post link at lemmy.world which should have that comment visible: https://lemmy.world/post/40374746
You’ll have to resolve my comment link on your instance since my instance is set to private now, but in case that doesn’t work, here’s the text of it:
So, I set this up recently and agree with all of your points about the actual integration being glossed over.
I already had bot detection setup in my Nginx config, so adding Nepenthes was just changing the behavior of that. Previously, I had just returned either 404 or 444 to those requests but now it redirects them to Nepenthes.
Rather than trying to do rewrites and pretend the Nepenthes content is under my app’s URL namespace, I just do a redirect which the bot crawlers tend to follow just fine.
There’s several parts to this to keep my config sane. Each of those are in include files.
-
An include file that looks at the user agent, compares it to a list of bot UA regexes, and sets a variable to either 0 or 1. By itself, that include file doesn’t do anything more than set that variable. This allows me to have it as a global config without having it apply to every virtual host.
-
An include file that performs the action if a variable is set to true. This has to be included in the
serverportion of each virtual host where I want the bot traffic to go to Nepenthes. If this isn’t included in a virtual host’sserverblock, then bot traffic is allowed. -
A virtual host where the Nepenthes content is presented. I run a subdomain (
content.mydomain.xyz). You could also do this as a path off of your protected domain, but this works for me and keeps my already complex config from getting any worse. Plus, it was easier to integrate into my existing bot config. Had I not already had that, I would have run it off of a path (and may go back and do that when I have time to mess with it again).
The
map-bot-user-agents.confis included in thehttpsection of Nginx and applies to all virtual hosts. You can either include this in the mainnginx.confor at the top (above theserversection) in your individual virtual host config file(s).The
deny-disallowed.confis included individually in each virtual hosts’sserversection. Even though the bot detection is global, if the virtual host’sserversection does not include the action file, then nothing is done.Files
map-bot-user-agents.conf
Note that I’m treating Google’s crawler the same as an AI bot because…well, it is. They’re abusing their search position by double-dipping on the crawler so you can’t opt out of being crawled for AI training without also preventing it from crawling you for search engine indexing. Depending on your needs, you may need to comment that out. I’ve also commented out the Python requests user agent. And forgive the mess at the bottom of the file. I inherited the seed list of user agents and haven’t cleaned up that massive regex one-liner.
# Map bot user agents ## Sets the $ua_disallowed variable to 0 or 1 depending on the user agent. Non-bot UAs are 0, bots are 1 map $http_user_agent $ua_disallowed { default 0; "~PerplexityBot" 1; "~PetalBot" 1; "~applebot" 1; "~compatible; zot" 1; "~Meta" 1; "~SurdotlyBot" 1; "~zgrab" 1; "~OAI-SearchBot" 1; "~Protopage" 1; "~Google-Test" 1; "~BacklinksExtendedBot" 1; "~microsoft-for-startups" 1; "~CCBot" 1; "~ClaudeBot" 1; "~VelenPublicWebCrawler" 1; "~WellKnownBot" 1; #"~python-requests" 1; "~bitdiscovery" 1; "~bingbot" 1; "~SemrushBot" 1; "~Bytespider" 1; "~AhrefsBot" 1; "~AwarioBot" 1; # "~Poduptime" 1; "~GPTBot" 1; "~DotBot" 1; "~ImagesiftBot" 1; "~Amazonbot" 1; "~GuzzleHttp" 1; "~DataForSeoBot" 1; "~StractBot" 1; "~Googlebot" 1; "~Barkrowler" 1; "~SeznamBot" 1; "~FriendlyCrawler" 1; "~facebookexternalhit" 1; "~*(?i)(80legs|360Spider|Aboundex|Abonti|Acunetix|^AIBOT|^Alexibot|Alligator|AllSubmitter|Apexoo|^asterias|^attach|^BackDoorBot|^BackStreet|^BackWeb|Badass|Bandit|Baid|Baiduspider|^BatchFTP|^Bigfoot|^Black.Hole|^BlackWidow|BlackWidow|^BlowFish|Blow|^BotALot|Buddy|^BuiltBotTough| ^Bullseye|^BunnySlippers|BBBike|^Cegbfeieh|^CheeseBot|^CherryPicker|^ChinaClaw|^Cogentbot|CPython|Collector|cognitiveseo|Copier|^CopyRightCheck|^cosmos|^Crescent|CSHttp|^Custo|^Demon|^Devil|^DISCo|^DIIbot|discobot|^DittoSpyder|Download.Demon|Download.Devil|Download.Wonder|^dragonfl y|^Drip|^eCatch|^EasyDL|^ebingbong|^EirGrabber|^EmailCollector|^EmailSiphon|^EmailWolf|^EroCrawler|^Exabot|^Express|Extractor|^EyeNetIE|FHscan|^FHscan|^flunky|^Foobot|^FrontPage|GalaxyBot|^gotit|Grabber|^GrabNet|^Grafula|^Harvest|^HEADMasterSEO|^hloader|^HMView|^HTTrack|httrack|HTT rack|htmlparser|^humanlinks|^IlseBot|Image.Stripper|Image.Sucker|imagefetch|^InfoNaviRobot|^InfoTekies|^Intelliseek|^InterGET|^Iria|^Jakarta|^JennyBot|^JetCar|JikeSpider|^JOC|^JustView|^Jyxobot|^Kenjin.Spider|^Keyword.Density|libwww|^larbin|LeechFTP|LeechGet|^LexiBot|^lftp|^libWeb| ^likse|^LinkextractorPro|^LinkScan|^LNSpiderguy|^LinkWalker|msnbot|MSIECrawler|MJ12bot|MegaIndex|^Magnet|^Mag-Net|^MarkWatch|Mass.Downloader|masscan|^Mata.Hari|^Memo|^MIIxpc|^NAMEPROTECT|^Navroad|^NearSite|^NetAnts|^Netcraft|^NetMechanic|^NetSpider|^NetZIP|^NextGenSearchBot|^NICErs PRO|^niki-bot|^NimbleCrawler|^Nimbostratus-Bot|^Ninja|^Nmap|nmap|^NPbot|Offline.Explorer|Offline.Navigator|OpenLinkProfiler|^Octopus|^Openfind|^OutfoxBot|Pixray|probethenet|proximic|^PageGrabber|^pavuk|^pcBrowser|^Pockey|^ProPowerBot|^ProWebWalker|^psbot|^Pump|python-requests\/|^Qu eryN.Metasearch|^RealDownload|Reaper|^Reaper|^Ripper|Ripper|Recorder|^ReGet|^RepoMonkey|^RMA|scanbot|SEOkicks-Robot|seoscanners|^Stripper|^Sucker|Siphon|Siteimprove|^SiteSnagger|SiteSucker|^SlySearch|^SmartDownload|^Snake|^Snapbot|^Snoopy|Sosospider|^sogou|spbot|^SpaceBison|^spanne r|^SpankBot|Spinn4r|^Sqworm|Sqworm|Stripper|Sucker|^SuperBot|SuperHTTP|^SuperHTTP|^Surfbot|^suzuran|^Szukacz|^tAkeOut|^Teleport|^Telesoft|^TurnitinBot|^The.Intraformant|^TheNomad|^TightTwatBot|^Titan|^True_Robot|^turingos|^TurnitinBot|^URLy.Warning|^Vacuum|^VCI|VidibleScraper|^Void EYE|^WebAuto|^WebBandit|^WebCopier|^WebEnhancer|^WebFetch|^Web.Image.Collector|^WebLeacher|^WebmasterWorldForumBot|WebPix|^WebReaper|^WebSauger|Website.eXtractor|^Webster|WebShag|^WebStripper|WebSucker|^WebWhacker|^WebZIP|Whack|Whacker|^Widow|Widow|WinHTTrack|^WISENutbot|WWWOFFLE|^ WWWOFFLE|^WWW-Collector-E|^Xaldon|^Xenu|^Zade|^Zeus|ZmEu|^Zyborg|SemrushBot|^WebFuck|^MJ12bot|^majestic12|^WallpapersHD)" 1; }deny-disallowed.conf
# Deny disallowed user agents if ($ua_disallowed) { # This redirects them to the Nepenthes domain. So far, pretty much all the bot crawlers have been happy to accept the redirect and crawl the tarpit continuously return 301 https://content.mydomain.xyz/; }
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•Based on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersEnglish
166·12 days agoI was blocking them but decided to shunt their traffic to Nepenthes instead. There’s usually 3-4 different bots thrashing around in there at any given time.
If you have the resources, I highly recommend it.
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•System requirements for a Matrix server?English
7·12 days agoMost of the requirements are going to be for the database, and that depends on:
- How many active users you expect
- How many large rooms you or your users join
I left many of the large Matrix spaces I was in, and mine is now mostly just 1:1 chats or a group chat with a handful of friends. Given that low-usage case, I can run my server on a Pi 3 with 4 GB of RAM quite comfortably. I don’t do that in practice, but I do have that setup as a backup server - it periodically syncs the database from my main server - and works fine. The bottleneck there, really, is the SD card storage since I didn’t want an external SSD hanging off of it.
Even when I was active in several large Matrix spaces/rooms, a USFF Optiplex with a quad core i5, 8 GB of RAM, and a 500GB SSD was more than enough to run it comfortably alongside some other services like LibreTranslate.
Admiral Patrick@dubvee.orgto
Showerthoughts@lemmy.world•Truly identical twins as actors would present really interesting opportunities for a stage playEnglish
4·13 days agoOrphan Black: Live
Admiral Patrick@dubvee.orgto
Showerthoughts@lemmy.world•Having the first name of Al must be frustrating as it looks so much like AI.English
501·14 days agoI prefer sans-serif fonts visually but prefer serif for readability. So I use Atkinson Hyperlegible which is a mish-mash of both.

And bonus meme:

Admiral Patrick@dubvee.orgOPto
Programmer Humor@programming.dev•In conclusion: Magic DNSEnglish
1·14 days agoFYI: I moved the allow rule for DNS to the top of the chain, so that should fix problems with DNS providers not being able to reach the authoritative name servers.
Admiral Patrick@dubvee.orgOPto
Programmer Humor@programming.dev•In conclusion: Magic DNSEnglish
3·14 days agoUgh. Thanks. It’s quite possible, though maybe just a regional one? I did inadvertently block one of the IPs Let’s Encrypt uses for secondary validation, so this may be another case of that.
I get a shitload of bad traffic from the southeast Asia area (mostly Philippines/Singapore AWS) and have taken to blanket blocking their whole routes rather than constantly playing whack-a-mole. Fail2ban only goes so far for case-by-case.
Here’s the image from the meme from an alternate source:

Admiral Patrick@dubvee.orgOPto
Programmer Humor@programming.dev•In conclusion: Magic DNSEnglish
3·14 days agoYour IP may unfortunately be inside a CIDR block that largely does nothing but spam my infrastructure with script kiddie tomfoolery. Firewall rules apply to my authoritative DNS servers as well.
Edit: If you would like me to whitelist it, DM me your IP and I’ll add a narrow exception.
Admiral Patrick@dubvee.orgto
Selfhosted@lemmy.world•What URI paths does lemmy federation use?English
10·20 days agoBasically the only thing you want to present with a challenge is the paths/virtual hosts for the web frontends.
Anything
/api/v3/is client-to-server API (i.e. how your client talk to your instance) and needs to be obstruction-free. Otherwise, clients/apps won’t be able to use the API. Same for/pictrssince that proxies through Lemmy and is a de-facto API endpoint (even though it’s a separate component).Federation traffic also needs to be exempt, but it’s not based on routes but by the HTTP
Acceptrequest header and request method.Looking at the Nginx proxy config, there’s this mapping which tells Nginx how to route inbound requests:
nginx_internal.conf: https://raw.githubusercontent.com/LemmyNet/lemmy-ansible/main/templates/nginx_internal.conf
map "$request_method:$http_accept" $proxpass { # If no explicit matches exists below, send traffic to lemmy-ui default "http://lemmy-ui:1234/"; # GET/HEAD requests that accepts ActivityPub or Linked Data JSON should go to lemmy. # # These requests are used by Mastodon and other fediverse instances to look up profile information, # discover site information and so on. "~^(?:GET|HEAD):.*?application\/(?:activity|ld)\+json" "http://lemmy:8536/"; # All non-GET/HEAD requests should go to lemmy # # Rather than calling out POST, PUT, DELETE, PATCH, CONNECT and all the verbs manually # we simply negate the GET|HEAD pattern from above and accept all possibly $http_accept values "~^(?!(GET|HEAD)).*:" "http://lemmy:8536/";
Admiral Patrick@dubvee.orgto
Ask Lemmy@lemmy.world•To those who are new to this whole fediverse/threadiverse/this thing, how has your experience been?English
281·22 days agoI wouldn’t recommend it to anyone in real life. There are parts that are just way too jarring.
Ugh, this. And I hate that it’s like that.
Like, I used to have my instance open to whoever to sign up. My guiding principle was to have a place that wasn’t overrun with [parts that are just way too jarring]. Holy shit was that an impossible goal to do alone so I shuttered it up and now it’s just a private instance / testbed for Tesseract.
My friends knew I was active on Reddit, and that was fine. But I wouldn’t tell them I spend any amount of time here because what they would see going to almost any random instance will
probablydefinitely not look good on me by association despite that I’m nowhere near that.So if anyone shares this desire, I am open to un-mothballing my instance, rebranding, and taking on new admins and re-opening to users who also want a place like that.
Admiral Patrick@dubvee.orgto
Ask Lemmy@lemmy.world•"Nyet, nyet... pyay that man his money!" Dear Fediverse-- do we have a 'film clip' sub/community here? (see post for film clip; it's from "Rounders")English
6·25 days agoI totally get that.
The closest active alternative I can find is !screengrabs@piefed.social but it’s for still images. Maybe if the clip fits the theme there, they’ll allow it?
Admiral Patrick@dubvee.orgto
Ask Lemmy@lemmy.world•"Nyet, nyet... pyay that man his money!" Dear Fediverse-- do we have a 'film clip' sub/community here? (see post for film clip; it's from "Rounders")English
14·25 days agoOnly one I can find is !movieclips@lemmy.world but it’s 3 years old and has 0 submissions. Maybe you can revive it? Surprisingly, the mod for it is still active on the platform.
Otherwise, “if you build it, they will come”.
Admiral Patrick@dubvee.orgto
Showerthoughts@lemmy.world•The people like AI because they treat it like a search engine.English
4·25 days agoMaybe AI should be more like a parent and simply say “I don’t know. Go read a book, find out, and let me know”.
Pretty sure my mom did know the answer but I learned more by reading a book and telling her what I learned.



In my city, they just keep paving over the old asphalt, so the manhole covers are like 6 inches deep in some places. Hitting one of those in my sedan is not pleasant.