• 0 Posts
  • 84 Comments
Joined 2 years ago
cake
Cake day: June 20th, 2023

help-circle
  • solrize@lemmy.worldtoSelfhosted@lemmy.worldSelfhosting wikipedia
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    6 hours ago

    I haven’t looked in a few years but 20TB is probably plenty. I agree that Wikipedia lost its way once it got all that attention online and all that search traffic. Everyone should have their own copy of Wikipedia. I used to download the daily incremental data dumps but got tired of it. I still have a few TB of them around that I’ve been wanting to merge.


  • The text is in not-exactly-convenient database dumps (see other commenter’s link) and there are daily diffs (mostly bot noise), but then there are the images and other media, which are way up in the terabytes by now. There are some docs, maybe out of date, about how to run the software yourself. It’s written in PHP and it’s big and complicated.









  • I see, fair enough. Replication is never instantaneous, so do you have definite bounds on how much latency you’ll accept? Do you really want independent git servers online? Most HA systems have a primary and a failover, so users only see one server. If you want to use Ceph, in practice all servers would be in the same DC. Is that ok?

    I think I’d look in one of the many git books out there to see what they say about replication schemes. This sounds like something that must have been done before.


  • Why do you want 5 git servers instead of, say, 2? Are you after something more than high availability? Are you trying to run something like GitHub where some repos might have stupendous concurrent read traffic? What about update traffic?

    What happens if the servers sometimes get out of sync for 0.5 sec or whatever, as long as each is in a consistent state at all times?

    Anyway my first idea isn’t rsync, but rather, use update hooks to replicate pushes to the other servers, so the updates will still look atomic to clients. Alternatively, use a replicated file system under Ceph or the like, so you can quickly migrate failed servers. That’s a standard cloud hosting setup.

    What real world workload do you have, that appeared suddenly enough that your devs couldn’t stay in top of it, and you find yourself seeking advice from us relatively clueless dweebs on Lemmy? It’s not a problem most git users deal with. Git is pretty fast and most users are ok with a single server and a backup.




  • Yeah a buddy of mine (not rich) has one and seems to like it. It’s the big style not the Moto Razr style. It’s like two normal sized smartphones folded together, so when you open it you get a big roughly square screen about 6 inches on a side. About 2x the area of a normal phone screen. It’s a Samsung, idk what model or what it cost. It looks nice. No idea about fragility. If you have a question I can relay it to him.






  • Dedi will perform a lot better and be more consistent and reliable. They’re not THAT expensive if you’re making nontrivial use of them. Otherwise maybe you can keep moving around between Contabo products. Keep in mind too that hdd performance will seem a lot better when you’re not sharing it with dozens of other users. I have an HDD server and it’s fine for browsing. Might not be great for large seek-intensive databases but I’m not currently doing that

    Anyway you can also ask on lowendspirit.com which is a forum about budget vps.