• 0 Posts
  • 9 Comments
Joined 3 years ago
cake
Cake day: June 29th, 2023

help-circle

  • If you find that OCR doesn’t get you very far, maybe try a small vLM to parse PNGs of the pages. For example, Nanonets OCR will do this, although quite slow if you don’t have a GPU. It will give you a Markdown version of the page, which you can then translate with another tool.

    PaddleOCR might also be useful, since it focuses on Chinese, but it’s more difficult to set up. To add to this, some other options are MinerU and MistralOCR (this is paid, but you can test it for free if you upload it in Mistral’s library).




  • You’re right! Sorry for the typo. The older nomic-embed-text model is often used in examples, but granite-embedding is a more recent one and smaller for English-only text (30M parameters). If your use case is multi-language, they also offer a bigger one (278M parameters) that can handle English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified). I would test them out a bit to see what works best for you.

    Furthermore, if you’re not dependent on MariaDB for something else in your system, there are also some other vector databases I would recommend. Qdrant also works quite well, and you can integrate it pretty easily in something like LangChain. It really depends on how much you want to push your RAG workflow, but let me know if you have any other questions.



  • For notes, I have moved to Joplin with the option to synchronize my data using a WebDAV server. It works really well, and it has both a mobile and desktop app. If you’re interested in developing your project, maybe you can have a look at the options this provides. For example, I really like the ability to separate notes between groups, assign tags, create drawings, and the possibility to use Markdown.

    Good luck with your projects! To mirror @enemenemu’s suggestion, I would also look into collaborating with the people trying to push the EU Docs alternative. Not sure if that will work, but it’s worth a shot if you’re interested :D


  • Mine’s just one I got from a random kid name generator.

    A bit off-topic: not sure why, but I keep seeing posts here on Lemmy lately about Romanian women pulling the short end of the stick in terms of gender equality. I hope I’m not offending in any way with this question, but is Romania sticking to the traditional gender roles?


  • Might be even cheaper if you wait a bit and build it yourself. Next gen GPUs are coming out, which will lead to some price cuts on the current gen.

    However, like others here have mentioned, you’re paying extra for them building it for you and warranty.

    I don’t know if ro.pcpartpicker.com works well for Romania, but you can also give that a try and see what the individual components would net you on the local market.

    Building the computer yourself along with your kid could also be a nice opportunity to teach him (and maybe yourself, if you’re not that knowledgeable) about the underlying components.