π 2023-10-18
Since the language translation feature is a little underdocumented (as in, not mentioned anywhere in the documentation) I thought it was prudent to make an actual post about the details.
LibreTranslate integration
Most people speak only one or two languages, so to make content more broadly available it is important to translate it as well. Given worldwide scale and continual production, really the only feasible solution here is good old machine learning. Nowadays automated translations are pretty okay β at least you'll get the gist of what's being said even if much of the nuance is lost.
I'm running a LibreTranslate instance on a home server, at the same IP address as skyjake.fi. It's behind an Apache reverse proxy, and the access logs do not record IP addresses or any content about the (HTTPS) translation requests. While the quality of the translation may not be state-of-the-art, I hope this arrangement is more palatable than relying on the web giants for this service.
This is a bit experimental, though! I have no idea if the server will implode when more requests start coming in. It is also dreadfully slow for longer content. I'll keep an eye on it and see how it goes...
Gemtext lends itself to machine translation pretty well. The markup is simple and newlines generally seem to be preserved in the translation. When it comes to LibreTranslate in particular, I do have to strip the markup before making a request, because it seems some language models interpret markup characters in unexpected ways or just omit them entirely. A bit of improvement is still needed for preformatted blocks because they are translated, too, sometimes with amusing results.
Since those early days, several improvements have been made to the system. It turned out that the server did not implode, although it remains pretty slow in translating longer pieces of text. I've updated the instance a few times and the language models were updated, too, with a substantially larger number of supported languages. I also improved how Lagrange preprocesses the content before translation to avoid issues with markup, and there is an option to omit or include preformatted blocks.
So, to be entirely clear, when you use this feature you are "calling home" to skyjake.fi, but I do not keep a record or monitor in other ways the requests as they come. The service is provided as a courtesy to the community, and of course I use it myself to read the Spanish and French posts that occasionally show up in Cosmos.
CC-BY-SA 4.0
The original Gemtext version of this page can be accessed with a Gemini client: gemini://skyjake.fi/gemlog/2023-10_lagrange-libretranslate.gmi