Improving Threads in Cosmos

With more data coming in, some problematic cases have started surfacing. I've now improved the thread discovery and presentation so that more relevant posts are being found, and some Flounder and Smol Pub mirrors are no longer showing up twice.

This newly-surfaced thread on 2022-01-17 is a great example of what Cosmos is good at:

Billsmugs' Gemlog β€” Re: AntennaZINE
β€· text.eapl.mx β€” Introducing AntenaZINE
β€· ew β€” Newspaper
β€· text.eapl.mx β€” Quick experiment: Antenna to ePub
β€· Solderpunk versus the windmills: a Gemlog β€” Progress toward "offline first"
β€· Solderpunk versus the windmills: a Gemlog β€” Computing less, but with more focus

Billsmugs' post triggered this to come up after I fixed a bug where threads weren't discovering posts older than a month. The main index goes back a month, but that shouldn't limit the ability to surface older posts.

Note that I've also changed the sort order within threads. They are now shown with newest reply first, i.e., the first line of the group is the "current" post and it is followed by the related older items. This means the curved arrow β€· actually makes sense as it shows which post links where, and the entire index is now reverse chronological. Comment posts are always immediately below their parents (indicated by β‹―).

With threads now being more fully populated, I also noticed that there were some circular relationships and, in some cases, long chains of references throughout an entire gemlog. In the former situation, the parent post detection is simple so it doesn't consider longer circular linkages, but the cycles can be broken when collecting the thread posts for display. The latter was dealt by limiting how many posts a thread will display from the same author.

The algorithm is a getting little messy, but I suppose that reflects the messiness in the data. πŸ˜ƒ

The next issue to think about is how to handle dynamic pages like journals and tinylogs whose content changes over time. Currently, if there is a significant change in the entry timestamp (presumably after it has been resubmitted to an aggregator), the entry's links are rechecked and new parents are assigned. But if there are other posts linking to the dynamic page, those links may no longer be relevant. I should probably prevent dynamic pages from being link targets, or at least expire such references after a while.

πŸ“… 2022-01-18

🏷 Cosmos, Gemini

CC-BY-SA 4.0

The original Gemtext version of this page can be accessed with a Gemini client: gemini://skyjake.fi/gemlog/2022-01_improving-threads-in-cosmos.gmi