Browsing ZIPs

➤Lagrange v1.4 can access local ZIP archives and browse them as if they were normal directories/files.

I chose to limit this to local ZIP files because there are several issues with doing this over Gemini URLs.

Let's consider an URL like:

gemini://skyjake.fi/media/lagrange_v1.4.gpub/images/v1.4_capcom_split.jpg

If this was a "file" URL, the client would notice that the path doesn't actually exist and therefore could check each of the parent directories and detect if one of them is actually a ZIP archive. If so, the archive is opened and the entry is looked up from inside.

The Gemini server could use the same technique to look up files from inside archives. I'm not aware of servers that actually support this, though.

Update 2021-05-09

Turns out Sean Conner's GLV-1.12556 does already support this kind of requests for files inside ZIP archives.

Client-side tricks

Given lack of server-side support, what could the client do? If the client first navigates to the archive base URL, it would be technically possible to remember that URL and apply a special exception to URLs that access paths inside it. The problem is that there's nothing that can be done if the client doesn't know the base URL, say, if you've bookmarked a file inside an archive or just come across it on some capsule.

The client might try to scan the URL path for file extensions like .zip and .gpub, but this would be a brittle hack. Those are not actually required to be present and the media type sent by the server is the actual type that matters.

If the Gemini specification allowed it, upon receiving a "not found" reply, the client could do additional requests for the parent directories in the URL path until a valid response if received, and then check if that's a ZIP archive. This would have to be done with _every_ invalid URL, though, so it would lead to many unnecessary requests.

Fragments to the rescue

The most reasonable solution I've come up with is using URL fragments to refer to entries inside archives:

gemini://skyjake.fi/media/lagrange_v1.4.gpub#images/v1.4_capcom_split.jpg

Now the client knows to fetch the right file, the server can send it over without any additional trickery, and the client can use the fragment as the archive entry to decompress. As a bonus, the client doesn't have to re-fetch the archive when only the fragment changes, so browsing inside the archive is done with the locally cached copy.

Of course, there are problems with this:

Fragments are not (currently) in the Gemini specification, so this would be a client-specific extension, and this URL wouldn't work in any other client. (On the bright side, other clients would fail gracefully, still ending up fetching the right archive.)
This behavior is not consistent with how fragments are typically used, e.g., referring to positions inside a page. Instead, they would change page content like a query.

Compared to using a query string for the entry path, this is better because query strings are sent to the server and fragments are not. There is no guarantee that a server won't get confused by a query string in this context.

Solution?

Lagrange's UX could still be streamlined here by removing the explicit Save to Downloads step, and when the archive contents are opened for viewing, doing an automatic switch to a "file://" URL pointing to a locally cached copy of the archive . This still gets tricky if the user wants to bookmark pages inside the cached URL, since those wouldn't remain valid indefinitely. Bookmarks in the Downloads folder are at least within the user's control.

The best solution is to support ZIP archives server-side. This would solve all the client-side ambiguity, all clients would be automatically compatible, and a server admin might find it convenient to manage content in this kind of packaged form.

Serving Gempub this way is still a little tricky, because the client needs access to the metadata and the index page in addition to the current chapter. Metadata could be passed in via MIME parameters, and I suppose the index page URL could be, too, but it starts getting iffy. The client would also need to make an additional request to fetch the index. A cleaner solution would be to introduce some sort of a container format that combines the metadata, the index, and the requested chapter in the same response, but that increases overhead considerably and seems overly complicated. (And of course, Gempub is already a container format itself.)

📧skyjake

📅 2021-05-08

🏷 Gemini, Lagrange

CC-BY-SA 4.0

➤skyjake's Gemlog

The original Gemtext version of this page can be accessed with a Gemini client: gemini://skyjake.fi/gemlog/2021-05_browsing-zips.gmi