A Smörgåsbord of Problems

📅 2023-01-20

🏷 Lagrange, Programming

For the past several days, while combating another flu, I've been polishing Lagrange's dev branch for the v1.15 release. Preparing for a release typically involves solving a series of small(ish) problems. Here's a sampling of what I encountered this time.

Operating systems have fundamental differences when it comes to windowing and event processing. I do most of my development on macOS, so a bunch of small issues typically pop up when testing on Windows, Linux, and *BSD.

Flood of mouse motion

PROBLEM: On X11, moving the mouse rapidly while resizing the sidebar causes the entire window to freeze for a few seconds.

SOLUTION: When using a high-resolution mouse with SDL, you receive each and every high-resolution coordinate change as a separate event. This means that you may receive 30 motion events per frame. This is nice and good for a fast-paced game, but for a traditional desktop app it's entirely pointless to handle more than one mouse motion event per frame.

In this case, those 30 motion events each resized the sidebar, rearranged widgets in the window, and reallocated scroll content buffers for both the sidebar and the document area. By the time this was done, the event queue had already grown by several new motion events. The event loop would continue processing motion events without ever breaking for redraw.

The solution was twofold: 1) accumulate received motion events without actually processing them, and 2) ensure that event processing takes a break at least every 16 milliseconds for a ~60 Hz refresh rate. The accumulated mouse motion is then handled right before refresh occurs.

Damnable flash of white

PROBLEM: On Windows, when a new window opens (a new browser window or Preferences), there is a brief flash of white before the actual contents are drawn. The issue is mostly hidden by the operating system's window animations where new windows gradually fade in, but if these animations are disabled the problem is very clear.

SOLUTION: This seems to be an SDL issue. I tested it out with a minimal SDL program that just opens an empty window and clears it, and there was still sometimes a flash of white when the window appears.

Win32 windows are created from a "window class" object that determines how its background is cleared. SDL isn't setting up the window class background HBRUSH that the system would fill the background with, because SDL apps are expected to draw the window contents themselves and the operating system should not interfere with that. The WindowProc that SDL uses seems to simply ignore the background clear event. What happens in practice is that occasionally the app doesn't manage to draw the window in time, and the system displays the default empty white background for one frame before the app's graphics are shown. The random nature of the problem suggests that it's some event queue or timing issue: whether the window's initial clear/paint events are handled before the screen gets refreshed.

As a workaround, I made sure that the app's window refresh allows drawing something in the window as early as possible, to minimize the chances of seeing the white flash. If nothing else is available at least the correct UI background color should be used for clearing the window. The UI palette is customizable, though, and there's a bit of global state involved, so it's a little tricker than it should be.

Crashes due to incorrect global state

PROBLEM: Even though the app works fine on macOS, it crashes on Windows when clicking on a window close button.

SOLUTION: The way events are processed on Windows is sometimes different than on other platforms, so the context in which an event is being handled may be unexpected, or the events may get handled in a different order, or in differently sized batches. The operating system could also be making a callback to the app's code via SDL, like when resizing the window. In these cases, one just has to ensure that the app's own global state reflects what is happening, avoiding a crash.

Of course, minimizing the amount of global state would be wise, but for pragmatic reasons it's nice to know which window is currently the "active" one in the main thread (the only thread that is allowed to touch the UI).

Dialog dismissal

PROBLEM: Values of settings are not saved when closing the Preferences dialog via the close button in the window frame.

SOLUTION: In the old setup, Preferences was only dismissable by a button that emitted a "prefs.dismiss" action. But when closing a standalone window, all the contained widgets just get destroyed. Instead, one must ensure that the correct dismissal actions get handled, too. For now, there's a simple hardcoded workaround for the window that hosts the "prefs" widget.

Another problem arose, though: the old "prefs.dismiss" button also closes the window, and that in turn triggers another "prefs.dismiss" via the newly added code. Of course, I had an infinite loop going. Applying a flag on the "prefs" widget to prevent repeated triggering of the action was required.

Color tweaks

PROBLEM: The keybindings scroll bar is invisible in the detached Preferences dialog.

SOLUTION: The scroll bar thumb color has been linked to the current page theme (for stylistic consistency), but that's only valid for a DocumentWidget. All other widgets should use a UI color instead as they are not directly associated with page content. A new property was added to the scroll bar widget to configure the appropriate thumb color.

When I start tweaking colors it's typically not limited to one widget. It has been bugging me that the mouse hover and selection background colors weren't differentiated, leading to some ambiguity. Now there's a slight difference in brightness for these states.

Menubar menus closing instantly

PROBLEM: Hovering over menubar menus, after one has been opened, should switch the open menu to whichever menu is under the mouse cursor. Instead, the open menu closes and nothing else happens.

SOLUTION: When a menu widget closes, it emits a "menu.closed" notification as an event. However, apparently depending on the exact order and batching of these events, this notification was handled too late and it caused the newly opened menubar menu to close itself. (The popup menu widget tends to automatically close itself on most action/notification events, so menus don't accidentally hang around, and get dismissed when an action is triggered from the menu.) Making menu widgets ignore "menu.closed" was enough to fix the problem.

Windows.h, DPI awareness

PROBLEM: The MSYS2 build suddenly starts failing because `SetProcessDPIAware()` is no longer a declared function. (It is a function in the Win32 API.) Maybe triggered by upgrading to SDL 2.26?

SOLUTION: Change the header include order and put first.

I upgraded SDL when investigating the white background flashes. Subsequently, the MSYS2 build started failing because it can't find a Win32 function. It worked just fine before? Since changing the order of includes fixes the problem, apparently the upgraded SDL headers will prevent some of the Win32 headers from being included.

New versions of SDL keep adding useful stuff: 2.24 added a Windows DPI awareness hint, making the direct Win32 API call unnecessary. I can't require 2.24, though, but using it when available is nice.

Dessert: Hanging MIME hooks

PROBLEM: With 'text/gemini' MIME hooks that run on every requested Gemini page, refreshing feed subscriptions sometimes leads to the refresh operation never finishing. The feed workers are hanging, forever waiting for a hook process to finish.

SOLUTION: This one is actually a long-standing issue. The real problem seems to be that when the hook child processes are spawned concurrently in background threads, sometimes the pipes used for communicating with the children get mixed up. (I'm not a POSIX expert, so the internal details are a little hazy. I gather forking the process has some interesting side effects when it comes to file descriptors.) This can lead to the spawn call failing at random, or the calling thread waiting indefinitely on a file descriptor without a response.

It is certainly possible that my code is to blame here, so further research is needed. A post on StackOverflow suggests forking the child processes from a single dedicated thread, and not from various worker threads. As a workaround, I've applied mutexes to only ever execute a single hook child process at a given time. Given that Gemini clients don't generally perform multiple requests concurrently (feed refresh being a special use case), this should be adequate for now.

CC-BY-SA 4.0

The original Gemtext version of this page can be accessed with a Gemini client: gemini://skyjake.fi/gemlog/2023-01_problems.gmi