Posts tagged "c":
Pimp your catgirl(1)
⣿⡟⠙⠛⠋⠩⠭⣉⡛⢛⠫⠭⠄⠒⠄⠄⠄⠈⠉⠛⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿ ⣿⡇⠄⠄⠄⠄⣠⠖⠋⣀⡤⠄⠒⠄⠄⠄⠄⠄⠄⠄⠄⠄⣈⡭⠭⠄⠄⠄⠉⠙ ⣿⡇⠄⠄⢀⣞⣡⠴⠚⠁⠄⠄⢀⠠⠄⠄⠄⠄⠄⠄⠄⠉⠄⠄⠄⠄⠄⠄⠄⠄ ⣿⡇⠄⡴⠁⡜⣵⢗⢀⠄⢠⡔⠁⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄ ⣿⡇⡜⠄⡜⠄⠄⠄⠉⣠⠋⠠⠄⢀⡄⠄⠄⣠⣆⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⢸ ⣿⠸⠄⡼⠄⠄⠄⠄⢰⠁⠄⠄⠄⠈⣀⣠⣬⣭⣛⠄⠁⠄⡄⠄⠄⠄⠄⠄⢀⣿ ⣏⠄⢀⠁⠄⠄⠄⠄⠇⢀⣠⣴⣶⣿⣿⣿⣿⣿⣿⡇⠄⠄⡇⠄⠄⠄⠄⢀⣾⣿ ⣿⣸⠈⠄⠄⠰⠾⠴⢾⣻⣿⣿⣿⣿⣿⣿⣿⣿⣿⢁⣾⢀⠁⠄⠄⠄⢠⢸⣿⣿ ⣿⣿⣆⠄⠆⠄⣦⣶⣦⣌⣿⣿⣿⣿⣷⣋⣀⣈⠙⠛⡛⠌⠄⠄⠄⠄⢸⢸⣿⣿ ⣿⣿⣿⠄⠄⠄⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠇⠈⠄⠄⠄⠄⠄⠈⢸⣿⣿ ⣿⣿⣿⠄⠄⠄⠘⣿⣿⣿⡆⢀⣈⣉⢉⣿⣿⣯⣄⡄⠄⠄⠄⠄⠄⠄⠄⠈⣿⣿ ⣿⣿⡟⡜⠄⠄⠄⠄⠙⠿⣿⣧⣽⣍⣾⣿⠿⠛⠁⠄⠄⠄⠄⠄⠄⠄⠄⠃⢿⣿ ⣿⡿⠰⠄⠄⠄⠄⠄⠄⠄⠄⠈⠉⠩⠔⠒⠉⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄⠐⠘⣿ ⣿⠃⠃⠄⠄⠄⠄⠄⠄⣀⢀⠄⠄⡀⡀⢀⣤⣴⣤⣤⣀⣀⠄⠄⠄⠄⠄⠄⠁⢹
So you want to use (or go back to using) IRC after testing the waters via your favorite website's KiwiIRC or TheLounge webclient. You've lurked in the main chat channel long enough for people to mention "clients". Actual native clients - none of that web stuff.
You scour the web until you find a minimalist terminal client which is 98% of what you want. It's easy to configure. It works. It looks fine in your terminal emulator. You're mostly happy with it - but you have a few things you'd like to fix. You want that last 2%. You want to make it comfy. But it's written in C, and you haven't touched that for a long time - so you hold off on that plan for a while.
Eventually, the missing 2% irks you enough that you decide the time investment is worth it. The quest for a comfy catgirl(1) is the subject of today's devlog.
Note: If you are interested in the finished product, deb packages are provided here
Setting up the git(1) Repository
This will be a "soft fork." I expect to rebase my changes onto upstream periodically, or possibly even push some of them back. Therefore, adding the upstream remote with git remote add ... is a good start.
Ensuring Everything Really Works
catgirl(1) is a C program. I've been using it for months without any crashes or segfaults, so it might be safe to assume that most of the common code paths do not have egregious memory access bugs.
However, one can't be certain unless some form of allocator or memory-access instrumentation is used. Before performing more drastic modifications to the code, it's worth inspecting it for easily-solvable memory bugs.
valgrind(1) is often used for this; it works by hooking into libc's malloc() and free() to catch leaks and use-after-free bugs in heap allocations. However, newer versions of GCC and Clang offer a more broadly effective solution: AddressSanitizer (ASan). ASan instruments all memory access operations, inserting bounds checks for any object access, regardless of whether the memory is stack- or heap-allocated.
Since ASan is included with gcc, adding -g -fsanitize=address to CFLAGS when make(1)-ing the project is all that's needed to produce an instrumented binary.
Does running the program under ASan find anything? Yes! Switching buffers terminates the program, with ASan barfing out a trace complaining about a use-after-free. The trace is straightforward and pinpoints the exact location of the initial free() and the subsequent access. This made for an easy fix.
Adding Custom Macros
catgirl supports simple macro substitution by typing \macro and pressing C-x. While this is fine, there are two issues with this approach:
- catgirl uses a hardcoded macro table (Fine if you're happy with the predefined macro set, but we want more bling 💎✨)
- The backslash is actually an allowed character in IRC nicknames (See the <nick> rule in the RFC1459 pseudo-BNF)
To address #1, I implement a mechanism for loading macros from a simple two-column <macro> <substitution> configuration file.
My initial version of this was overly complicated and attempted to parse the file by hand via character comparisons and iswspace(). While wading through manpages in search of a better solution, I discovered scanf() scansets. Neat! Not only do these functions have support for scanning wide character strings, but they allow encoding some of the scanning constraints within the format string via scansets.
The original code used a linear search for macro table lookups, so I replaced that with a binary search. This is intended to be more of a simplification rather than an optimization - lsearch(3p), although not part of standard C, would've been a possible alternative.
I also added a /macros command to display the macro expansion table and (re)load new macro files. As for #2, I picked a new prefix character to replace / - according to the RFC1459 pseudo-BNF, . is a possible choice - it's not a valid nickname character (as it's reserved for hostnames), so it won't interfere with nick completion.
Fixing Minor Annoyances
While testing the macro implementation, I ran into a couple of minor bugs - sometimes, macros weren't being expanded. This was harder to fix than expected because it seemed to happen sporadically without a clear trigger. Naturally, I assumed the new code was the culprit.
The next time this happened, attaching to the process via gdb(1) and inspecting the line editor's state offered some insight into the cause: Sure enough, there's a subtle off-by-one error in the macro expansion logic. After reproducing this on the upstream sources, After reproducing this on the upstream source, I was able to apply a permanent fix.
Next was an issue with the overflow marker bleeding into the prompt area, caused by a missing color pair (pen) reset when updating the input state.
Finally, I constrained command completion within the network (server) buffer to slash-prefixed entries. This prevents the client from completing "N" to "NickServ" when you're just trying to type a command. It's mostly a correctness fix; in practice, erroneous completions are just rejected as invalid commands.
WALLOPS
During testing, the Libera.Chat admins sent a Wallops. Apparently, a new Linux LPE was discovered and people were talking about it. I didn't get it.
It turns out catgirl(1) didn't support receiving Wallops. Luckily, the message format is simple. An implementation only needs to echo the message to the network buffer. Implementing a /wallops command to send a Wallops is only marginally more difficult due to the similarity to PRIVMSG.
I was too lazy to set up my own ircd for testing this. Luckily, the RektIRC IRCops agreed to send me a couple of Wallops so I can test this. They worked! Nice.
IRCv3 echo-message
Libera.Chat and other IRC networks support the IRCv3 echo-message extension. This echoes sent messages back to the client, which may seem redundant at first until you consider that:
- Many channels transform received messages (e.g., stripping control code via Libera's +c chanmode)
- It serves as a latency measurement
- It acts as an acknowledgment that the server successfully received the message
The spec notes that clients may choose to disable local echoing of sent PRIVMSG and NOTICE messages altogether, so I did just that. While there is a tiny delay between sending a message and the echo appearing, I found it negligible in practice.
Input History
catgirl(1) uses the ↑ and ↓ keys to scroll the window backlog, while PgUp and PgDn scroll pagewise. C-p and C-n cycle buffers, and M-p/M-n scroll to highlighted terms.
Aside from the fact that much of the keybinding "real estate" is used for the scrolling functionality, there is no implementation of readline-like edit history commonly seen in other IRC clients.
I was able to do a fairly compact implementation of this, integrated into the input handling unit, without modifying other code. Finding an appropriate keybind for this required some research. Originally, I wanted to use M-↑ and M-↓ - however, binding to arrow keys within terminals in not portable, so eventually I settled on M-, and M-. with M-↑ and M-↓ as alternate keybindings.
Replacing the Completion Engine
As a side project, I experimented with modifying the completion module to use a Treap data structure. Originally, catgirl(1) used a doubly-linked list, which was simply traversed linearly and searched with str[n]cmp() for implementing tab completion.
While the original implementation is O(n), it’s barely noticeable in daily use. Glibc’s vectorized implementations of strcmp() and memcmp() make linear searches incredibly fast.
I did a few tests comparing the averaged lookup performance of the original O(n) implementation with the O(log n) treap-based implementation for word lists of size 1.000, 10.000, up to 50.000, and the difference between the two was around 1-5ms. In the end, I decided to retain the treap-based code in my branch since it is already tested and working, while acknowledging that it may not be worth incurring the additional complexity (I may decide to remove this later).
Wrapping up
I spent a few days testing, rebasing, and revising the various commits. Throughout the process, I set up an OBS project to build deb packages for the distros I use. I installed resulting artifacts and used the packaged client on a day-to-day basis as a form of dogfooding.
At this point, I felt that the modified client was comfy enough for my usage. Was it worth it? IMO, yes - ultimately, the modifications were fairly compact and compartmentalized, and putting the educational value aside, the client can be now considered "feature complete" from my point of view.
