PetWay progress: revised strings

2025-08-05

Changes

At previous milestone pet introduced low level iterators and used them in many low-level functions. Basically, iterators are good, but the approach is not. The main problem is _pw_get_char function that contains switch operator.

At this milestone pet refactored string functions, avoiding _pw_get_char in the inner loops.

The source file pw_string.c is split into small files which now reside in src/string subdirectory.

An attempt was made to optimize basic string operations, but memchr/memcmp from Glibc are superior. Compared strstr performance with Stringzilla's sz_find on x86_64 and ARM64 boards pet had at paw. Stringzilla displayed worse performance than Glibc on substring search. PetWay version has almost the same performance as Stringzilla on ARM64 and slightly worse than Glibc on x86_64. That's not bad given the variety of character sizes it's capable to handle.

Missing C features

Pet discovered interesting problems in C related to static strings.

First, static strings are unaligned. That's anticipated, as long as characters take one byte so the can start on a byte boundary. However, there's no way to specify alignment, especially if string literal is passed as an argument, like foo("bar")

A workaround is to define an array with desired alignment and initialize it with a string literal, but that's too verbose for pet.

Second, there's no way to distinguish char* argument from a string literal in macros.

Third, there's no way to know the length of UTF-8 string literal in codepoints at compile time.

Future work

From previous milestone: