PetWay progress: revised strings and generics

2025-07-18

Changes

Low-level string iterators

When strings can have multiple memory layouts and character sizes, picking characters by index is slow operation. Low level iterators expose memory layout so the inner code depends only on character size.

This opens a way to variable character size: UTF-8.

It would be great if strings supported UTF-8 natively but there are a couple issues:

Merged CharPtr type with String

This change resulted is static string. This does not mean that string is constant. Like other two kinds of strings, embedded and allocated, static strings can be modified thanks to COW.

Static string initializers are PW_STATIC_STRING and PW_STATIC_STRING_UTF32. Rvalues are created with a single generic PwStaticString which never fails. It's a function, and it has to call strlen to initialize length. There's no rvalue macros similar to PwString. They are easy to implement but it would increase the entropy. Later, maybe. Depending on use cases.

Increased bit width for char_size

This allowed storing character size as is, 1-based instead of 0-based. Also, this is another step towards variable-size strings. But maybe it should be stored as a shift counter to eliminate multiplications. Need to evaluate this on ARM.

Initializer and rvalue for UTF-32 embedded strings

That's PW_STRING_UTF32 and PwStringUtf32 respectively.

There's no support for wide character to avoid stepping into surrogate pairs shit.

Replaced UTF-8 wrappers with _ascii versions in generics

Clang 16 did not correctly handle char8_t* and char* in generics. Clang 19 does that in the right way.

Future work