tw.myaw

Pet discovered twtxt occasionally, when glanced at tilde.club server stats. Pet found the idea behind twtxt useful to talk to itself, it just needed some extension.

In the first place pet wants to use MYAW instead of plain text. MYAW looks the best format for raw source data pet needed throughout its lives.

There's no well-known tw.myaw file like twtxt tw.myaw has concept of channels and a well-known file is named twchan.myaw. Its root object is a map with the following structure:

<channel name>:
    filename:  # relative name of channel file, may include directory
    archive:   # relative path to a directory with YYYY-MM archives

All the gibberish is stored in channel files. The root object is a mapping:

channel:
    file_id:   # unique identifier of channel file
    about:     # channel description
    avatar:    # channel avatar

items:
    # list of items

    - id:      # unique item identifier (optional)
    parent:  # parent item identifier for replies
    ts::isodate:  # timestamp
    source:  # URL of the source if this item is fetched from somewhere
    text:    # the message
    data:    # source data in any other format
        type:     # JSON, Markdown, etc.
        content:  # the data
    tags:    # list of tags
    media:   # links to media, as in fedi, TBD

New items are always appended to the end of file and the requester may download only last changes. However, the entire file can be re-created when it goes to archive. That's why it contains file_id in the very beginning and the requester must check it against local copy. If file_id does not match, the requester moves local copy to the archive and downloads new file.

The data can be archived when the size goes beyond some limit or channel preferences get changed. Thus, there's no need to include channel info into in each post like fedi does for users.

Archive files are kept in subdirectories named YYYY-MM. File name has the following format:

CHANNEL-YYYYMMDD[HHMM]-YYYYMMDD[HHMM].myaw

The first date is the date/time of first record (UTC), and the second date is the date/time of last record. HHMM part is optional, it is used when there are multiple large files for the same day.

Files in the archive can be compressed. Lzma is the preferred method.

Intended use and TODO:

twtxt derived from tw.myaw
collect fedi timelines into tw.myaw, group and display by tags, find frequent/rare words/ngrams
collect tw.txt from other sources
an interface to post to tw.myaw and to fedi