Into that fun point in the project where I turn it loose on urls in the wild and see if it can process something resembling structured data out of them.

I’ve discovered a corollary to the [[Failed Standards][failed-standards.html]]: apparently some people implement stuff using a Google url that I guess used to redirect to I don’t know if the standards are really the same or if Google forked them BECAUSE THEY DON’T EXIST ANY MORE but at any rate, using the current set seems to be compatible enough for my purposes (so far, displaying rich previews for links in Twitter or Mastodon updates).

Also: I know the name refers to providing data to embed the aforementioned previews, but it amuses me terribly that oEmbed is literally a format that cannot be embedded in the page it provides data for - it’s an API to fetch the info in a separate transaction.

| I ❤️ how structured JSON relies on external definitions and everybody is like “we’ll store it at a big company like Google so it’ll never go away” and WHOOPS is completely gone but somehow everyone is still using this mystery schema.

Seriously, putting "rel": "" in your webfinger json is not doing any good when the server, including any presence on the Wayback Machine, is gone gone gone.

It’s amazing how OStatus has managed to even be implemented anywhere, what with the lack of documentation and all.

crap I’m gonna end up writing documentation here aren’t I

I’m still sticking with Mastodon, as I mentioned last spring. Twitter is down to reply-only. And I’ve deleted (mostly) my Facebook account, though I’m pretty sure FB resurrected it when someone tried to log into it.

My biggest issue with Masto, technologically, is that it’s a pretty heavyweight critter to run for a single-user instance. So I’ve been playing around with the idea of minimum-viable instance, more as a way to wrap my head around the standards it uses than as an attempt at building a usable product. (But if that happens anyway, cool.) I think the most worthwhile thing I could end up with is a coherent set of documentation for the standards, though, because for instance as far as I can tell the Salmon protocol’s specification only lives in a Wayback machine archive of a Google Code site (Salmon spec and magic-signature spec).

I mentioned OStatus/GnuSocial about a year ago, as something I was keeping an eye on. I signed up for awhile back, and gradually started using it more and more.

When I first signed up, it was mostly LGBTQ and furry. That is, people who left Twitter looking for a safe space. And then Twitter made the already infamous @reply change, and the floodgates opened. Just the flagship instance has increased its userbase by 20% in the space of 24 hours or so.

It will be interesting to see if adoption reaches critical mass. Right now all the conversation on masto is about masto, which doesn’t exactly build a community.

If you’re interested, I’m gamehawk.

Update: has a running list of instances, since the main instance is kind of swamped. In hindsight, Eugen should have made a separate homepage/site for the federation than for the flagship instance.

Let’s see if plerd eats the key block here.

Version: GnuPG v1


I’m gamehawk on keybase, though I haven’t figured out how to keep my Twitter account verified there since my account is private these days.

Locked book

Some while back, Facebook borked the Pages feed where you could read all the entries from Pages that a Page liked. I suspect this wasn’t accidental - if you wanted to read all entries from Pages you followed, you could just create your own Page, like everything you wanted to follow, and read the Pages feed. Lifehack! Otherwise you only saw a tiny percentage of any given Page’s posts.

However, I used it in that specific example to follow all the businesses and churches and organizations in the Delano neighborhood, and re-share any important news, add any events to the calendar, etc. My Page follows over 200 other Pages, and ain’t no way I’m manually checking each one. So I pulled up Facebook::Graph and later OpenGraph and scraped the JSON for each Page. Next, I parsed the current entries into a quick-and-dirty preview page. It’s not a replacement for FB proper, but it gives me enough to decide which posts to click through to.

I figured I’d do roughly the same thing here, except for Users instead of Pages. Only… if I’m reading this right, the Graph API doesn’t allow that anymore unless each individual user has allowed the application to serve their feed. Unless I’m missing something, the Graph won’t work for me.

This makes life a little more awkward, but isn’t a dealbreaker. The only reason I’ve tolerated Facebook as long as I have has been FB Purity, a browser plugin that Facebook hates so much you know it’s good. (Seriously: you can’t mention it in a post you want other people to see at all, much less link to it.) FBP doesn’t use the API, it just makes requests same as FB’s own Javascript does. And bless Steve’s good heart, his code is unobfuscated and well commented, so it practically documents how to do everything through Facebook’s ajax. Sweet!

Installing a browser extension is still a little more friction than I wanted, especially given that you can’t get FBP anywhere other than its site because Facebook has made sure it doesn’t get listed by Mozilla or anybody else. But once over that hump, it does have the potential to make all kinds of things Just Work™… invisibly.

Somehow, in the software changes here, I unlinked the older blog entries about the Facebook Killer and never got around to re-linking them. That’s okay, they were pretty rambly and this gives me an excuse to sum up what I have so far. I jokingly have code-named the project Pumpkin Spice Latte, for reasons which now escape me other than you can tell the time of year I named it. Here’s what it needs, in semi-particular order:

Ease of use

You log in, you type in a box. You click “like,” on your friends, relatives, businesses, posts, whatever. At its simplest, that’s it. Anything that replaces Facebook has to be that easy. That’s no big deal - that’s just really basic blogging software.

There’s a “there” there

Once you’ve “liked” something on FB, it (or things like it) shows up in your timeline/wall/whatever it is this week, simple as that. Posting is, for most people, secondary to reading other people’s posts. That’s a slightly bigger deal, but not much - that’s just really easy-to-use feed reading. Subscribe new users to all of their FB friends, and the critical mass is there.


Replacing one Facebook with another is no good. There will certainly have to be hosted PSL instances for ease-of-use, but you have to be able to click on a button and move all your stuff from one host to another, bam.

Unambigious identification

Facebook has a problem with spammers cloning users. I used to get regular friend requests from clones of my mother (until I, um, unfriended her for political reasons), and if you can click one button and move to a new host your friends have to be able to tell the difference between you doing that and you just being cloned. You’ll have to be able to identify yourself with a public/private key setup of some sort (but see also “ease of use”).

Social games

Come on, we’ve had BBS door games since the 80’s. We got this.

Open-sourcey stuff

There is a whole host of under-the-hood features that will be important to us nerds. I think they’re important for implementing the above, but they wouldn’t necessarily appeal to the canonical user.


“Just” feed-reading is complicated by FB eliminating its RSS feeds a long time ago, and putting TOS restrictions on its API to prevent this kind of thing. If PSL and similar things really took off, it would quickly become whac-a-mole as FB tried to block ways of offline reading.

People need some incentive to move. Luckily, Facebook provides some of that all by itself: people get annoyed that they can’t control whether they see everything family members post. People get annoyed by ads. People get annoyed by Facebook re-sorting their timeline.

Spam and phishing and bullying and everything else that comes with a social network needs a lot of careful control. On the down side, there’s no central authority to ban someone, but on the up side, you don’t need to rely on a central authority to ban someone… provided you have tools to give you enough control over your experience.

Paul’s right.

I’ve been saying this for reasons that are on a somewhat abstract basis: we have to own our words, we have to not be locked in to a single client, we have to own our behavioral data. It’s become much less abstract lately.

I took my Twitter account private this week, for reasons I couldn’t quite put my finger on, and logged out of Facebook (and deleted the cookies, and let Privacy Badger do what it does). A couple days later, a number of my Twitter followers mentioned having done the same thing. It wasn’t organized, to the best of my knowledge, nobody said “we need to do this,” it just felt right.

The problem is, I still have to log in to Facebook every morning, download all the data from the Pages I follow, and log back out, because that’s the only place those businesses post events and news and such. Which is really sad, because I see the viewer stats on the neighborhood Page I run, and engagement is <5% unless you pay for it, or really really work on grassroots stuff. They’ve really got that lock-in down pat.


After the rambly last post, I finished up the conversion script to turn the existing data into a set of better, mostly-compliant files. Almost all the data fits into existing schema structures, albeit with some use of relatedLink and significantLink, which is a little vaguer than I’d like to be. Google’s tester doesn’t like the alternateType use, though, which is disappointing, but its crawler will find all the json by crawling the HTML so I guess it’ll do.

The most glaring absence has been that of linking contact and social media information to Persons and Organizations, which is funny because c’mon, it was in FOAF, why didn’t Brickley bring it over. So I did: created an umbrella Agent to unify Person and Organization and gave them holdsAccount properites to contain OnlineAccounts. At some point I will serve up the modified schemas under the right @context and see if Google is okay with them.

Aside from that, everything validates, and I have a Website with Blogs that have SocialMediaPostings under them, which contain ImageObjects, are attributed to an Organization as publisher and a Person as author. Next step, since having a Google Calendar widget on every page has really weighed the existing structure down, plus I got tired of synchronizing Google and Facebook, has been to write a little thing that converts Facebook events to Events.

With that done, now I’m working on updating my templates to agree with the new structures, and then it can build the live site directly.

I’ve touched on the issues with JSON-LD a little already, so I’m going to spell that out a little more here. If you’re not interested in the technical details, you can probably skip this post - honestly, I’m pretty much just rubber-ducking here.

Here’s the theory. I’m building a RESTful site generator. I settled on’s JSON-LD as a standard, though it’s kind of a wibbly-wobbly, ill-documented one. Each JSON file has its own url (and can theoretically be a static file). Based on the HTTP Accepts header, a client can navigate the site based on the corresponding HTML without running a line of Javascript, or based on the JSON without having to parse HTML. doesn’t quite allow for this, though.

Here’s a real-world implementation, at BBC News:

JSON at the BBC

Each of those boxes is an HTML page; the JSON doesn’t stand alone. This isn’t a surprise - as I’ve mentioned, Google doesn’t recognize it outside of an HTML wrapper, and even then only recommends it if it’s physically there and not linked with a src=. I can handle this behavior pretty simply by embedding the JSON in the corresponding HTML page. Smarter clients can still grab the JSON directly. It bloats the HTML, but what can you do?

However, there are a few structural issues I’d have in parsing this with a client. The two separate JSON structures both have an URL of the underlying HTML page, so it’s clear to my client they’re related. The ItemList contains a list of urls, which is pretty much the only legal thing they can contain. So far, so good. But navigate to the article page, and there’s a discoverability problem. The Article contains the article data, but the only links to other things in it are the image urls. If I enter the site on an article, the only navigation I can do is to go to the website root - which in this case doesn’t work, because only the /news hierarchy has microdata.

The Publisher also exists only as substructures - there is no url or other unique identifier. Now, obviously the Beeb just has it in there because Google demands it and not because this is a RESTful interface, so it’s no big deal. Presumably whatever generates the data has the Publisher info in a single place, so updating, say, the logo is something that propogates automatically, and changing the name isn’t ambiguous as to whether the Publisher’s name is changing or the Publisher entity is being replaced.

The lack of navigation, though, is an underlying problem in the schema. An Article can have a Section, but that’s just a text field - it can be isPartOf a CreativeWork, but there really isn’t a structure that would be an appropriate section.

Let’s look at the current(ish) Wirebird incarnation: a hyperlocal journalism site. (Yes, yes, “a blog.”) First of all, the current (inherited from its WordPress days) structure, which could be changed, but if Wirebird is supposed to be a generic site generator (yes, yes, “a CMS”) then it should be able to emulate it.

The site is entirely the publication/blog/CMS, so the base page is the base navigation point and we don’t have the BBC News issue. There are three sections - for residents, visitors, and businesses (the latter being B2B). We want site visitors to be able to go to the section appropriate to them and only see those articles, but the base page shows articles from all sections (currently, in straight-up reverse-chron order). Presently the url structure is completely flat, because (like this site) it used Plerd in the meantime… but really, the urls don’t matter for REST. So the base page (“index”) is a WebSite. Now, the schema does say “Every web page is implicitly assumed to be declared to be of type WebPage, so the various properties about that webpage, such as breadcrumb may be used. We recommend explicit declaration if these properties are specified, but if they are found outside of an itemscope, they will be assumed to be about the page.” That doesn’t necessarily help if I want my JSON objects to stand alone, but wait: you can also add an alternateType. So let’s do that.

That gives us breadCrumb, which gives us upward discovery at least. It also gives us relatedLink and significantLink, which can be (arrays of) straight URLs, not schema objects, but that’s okay. Mostly, anyway. Tentatively, then, the significantLink of the base page is a list of the sections, and also of the miscellaneous WebPages like “about” and “privacy” and such - basically, everything that gets linked from the menu bar in the html version. The relatedLink list is all of the posts, even more tentatively. This gets used to build a Daring Fireball-like archive, but if it’s going to be a permanent part of the index page it’s probably going to involve pagination. Maybe it’ll just be the recent posts, and be the json equivalent of the atom and rss feeds? We’ll see.

The sections (not Sections, because those aren’t an object) are a little uncertain. If the individual news articles slash blog entries are coded as BlogPosting, then the sections can be Blogs. But Google seems to consider those a little differently than Articles and NewsArticles, as far as things like Google News is concerned. There’s no direct equivalent of a Blog for news sites - there are PublicationIssues and PublicationVolumes, but that’s doesn’t seem quite right for a continuously-updated news site. The sections can just be WebPage alone, with entries as a significantLink list, but that’s ambiguous: how do we tell the difference between an “about” page and a news section named “about”?

We can define sections as Blogs and articles as both BlogPosting and NewsArticle in alternateType, but the point of a schema starts to get lost if we layer too many types on (even though the differences between the types are negligible at this point). This might require some more consideration.