Identification

If this entry is about subscribing to a particular feed (that is, a particular place-to-write stuff, be it an actual Atom/RSS feed, a blog, a social-media-network account, or just a web page), then this one's about subscribing to a particular person. Or at least identifying a particular person across multiple feeds, and letting the user decide which bits to subscribe to. I haven't written any code for this yet, but it's a little more complicated. A feed is really just a changelog, while an identity is a little fuzzier.

Let's use my tilde as an example. My hypothetical Spice user hits http://tilde.club/~silver/ and says "Whoa, this person is fascinating, I want to follow them. The index.html is easily parsed to come up with a feed, but how can we automatically identify the person behind it?

First of all, there's an atom.xml referenced in the headers and, when I haven't borked its formatting, there are author blocks on every entry. So I know this Karen Cravens wrote this stuff, and that her email address is silver@phoenyx.net. Cool.

The first thing a human sees in http://tilde.club/~silver/ is "I am Karen Cravens" which links to a page where I further identify myself. Probably PS should look at me.html, about.html, bio.html and other likely candidates. The fact that that link is an exact match for an author from the feed is probably also a clue we should pay attention to. Tilde accounts aren't as ubiquitous as they used to be, but we should still consider silver@tilde.club as a possibility.

That said, parsing http://tilde.club/~silver/me.html nets us a link to a Twitter account, a Github account, that silver@phoenyx.net email again, and a couple other bits. PS knows what Twitter is, so it pulls that account info up, sees that the name field matches, so that's definitely this Karen person. Same with Github. If the accounts didn't match, we'd let the user decide. From Twitter, we see that Karen is from Wichita, Kansas (we can also make an educated guess on that from the "I am from" text in me.html.) The Twitter webpage info points back to http://tilde.club/~silver/, so nothing new there. Likewise Github has the same email and webpage. We can pull avatars from both those sites, which in this case happens to be the same picture. We can also check http://phoenyx.net/silver and http://phoenyx.net/~silver but at the moment both of those 404. (Though Real Soon Now I'm going to make phoenyx.net a tilde site.)

But wait, phoenyx.net's whois record matches Cravens. A quick look at its homepage shows a feed, which is a stupid WordPress one that doesn't include much in the way of author blocks. Looking at the RSS version, we find the author kareninwichita and have to go to the page for that, which tells us Name: Karen Cravens so hey, we have a match. If we want to get bold, we can say kareninwichita is a potential alternate username, and find things like my local Twitter account http://twitter.com/kareninwichita but also someone else's MySpace account. (We can ask the user if Karen Coleman is an alternate name, but hopefully they'll say no.) Similarly, searching for gamehawk turns up more false hits than real ones.

We can check known social networks for me: there's a Karen Cravens on Facebook who lives in Wichita Kansas, and whose avatar matches the Twitter @kareninwichita one. That adds my maiden name and spouse's name to the list of things. And my spouse's name matches the rest of that phoenyx.net whois record, so we can add a mailing address (which matches the hometown) to the information.

An identity is starting to shape up. We have a name, some avatar to choose from, a maiden name, a spouse, a hometown, mailing address, phone number, an email address or two, a couple of blogs, a Facebook, a couple of tweet streams, and a repo. That's enough for a pretty healthy profile page. Let's go stalk another tildenizen.

Looking at http://tilde.club/~silver/sitemap2.html, and "randomly" picking the most recent update, our candidate is http://tilde.club/~john. Oh good, he has a nice cryptic page. (If you haven't figured it out, you might want to go look at it now because I'm going to SPOILER it) No feed, no email links, nothing. We have an image link to a 1x1 pixel, so that's probably not an avatar, and a link to a solitary other page. Which links to yet another page like it, and so on for what, fifteen levels? That's probably more than Spice should follow, but maybe it asks the user and the user says keep going. Eventually, it gets to one that links back to index.html (the symbol is, of course, a tilde on that page, so John is going home) but which also has By <A HREF=http://johnholdun.com>John Holdun</A> on it. A "by" is probably a good keyword to look for, and ~john matches at least the first name, so away we go.

From there we have some lovely links to places PS understands: Instagram (where we get an avatar), Tumblr, Twitter (from which we learn that John is from New York, and see the same avatar), and an email link. There's also a batch of other links there, so let's check them out.

The first one is http://attentiontoretail.co/ which, to a human who scrolls to the bottom, is obviously a blog by John. (It says "ATTENTION TO RETAIL is a blog about thinking about shopping. It is published on Mondays by John Holdun.") But can PS figure that out? There's no handy-dandy Atom or RSS, no standard CMS/blogging software obvious (and those generally build Atom or RSS in any event). The only thing we've got is that by <a href='http://johnholdun.com'>John&nbsp;Holdun</a>. Whois on the domain tells us it belongs to John (by name, though not by email address). Bingo! There's a Twitter link that PS can identify as blog-specific (it has this blog as its homepage, plus the bio includes the Twitter account we already know is John's), as well as a Tumblr, Facebook, and email. We attach all these to John too.

On to the next: http://puthtml.com. There's John's twitter account at the bottom, but no other firm clues, even in whois. To make sure this isn't someone else's page that just happens to link back to John, we should probably ask the user "Is this really John Holdun's page?" User says yes, so we add it. And so on.

We can check known social networks for John as well, but even if we don't find any ~john still ends up with a pretty complete profile page too.


Now, tildenizens are probably not the "best" candidates for this (by which I really mean "worst"). Most of my family members don't have a net presence outside of Facebook, so their Pumpkin Spice profile page is going to be pretty one-dimensional. (And, interestingly, they're even more freaked out about privacy and stalkers and stuff because, I guess, it's better to give all your data to one company than to spread it all over the net? Maybe they have a point. But I once mentioned an AT&T problem on Twitter, had an AT&T rep ask me to "follow and DM my info." Having been down that road before and not particularly wanting to follow yet another AT&T tech support account, I just @-replied them the voice line the ADSL was attached to. I got an anxious reply telling me to delete the tweet because it was public. My response was something along the lines of "This, from a company that wants me to pay them extra to keep them from delivering my name, phone number and home address to the doorstep of half a million people every year?" The tech didn't have an answer for that. Nor did they fix our problem, come to that.)

Page created: 08 November 2014



tilde home
silver home



Click for the [ Random page ]
Want to join the ring? Click here for info.
join random join