RSS as HTML

Update (2024-01-28): I've removed support for this on my site's blog feed. To find out why, see my new post explaining why I removed my XSLT feeds template.


Have you seen the RSS Feed for this blog? Turns out the major browsers all support a decent subset of XSLT.

What's that mean though? It means you don't need to format your content as both a page and a feed. Your pages can be RSS feeds and visa versa. Your audience can visit the same URL in their browser as in their feed reader. Even if you aren't replacing existing pages with their feed alternate, you don't have to link browsers to unstyled XML documents (angle brackets panic the users). Instead, you can use all the same styles, scripting, and multimedia from the rest of your pages to hide RSS in plain sight.

RSS Primer

RSS, despite many companies' desire to see it die, remains one of the best ways to keep tabs on what people publish to the web. Thanks to Aaron Swartz, Dave Winer, and dozens of the folks behind a large number of the most popular CMS, a large contingent of web content authors today produce feeds you can subscribe to. Other people have done a better job espousing the virtues of RSS than I will here. What I hope you're more interested in is the guts of how it works in this context.

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>

<title>VE3ZSH - Blog</title>
<link>https://ve3zsh.ca/blog/index.html</link>
<description>Newest posts from VE3ZSH's blog.</description>
<language>en</language>

<item>
	<title>RSS as HTML</title>
	<link>https://ve3zsh.ca/blog/2020/08/03/rss-as-html.html</link>
</item>

<item>
	<title>Animated Reel Menu</title>
	<link>https://ve3zsh.ca/blog/2020/04/15/animated-reel-menu.html</link>
</item>

<item>
	<title>How To Secure Application Credentials</title>
	<link>https://ve3zsh.ca/blog/2020/01/24/how-to-secure-application-credentials.html</link>
</item>

</channel>
</rss>

It's almost comically simple if you really know HTML (nesting <div> elements isn't HTML). Start with a doctype <?xml …?>. Next, a document root <rss> and body <channel>. Instead of a separate head, you put the channel's metadata in the channel itself along with the <item> elements. Each item consists of a <title> and <link>. That's it!

Sure, you can add more. You can actually add anything you like. That's the extensible part of XML. It's all a matter of what feed readers are looking for. We'll come back to that later.

Format Wars

Some of you might be asking, what about Atom? I think there's some unnamed law of humans and technology that leads to format wars. Problems it solves:

You can use either RSS 2.0 or Atom, it doesn't matter. Just about every reader you can find supports both because they're functionally equivalent. Just different names for the same information.

Wait, RSS 2.0? What happened to earlier versions? There's a number of v0.XX versions as RSS went through changes before v1 was tagged. Some claim this protocol versioning and Mozilla's early non-compliant implementation were reasons Atom had the traction it did. While many feed readers can deal with these older versions, it makes much more sense to just use v2.0.

Another fun fact, RSS 3.0 is a thing. Nobody publishes their feed in it as far as I know. No readers support it as far as I know. Aaron standardized and published the specification for it though. What makes it different?

Then there's JSON Feed. Again, nobody publishes or reads these as far as I know. What are its differences?

You can even go h-feed if you want. It's a part of the Microformats standard and a way of adding a set of class attributes to your existing pages. In theory, feed readers could use your existing pages as a feed. In reality, it seems only to appease the SEO gods. There are a couple feed readers for this, but they're not clients. Seems the people working on this want you to setup a number of servers for everything from authentication to pub/sub and join what they're calling the IndieWeb. You can read more about the architecture in their page about social readers if that interests you.

I'm sure there's a dozen other feed formats that nobody actually reads or writes, specified on some blog, wiki, or webpage. Every single one is the same essential thing: a document that contains a series of URLs where new ones indicate new content.

Theory of Operation

With all of that out of the way, how do you get webpages that work in feed readers? Remember how I said XML can contain anything you want. You can put anything inside and it's up to a reader what it wants to look at? Well, you can include an xml-stylesheet processing instruction right below the <?xml …?> declaration.

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="style.xml" type="text/xsl" media="screen"?>
<rss version="2.0">
<channel>
…

With that, you can link a document containing XSLT. XSLT are just a set of XML elements that an XSLT parser can use to transform an XML document. Feel free to explore the full element list along with the list of XPath functions. Wait, functions? Yeah, XPath and by extension XSLT offer a set of data transformation functions.

XHTML

How does this extra XML document help us? Well, the W3C spent a really long time between HTML version 4 and HTML version 5 specifying and standardizing XHTML. Despite being a complete waste of time, it means you can put HTML directly in XML.

Assuming your HTML isn't garbage, there's usually little required to convert your HTML to XML. The first change from HTML 5 is that every tag must be closed. This means self closing tags like <br> need to indicate self closure (<br/>). This goes for all self closing tags including <link>, <meta>, and <input> elements. You also can't get away with incorrect nesting (<p>Hello, <em>World!</p></em>) or the magic rules around tags like <p> and <li> which automatically close when they encounter any element other than phrasing elements.

Likewise, any boolean attributes without a value need to be given their proper value. This includes attributes like hidden, defer, and checked. They all need their own name as the attribute value (hidden="hidden").

Besides that, the only other gotcha is & entities. In XML, there are no named entities. All of them must use their numbered unicode code point escape. For example a &nbsp; would be &#160; and &lt; would be &#60;. You can get any code point fairly easily using CyberChef.

XSL Templates

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/rss/channel">
<html lang="en">
<head>
	<meta charset="utf-8"/>
	<meta name="viewport" content="width=device-width, initial-scale=1"/>
	<title><xsl:value-of select="title"/></title>
	<script src="script.js" defer="defer"></script>
	<link rel="stylesheet" href="style.css"/>
</head>
<body>

<h1>Recent Blog Posts</h1>

<ul>
<xsl:for-each select="item">
	<li>
		<a href="{link}">
			<xsl:copy-of select="title/node()"/>
		</a>
	</li>
</xsl:for-each>
</ul>

</body>
</html>
</xsl:template>
</xsl:stylesheet>

Now that your HTML is also XML we can define a template using it. We begin with a doctype <?xml …?>. The next line is an <xsl:stylesheet> root element that gives us access to the XSL namespace (xmlns) of XSLT using the xsl: element name prefix. Technically this element's name is <http://www.w3.org/1999/XSL/Transform/stylesheet> but the namespace lets us shorten that.

With that out of the way we can define the body of our XSLT which will be an <xsl:template> element. This element has a match attribute which specifies an XPath of the elements in the document the template is applied to. Here we specify /rss/channel as there's nothing we want access to from outside that scope. Inside the template, we put our XHTML. Note that this can include a <head> section with elements like <link rel="stylesheet"> and <script>. Once the XSLT are applied you have all the same HTML features you've come to expect.

One thing we can do in the <head> is use an <xsl:value-of select="title"> to get the value of the RSS <title> from the <channel> and use it as the title of the page. We select title because our template already set the context to /rss/channel and this is actually selecting /rss/channel/title.

Most of the document is boilerplate with a simple <h1> and <ul> list to keep the example simple. The next important element is the <xsl:for-each select="item">. This element loops over every element matching /rss/channel/item because our current context is /rss/channel and it selects item. For each item it will insert the value inside itself into the template.

The <xsl:copy-of select="title/node()"> here is similar to the <xsl:value-of> we saw in the head except <xsl:copy-of> doesn't give us just the text, instead it gives us the element we select and all its children. We use title/node() to give us just the contents of the <title> element inside itself meaning we don't put a <title> tag on the page. Note our context is /rss/channel/item so it's selecting /rss/channel/item/title/node().

The other new construct is {link}. These curly brackets can be used in XSLT when you are inside an attribute string but want to use an XPath expression to insert a value. Here the XPath is link or /rss/channel/item/link given the context.

Advanced Cases

We've already covered 99% of what you'd need to write your own. In this section I'll go over some things that you might want to try in your own setup.

Accessing Element Attributes

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">

<title>VE3ZSH - Blog</title>
<link rel="alternate" href="https://ve3zsh.ca/blog/index.html"/>
<updated>2020-08-03T00:00:00-04:00</updated>
<author>https://ve3zsh.ca/</author>
<id>https://ve3zsh.ca/</id>

<entity>
	<title>RSS as HTML</title>
	<link rel="alternate" href="https://ve3zsh.ca/blog/2020/08/03/rss-as-html.html">
	<id>https://ve3zsh.ca/blog/2020/08/03/rss-as-html.html</id>
	<updated>2020-08-03T00:00:00-04:00</updated>
</entity>

</feed>

If you do go the route of an Atom feed, you'll notice links are found inside attributes instead of the values of <link> elements. To access attributes in XPath, you use the @attr selector. For example, the anchor elements would go from <a href="{link}"> to <a href="{link/@href}">.

Descriptions & Other Tags

While all you need is <title> and <link>, I'd also suggest the <description> element. While some feeds don't include content, many people who use feed readers prefer to get the content of the site in the feed itself (not me). To do this, you can add a <description> element to your RSS feed <item> elements. This description technically supports HTML but it's often best to stick to plain text for the widest compatibility.

While adding optional elements that some feed readers deem mandatory, I'd also suggest <pubDate> as there are some readers that won't display your feed at all without it. It's an RFC 822 formatted date (e.g. <pubDate>03 Aug 2020 00:00:00 GMT</pubDate>).

If you're eager to put images into everyone's feed reader there's the <image> element. It lets you link a GIF, JPEG, or PNG for the channel to be displayed in a feed reader. Be sure to read the spec for it as there's multiple required sub-elements. On a fun note, the specification says the assumed dimensions of your image are 88x31, so feel free to get retro with it.

Sadly, nobody supports the <cloud> element as far as I know of, but technically this provides push based RSS. Using an HTTP-POST, XML-RPC, or SOAP based API you can actually run a server implementing the rssCloud API. I wouldn't suggest going through the effort to implement it though given broad lack of support in both feeds and readers. It's interesting to see new standards like ActivityPub and Micropub don't support or even acknowledge this existing standard.

Better Dates

One of my only complaints with RSS is its use of RFC 822 dates. To get better dates, you can go one of two routes, substrings or custom element.

The substring method would be using an <xsl:value-of> element with a select that includes the substring() function to grab pieces of the date. An example that grabs just the date would be select="substring(pubDate, 1, 11)" which would get you the value 03 Aug 2020. It's not bad, but it's not really the date format I prefer.

To use an alternative format like ISO 8601 it's simpler to just add a custom element. I use <isoDate> but you can use whatever you prefer. If you're using Atom, don't forget to obey the namespacing rules and put your new element in its own namespace. Fun observation, most feed parsers don't care about XML dogma.

Conclusion

I know it's been a long one. Why share all this? I'd love if more devs working on projects could sneak RSS into the system. These days I just check my feed reader about once a day or so for updates instead of getting sucked into hours of checking and rechecking all the engaging addicting content portals. Humanist technology should focus on improving quality of life and getting out of the way. Too bad the economic model in vogue focuses on trying to do the opposite.