The RSS feeds at Pitchfork are pretty broken. The links and guids (which are the same) cycle through a list of servers (www.pitchforkmedia.com, webapp.pitchforkmedia.com:9010, 62.42.344a.static.theplanet.com:9014, etc.). Most of the links don’t work when you try to follow them, and the items keep showing up as new because the guid has changed. Also, links to features are broken: they point to /features/id instead of feature/id.

I wrote a simple script to fix these problems. It also makes the feed valid by changing the author element to dc:creator (author requires an email address) and removing HTML from the title element. Oh, and setting the content-type properly.

For those who are interested, here’s the code:

#!/usr/bin/env ruby
require 'open-uri'
require 'rss/2.0'
require 'rss/dublincore'
require 'dublincore-rss2'

r = Apache.request
r.content_type = 'application/rss+xml; charset=utf-8'
r.send_http_header
exit(Apache::OK) if r.header_only?

DOMAIN = 'pitchforkmedia.com'
RSS_BASE = "http://#{DOMAIN}/rss/"
section = r.path_info.match(%r{(?!/)(\w+)}).to_s
section = 'today' if section.empty?
section.untaint

begin
  feed = open(RSS_BASE + section).read
rescue OpenURI::HTTPError
  exit(Apache::HTTP_NOT_FOUND)
end

rss = RSS::Parser.parse(feed, false)
rss.items.each do |item|
  uri = URI.parse(item.guid.content)
  uri.host = DOMAIN
  uri.port = 80
  uri.path.sub!(%r{/features/}, '/feature/')
  item.guid.content = uri
  item.link = uri
  item.dc_creator = item.author
  item.author = nil
  item.title.gsub!(%r{</?.*?>},'')
end

r.puts rss

dublincore-rss2.rb:

# mix dublincore into rss 2.0
# http://www.cozmixng.org/repos/rss/trunk/lib/rss/dublincore/2.0.rb
module RSS
  Rss.install_ns(DC_PREFIX, DC_URI)

  class Rss
    class Channel
      include DublinCoreModel
      class Item; include DublinCoreModel; end
    end
  end
end
11 November 2006