I subscribe to both the official and unofficial Python planet feeds because they’ve each got some content that’s unique to one or the other, but the overlap and duplication has really been bugging me lately. So I suddenly remembered the joy that is Yahoo Pipes and glued both of them together with a unique-ifying filter to eliminate duplicates, and, feeling generous, I stuck a friendlier Feedburner URL around it–you can get it at http://feeds.feedburner.com/UnifiedPythonPlanet.
It makes me pretty happy so far, except that somewhere in Pipes land I seem to be losing newlines in pre elements (which appears to be a bug in Pipes itself, argh), which is a little (very) annoying when trying to read people’s code samples. I may end up replacing the Pipes side with a little bit of Python (a little Feedparser and PyRSS2Gen ought to do the trick) if I get the time later today.
Anyway. I figured I should share it, on the offchance it makes anyone else’s life better. (Or if such a creature already exists, let me know and I’ll gladly use that instead.)
geekery, python, rss
feedburner, pipes, planet, python
Tonight was another fun-filled episode of Clepy, our little Python group that could. I presented “Fun With RSS”, a quickie overview of RSS, feedparser, and PyRSS2Gen, plus a little bit of fun combining the two. David Stanek introduced us to IPython, the Python interactive shell on steroids.
We had some nice discussion of some miscellaneous things as well, including PyCon and why you should go (registration is now open!).
We may consider adopting feedparser as its author, Mark Pilgrim, doesn’t appear to be maintaining it any more; I’ve dropped an email his direction, but given his stated desire to find a hobby that doesn’t involve computing, who knows if I’ll get a response? (Okay, so a bit of googling about shows recent evidence of his continued existence, so maybe I’ll try some additional addresses before shouting “fork!” too loudly.)
We’re also planning on replacing the Plone installation of clepy.org with a shiny TurboGears-powered setup. This would at least give us an opportunity to contribute back to TG (through fixes, donated functionality, and general advocacy), as well as build some programming comeraderie through sprints (in person or online) to implement the features we want. Plus it might actually spur some mailing list activity during the weeks between meetings.
clepy, geekery, python, rss, turbogears
I set aside the time tonight to get down and funky with the feedparser.py bug about it discarding unknown elements. I poked, prodded, sliced, diced, got my hands dirty, figured out what the problem was, and fixed it–huzzah! I had just gotten done submitting my patch back to feedparser’s Sourceforge site, when I saw that someone else had already filed a nearly identical patch at the beginning of September. (It’s amusing exactly how similar it is–great minds think alike, I guess?) On one hand, I wish I had seen Adam’s patch right away and not had to fight with it myself, but on the other hand it was a Valuable Learning Experience, the kind of thing that Builds Character. Plus given how long some of the patches have been sitting around without anyone looking at them, I think it’ll be at least six months before I actually manage to accrue any public embarrassment for my duplicate solution to the problem… I just hope one of the patches gets rolled into the next release, because it looks like while this is a cool project, it is desperately in need of some cleanup and love.
The practical upshot of this is that my patched feedparser.py is no longer discarding the “mood” and “music” information from LJ feeds, so I am now feeding pirnat.com with the same level of syndicated goodness that it had before I switched over to feedparser from my home-brew RSS parser.
Additionally, I was able to get the “filtered feeds” working and automatically publishing to the website as well. I need to clean up the site a bit and link to them somewhere, but that’s luckily the easy part. For a sample of what I mean, check out:
Now, maybe one of these days, I’ll have something worthwhile to say!
- Mood:slightly annoyed with myself
- Music:Orbital – “Halcyon (Live)”
geekery, livejournal, python, rss
My first dabbling with feedparser and PyRSS2Gen went so well that I’ve finally worked up the gumption to rewrite the code that pulls my RSS feed from LiveJournal, renders it to static HTML, and publishes it to where pirnat.com is hosted. Minus one annoying bug in feedparser (it discards unknown elements, so right now my LJ-specific data like “mood” and “music” isn’t making it through), it’s going pretty smoothly so far. I just switched my crontab over to using the new setup, so that means I am happy enough with it to use it.
Liz has to do the wine sales thing tomorrow, so I hope to get some more accomplished. At the very least I want to hook the feed filtering into the little cron script so that I can start producing topic-specific RSS feeds. Once that works, I will of course have to shake my website update wand and actually link to the things (in case anyone out there ever gives a crap). If all that gets done, then it’s time to dig into the feedparser bug and see if I can figure out how to make it behave a little better with unknown elements.
Whee!
- Mood:geeky
- Music:Pi soundtrack
geekery, livejournal, liz, python, rss
I should have looked into this stuff a while ago, because it’s entirely too handy and, in a word, pythonic. I will probably rewrite my crufty home-brew LiveJournal RSS parser using Mark Pilgrim’s excellent Universal Feed Parser. Additionally, since I’m not sure if Planet allows easy filtering based on category, and I might like my occasional TurboGears rambling to show up there, I think I will combine feedparser with PyRSS2Gen to build a simple little program to pull, filter, and re-emit RSS feeds.
Of course, knowing my luck, something like that already exists, but at least it’ll be fun.
Basically, it would be something like this:
import feedparser
import datetime
import PyRSS2Gen
# get the data
d = feedparser.parse('http://exilejedi.livejournal.com/data/rss/')
# do the filtering & build a list of RSSItem objects
items = [PyRSS2Gen.RSSItem(
title = x.title,
link = x.link,
description = x.summary,
guid = x.link,
pubDate = datetime.datetime(
x.modified_parsed[0],
x.modified_parsed[1],
x.modified_parsed[2],
x.modified_parsed[3],
x.modified_parsed[4],
x.modified_parsed[5]))
for x in d.entries if some_criteria(x)]
# make the RSS2 object
rss = PyRSS2Gen.RSS2(
title = d.feed.title,
link = d.feed.link,
description = "ExileJedi's Filtered RSS Feed",
lastBuildDate = datetime.datetime.now(),
items = items)
#emit the feed
xml = rss.to_xml() # or perhaps do rss.write_xml(some_file)
geekery, livejournal, python, rss, turbogears