Quantcast
Channel: User Justin Opolony - Stack Overflow
Viewing all articles
Browse latest Browse all 3

How to detect updates in podcast feeds?

$
0
0

I have a large set of podcast feed URLs which I'm periodically polling to check for updates. I'm really struggling to find a robust way to detect if a feed has changed that doesn't have any false positives. I'd like to be able to detect not just if there is a new episode, but also if an existing episode was updated.

RSS and Atom feeds provide pubDate, lastBuildDate or updated elements. However, I'm finding these frequently misused so that the feed is actually inserting the current date time into these fields each request. This makes them difficult to rely on to detect changes.

My next thought was to strip all date information from the podcasts, then MD5 hash the feed contents. I can then compare the feed hashes to detect changes to the feeds.

This seems to work for about 90% of the cases. However, there are still hundreds of podcasts that insert dynamic data into their feeds.

One podcast has the following as their podcast cover art:

http://erikglassman.hipcast.com/albumart/1000.1439649026.jpg

Where 1439649026 is what I assume is a timestamp. This second number changes with each request of their feed.

This is starting to seem like a losing battle. If I can't reliably trust the date fields of a podcast feed, and if some percentage of podcasts insert dynamic data into their feed text, how can I reliably detect changes to a feed in a robust way?


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images