Scrape and Feed with Python

Posted in datascience

How to scrape information on a page and serve them as an RSS feed?

This Python tutorial explains how scrape the information published on a webpage and transform them in a smart and clean RSS feed.

Let’s begin from this trial page I’ve built. What we want to do is transform all the different paragraphs, titles and dates into a valid RSS feed.

In order to do so I will use two libraries: lxml for the scraping and Yattag for generating the XML code of the feed.

Here’s the full code, I go through it in the comments. Here you find it in a github gist.

The internet is the most incredible infrastracture ever built. Human beings learn through dialogues. Open sourcing your ideas means enhancing them. I keep my brain alive reading, writing and coding. Economics student, I'm (slowly) teaching myself Data Science. Hacktivism with folks.

