Python Tutorial: How to Parse and Combine RSS News headlines using feedparser
This python code will take a list of RSS newsfeed urls, fetch and combine all the news headlines into one list. The feedparser package is required, install it with the following command.
pip install feedparser
import feedparser
# Function to fetch the rss feed and return the parsed RSS
def parseRSS( rss_url ):
return feedparser.parse( rss_url )
# Function grabs the rss feed headlines (titles) and returns them as a list
def getHeadlines( rss_url ):
headlines = []
feed = parseRSS( rss_url )
for newsitem in feed['items']:
headlines.append(newsitem['title'])
return headlines
# A list to hold all headlines
allheadlines = []
# List of RSS feeds that we will fetch and combine
newsurls = {
'apnews': 'http://hosted2.ap.org/atom/APDEFAULT/3d281c11a96b4ad082fe88aa0db04305',
'googlenews': 'https://news.google.com/news/rss/?hl=en&ned=us&gl=US',
'yahoonews': 'http://news.yahoo.com/rss/'
}
# Iterate over the feed urls
for key,url in newsurls.items():
# Call getHeadlines() and combine the returned headlines with allheadlines
allheadlines.extend( getHeadlines( url ) )
# Iterate over the allheadlines list and print each headline
for hl in allheadlines:
print(hl)
# end of code
Example Output
Trump lashes out at Schiff over Russia probe memo Former sports doctor sentenced to 40 to 125 years in prison AP Exclusive: 2015 letter belies pope's claim of ignorance Foles outduels Brady to give Eagles their first Super Bowl Philadelphia cleaning up after some celebrations turn unruly How the Eagles beat the Patriots at own game, and why the upset was historic Who is Adam Schiff? Top Democrat Earns Nickname 'Little' From President Trump New bipartisan immigration plan to be introduced in the Senate Martin Luther King Jr. Commercial for Ram Trucks Is Swiftly Criticized Justin Timberlake Halftime Selfie Kid Speaks: 'I Just Went for It' Super Bowl anti-terrorism documents found on plane Larry Nassar sentenced to 40 to 125 years in Eaton County The Kentucky State Police's poorly received Super Bowl joke about jail rape The first 'Solo: A Star Wars Story' trailer is here. But is it any good? Sole Surviving Suspect in Paris Attacks Stands Trial in Belgium Esmond Bradley Martin: Ivory investigator killed in Kenya This is the fourth fatal crash involving an Amtrak train in two months Everything you need to know about the Falcon Heavy launch How the Ending of The Cloverfield Paradox Relates to the Other Cloverfield Movies Rain put a damper on Super Bowl Sunday These are the 3 best Amazon deals that you can get right now Broadcom sweetens its bid for Qualcomm, calling it the 'best and final' offer Israeli man stabbed to death at West Bank settlement Autistic UK Man Accused of Hacking FBI Wins Appeal Against Extradition to US Review: 'This Is Us' (thankfully) resists overdoing Jack's death White House Plans To Withdraw 'Conspiracy Theorist And Anti-Science Extremist’ Pick Hype over Republican Memo leads to calls for Justice Department firing A Suspected Serial Killer May Have Targeted Toronto's Gay Village For Years Oil tanker with 22 Indian crew missing in Gulf of Guinea since Friday We're Never Going To 'Win' In Afghanistan Lasers on planes used to reveal massive massive complex of Mayan ruins Egypt says 4,400-year-old tomb discovered outside Cairo Nassar Reportedly Abused Over 2 Dozen Girls And Women During Sluggish FBI Investigation Alaskan copper mine proposal sparks concern about wild salmon fishery Donald Trump will trigger constitutional crisis if he uses declassified memo to end Russia probe, Democrats warn North Korea makes $200m by 'flouting' sanctions Police release 'kidnapped' priest in DR Congo Court hands Vietnam oil official another life sentence for corruption Paul Ryan: Secretary Getting $1.50 More A Week Shows Effect Of GOP Tax Cuts An Actual Nazi Is About To Get The Republican Nomination For A Congressional Seat Hawaii man says he's devastated about sending missile alert Concussions and Protests: Football's popularity drops Man Charged With Selling Armor-Piercing Bullets to Las Vegas Shooter Stephen Paddock Years Of U.S. Government Lies Could Soon Result In A Kurdish Massacre Mexico: 300 migrants found in dangerously cramped trucks Not just boy and girl; more teens identify as transgender Congo rebel leader extradited from Tanzania to face trial Father Of Otto Warmbier Will Attend Winter Olympics In South Korea: Report Europe must brake mounting nuclear arms race: Germany Leon Panetta on fallout from the Nunes memo This Kid Looking At His Phone Is The Super Bowl's Best Meme Larry Nassar: Thousands raised for father who charged at paedophile doctor jailed for abusing his gymnast daughters Dreamers Need More Cities And States Ready To Defy Trump Warren Buffett on hand as Navy commissions newest warship A Controversial Bill Would Allow Chemical Castration of Sex Offenders in Oklahoma Israel legalizes West Bank outpost after settler killed Uma Thurman Says Harvey Weinstein Assaulted Her Thousands of Greeks protest against Macedonia name compromise Syrian rebels down Russian plane, kill pilot These Pups Rescued From Puerto Rico Are Part Of This Year's Puppy Bowl Woman who cut baby from pregnant neighbour's womb weeps as she apologises in court: 'There is no excuse. There is nothing' Mike Pence to stop North Korea 'hijacking' Winter Olympics, aide says Man dies after rescuing 9-year-old son from aqueduct in Hesperia School system's appeals process leaves some minorities out Illinois GOP Rep. Under Fire For Ad Mocking Transgender Community, Feminists After An NFL Season Defined By Black Protest, The Super Bowl Sticks To Sports U.S. forces begin reducing numbers in Iraq: Iraqi spokesman Poland's top politician: Holocaust bill is 'misunderstood' Bodies of around 20 migrants recovered from sea: Spanish official Suspect arrested in Italy after gun attack on foreigners Madeleine Albright Would Give Devin Nunes An 'F' For Memo Stunt I tried living like Tom Brady for a week Survey: Most residents in struggling US areas respect police In Super Bowl ads, a play for values and a contentious MLK message Indianapolis Colts' Edwin Jackson Killed By Suspected Drunk Driver
Download this code as a IPython Notebook
I hope you find this python example useful and educational. You are free to use the above code how you see fit. I do however suggest that you implement some type of rss feed caching as some services may block your ip for excessive requests.
23 Replies to “Python Tutorial: How to Parse and Combine RSS News headlines using feedparser”
Thanks for the code, just what i was looking for.
Hi, this is great. How can I now write the out put to a csv?
with open(“myfile.csv”,’w’) as f:
for hl in allheadlines:
print(hl)
f.writelines(hl+”n”)
Get the indends right, and I say changed csv to txt as some headlines have commas in them
Mate, you are a CHAMPION! Cheers
Is there a way to include the urls for each post as well? Last question!
thank you for kickstarting my project with this crucial piece of code!
Thanks alot for the script. Can you assist in making the headlines to be live links?
Line 13 just appends the title to the array. Instead, append entire newsitem instead of newsitem[“title”]
Then around line 34-35 you would output the HTML link using hl[“title”] and hl[“link”]
How would I make the title clickable with no link displayed?
thanks
Thanks. Great Code
I’m glad you found it useful!
Thanks a lot. Is there any way to get image link from this? I’m trying to figure out but no luck. Good job on the code.
You can get at the newsitem’s image (if it has one) inside the getHeadlines() function. Since newsfeed items don’t always have images the following code snippet is the easiest solution without raising a key error.
Thank you, you are the best :)
Thanks for the code, is there any way to extract time and date for the feeds and limit to the last 5 mins instead of all the news on the list then extract it to txt file.
Thanks a lot for the code. Is there a way to make the headlines to be clickable links?
is there anyway to weigh the headline which is trending
No not really, but I would suggest you checkout the python module
newspaper
.http://newspaper.readthedocs.io/en/latest/
Is there a way to make this code show the latest headlines in a slideshow format that will shuffle through the newest headlines?
there is a possibility to show the datepublished for every news??
How would you deal with duplicate titles?
Thank you for the great work! So lucky I found this article. However, how can I show the original link next to the headline? Just can’t figure it out…
See previous comments and replies above about modifying Line 13.