Monday, June 10, 2013

Get RSS for your website using jQuery and PHP

http://www.openlogic.com/wazi/bid/295699/get-rss-for-your-website-using-jquery-and-php


Oscar Wilde once said, "It is a very sad thing that nowadays there is so little useless information." Given the number of RSS feeds now available, it appears things may have changed since his time. Nevertheless, you might want to get, process, and display information from an RSS feed on your site. For just showing a feed, a simple news aggregator is enough, but getting a feed directly from a web page is a thornier problem. In this article we will examine ways to fetch and process a news feed from a web page using AJAX to get the data, and see different ways of processing the resulting XML or JSON code. (If all these abbreviations make you nervous, see Jargon Untangled.)
Jargon untangled
Here's your map out of the alphabet jungle:
  • AJAX (Asynchronous JavaScript and XML) is a technique that allows a web application to communicate with a server in the background, without interfering with users. An AJAX-enabled site can move data to and from a server almost without noticeable delays or pauses. In addition to XML, AJAX applications can also exchange JSON, plain text, or any other data formats.
  • DHTML (Dynamic HTML) is a group of techniques that allows programmers to dynamically change the look and content of a web page after it has been downloaded from the server, possibly as an effect of data entered by the user or brought forth by using AJAX. Modern highly interactive applications such as Gmail and Google Maps use DHTML extensively.
  • JSON (JavaScript Object Notation) is an alternative format to XML for data representation. Since it's JavaScript-based, it can be processed efficiently by a browser.
  • RSS (Really Simple Syndication) is a name for XML-based formats used for frequently updated items such as news posts or blog entries. An RSS document is usually called a feed or channel.
  • XML (eXtensible Markup Language) is a standard syntax for creating custom structures for representing and sharing data. All RSS feeds use XML.
Before we start, let's set up a test environment. We want to fetch and process a feed from a web page, so let's make a simple page with a text field and a few buttons.
An empty form
This (really basic) form lets you pick a RSS URL and get it in four different ways
You can enter the URL of a feed in the text field and click on a button to load the feed by different methods, producing a suitable, though minimal, news display with just titles and descriptions.
Sample result
A sample result, after processing a URL feed
We shall use jQuery to simplify our coding. (I opted to use the latest 1.x release, because from version 2.0 onward jQuery loses compatibility with the older Internet Explorer 6, 7, and 8 browsers.) We shall be using it for DHTML and DOM work (check out the clearAllNews, addNews, and feed functions below), for AJAX (as we'll see in the code called by the buttons), and more; just look for the dollar sign ("$") in the code to find all of its usages. Though we barely scratch the surface of jQuery in this article, the given code should give you a taste of the simplified kind of programming that it allows.

  
    Get RSS feeds

    
    
    
    
    
    
    
  
  
    Feed to get:
    
    
    
    
    
    
With this code out of the way, we can focus on actually getting and processing an RSS feed.
The "Same Origin Policy" Problem
Before even thinking about getting a feed, you should take into account the Same Origin Policy (SOP), which throws a big wrench into your programming. The SOP is a security restriction that won't allow a page that was loaded from a certain "origin" (meaning URL, formed by a protocol/host/port trio) to read or modify data from a different origin. For example, if your web page was loaded from http://your.site.com:80/some/place, the SOP won't allow it to get data from any other origin; for example, it won't let you read https://your.site.com (different protocol), http://other.site.com (different host), or even http://your.site.com:8080 (different port).
SOP is a good idea because it blocks rogue JavaScript from one origin that might attempt to manipulate or examine data from any other origin. Without it, a phisher could lure users to a legitimate page that could be monitored by a third party. With SOP in place, you can be assured that anything you view comes from the expected origin, and no code from other sites may be involved.
For developers, SOP can sometimes be a bother. Even if you have valid reasons for getting data from another origin, as we want to do in our feed-fetching page, SOP won't let you. Your request will simply fail.

Doing it by proxy

The first method to get a news feed from a client browser requires a proxy. Since the browser won't allow a web page to get the desired news feed directly (see The "Same Origin Policy" Problem) you have to go a roundabout way, and using a proxy is the time-honored (and, from the point of view of security, the best) technique. The web page can connect to a short, simple script on your own server (the browser won't object to that, since the web page itself came from your server) and that script can take care of getting the news feed and sending it back to your page. All the script has to do is pass back whatever it receives, as you can see below. You pass to it a feed parameter that specifies the desired news feed, and it sends back the feed's contents.

You must make sure that you are actually getting a URL, because 
otherwise a hacker could ask for a file name, and get its results handed
 to the browser. The script therefore returns error 403, plus an appropriate explanation, if a non-valid URL is detected, and error 404 for a non-existing feed.
Testing this code is easy. Open a console and use wget or curl to get a feed.
>curl 127.0.0.1/rss_wazi/rss_read.php?feed=http://rss.cnn.com/rss/edition_technology.rss



CNN.com - Technology
http://www.cnn.com/TECH/index.html?eref=rss_tech
CNN.com delivers up-to-the-minute news and information on the latest top stories, weather, entertainment, politics and more.
en-US
Copyright 2013 Cable News Network LP, LLLP.
Sat, 01 Jun 2013 13:28:49 EDT
10

CNN.com - Technology
http://www.cnn.com/TECH/index.html?eref=rss_tech
http://i.cdn.turner.com/cnn/.e/img/1.0/logo/cnn.logo.rss.gif
144
33
CNN.com delivers up-to-the-minute news and information on the latest top stories, weather, entertainment, politics and more.

Film to digital: Seeing movies in a new lighthttp://www.cnn.com/2013/05/31/tech/innovation/digital-film-projection/index.htmlhttp://rss.cnn.com/~r/rss/edition_technology/~3/sVHlRDRF6v8/index.htmlVast majority of theaters have changed to digital projection. Lots of pros, including sharp picture and less wear, but some still miss film.<img src="http://feeds.feedburner.com/~r/rss/edition_technology/~4/sVHlRDRF6v8" height="1" width="1"/>Fri, 31 May 2013 12:17:45 EDThttp://www.cnn.com/2013/05/31/tech/innovation/digital-film-projection/index.html
...
... plenty of lines snipped out
...


Using the PHP proxy

Now, let's end the job. Whenever the user clicks the PHP Proxy button on the web page, the browser calls the usePhpProxy function with the value of the input field as a parameter. You can easily call the proxy with the jQuery.ajax(...) function, whose parameters are:
  • the URL of the service we are calling; you can assume it is within your own website (because of SOP restrictions) and just specify rss_read.php
  • an object with any parameters the service might require; in this case, it's just the URL of the feed we actually want
  • the data type of the result, which will be XML
  • an error function, which is called if the proxy produces some error, and
  • a success function, which is called to process the results provided by the proxy.
function usePhpProxy(feedToGet) {
  clearAllNews();
  jQuery.ajax("rss_read.php", {

    "data": {
      "feed":feedToGet
    },

    "dataType": "xml",

    "error": function(jqXHR, textStatus, errorThrown) {
      showError();
    },

    "success": function(xml, textStatus, jqXHR) {
      $(xml).find("item").each(function() {
        var xmlTitle= $(this).find("title").text();
        var xmlLink= $(this).find("link").text();
        var xmlDesc= $(this).find("description").text();
        addNews(xmlTitle, xmlDesc, xmlLink);
      });
    }
  });
}
The jQuery XML processing functions make it easy to get at every in the feed and pick out its title, link, and description, in order to build up the results page. You need not write an explicit loop; the jQuery each() function does the job, and the addNews(...) function shows the actual feed data.
So far, we've done all the basic work needed to show RSS on a page, and we managed to get the data in a "do-it-yourself" fashion by using our own PHP-coded proxy. There's still more we can do, however, and next we'll turn to other APIs that can reduce our task load even further.

No comments:

Post a Comment