Yet Another Blogger Tips Blog

Wednesday, March 14, 2012

Google App Engine, Python 2.7, and lxml

Google App Engine (GAE) lets you build and run applications on Google's infrastructure. I have done two applications with GAE and Python (the programming language). The previous version of my tool to get blogger avatars (avafavico) used regular expressions to parse the Blogger profile page for the profile photo, or user's first contributed blog's favico, if no profile photo is found. As you might know, using regular expressions may not be the best solution. It would be better to parse html page document object model (DOM) and get results there. That way the code is not so sensitive to possible changes in the page html code. Of course I did the regular expressions so, that they should work with different html.

Previously I used GAE Python 2.5 environment, which was the default and supported version. On February 27th Python 2.7 became fully supported. GAE with Python 2.7 contains more external libraries than version 2.5, one of those libraries being lxml.

I tried to search for different solutions and examples, but there were not many. With GAE Python 2.5, one can use BeautifulSoup, but there are some issues (problematic 3.1.0 version, uncertain development future of BS, etc.). And there is minidom, but it may not handle broken html well. Blogger profile page should not have broken html, but you never know. lxml is definitely better and faster, supports XPath, etc.

The day before yesterday I updated the GAE app to use Python 2.7 and lxml. There were none to some examples about using python27, lxml and GAE, so I'll show you here a working example. First I started modifying the file app.yaml, there I changed runtime to python27, added "threadsafe" (false), and added (latest) lxml in libraries section. Increased version number to 5. Now app.yaml looks like this:

Data provided by Pastebin.com - Download Raw - See Original

In blogava.py I added "from lxml import etree" and then used etree functions instead of regular expressions to find things in html. Here's how DOM tree is constructed, variable "result" contains page html as a string, and then XPath is applied to the tree, like this:

>>> tree = etree.HTML(result)
>>> r=tree.xpath("//img[@id='profile-photo']/@src")

Here find from the tree the first img tag, which id is set to "profile-photo", and get that tag's src attribute. In the full script, if no id='profile-photo' is not found, then try to search for the first image that has class "photo". If both fails, search for first "contributed-to" blog, and use it's favicon, if that is not found, use Blogger favicon. And here is the blogava.py source file:

Data provided by Pastebin.com - Download Raw - See Original

This new version of avafavico has been up and running for two days. I'm very pleased that I got lxml working with GAE, all in all it was quite easy. Hope this example is useful to someone. If it helped you, please leave a comment. :)

Thursday, March 8, 2012

YouTube/JSONP oEmbed service "oembed-js"

In the previous post I introduced my oembed-js service and an application using it: YouTube embeds in Blogger comments.

Oembed-js is a Google Appengine application coded in Python. It relays YouTube oembed calls and offers results also in JSONP format, so javascript applications can use it.

Parameters

url: the (youtube) url to embed (best to be urlencoded)
maxwidth: max width of embedded object (optional)
maxheight: max height of embedded object (optional)
format: json/jsonp or xml (optional, default:json/jsonp)
callback or jsonp: the javascript callback function, required, if jsonp format is wanted
callID: a call ID that is passed through to response (optional)

Sample call:

http://oembed-js.appspot.com/?callback=foo&callID=123&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DGMra5Wh51oU

Response

YouTube oEmbed response in json(p) or xml format, including field html (see oembed specs), and:

callID: if callID was given, it is passed through (can identify the response)
url: pass through the url that is being embedded
videoid: contains 11 char long YouTube videoid, if single video embed
playlist: contains playlist id, if playlist embed

Sample response:

foo({"provider_url": "http://www.youtube.com/", "version": "1.0", "title": "Sviatoslav Richter in Kiev, 1958 - Pictures at an Exhibition", "url": "http://www.youtube.com/watch?v=GMra5Wh51oU", "type": "video", "videoid": "GMra5Wh51oU", "thumbnail_width": 480, "height": 344, "width": 459, "html": "<iframe width=\"459\" height=\"344\" src=\"http://www.youtube.com/embed/GMra5Wh51oU?fs=1&feature=oembed\" frameborder=\"0\" allowfullscreen></iframe>", "author_name": "FirstPublicChannel", "callID": "123", "provider_name": "YouTube", "thumbnail_url": "http://i4.ytimg.com/vi/GMra5Wh51oU/hqdefault.jpg", "thumbnail_height": 360, "author_url": "http://www.youtube.com/user/FirstPublicChannel"})

Another demo how to use it

Enter YouTube URL (video or playlist):

This demo and its source is also here in this jsFiddle.

Update March 27th 2012: Now oembed-js supports youtu.be urls (YouTube oembed service does not support them). Also now supports videos which have embedding turned off (YouTube oembed does not support these either).

Sunday, March 4, 2012

Allow users embed YouTube videos in Blogger comments

Some time ago while googling something about video embeds I came across with oEmbed specification. This simple API allows websites to display embedded content from oEmbed providers. After some inspecting I used the oEmbed in a Python script to turn youtube links in a text into embedded videos. Worked nicely.

oEmbed returns data in JSON or XML format. That is ok with server side languages like php and Python. But for javascript this is a problem. If there is a problem, usually solution is found, and I found Embedly service, which is unfortunately not free.

With no intention to invest any money, and hands itching for another Google Appengine Python app, I saw an opportunity. To make a service that relays YouTube's oEmbed results to javascript format (JSONP). And all with memory caching etc. so it should be able to handle some traffic. After some coding it was ready and working http://oembed-js.appspot.com/?url=[youtube url to embed][&callback=foo] But now where to use it...

In Wordpress blogs, when commenting an article, user can just write/paste youtube url into the comment text, and it will be turned into a video embed. Surprise, not in Blogger. For some blogs this is a bummer. Now with my new, free, javascript/jQuery compatible GAE service oembed-js this became possible in Blogger. Of course we need to hack the blog first a bit. :)

See the demo blog!

You can decide, whether you want to turn all raw youtube links (watch and playlist links) into embeds, which is the default, or only those, that use [video=youtube_url] tag. You can also change the tag name, if you want, maybe you like youtube or embed more. If you have lazy loading installed in your blog, by default this script will use it, and pages with many videos load much faster. Installation is simple, open template html for editing, find </head> and paste this code before it:

Data provided by Pastebin.com - Download Raw - See Original

This script does not work in dynamic views, but it works with Blogger native comments, my threaded comments, and also Blogger threaded comments. Hope you like it. I'll tell you later about the oembed‑js service API (application programming interface), so it is easier for you to develop own applications.

References:
- oEmbed specification
- YouTube API blog: oEmbed support

Update 27th March 2012: Line #74 updated, to handle better the case of only a YouTube url as comment text in new Blogger threaded comments (you do not need to update if you use videotag, or don't use Blogger threaded comments).

Pages

Wednesday, March 14, 2012

Google App Engine, Python 2.7, and lxml

Thursday, March 8, 2012

YouTube/JSONP oEmbed service "oembed-js"

Sunday, March 4, 2012

Allow users embed YouTube videos in Blogger comments