Download Posters with ‘The Movie Database’ API in Python

The Movie Database is a community maintained movie, tv and actor database. One of its most useful feature is artwork, something that IMDb does not provide. XBMC uses the database as the default resource for posters and backdrops.

All content on the website can be accessed with a public API, currently in version 3. This blog post illustrates how to use the API to download posters for a movie. Here’s the final result.

Step 1: account creation and API key

To use the API you must have an account and an API key. You can register the latter in your account settings under section API.

Step 2: system wide configuration information

Many of the API request, including the image download, rely on a system wide configuration. All artwork, for instance, is stored on Cloudfront and the API gives the image locations relative to this base. The API request to get the configuration has the following simple format:

http://api.themoviedb.org/3/configuration?api_key=<your_api_key>

Here’s a small Python snippet to request the configuration and store it as config.

import requests
CONFIG_PATTERN = 'http://api.themoviedb.org/3/configuration?api_key={key}'
KEY = '<your_api_key>'

url = CONFIG_PATTERN.format(key=KEY)
r = requests.get(url)
config = r.json()

The request will return the data in JSON by default with the following content:

{'change_keys': ['adult',
                  'also_known_as',
                  ...,
                  'translations'],
 'images': {'backdrop_sizes': ['w300', 'w780', 'w1280', 'original'],
             'base_url': 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/',
             'logo_sizes': ['w45', 'w92', 'w154', 'w185', 'w300', 'w500', 'original'],
             'poster_sizes': ['w92', 'w154', 'w185', 'w342', 'w500', 'original'],
             'profile_sizes': ['w45', 'w185', 'h632', 'original'],
             'secure_base_url': 'https://d3gtl9l2a4fn1j.cloudfront.net/t/p/'}}

We need two values from the images section:

  • base_url: this is where the images are stored.
  • poster_sizes: those are the available sizes.

Let’s assume we want to download the maximum size, which seems to always be the last element of the sizes list. For example, original in [‘w92′, ‘w154′, ‘w185′, ‘w342′, ‘w500′, ‘original’] will lead to the maximum resolution. I’m not certain though that the API will always return sizes in ascending order, therefore I use a custom sort function to get largest size:

base_url = config['images']['base_url']
sizes = config['images']['poster_sizes']
"""
    'sizes' should be sorted in ascending order, so
        max_size = sizes[-1]
    should get the largest size as well.        
"""
def size_str_to_int(x):
    return float("inf") if x == 'original' else int(x[1:])
max_size = max(sizes, key=size_str_to_int)

Step 3: Get available poster urls

Now that we have the necessary information from the configuration, we can proceed to request the posters for the desired movie. The request to get the images has the following format:

http://api.themoviedb.org/3/movie/<imdbid>/images?api_key=<key>

where ** is the IMDb movie id, e.g., *tt0095016* (*you can find out the ID for a movie title with the script at the end of this blog post*). The following Python script retrieves the image information from the movie database:

IMG_PATTERN = 'http://api.themoviedb.org/3/movie/{imdbid}/images?api_key={key}' 
r = requests.get(IMG_PATTERN.format(key=KEY,imdbid='tt0095016'))
api_response = r.json()

The API response has the following format:

{'backdrops': [{'aspect_ratio': 1.78,
                 'file_path': '/sEkWkPFIcoSyP3qRiZunyOfdMpv.jpg',
                 'height': 1080,
                 'iso_639_1': None,
                 'vote_average': 5.36734693877551,
                 'vote_count': 7,
                 'width': 1920},
                ... ],
 'id': 562,
 'posters': [{'aspect_ratio': 0.67,
               'file_path': '/mc7MubOLcIw3MDvnuQFrO9psfCa.jpg',
               'height': 1500,
               'iso_639_1': 'en',
               'vote_average': 5.45518207282913,
               'vote_count': 5,
               'width': 1000},
              ...]}

To later download the poster, we only need the file_path information. With the information from the system-wide configuration (step 2), we have all information to build the full url to the image as follows:

url = <base_url> + <max_size> + <rel_path>

for example

base_url = 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/'
max_size = 'original'
rel_path = 'mc7MubOLcIw3MDvnuQFrO9psfCa.jpg'
url = 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/original/mc7MubOLcIw3MDvnuQFrO9psfCa.jpg'

The following Python snippet assembles the image urls and adds them to a list:

posters = api_response['posters']
poster_urls = []
for poster in posters:
    rel_path = poster['file_path']
    url = "{0}{1}{2}".format(base_url, max_size, rel_path)
    poster_urls.append(url)

 

Step 4: download posters

Finally, we store all posters as poster_1.jpg, poster_2.jgp, etc. in the current directory:

for nr, url in enumerate(poster_urls):
    r = requests.get(url)
    filetype = r.headers['content-type'].split('/')[-1]
    filename = 'poster_{0}.{1}'.format(nr+1,filetype) 
    with open(filename,'wb') as w:
        w.write(r.content)

The final code can be seen on GitHub Gist.

Bonus: get IMDb id for movie title

The following function uses the undocumented IMDb API to get the IMDb ID for a movie title.

import requests
import urllib

def imdb_id_from_title(title):
    """ return IMDB id for search string

        Args::
            title (str): the movie title search string

        Returns: 
            str. IMDB id, e.g., 'tt0095016' 
            None. If no match was found

    """
    pattern = 'http://www.imdb.com/xml/find?json=1&nr=1&tt=on&q={movie_title}'
    url = pattern.format(movie_title=urllib.quote(title))
    r = requests.get(url)
    res = r.json()
    # sections in descending order or preference
    for section in ['popular','exact','substring']:
        key = 'title_' + section 
        if key in res:
            return res[key][0]['id']

Link to Gist.

comments powered by Disqus