Skip to content

Catalog

Data Models

all types of catalog items inherits from Item which stores as multi-table django model. one Item may have multiple ExternalResources, each represents one page on an external site

classDiagram
    class Item {
        <<abstract>>
    }
    Item <|-- Album
    class Album {
        +String barcode
        +String Douban_ID
        +String Spotify_ID
    }
    Item <|-- Game
    class Game {
        +String Steam_ID
    }
    Item <|-- Podcast
    class Podcast {
        +String feed_url
        +String Apple_ID
    }
    Item <|-- Performance
    Item <|-- Work
    class Work {
        +String Douban_Work_ID
        +String Goodreads_Work_ID
    }
    Item <|-- Edition
    Item <|-- Series

    Series *-- Work
    Work *-- Edition

    class Series {
        +String Goodreads_Series_ID
    }
    class Work {
        +String Douban_ID
        +String Goodreads_ID
    }
    class Edition{
        +String ISBN
        +String Douban_ID
        +String Goodreads_ID
        +String GoogleBooks_ID
    }

    Item <|-- Movie
    Item <|-- TVShow
    Item <|-- TVSeason
    Item <|-- TVEpisode
    TVShow *-- TVSeason
    TVSeason *-- TVEpisode

    class TVShow{
        +String IMDB_ID
        +String TMDB_ID
    }
    class TVSeason{
        +String Douban_ID
        +String TMDB_ID
    }
    class TVEpisode{
        +String IMDB_ID
        +String TMDB_ID
    }
    class Movie{
        +String Douban_ID
        +String IMDB_ID
        +String TMDB_ID
    }

    Item <|-- Collection

    ExternalResource --* Item
    class ExternalResource {
        +enum site
        +url: string
    }

Add a new site

  • If official API is available for the site, it should be the preferred way to get data.
  • add a new value to IdType and SiteName in catalog/common/models.py
  • add a new file in catalog/sites/, a new class inherits AbstractSite, with:
    • SITE_NAME
    • ID_TYPE
    • URL_PATTERNS
    • WIKI_PROPERTY_ID (not used now)
    • DEFAULT_MODEL (unless specified in scrape() return val)
    • a classmethod id_to_url()
    • a method scrape() returns a ResourceContent object
      • BasicDownloader or ProxiedDownloader can used to download website content or API data. e.g. content = BasicDownloader(url).download().html()
    • check out existing files in catalog/sites/ for more examples
  • add an import in catalog/sites/__init__.py
  • add some tests to catalog/<folder>/tests.py according to site type
    • add DOWNLOADER_SAVEDIR = '/tmp' to settings.py can save all response to /tmp
    • run neodb-manage cat <url> for debugging or saving response file to /tmp. Detailed code of cat is in catalog/management/commands/cat.py
    • move captured response file to test_data/, except large/images files. Or if have to, replace it with a smallest version (e.g. 1x1 pixel / 1s audio)
    • add @use_local_response decorator to test methods that should pick up these responses (if BasicDownloader or ProxiedDownloader is used)
  • run all the tests and make sure they pass
    • Command: neodb-manage python3 manage.py test [--keepdb].
    • See this issue if lxml.etree.ParserError occurs on macOS.
  • add a site UI label style to common/static/scss/_sitelabel.scss