Introducing the liveblog-indexer gem

require 'liveblog-indexer'

lbi = LiveBlogIndexer.new filepath: '/tmp', \
                word_index: 'words_indexed.json', url_index: 'indexed.json'
lbi.crawl 'https://www.jamesrobertson.eu/liveblog/2015/nov/13/formatted.xml'


This gem can not only index a single liveblog page it can crawl over all Liveblog pages. It does this by reading each formatted.xml which contains a link to the previous page. Within a page, it scans each section for words to index, while ignoring specific HTML elements including pre, code, and time.

Note: There are 2 files which are saved in JSON format, those are words_indexed.json and urls_indexed.json. The filepath and the filenaes can be set at initialize().

Resources