Traversing XML using REXML or Nokogiri

This is an old snippets article I wrote around 2010, yet it's still relevant today.

To traverse an XML document you would typically implement your own recursive method however with REXML or Nokogiri this method is built-in. In REXML use the method Document::root#each_recursive and in Nokogiri Nokogiri::XML::root#traverse where Nokogiri::XML is the document.

REXML:

require 'rexml/document'
include REXML

xml =<<XML
<root>
  <summary><task>clean bathroom</task></summary>
  <records>
    <x>
      <summary><task>clean bath</task></summary>
      <records>
        <x><summary><task>rinse surfaces with water</task></summary><records/></x>
        <x><summary><task>apply pine fresh flash cleaner</task></summary><records/></x>
      </records>
    </x>
    <x>
      <summary><task>clean sink</task></summary>
      <records>
        <x><summary><task>remove items around the sink</task></summary><records/></x>
        <x><summary><task>using a sponge apply warm water to the sink surfaces</task></summary><records/></x>
      </records>
    </x>
    <x><summary><task>mop the floor</task></summary>records/></x>
  </records>
</root>
XML

document = Document.new(xml)

titles = []
document.root.each_recursive do |elem|
  titles << elem.text.to_s if elem.name == "task"
end

puts titles

output:

clean bathroom
clean bath
rinse surfaces with water
apply pine fresh flash cleaner
clean sink
remove items around the sink
using a sponge apply warm water to the sink surfaces
mop the floor

Nokogiri:

require 'nokogiri'

document = Nokogiri::XML(xml)
titles = []
document.root.traverse do |elem|
  titles << elem.content if elem.name == "task"
end

puts titles

output:

clean bathroom
clean bath
rinse surfaces with water
apply pine fresh flash cleaner
clean sink
remove items around the sink
using a sponge apply warm water to the sink surfaces
mop the floor

Resources

Tags:
Source:
1800hrs.txt
Published:
16-09-2014 18:00