This class lets you perform SAX
style parsing on HTML
with HTML
error correction.
Here is a basic usage example:
class MyDoc < Nokogiri::XML::SAX::Document def start_element name, attributes = [] puts "found a #{name}" end end parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new) parser.parse(File.read(ARGV[0], mode: 'rb'))
For more information on SAX
parsers, see Nokogiri::XML::SAX
# File lib/nokogiri/html/sax/parser.rb, line 51 def parse_file filename, encoding = 'UTF-8' raise ArgumentError unless filename raise Errno::ENOENT unless File.exist?(filename) raise Errno::EISDIR if File.directory?(filename) ctx = ParserContext.file(filename, encoding) yield ctx if block_given? ctx.parse_with self end
Parse a file with filename
# File lib/nokogiri/html/sax/parser.rb, line 41 def parse_io io, encoding = 'UTF-8' check_encoding(encoding) @encoding = encoding ctx = ParserContext.io(io, ENCODINGS[encoding]) yield ctx if block_given? ctx.parse_with self end
Parse given io
# File lib/nokogiri/html/sax/parser.rb, line 31 def parse_memory data, encoding = 'UTF-8' raise ArgumentError unless data return unless data.length > 0 ctx = ParserContext.memory(data, encoding) yield ctx if block_given? ctx.parse_with self end
Parse html stored in data
using encoding
© 2008–2018 Aaron Patterson, Mike Dalessio, Charles Nutter, Sergio Arbeo,
Patrick Mahoney, Yoko Harada, Akinori MUSHA, John Shahid, Lars Kanis
Licensed under the MIT License.