QHtmlParser  0.0.1
A Qt/C++ library for parsing and traversing/searching HTML documents.

Introduction

QHtmlParser is a Qt/C++ library for parsing, traversing and searching HTML documents.

QHtmlParser provides the following classes:

Name Description
QHtmlAttribute Represents an individual HTML attribute with a name and value.
QHtmlAttributeMatch Used for performing a match against the attributes of an element.
QHtmlDocument Used for loading and parsing a HTML document.
QHtmlElement Represents an individual HTML element/tag in a document.

Usage

Before the elements of a HTML document can be accessed, the document must first be parsed using QHtmlDocument. This can be done by passing the document to the QHtmlDocument constructor, or by using the setContent() method. The document can be in the form of a QByteArray, QString or a QIODevice that is open for reading, e.g:

QFile file("doc.html");
file.open(QFile::ReadOnly);
QHtmlDocument document;
const bool ok = document.setContent(&file);
if (ok) {
// traverse document
}
else {
// handle error
}

Once the document is parsed, the elements and attributes of the document may be accessed using the methods of the QHtmlElement class. In addition to basic traversal, QHtmlElement has several methods that allow you to search for elements by tag name and attributes. The following code snippet shows a search for 'div' elements with a 'class' attribute equal to 'foo' or 'bar':

const QHtmlElement html = document.htmlElement();
const QHtmlAttributeMatch match1("class", "foo");
const QHtmlAttributeMatch match2("class", "bar");
const QHtmlAttributeMatches matches = QHtmlAttributeMatches() << match1 << match2;
const QHtmlElementList divs = html.elementsByTagName("div", matches, QHtmlParser::MatchAny);
foreach (const QHtmlElement &div, divs) {
// process element
}

Source Code

The source code can be found at GitHub.