DOM extraction expression evaluator.
| Name | Description | Default value |
|---|---|---|
evaluator |
HTML parser and selector engine. Possible values: cheerio, browser. Use cheerio if you are running Surgeon in Node.js. Use browser if you are running Surgeon in a browser or headless browser (e.g. PhantomJS). |
cheerio |
Unless redefined, all examples assume the following initialisation:
import surgeon from 'surgeon';
/**
* @param configuration {@see https://github.com/gajus/surgeon#configuration}
*/
const x = surgeon();Note:
For simplicity, strict-equal operator (
===) is being used to demonstrate deep equality.
The default behaviour of a query is to match a single node and extract value of the textContent property.
const document = `
<div class="title">foo</div>
`;
x('.title')(document) === 'foo';
x('.title {1}[0]')(document) === 'foo';
x('.title {0,1}[0]')(document) === 'foo';
x('.title {1,1}[0]')(document) === 'foo';To extract multiple nodes, you need to specify a quantifier expression.
const document = `
<div class="title">foo</div>
<div class="title">bar</div>
<div class="title">baz</div>
`;
const result = x('.title {0,}')(document);
result === [
'foo',
'bar',
'baz'
];Surgeon queries can be nested. Result of the parent query becomes the root element of the descending query.
const document = `
<article>
<div class='title'>foo title</div>
<div class='body'>foo body</div>
</article>
<article>
<div class='title'>bar title</div>
<div class='body'>bar body</div>
</article>
`;
const result = x('article {0,}', {
body: x('.body'),
title: x('.title')
})(document);
result === [
{
body: 'foo body',
title: 'foo title'
},
{
body: 'bar body',
title: 'bar title'
}
];Validation is performed using regular expression.
const document = `
<div class="title">foo</div>
`;
x('.title', /foo/)(document) === 'foo';If the regular expression does not match the data, an InvalidDataError error is thrown (see Handling errors).
A quantifier expression is used to assert that the query matches a set number of nodes.
The default quantifier expression value is {1}.
| Name | Syntax |
|---|---|
| Fixed quantifier | {n} where n is an integer >= 1 |
| Greedy quantifier | {n,m} where n >= 0 and m >= n |
| Greedy quantifier | {n,} where n >= 0 |
| Greedy quantifier | {,m} where m >= 1 |
If this looks familiar, its because I have adopted the syntax from regular expression language. However, unlike in regular expression, a quantifier in the context of Surgeon selector will produce an error (UnexpectedResultCountError) if selector result count is out of the quantifier range.
.title {1}
.title {0,1}
.title {0,}An accessor expression can be used to return a single item from an array of matches. An accessor expression must precede a quantifier expression.
The default accessor expression value is [0]. The default applies only if a quantifier expression is not specified. If a quantifier expression is specified, then by default all matches are returned.
[n] where n is a zero-based index.
.title {1}[0]An attribute selector is used to select a value of an HTMLElement attribute.
@n where n is the attribute name.
.title@data-idA property selector is used to select a value of an HTMLElement property.
@.n where n is the property name.
.title@.textContentThere are many errors that Surgeon can throw. Use instanceof operator to determine the error type.
| Name | Description |
|---|---|
NotFoundError |
Thrown when an attempt is made to retrieve a non-existent attribute or property. |
UnexpectedResultCountError |
Thrown when a quantifier expression is not satisfied. |
InvalidDataError |
Thrown when a resulting data does not pass the validation. |
Example:
import {
InvalidDataError
} from 'surgeon';
const document = `
<div class="title">foo</div>
`;
try {
x('.title', /bar/)(document);
} catch (error) {
if (error instanceof InvalidDataError) {
// Handle data validation error.
} else {
throw error;
}
}Surgeon is using debug to provide additional debugging information.
To enable Surgeon debug output run program with a DEBUG=surgeon:* environment variable.
x-ray is a web scraping library.
The primary difference between Surgeon and x-ray is that Surgeon does not implement HTTP request layer. I consider this an advantage for the reasons that I have described in the following x-ray issue.