mirror of
https://git.freebsd.org/ports.git
synced 2025-05-21 03:23:10 -04:00
is done in C using a variant of the gumbo parser. The gumbo parse tree is then transformed into an lxml tree, also in C, yielding parse times that can be a thirtieth of the html5lib parse times. That is a speedup of 30x. This differs, for instance, from the gumbo python bindings, where the initial parsing is done in C but the transformation into the final tree is done in python. WWW: https://html5-parser.readthedocs.io/
9 lines
494 B
Text
9 lines
494 B
Text
A fast implementation of the HTML 5 parsing spec for Python. Parsing
|
|
is done in C using a variant of the gumbo parser. The gumbo parse
|
|
tree is then transformed into an lxml tree, also in C, yielding
|
|
parse times that can be a thirtieth of the html5lib parse times.
|
|
That is a speedup of 30x. This differs, for instance, from the gumbo
|
|
python bindings, where the initial parsing is done in C but the
|
|
transformation into the final tree is done in python.
|
|
|
|
WWW: https://html5-parser.readthedocs.io/
|