Home/www/p5-HTML-ExtractContent

p5-HTML-ExtractContent

0.12_1www

Perl extension for HTML content extractor with scoring heuristics

HTML::ExtractContent is a module for extracting content from HTML with scoring heuristics. It guesses which block of HTML looks like content according to scores depending on the amount of punctuation marks and the lengths of non-tag texts. It also guesses whether content end in the block or continue to the next block.

$pkg install p5-HTML-ExtractContent

metacpan.org/release/HTML-ExtractContent ↗

Origin

www/p5-HTML-ExtractContent

Size

50.3KiB

License

ART10, GPLv1+

Maintainer

perl@FreeBSD.org

Dependencies

4 packages

Required by

0 packages

Dependencies (4)

perl5 p5-HTML-Parser p5-Exporter-Lite p5-Class-Accessor-Lvalue