Perl extension for HTML content extractor with scoring heuristics
HTML::ExtractContent is a module for extracting content from HTML with scoring heuristics. It guesses which block of HTML looks like content according to scores depending on the amount of punctuation marks and the lengths of non-tag texts. It also guesses whether content end in the block or continue to the next block.
$
pkg install p5-HTML-ExtractContentOrigin
www/p5-HTML-ExtractContent
Size
50.3KiB
License
ART10, GPLv1+
Maintainer
perl@FreeBSD.org
Dependencies
4 packages
Required by
0 packages