RECENT POSTS

P5-html-extractmain

May 26, 2018

Perl extension to extract main content of a web page

HTMLExtractMain is a module which takes HTML content, and uses the Readability algorithm to detect the main body of the page, usually skipping headers, footers, navigation, etc.

WWW http//search.cpan.org/dist/HTML-ExtractMain/