Do you have GDPR compliance issues ?

Check out Legiscope a GDPR compliance software, that will save you weeks of work, automating your documentation, the training of your teams and all processes you need to keep your organisation compliant with privacy regulations

Py-pdfminer.six

Jul 20, 2023

PDF parser and analyzer

We fathom PDF

Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the sourcecode of the PDF. It can also be used to get the exact location, font or color of the text.

It is built in a modular way such that each component of pdfminer.six can be replaced easily. You can implement your own interpreter or rendering device that uses the power of pdfminer.six for other purposes than text analysis.

Features

Parse, analyze, and convert PDF documents.
PDF-1.7 specification support. well, almost
CJK languages and vertical writing scripts support.
Various font types Type1, TrueType, Type3, and CID support.
Basic encryption RC4 support.
Outline TOC extraction.
Tagged contents extraction.
Automatic layout analysis.

Checkout these related ports:

Zxing-cpp - ZXing C++ Library for QR code recognition
Zu-hunspell - Zulu hunspell dictionaries
Zu-aspell - Aspell Zulu dictionary
Zq - Easier and faster alternative to jq
Zorba - General purpose C++ XQuery processor
Zenxml - Simple C++ XML Processing
Zed - Command-line tool to manage and query Zed data lakes
Yq - Command-line YAML and XML processor, jq wrapper for YAML/XML documents
Yould - Pronounceable word generator
Yodl - Easy to use but powerful document formatting/preparation language
Yi-hunspell - Yiddish hunspell dictionaries
Yi-aspell - Aspell Yiddish dictionary
Yelp-xsl - DocBook XSLT stylesheets for yelp
Yelp-tools - Utilities to help manage documentation for Yelp and the web
Ydiff - Diff readability enhancer for color terminals

RECENT POSTS

Do you have GDPR compliance issues ?

Py-pdfminer.six

PDF parser and analyzer

Checkout these related ports: