Search: [pdf] - shaarliGor

OCRmyPDF documentation https://ocrmypdf.readthedocs.io/en/latest/

Tue Jul 2 08:53:30 2024

Un outil en ligne de commande pour ajouter la reconnaissance de texte à un PDF. Utilise notamment tesseract.

parse_toc.py https://gist.github.com/tilusnet/407cd845a6b1cb939b34

Thu Jun 2 10:54:45 2016

Script python pour extraire le sommaire d'un PDF (s'il est structuré, à savoir ce qui se nomme assez imparfaitement signets dans Adobe Acrobat). On donne en paramètre le fichier PDF et le niveau de profondeur souhaité de la table.

Fonctionne avec python 2.7 et utilise pdfminer, ce qui se fait très bien dans un virtualenv.

GitBook Toolchain Documentation http://toolchain.gitbook.com/

Tue Apr 12 11:37:31 2016

La documentation de la GitBook Toolchain que l'on peut considérer comme une suite éditoriale basée sur Markdown et git.

L'intérêt, en plus d'apprendre à mieux utiliser le service Web GitBook, c'est qu'on y trouve la méthode pour installer GitBook en local (pas simplement l'éditeur de GitBook, mais l'outil gitbook-cli).

Convertir des ODT en PDF récursivement http://www.cyrille-borne.com/forum/showthread.php

Thu Dec 10 08:03:53 2015

Sur le forum de cyrille borne.

Le script :

#!/bin/bash
REPER="$1"
CIBLE=".odt"
EXTS="pdf"

for i in "$(find "$REPER" -depth -type d )" ; do
    result_pdf="$(basename "$i")"    
# si on veut utiliser unoconv à la place de libreoffice    
# unoconv -f pdf ${i}/*$CIBLE ${result_pdf}$EXTS

libreoffice --headless --convert-to pdf *"$i"/*$CIBLE "$result_pdf"$EXTS --outdir "$i"/"$result_pdf"

Tabula: Extract Tables from PDFs http://tabula.technology/

Wed Nov 25 09:24:59 2015

Un outil pour extraire les data, sous forme de tableaux, d'un PDF. Ne fonctionne pas avec des documents numérisés.

Ce n'est pas un traitement complètement automagic : il faut sélectionner la table à la main. On peut reproduire la zone sélectionnée sur les pages suivantes.

Dans certains cas c'est une aide appréciable.

Le code est sûr github.

Hidden online surveillance: What librarians should know to protect their own privacy and that of their patrons | Fortier | Information Technology and Libraries http://ejournals.bc.edu/ojs/index.php/ital/article/view/5495

Fri Sep 25 06:44:25 2015

Abstract

Librarians have a professional responsibility to protect the right to access information free from surveillance. This right is at risk from a new and increasing threat: the collection and use of non-personally identifying information such as IP addresses through online behavioral tracking. This paper provides an overview of behavioral tracking, identifying the risks and benefits, describes the mechanisms used to track this information, and offers strategies that can be used to identify and limit behavioral tracking. We argue that this knowledge is critical for librarians in two interconnected ways. First, librarians should be evaluating recommended websites with respect to behavioral tracking practices to help protect patron privacy; second, they should be providing digital literacy education about behavioral tracking to empower patrons to protect their own privacy online.

pdf : http://ejournals.bc.edu/ojs/index.php/ital/article/download/5495/pdf

coolwanglu/pdf2htmlEX · GitHub https://github.com/coolwanglu/pdf2htmlEX

Mon Aug 10 06:09:28 2015

pdf2htmlEX converti les PDF en HTML.

via http://blog.sciunto.org//posts/don_pdf2html2EX/

GoldenGATE Editor (& Imagine) - Plazi http://plazi.org/wiki/GoldenGATE_Editor

Thu Jun 4 16:18:10 2015

Ce qui m'intéresse plus dans cette page, c'est la partie sous Imagine, qui contient un Parser pour PDF qui vaut la peine d'être testé !

Antoine de Saint-Exupery | … aux merveilles du domaine public http://www.saintexupery-domainepublic.be/

Tue Mar 31 06:44:22 2015

Des versions numériques (ePub, PDF, TXT) du Petit Prince de St-Exupery, parce qu'il est désormais dans le domaine public (sauf en France, voir : http://romainelubrique.org/telecharger-petit-prince-en-belgique ).

The Markdown Resume http://mszep.github.io/pandoc_resume/

Sun Aug 10 08:26:07 2014

De quoi réaliser son CV en markdown et le convertir en html et pdf.

Les sources : https://github.com/mszep/pandoc_resume

PDFy - Instant PDF Host https://pdf.yt/

Fri Jun 20 17:44:19 2014

Une réaction saine à l'infâme SCRIBD : de quoi héberger un PDF sans se faire de compte, le téléchargement est toujours possible, le embedded code aussi, basé sur pdf.js

Et le code source est public, libre : https://github.com/joepie91/pdfy

Inclure des PDF sans plugin sur votre site avec PDF.js (créé par Mozilla) - Influence PC http://influence-pc.fr/14-03-2014-inclure-des-pdf-sur-votre-site-avec-pdf-js-cree-par-mozilla

Fri Mar 14 13:35:41 2014

Intégrer un lecteur pdf dans votre page web.

pdf2json - PDF2JSON is a conversion library specialized in converting PDF to XML and JSON format. - Google Project Hosting https://code.google.com/p/pdf2json/

Wed Feb 19 15:18:44 2014

librairie pour convertir du PDF vers du JSON. Licence GNU GPL2.
[il faut encore voir si le fichier obtenu est utilisable...]

[PDF] BEAMER appearance cheat sheet (from version 3.26) http://www.cpt.univ-mrs.fr/~masson/latex/Beamer-appearance-cheat-sheet.pdf

Mon Oct 14 13:18:45 2013

Une CheatSheet pour Beamer (LaTeX)

Tableau de contrôle du SGQRI 008-01 abrégé | AccessibilitéWeb http://accessibiliteweb.com/tableau-de-controle-du-sgqri-008-01-abrege/

Tue Jul 16 06:48:09 2013

Compilation des règles d'accessibilité d'un site web, sous forme d'aide mémoire PDF.

Word 2013 - Fou à lier http://foualier.gregory-thibault.com/?Wcg22A

Fri Jul 5 18:39:42 2013

Message personnel à Fou à lier : tu n'as pas dû bien regarder ce que fait LibreOffice d'un PDF. Il l'ouvre et tu l'édites. Je pense que c'est depuis la version 3.6, sans en être certain non plus.
Mieux : tu peux faire un PDF hybride, avec dedans le PDF et de quoi l'ouvrir à nouveau dans LibreOffice pour l'éditer, comme si c'était un ODT.
Et en plus LibreOffice est libre, multiplateforme, tout ça...

Pandoc http://pandoc.org/

Sun Feb 3 14:09:50 2013

Un logiciel capable de convertir un fichier texte, markdown, textile, html, LaTeX, etc. en html, html5, docx, odt, opendocument XML, LaTeX, ePub, PDF, etc.
Impressionnant.

PDFreaders.org - Procurez vous un lecteur de PDF Libre! http://pdfreaders.org/index.fr.html

Mon Sep 3 13:01:08 2012

Reference Model for an Open Archival Information System (OAIS) http://public.ccsds.org/publications/archive/650x0m2.pdf

Fri Jun 22 13:42:41 2012

Version révisée. Publiée le 14 juin 2012.
"This Recommended Practice defines the Reference Model for an Open Archival Information System (OAIS). The current issue includes clarifications to many concepts, in particular, Authenticity with the concept of Transformational Information Property introduced; corrections and improvements in diagrams; addition of Access Rights Information to PDI."

Science Policy Briefings : European Science Foundation http://www.esf.org/publications/science-policy-briefings

Thu Apr 26 09:33:39 2012

Documentation proposée par la ESF

Links per page

Filters