Seenthis
•
 
Identifiants personnels
  • [mot de passe oublié ?]

 
  • #p
  • #python
RSS: #python_3

#python_3

  • @hackernoon
    Hacker Noon @hackernoon CC BY-SA 26/11/2018
    3
    @sandburg
    @touti
    @suske
    3

    An open-(source, science) tool to extract tables from PDFs into Excels
    ▻https://hackernoon.com/an-open-source-science-tool-to-extract-tables-from-pdfs-into-excels-3ed3

    https://cdn-images-1.medium.com/max/1024/0*8YsOjqB-FQPkCAlY.png

    I originally wrote this post for my website.Photo by Patrick Tomasso on UnsplashBorrowing the first three paragraphs from my previous blog post since they perfectly explain why extracting tables from PDFs is hard.The PDF (Portable Document Format) was born out of The Camelot Project to create “a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks”. Basically, the goal was to make documents viewable on any display and printable on any modern printer. PDF was built on top of PostScript (a page description language), which had already solved this “view and print anywhere” problem. PDF encapsulates the components required to create a “view and print anywhere” document. These include characters, fonts, graphics and (...)

    Hacker Noon @hackernoon CC BY-SA
    • @suske
      Suske @suske 2/12/2018

      #excalibur #python_3 #pip #pdf #tables_de_donnees #tableaux

      Suske @suske
    Écrire un commentaire