An open-(source, science) tool to extract tables from PDFs into Excels
▻https://hackernoon.com/an-open-source-science-tool-to-extract-tables-from-pdfs-into-excels-3ed3
I originally wrote this post for my website.Photo by Patrick Tomasso on UnsplashBorrowing the first three paragraphs from my previous blog post since they perfectly explain why extracting tables from PDFs is hard.The PDF (Portable Document Format) was born out of The Camelot Project to create “a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks”. Basically, the goal was to make documents viewable on any display and printable on any modern printer. PDF was built on top of PostScript (a page description language), which had already solved this “view and print anywhere” problem. PDF encapsulates the components required to create a “view and print anywhere” document. These include characters, fonts, graphics and (...)