How to remove a watermark word 102/25/2023 '/Creator': 'Acrobat PDFMaker 11 für Excel'.When I look at one of the PDFs in this question also written in German, I note that it was created using different apps: When I look at the underlying info of your PDF I note that it was created using these apps: This encoding model isn't working with your PDF. The German characters within your document should be able to be extracted using PyPDF2, because it uses the latin-1 (iso-8859-1) encoding/decoding model. Your issue seems to be related to the encoding in the document. The garbled text issue that you're having has nothing to do with the watermark in the document. I mean, maybe this problem can be fixed in some other way, maybe the problem is not in that watermark/logo? Is there a way to remove watermark from page or something like that? My question is, how can I fix this problem? This is the result that I'm getting: #$%˘˘ Page_content = page_content.replace("\n\n\n", "\n").strip() Page_content = read_pdf.getPage(page).extractText() Pdf_file_text = 'PDF File: ' pdf_link '\n\n'įor page in range(read_pdf.getNumPages()): I think thats because PDF has watermark over the page so it does not recognise the text: import requests I have wrote a code that extracts the text from PDF file with Python and PyPDF2 lib.Ĭode works good for most docs but sometimes it returns some strange characters.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |