Suramya's Blog : Welcome to my crazy life…

January 21, 2023

Fixing AssertionError: Font Arial,Bold can not represent ‘E’ when using Borb to modify PDF Files

Filed under: Computer Software,Knowledgebase,Tech Related — Suramya @ 12:47 AM

I have a bunch of PDF files that I need to modify to remove text from them. Initially I was using LibreDraw but that was a manual task so I thought that I should script it/Automate it. Little did I know that programmatically editing PDF’s is not that simple. I tried a bunch of libraries such as PyPDF4, pikepdf etc but the only one which worked was borb which is a library by Joris Schellekens. They have a great collection of examples and using that I got my first script that searched and replaced text in the PDF working.

However, when I tried to run the script against my pdf file the script fails with the following error:

Traceback (most recent call last):
  File "/home/suramya/Temp/BorbReplace.py", line 26, in 
    main()
  File "/home/suramya/Temp/BorbReplace.py", line 18, in main
    doc = SimpleFindReplace.sub("Manual", "", doc)
  File "/usr/local/lib/python3.10/dist-packages/borb/toolkit/text/simple_find_replace.py", line 80, in sub
    page.apply_redact_annotations()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/page/page.py", line 271, in apply_redact_annotations
    .read(io.BytesIO(self["Contents"]["DecodedBytes"]), [])
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 290, in read
    raise e
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 284, in read
    operator.invoke(self, operands, event_listeners)
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 271, in invoke
    self._write_chunk_of_text(
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 203, in _write_chunk_of_text
    )._write_text_bytes()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 145, in _write_text_bytes
    return self._write_text_bytes_in_hex()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 160, in _write_text_bytes_in_hex
    assert cid is not None, "Font %s can not represent '%s'" % (
AssertionError: Font Arial,Bold can not represent 'E'

Process finished with exit code 1

I tried a couple of different files and the font name changes but the error remains

The script I was using is:

from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleFindReplace

import typing

def main():

    # attempt to read a PDF
    doc: typing.Optional[Document] = None
    with open("/home/suramya/Downloads/t/MAA1.pdf", "rb") as pdf_file_handle:
        doc = PDF.loads(pdf_file_handle)

    # check whether we actually read a PDF
    assert doc is not None

    # find/replace
    doc = SimpleFindReplace.sub("PRIVATE", "XXXX", doc)

    # store
    with open("/home/suramya/Downloads/t/MAABLR_out.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)


if __name__ == "__main__":
    main()

I searched on the web and didn’t find any solutions so I reached out to the project owner and they responded with the following message “Not every font can represent every possible character in every language. you are trying to insert a piece of text that contains a character that Arial can not represent. Maybe some weird kind of “E” (since uppercase E should not be a problem).”. The problem was that I wasn’t trying to replace any strange characters, just a normal uppercase E.

To help trouble shoot, they asked me for a copy of the file. So I was masking the data in the PDF file to share it and the script suddenly started working. Turns out that there was an extra space after the word PRIVATE in the file and when I removed it things started working (even on the unmasked file). So it looks like the issue is caused when there is an encoding issue with the PDF file. Opening it in Libre Draw and exporting as a new PDF file seems to resolve the issue.

Now we are a step closer to the solution, I just need to figure out how to convert the file from the command line and I will be home free. Something to work on when I have had some sleep.

– Suramya

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress