Suramya's Blog : Welcome to my crazy life…

March 6, 2024

Researchers demo the first worm that spreads through LLM prompt injection

Filed under: Artificial Intelligence,Computer Security,Computer Software — Suramya @ 10:17 PM

In the past year we have seen an uptick in the tech industry looking towards embedding LLM (Large Language Models) or AI as they are being pitched to the world in all possible places. Windows 11 now has built in Copilot that is extremely hard to disable. Email systems are using LLM’s to get additional details/information using the data from the email to add context etc. This creates new attack surfaces that attackers can target and we have seen instances where attackers have used prompt injection to gain access to data or systems that were restricted.

Building on top of that researchers have now created (and demo’d) the first worm that spreads through prompt injection. This is breakthrough work similar to how the Morris Worm was in the late 80’s. Basically, researchers created an email which has an adversarial prompt embedded in it. This prompt is then ingested by an LLM (using Retrieval-Augmented Generation which allows it to enhance the reliability of the LLM by fetching data from external sources when the email is processed by the LLM) where it jailbreaks the GenAI service and can steal data from the emails (or do whatever else the attacker wants such as changing email text, removing data etc). In addition the prompt also has the ability to make the email assistant forward the email with the malicious prompt to other email addresses allowing it to spread. The researchers have christened their worm as Morris II giving homage to the first email worm.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

This is pretty fascinating work and I think that this kind of attack will start becoming more common as the LLM usage goes up. The research paper is available at: ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

– Suramya

March 1, 2024

If buying isn’t owning, then piracy isn’t stealing

Filed under: Computer Software,My Thoughts — Suramya @ 12:27 PM

Nowadays it is quite common for folks to move from having physical copies of books, movies, TV series etc to having digital copies of the same. I like having digital versions of things because they are easier to store but that brings a whole lot of problems. Basically having an electronic item specially something that is hosted on a service or needs an external system to approve your right to open that file/game/book then it means that you don’t really own that item. The service can arbitrarily decide to remove it from their system, alter it without telling you or decide to reduce functionality and make you pay more for something that you already paid for.

We have so many examples of this happening, such as services removing movies that you bought from your systems because they figured it was more cost effective to not renew rights to the movie. Amazon Music has removed music that was paid for from their system or have changed it. Books were removed from Kindle by amazon when they felt it needed to be inspite of the users having purchased it.

Earlier this week in example no 400035 that shows that we don’t really own the digital content we ‘bought’, Sony deleted content that they had promised would be there forever with little to no recourse for the users to get their content back when though they had paid for it.

Funimation, a Sony-owned streaming service for anime, recently announced that subscribers’ digital libraries on the platform will be unavailable after April 2. For years, Funimation had been telling subscribers that they could keep streaming these digital copies of purchased movies and shows, but qualifying it: “forever, but there are some restrictions.”

But soon, people who may have discarded or lost their physical media or lack a way to play DVDs and Blu-rays won’t have a way to access the digital copies that they were entitled to through their physical copy purchase.

A little while ago Philippe Tremblay, director of subscriptions at Ubisoft made a comment that gamers need to get used to the idea they don’t own their games anymore and embrace digital downloads. This is absolutely ridiculous and should not be normalized. If I own something I should be able to do whatever I want with it. Unfortunately that is not the case because the content is protected by DRM (Digital Rights Management) which is supposed to be a tool to prevent piracy but instead is a tax or punishment for doing the right thing and buying content legally.

If I pirate a movie or a book I can do whatever I want with it and watch it wherever I want or convert it to another format that is easier for me to consume (mobi->epub for books as an example) But when I legally buy something the DRM on it stops me from doing the same thing as it is a felony for me to remove the DRM so that I can access stuff I paid for in a way that is convenient for me.

Before streaming services and digital stores became popular, at times the only way to get content was to pirate it. To give an example, back in 2007/2008 books by most of the authors I like were not available in India so if I wanted to read a book I would have to buy it from Amazon and have it shipped to India. Amazon used to charge $10 PER book to ship it to India at that time even if you ordered multiple books and paying that for a book that costed $7 made absolutely no sense. Same was the case with movies and tv series. With streaming and digital media taking off, I can now buy a digital book when it is released or watch a new TV series when it is launched legally. Now with this nonsense of deleting stuff that people have bought, we need to start keeping copies of all the stuff we buy offline so that I still have access to what I paid for even when a corporation decides that it is more cost effective to delete/remove access to it.

Source: Techdirt.com: Here We Go Again: Sony Disappears Digital Content That Was Pitched To Customers As ‘Forever’

– Suramya

October 17, 2023

Best Support response times and quality I have seen is from the WordPress Activitypub team

Filed under: Computer Software,My Thoughts,Tech Related — Suramya @ 10:49 PM

I have been using Open Source since I found out about it back in 1999. At present majority of the software I have running on my system is opensource with a few notable exceptions such as Microsoft Word (Libreoffice still has formatting issues) and CrossOver by Code Weavers (that allows me to run Windows software on Linux) and a few games that I don’t get to play enough. Which means that I have considerable experience with the support offered by the various opensource projects. The support ranges from RTFM, no responses to questions or detailed responses from the team/users.

Out of all the projects that I have reached out for support the most fantastic & the fastest support response has been from Matthias Pfefferle (German Site) from the wordpress-activitypub project. I have raised multiple tickets with the project and have always gotten a quick (Fastest response in 2mins!!!), detailed and helpful response to my questions. For the issues I raised, some of them required a code fix and a fix was released within days. I don’t think I have received such a fantastic response even from sites/projects where I am a paying subscriber.

Anyways, we always post about the bad experiences we have so I think that we should also take time to post about the fantastic experiences and people we interact with because there is way too much negative news out there and these small things can help bring a smile to someones face and make sure they know that their hard work is appreciated.

If you run a WordPress Blog (self-hosted) you should definitely install this plugin and federate your posts to Mastodon (and the rest of the fediverse).

– Suramya

October 9, 2023

Microsoft AI responds with absolute nonsense when asked about a prominent Cyber Security expert

Filed under: Artificial Intelligence,Computer Software — Suramya @ 11:39 PM

The more I read about the Microsoft implementation of ‘AI’ the more I wonder what on earth are they thinking? Their AI system is an absolute shambles and about 99% of the output is nonsense. See the example below:

I did not realise how inaccurate Microsoft's Al is. It's really bad
Microsoft AI returns absolute nonsense when asked about who Kevin Beaumont is

I did not realise how inaccurate Microsoft’s Al is. It’s really bad This is just one example – it lists a range of lawsuits I’ve filed, but they’re all fictional – it invented them and made up the citations. It says I gave Microsoft’s data to @briankrebs. It says Krebs is suing me. It says @malwaretech works for me. The list goes on and on. Very eyebrow raising this is being baked into next release of Windows 11 and Office. It will directly harm people who have no knowledge or recourse.

I mean I can understand if it got one or two facts wrong because the data sources might not be correct, but to get every single detail wrong requires extra skill. The really scary part is that Google AI search is not much better and both companies are in a race to replace their search engine with AI responses. Microsoft is going a step further and including it as a default option in Windows. I wonder how much of the user data being stored on a windows computer is being used to train these AI engines.

There needs to be an effort to create a search engine that filters out these AI generated responses and websites to go back to the old style search engines that actually returned useful & correct results.

– Suramya

August 29, 2023

Excel holding up the Global Financial System, now with Python support

Filed under: Computer Security,Computer Software,My Thoughts,Tech Related — Suramya @ 1:12 PM

It is both impressive and scary how much of the world’s financial systems is being run using Microsoft Excel. Folks have created formulars/macros/scripts/functions etc in Excel that allows them to generate data that is used to take major financial decisions with real world impact.

In one of my previous companies we actually had a full discussion on how to get an inventory of all the Excel code in use at the company and how to archive it so that we have backups and version control on them. Unfortunately, I left before much headway was made but I did learn enough about excel use to scare me. (Especially since I am not the biggest fan of Microsoft software 😉 )

Now you might ask why so many people are using excel when there are better tools available in the market and these companies have inhouse teams to create custom software for the analyst and I asked the exact same questions when I started. I think it is probably because the tool makes it easy for folks to come up with formulas and scripts that get their work done instead of having to wait for an external team to make the changes etc that they need.

Now, a few days ago Microsoft made a surprise announcement that going forward they are going to support running Python inside an Excel file. Yikes!! In order to use this functionality you will need to be part of the Microsoft 365 Insider program and then you can type Python code directly into cells using the new =PY() function, which then gets executed in the cloud. From what I have read, this will be enabled by default and needs to be disabled via a registry key.

Since its inception, Microsoft Excel has changed how people organize, analyze, and visualize their data, providing a basis for decision-making for the millions of people who use it each day. Today we’re announcing a significant evolution in the analytical capabilities available within Excel by releasing a Public Preview of Python in Excel. Python in Excel makes it possible to natively combine Python and Excel analytics within the same workbook – with no setup required. With Python in Excel, you can type Python directly into a cell, the Python calculations run in the Microsoft Cloud, and your results are returned to the worksheet, including plots and visualizations.

We already have issues with Excel Macros being used as vectors for malware & viruses, this just opens a whole new front in that war. Now, admins will have to worry about attackers using Python in Excel to infiltrate the organization or to send data outside the org. I can see how it is useful for people working with datasets and MS is adding this functionality to keep up with other tools such as Tableau etc which are more powerful but still I feel that this is a bad move.

Another problem that folks are going to face is that now your Excel sheets have Python programs inside them, how are we supposed to version the code, how is code review done? Basically this code should be going through the standard SDLC (Software Development Life Cycle) process but wouldn’t. We also need to ensure that all changes are reviewed and monitored to protect against insider attacks but the way the system is setup this is going to be extremely difficult (We have already seen that with Macros and Formulas etc).

Lets see how folks address this risk profile.

– Suramya

August 21, 2023

Workaround for paste not working in Whatsapp web on Firefox

Filed under: Computer Software,Knowledgebase — Suramya @ 10:40 PM

In the latest version of Firefox (116.0.3) something changed and due to this if you are using the Web version of Whatsapp you are no longer able to paste anything into the chat window. Ctrl-V, Right-Click -> Paste, nothing works. The issue exists on both Windows and Linux builds of Firefox. Did a search and found that others are facing the same issue and found a work around at Superuser.com: Firefox doesn’t allow to paste into WhatsApp web anymore?

In order to fix follow these steps:

  • Go to about:config
  • Search for ‘dom.event.clipboardevents.enabled’ (without the quotes)
  • Set the value to false by double clicking on the entry

The change takes effect immediately. This is not a permanent fix and once Firefox fixes the issue you should revert the change.

– Suramya

June 28, 2023

Please stop shoving ChatGPT Integration into products that don’t need it

I am getting really tired of folks shoving ChatGPT integration into everything whether it makes sense or not. The latest silliness is an electric bike with ChatGPT integration. I understand the desire to integrate GPS/Maps etc in a bike, although personally I would rather use an independent device which would get updates more frequently than the built in GPS where the maps might get updated a few times a year. Unless the maps are getting downloaded live using 3G/4G/whatever. I even understand the desire to integrate voice recognition in the setup so that the user can talk to it. But why on earth do I want/need to have ChatGPT shoved in there?

Based on ChatGPT’s well known tendency to hallucinate there is a good probability that it might decide that you should take a path that is not safe or even dump you into the ocean because it hallucinated that it was the way to go. This is the same thing we saw with Blockchain a few years ago, everything was suddenly on the Blockchain whether it needed to be or not. The sad part is that these folks are going to make a ton of money because of the hype behind ChatGPT and then bail leaving the consumers with a sub-par bike that hallucinates.

Source: Urtopia Unveils the World’s First Smart E-Bike with ChatGPT Integration at EUROBIKE 2023

– Suramya

February 27, 2023

It is now possible to put undetectable Backdoors in Machine Learning Models

Filed under: Computer Software,Emerging Tech,My Thoughts,Tech Related — Suramya @ 10:18 PM

Machine Learning (ML) has become the new go to buzzword in the Tech world in the last few years and everyone seems to be focusing on how they can include ML/AI in their products, regardless of whether it makes sense to include or not. One of the bigest dangers of this trend is that we are moving towards a future where an algorithm would have the power to make decisions that have real world impacts but due to the complexity it would be impossible to audit/check the system for errors/bugs, non-obvious biases or signs of manipulation etc. For example, we have had cases where the wrong person was identified as a fugitive and arrested because an AI/ML system claimed that they matched the suspect. Others have used ML to try to predict crimes with really low accuracy but people take it as gospel because the computer said so…

With ML models becoming more and more popular there is also more research on how these models are vulnerable to attacks. In December 2022 researchers (Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan and Or Zamir) from UC Berkely, MIT and Princeton published a paper titled “Planting Undetectable Backdoors in Machine Learning Models” in the IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS) where they discuss how it would be possible to train a model in a way that it allowed an attacker to manipulate the results without being detected by any computationally-bounded observer.

Abstract: Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning has clear benefits, and at the same time raises serious concerns of trust. This work studies possible abuses of power by untrusted learners.We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate “backdoor key,” the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees.

First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given query access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Moreover, even if the distinguisher can request backdoored inputs of its choice, they cannot backdoor a new input­a property we call non-replicability.

Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm (Rahimi, Recht; NeurIPS 2007). In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is “clean” or contains a backdoor. The backdooring algorithm executes the RFF algorithm faithfully on the given training data, tampering only with its random coins. We prove this strong guarantee under the hardness of the Continuous Learning With Errors problem (Bruna, Regev, Song, Tang; STOC 2021). We show a similar white-box undetectable backdoor for random ReLU networks based on the hardness of Sparse PCA (Berthet, Rigollet; COLT 2013).

Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, by constructing undetectable backdoor for an “adversarially-robust” learning algorithm, we can produce a classifier that is indistinguishable from a robust classifier, but where every input has an adversarial example! In this way, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.

Basically they are talking about having a ML model that works correctly most of the time but allows the attacker to manipulate the results if they want. One example use case would be something like the following: A bank uses a ML model to decide if they should give out a loan to an applicant and because they don’t want to be accused of being discriminatory they give it to folks to test and validate and the model comes back clean. However, unknown to the testers the model has been backdoored using the techniques in the paper above so the bank can modify the output in certain cases to deny the loan application even though they would have qualified. Since the model was tested and ‘proven’ to be without bias they are in the clear as the backdoor is pretty much undetectable.

Another possible attack vector is that a nation state funds a company that trains ML models and has them insert a covert backdoor in the model, then they have the ability to manipulate the output from the model without any trace. Imagine if this model was used to predict if the nation state was going to attack or not. Even if they were going to attack they could use the backdoor to fool the target into thinking that all was well.

Having a black box making such decisions is what I would call a “Bad Idea”. At least with the old (non-ML) algorithms we could audit the code to see if there were issues with ML that is not really possible and thus this becomes a bigger threat. There are a million other such scenarios that could be played and if we put blind trust in an AI/ML system then we are setting ourselves up for a disaster that we would never see coming.

Source: Schneier on Security: Putting Undetectable Backdoors in Machine Learning Models

– Suramya

February 21, 2023

Fixing problems with nvidia-driver on Debian Unstable after latest upgrade

Filed under: Computer Software,Linux/Unix Related,Tech Related — Suramya @ 10:54 PM

Earlier today I ran my periodic update of my main desktop that is running Debian Unstable. The upgrade finished successfully and since a new kernel was released with this update I restarted the system to ensure that all files/services etc are running the same version. After the reboot the GUI refused to start and I thought the problem could be because of a NVIDIA kernel module issue so I tried to reboot to an older kernel but that didn’t work either. Then I tried running apt-get dist-upgrade again which gave me the following error:

root@StarKnight:~# apt-get dist-upgrade 
Reading package lists...
Building dependency tree...
Reading state information...
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 nvidia-driver : Depends: nvidia-kernel-dkms (= 525.85.12-1) but 515.86.01-1 is installed or
                          nvidia-kernel-525.85.12 or
                          nvidia-open-kernel-525.85.12 or
                          nvidia-open-kernel-525.85.12
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

So I ran the apt –fix-broken install command as recommended and that failed as well with another set of errors:

root@StarKnight:/var/log# apt --fix-broken install
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Correcting dependencies... Done
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
dpkg: dependency problems prevent configuration of nvidia-driver:
 nvidia-driver depends on nvidia-kernel-dkms (= 525.85.12-1) | nvidia-kernel-525.85.12 | nvidia-open-kernel-525.85.12 | nvidia-open-kernel-525.85.12; however:
  Version of nvidia-kernel-dkms on system is 515.86.01-1.
  Package nvidia-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.

dpkg: error processing package nvidia-driver (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 nvidia-driver
E: Sub-process /usr/bin/dpkg returned an error code (1)

Looking at the logs, I didn’t see any major errors but I did see the following message:

2023-02-21T19:48:27.668268+05:30 StarKnight kernel: [    3.379006] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.86.01  Wed Oct 26 09:12:38 UTC 2022
2023-02-21T19:48:27.668286+05:30 StarKnight kernel: [    4.821755] NVRM: API mismatch: the client has the version 525.85.12, but
2023-02-21T19:48:27.668287+05:30 StarKnight kernel: [    4.821755] NVRM: this kernel module has the version 515.86.01.  Please
2023-02-21T19:48:27.668287+05:30 StarKnight kernel: [    4.821755] NVRM: make sure that this kernel module and all NVIDIA driver
2023-02-21T19:48:27.668288+05:30 StarKnight kernel: [    4.821755] NVRM: components have the same version.

Searching on the web didn’t give me a solution but since I am running the Debian Unstable branch it is expected that once in a while things might break and sometimes they break quite spectacularly… So I started experimenting and tried removing and reinstalling the nvidia-driver but that was failing as well because the package was expecting nvidia-kernel-dkms version 525.85.12 but we had 515.86.01-1 installed.

root@StarKnight:~# apt-get install nvidia-driver
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  nvidia-driver
0 upgraded, 1 newly installed, 0 to remove and 14 not upgraded.
Need to get 0 B/494 kB of archives.
After this operation, 1,398 kB of additional disk space will be used.
Selecting previously unselected package nvidia-driver.
(Reading database ... 439287 files and directories currently installed.)
Preparing to unpack .../nvidia-driver_525.85.12-1_amd64.deb ...
Unpacking nvidia-driver (525.85.12-1) ...
dpkg: dependency problems prevent configuration of nvidia-driver:
 nvidia-driver depends on nvidia-kernel-dkms (= 525.85.12-1) | nvidia-kernel-525.85.12 | nvidia-open-kernel-525.85.12 | nvidia-open-kernel-525.85.12; however:
  Version of nvidia-kernel-dkms on system is 515.86.01-1.
  Package nvidia-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.
  Package nvidia-open-kernel-525.85.12 is not installed.

Now I had a couple of options, first was to wait for a couple of days (if I am lucky) for someone to upload the correct versions of the packages to the channel. The second option was to remove the package and installed the Open Source version of the Nvidia driver. I didn’t want to do that because that package is a memory hog and doesn’t work that well either. The last option was to try to manually install the older version (525.85.12) of the nvidia-kernel-dkms package and this is what I decided to go with, a search on the Debian Packages site gave me the .deb file for nvidia-kernel-dkms and firmware-nvidia-gsp (a dependency for the dkms package). I downloaded both the packages and installed them using the following command:

root@StarKnight:/home/suramya/Media/Downloads# dpkg -i firmware-nvidia-gsp_525.85.12-1_amd64.deb 
root@StarKnight:/home/suramya/Media/Downloads# dpkg -i nvidia-kernel-dkms_525.85.12-1_amd64.deb 

Once the packages were successfully downgraded I rebooted the system and the GUI came up without issues post the reboot.

Moral of the story is that you need to be prepared to have to troubleshoot your setup if you are running Debian Unstable or Debian Testing on your system. If you don’t want to do that then you should stick to Debian Stable which is rock solid or one of the other distributions such as Ubuntu or Linux Mint etc.

– Suramya

January 21, 2023

Fixing AssertionError: Font Arial,Bold can not represent ‘E’ when using Borb to modify PDF Files

Filed under: Computer Software,Knowledgebase,Tech Related — Suramya @ 12:47 AM

I have a bunch of PDF files that I need to modify to remove text from them. Initially I was using LibreDraw but that was a manual task so I thought that I should script it/Automate it. Little did I know that programmatically editing PDF’s is not that simple. I tried a bunch of libraries such as PyPDF4, pikepdf etc but the only one which worked was borb which is a library by Joris Schellekens. They have a great collection of examples and using that I got my first script that searched and replaced text in the PDF working.

However, when I tried to run the script against my pdf file the script fails with the following error:

Traceback (most recent call last):
  File "/home/suramya/Temp/BorbReplace.py", line 26, in 
    main()
  File "/home/suramya/Temp/BorbReplace.py", line 18, in main
    doc = SimpleFindReplace.sub("Manual", "", doc)
  File "/usr/local/lib/python3.10/dist-packages/borb/toolkit/text/simple_find_replace.py", line 80, in sub
    page.apply_redact_annotations()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/page/page.py", line 271, in apply_redact_annotations
    .read(io.BytesIO(self["Contents"]["DecodedBytes"]), [])
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 290, in read
    raise e
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/canvas_stream_processor.py", line 284, in read
    operator.invoke(self, operands, event_listeners)
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 271, in invoke
    self._write_chunk_of_text(
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/redacted_canvas_stream_processor.py", line 203, in _write_chunk_of_text
    )._write_text_bytes()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 145, in _write_text_bytes
    return self._write_text_bytes_in_hex()
  File "/usr/local/lib/python3.10/dist-packages/borb/pdf/canvas/layout/text/chunk_of_text.py", line 160, in _write_text_bytes_in_hex
    assert cid is not None, "Font %s can not represent '%s'" % (
AssertionError: Font Arial,Bold can not represent 'E'

Process finished with exit code 1

I tried a couple of different files and the font name changes but the error remains

The script I was using is:

from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleFindReplace

import typing

def main():

    # attempt to read a PDF
    doc: typing.Optional[Document] = None
    with open("/home/suramya/Downloads/t/MAA1.pdf", "rb") as pdf_file_handle:
        doc = PDF.loads(pdf_file_handle)

    # check whether we actually read a PDF
    assert doc is not None

    # find/replace
    doc = SimpleFindReplace.sub("PRIVATE", "XXXX", doc)

    # store
    with open("/home/suramya/Downloads/t/MAABLR_out.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)


if __name__ == "__main__":
    main()

I searched on the web and didn’t find any solutions so I reached out to the project owner and they responded with the following message “Not every font can represent every possible character in every language. you are trying to insert a piece of text that contains a character that Arial can not represent. Maybe some weird kind of “E” (since uppercase E should not be a problem).”. The problem was that I wasn’t trying to replace any strange characters, just a normal uppercase E.

To help trouble shoot, they asked me for a copy of the file. So I was masking the data in the PDF file to share it and the script suddenly started working. Turns out that there was an extra space after the word PRIVATE in the file and when I removed it things started working (even on the unmasked file). So it looks like the issue is caused when there is an encoding issue with the PDF file. Opening it in Libre Draw and exporting as a new PDF file seems to resolve the issue.

Now we are a step closer to the solution, I just need to figure out how to convert the file from the command line and I will be home free. Something to work on when I have had some sleep.

– Suramya

Older Posts »

Powered by WordPress