Suramya's Blog : Welcome to my crazy life…

February 6, 2025

A Linux Distribution which runs directly within a PDF file

There is a semi-serious joke in the IT industry that anything that can compute is eventually used to play Doom and then run Linux. Now you can do both from inside a PDF file. Since the PDF specification supports Javascript a highschool student who goes by the handle ‘ading2210’ has implemented a RISC-V emulator in it which can run a barebones Linux distribution within the PDF file itself. This builds on top of the work done to get Doom to run inside the PDF file.

The full specfication for the JS in PDFs was only ever implemented by Adobe Acrobat, and it contains some ridiculous things like the ability to do 3D rendering, make HTTP requests, and detect every monitor connected to the user’s system. However, on Chromium and other browsers, only a tiny subset of this API was ever implemented, due to obvious security concerns. With this, we can do whatever computation we want, just with some very limited IO.

C code can be compiled to run within a PDF using an old version of Emscripten that targets asm.js instead of WebAssembly. With this, I can compile a modified version of the TinyEMU RISC-V emulator to asm.js, which can be run within the PDF. For the input and output, I reused the same display code that I used for DoomPDF. It works by using a separate text field for each row of pixels in the screen, whose contents are set to various ASCII characters. For inputs, there is a virtual keyboard implemented with a bunch of buttons, and a text box you can type in to send keystrokes to the VM.

The largest problem here is with the emulator’s performance. For example, the Linux kernel takes about 30-60 seconds to boot up within the PDF, which over 100x slower than normal. Unfortunately, there’s no way to fix this, since the version of V8 that Chrome’s PDF engine uses has its JIT compiler disabled, destroying its performance.

For the root filesystem, there are both 64 and 32 bit versions possible. The default is a 32 bit buildroot system (which was prebuilt and taken from the original TinyEMU examples), and also a 64 bit Alpine Linux system. The 64 bit emulator is about twice as slow however, so it’s normally not used.

You can try out the implementation of LinuxPDF here. More details of the project and the code used to create it is available on the project’s GitHub page.

– Suramya

January 22, 2025

ELIZA Resurrected using original code after 60 years

If you have been following the AI chat bot news/world then you would have heard the name ELIZA come up. Eliza was the world’s first chatbot created over 60 years ago by MIT professor Joseph Weizenbaum and was the first language model which a user could interact with. It had a significant impact on the AI world (Actual AI research not the LLM wanna be AI we have right now) and was the first to attempt the Turing test. It was originally written in a programming language invented by Weizenbaum called the Michigan Algorithm Decoder Symmetric List Processor (MAD-SLIP) and the pattern matching directives were provided as separate scripts. Shortly after the initial release it was rewritten in LISP which went viral. Unfortunately the original code in MAD-SLIP went missing till recently soon after that.

One of the most famous ELIZA scripts was called Doctor that emulated a psychotherapist of the Rogerian school (in which the therapist often reflects back the patient’s words to the patient). Much to his surprise Weizenbaum found that folks attributed human-like feelings to the computer program. Wikipedia explains how the software worked:

ELIZA starts its process of responding to an input by a user by first examining the text input for a “keyword”.[5] A “keyword” is a word designated as important by the acting ELIZA script, which assigns to each keyword a precedence number, or a RANK, designed by the programmer.[15] If such words are found, they are put into a “keystack”, with the keyword of the highest RANK at the top. The input sentence is then manipulated and transformed as the rule associated with the keyword of the highest RANK directs.[20] For example, when the DOCTOR script encounters words such as “alike” or “same”, it would output a message pertaining to similarity, in this case “In what way?”,[4] as these words had high precedence number. This also demonstrates how certain words, as dictated by the script, can be manipulated regardless of contextual considerations, such as switching first-person pronouns and second-person pronouns and vice versa, as these too had high precedence numbers. Such words with high precedence numbers are deemed superior to conversational patterns and are treated independently of contextual patterns.[citation needed]

Following the first examination, the next step of the process is to apply an appropriate transformation rule, which includes two parts: the “decomposition rule” and the “reassembly rule”.[20] First, the input is reviewed for syntactical patterns in order to establish the minimal context necessary to respond. Using the keywords and other nearby words from the input, different disassembly rules are tested until an appropriate pattern is found. Using the script’s rules, the sentence is then “dismantled” and arranged into sections of the component parts as the “decomposition rule for the highest-ranking keyword” dictates. The example that Weizenbaum gives is the input “You are very helpful”, which is transformed to “I are very helpful”. This is then broken into (1) empty (2) “I” (3) “are” (4) “very helpful”. The decomposition rule has broken the phrase into four small segments that contain both the keywords and the information in the sentence.[20]

The decomposition rule then designates a particular reassembly rule, or set of reassembly rules, to follow when reconstructing the sentence.[5] The reassembly rule takes the fragments of the input that the decomposition rule had created, rearranges them, and adds in programmed words to create a response. Using Weizenbaum’s example previously stated, such a reassembly rule would take the fragments and apply them to the phrase “What makes you think I am (4)”, which would result in “What makes you think I am very helpful?”. This example is rather simple, since depending upon the disassembly rule, the output could be significantly more complex and use more of the input from the user. However, from this reassembly, ELIZA then sends the constructed sentence to the user in the form of text on the screen

Now after over 60 years the original code written in MAD-SLIP has been resurrected by Jeff Shrager, a cognitive scientist at Stanford University, and Myles Crowley,an MIT archivist, who found it among Weizenbaum’s papers back in 2021. Which is when they started working on getting the code to run, which was a significant effort. They first created an emulator that approximated the computers available in the 1960’s and then cleaned up the original 420-line ELIZA code to get it to work. They published a paper: ELIZA Reanimated: The world’s first chatbot restored on the world’s first time sharing system on 12th Jan where they explain the whole process.

ELIZA, created by Joseph Weizenbaum at MIT in the early 1960s, is usually considered the world’s first chatbot. It was developed in MAD-SLIP on MIT’s CTSS, the world’s first time-sharing system, on an IBM 7094. We discovered an original ELIZA printout in Prof. Weizenbaum’s archives at MIT, including an early version of the famous DOCTOR script, a nearly complete version of the MAD-SLIP code, and various support functions in MAD and FAP. Here we describe the reanimation of this original ELIZA on a restored CTSS, itself running on an emulated IBM 7094. The entire stack is open source, so that any user of a unix-like OS can run the world’s first chatbot on the world’s first time-sharing system.

You can try it out: here.

Source:

– Suramya

January 21, 2025

Getting my NVIDIA card working after breaking it again with the latest updates

Filed under: Computer Software,Knowledgebase,Linux/Unix Related — Suramya @ 11:10 AM

NVIDIA doesn’t have the best history with Linux as it’s cards historically didn’t work well with Linux. But over the past few years things were changing and at least in my experience they were at a point that the cards worked without major issues. As some of you know I use the unstable version of Debian, primarily because it has the newest versions of software available but the downside of using it is that things break and sometimes they break spectacularly.

This time there was an issue with the NVIDIA driver/configuration which caused my system to stop opening the GUI login interface when I restarted the system. I tried reinstalling the driver as the error messages in the log suggested that the issue was caused by a missing driver. I purged the nvidia drivers by issuing the following command as root:

apt purge *nvidia*

Then reinstalling the drivers using the following command:

apt-get install nvidia-detect nvidia-driver

After this reinstall the driver was being detected correctly but the GUI still wasn’t coming up. A search on the net didn’t return many useful results but on one of the sites, there was a reference to the fact that running nvidia-xconfig recreats the X Configuration file for NVIDIA cards, so I tried that by running the following commands as root

apt-get install nvidia-xconfig 
nvidia-xconfig 

This created the Configuration file and once I rebooted everything started working again. I did have to reconfigure my desktop since one of the things I had tried was to reset all the custom configurations to KDE but that was mostly a minor issue.

This issue was on Kernel 6.12.9-amd64 with Debian Unstable release as of 17th Jan 2025

– Suramya

January 3, 2025

Playing Doom to solve a CAPTCHA

Filed under: Computer Software,Interesting Sites — Suramya @ 10:48 AM

I guess traditional CAPTCHA’s are getting too easy for LLM’s and humans to solve so Guillermo Rauch decided to create a CAPTCHA that lets you play DOOM® to prove that you’re human.

The project works by leveraging Emscripten to compile a minimal port of Doom to WebAssembly and enable intercommunication between the C-based game runloop (g_game.c) and the JavaScript-based CAPTCHA UI.

Some extensions were made to the game to introduce relevant events needed for its usage in the context of a CAPTCHA.

It is actually a fun implementation of the game and while I doubt it will gain widespread usage it is an interesting proof of concept.

– Suramya

September 26, 2024

Python in Excel launched for all Office 365 Business and Enterprise users

Filed under: Computer Security,Computer Software,My Thoughts,Tech Related — Suramya @ 10:35 PM

Excel is both a blessing and a bane for companies. Because of its capabilities folks have created formulas/macros/scripts/functions etc in Excel that allows them to generate data that is used to take major financial decisions with real world impact. But that capability also makes it an ideal vector for infiltrating an organization using Macros or scripts in Excel files to compromise systems.

Back in Aug 2023, Microsoft first announced that they are going to support running Python inside an Excel file. After that there was no major talk about it so I had hoped this meant that they had abandoned the project, but sadly I was mistaken. Redmond announced the official release of Python in Excel for Windows users of Microsoft 365 Business and Enterprise in a blog post. The post has a lot of details on the new capabilities this gives to power users and frankly I can see why folks are excited about it. But from a security and version control point of view this is a disaster waiting to happen.

There is a new learning series available for free for 30 days on LinkedIn that incorporates numerous examples, tutorials, and tips on how to best leverage Python in Excel.

Included in the Excel for Python release is a large language model integration that will allow Excel users to ask the Copilot to build scripts for them with plain language commands.

Microsoft partnered with data science tool maker Anaconda to develop the Python-Excel integration. As we’ve previously reported, data can move effortlessly between the two platforms using a few custom-defined functions.

This two-way function sending is a key part of security – Microsoft states Python processes Excel data without revealing the user’s identity, and all Python code runs in a secure, isolated environment, only accessing libraries approved by Anaconda​.

As with all the stuff MS has released recently, this also has LLM Integration but is on a very restricted list. The service is available to all Office 365 users with a valid Enterprise or Business Microsoft 365 subscription on the Current Channel.

Source: The Register: Python in Excel is here, but only for certain Windows users

– Suramya

August 30, 2024

Admiral Grace Hopper’s NSA Lecture from 1982 on Future Possibilities: Data, Hardware, Software, and People

Filed under: Computer Software,Tech Related — Suramya @ 6:05 PM

Grace Hopper is one of the founders of Programming languages and was the first person to devise the theory of machine-independent programming languages which she then used to develop the FLOW-MATIC programming language and COBOL. She had a phenomenal impact on the field of Computer Science/Engineering and her lectures are extremely interesting to watch as even after 40 years the concepts she talks about are still relevant. The NSA has finally released the video recording of a 1982 lecture by Adm. Grace Hopper titled “Future Possibilities: Data, Hardware, Software, and People.”

Initially they refused to do so because “With digital obsolescence threatening many early technological formats, the dilemma surrounding Admiral Hopper’s lecture underscores the critical need for and challenge of digital preservation. This challenge transcends the confines of NSA’s operational scope. It is our shared obligation to safeguard such pivotal elements of our nation’s history, ensuring they remain within reach of future generations. While the stewardship of these recordings may extend beyond the NSA’s typical purview, they are undeniably a part of America’s national heritage.”.

Thankfully after a massive push from the all over the world to get NSA to release the video saner minds prevailed and the entirety of the lecture has been released in two parts. You can watch them below:


Capt. Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part One, 1982)


Capt. Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part Two, 1982)

Since I don’t trust online systems to keep information available indefinitely, I have also archived the lectures on my system so if they disappear in the future I will have copies I can publish.

– Suramya

August 27, 2024

MIT Researchers publish AI risk database exposing 700+ ways AI can be risky

Filed under: Artificial Intelligence,Computer Software,My Thoughts — Suramya @ 10:44 AM

AI (or rather what is call AI right now), is not really intelligent but it does have a lot of risks associated with using it. We all know about the Deep Fakes and the hallucinations etc but those are not the only risks of using generative AI. The researchers at MIT have cataloged the over 700 risks of using generative AI.

The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them. This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference.

This comprises a living database of 777 risks extracted from 43 taxonomies, which can be filtered based on two overarching taxonomies and easily accessed, modified, and updated via our website and online spreadsheets. We construct our Repository with a systematic review of taxonomies and other structured classifications of AI risk followed by an expert consultation. We develop our taxonomies of AI risk using a best-fit framework synthesis. Our high-level Causal Taxonomy of AI Risks classifies each risk by its causal factors (1) Entity: Human, AI; (2) Intentionality: Intentional, Unintentional; and (3) Timing: Pre-deployment; Post-deployment. Our mid-level Domain Taxonomy of AI Risks classifies risks into seven AI risk domains: (1) Discrimination & toxicity, (2) Privacy & security, (3) Misinformation, (4) Malicious actors & misuse, (5) Human-computer interaction, (6) Socioeconomic & environmental, and (7) AI system safety, failures, & limitations. These are further divided into 23 subdomains. The AI Risk Repository is, to our knowledge, the first attempt to rigorously curate, analyze, and extract AI risk frameworks into a publicly accessible, comprehensive, extensible, and categorized risk database. This creates a foundation for a more coordinated, coherent, and complete approach to defining, auditing, and managing the risks posed by AI systems.

They have published a paper on it: The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence that you should check out. They have also made their entire database available to copy for free as well.

Check it out if you have some free time.

Source: Boingboing.net: MIT’s AI risk database exposes 700+ ways AI could ruin your life.

– Suramya

August 25, 2024

Browse Open source clones of classic video games

Filed under: Computer Software,Tech Related — Suramya @ 2:19 AM

There are a lot of games that can no longer be played because the systems to run the games are no longer in production and it is illegal to modify their code to work on the new systems or operating systems or emulators. That is where open source comes into play, developers have dedicated a lot of time creating open source clones of their favorite games.

You can access the list and instructions on how to install/play them at: https://osgameclones.com/, which gathers open-source or source-available remakes of great old games in one place.

A Remake is a game where the executable and sometimes the assets as well are remade open source. Some of these games aren’t exact remakes but evolution of original ones, which were eventually open sourced.
A Clone is a game which is very similar to or heavily inspired by a game or series.
An Official project is the official source code release for a game that was formerly closed-source, maintained by the original creators and has minimal changes.
A Similar game is one which has similar gameplay but is not a clone.
A Tool is not a game, but something that assists in playing or modding the game, such as a high resolution patch, or resource extractor.

I see Open source versions of Classics like Decent II, Doom II/III and many more on the site. Check it out if you have some free time.

Source: Boingboing.net: Open source clones of classic video games

May 25, 2024

ICQ messenger shutting down after almost 28 years of service

Filed under: Computer Software,My Thoughts,Tech Related — Suramya @ 6:12 AM

I used ICQ for the first time around 1997/98 and it was an amazing experience to be chatting with someone on the other side of the planet for free. I had been using BBS’s to connect with people but for obvious reasons they were all local folks. ICQ was the first system that allowed me to connect to the international world for free. I think it was also the first system where I encountered the infamous A/S/L(Age/Sex/Location) question. It was quite popular over on AOL IM as well, but I first encountered it on ICQ. I think it was probably because there was no concept of a profile picture at that time and ICQ used a number instead of a custom screen name for user identification.

Unfortunately the system is going to be shutdown on June 26th with no explanation given on why it is being shutdown. That being said It is impressive that the system managed to stay up and running for so long. I think the last time I logged into ICQ was sometime in the late 2000’s. I wonder if I still remember the password to be able to log in one last time before it is gone for good.

What would be really cool is if VK makes the source code behind the server opensource so that others can setup an alternate ICQ server for folks to use. But I doubt that would happen.

Source: Bleepingcomputer.com: ICQ messenger shuts down after almost 28 years

– Suramya

May 23, 2024

Windows 11 will feature builtin Spyware in the near future or Recall AI as Microsoft Calls it

Till recently if you wanted to spy on someone and see what they have been doing on the computer, you had to infect their computer by making them visit a dodgy site or get physical access and download a RAT (Remote Access Trojan) & install it on the target’s computer, configure the Antivirus to ignore it and put in a backdoor so that you can access the data remotely. Obviously this was a lot of work so looks like some cyber criminals reached out to Microsoft (MS) and asked for help. MS being a super helpful company, has added a functionality called ‘Windows Recall’ to it’s windows 11 Preview build to solve this. Recall takes a snapshot (literally) of the screen every few seconds and stores it in a searchable database ‘stored locally’. Basically it does exactly what spyware does without having to install anything new on your system. As per the company below is how the Recall works:

Recall uses Copilot+ PC advanced processing capabilities to take images of your active screen every few seconds. The snapshots are encrypted and saved on your PC’s hard drive. You can use Recall to locate the content you have viewed on your PC using search or on a timeline bar that allows you to scroll through your snapshots. Once you find the snapshot that you were looking for in Recall, it will be analysed and offer you options to interact with the content. What actions you can take depend on the content and the chat provider capabilities in Copilot in Windows. For example, you may highlight a block of text and decide to summarise it, translate it, or open it with a text editor like Word or Notepad. If you highlight an image, you will be able to edit it or use your chat provider in Copilot in Windows to find or create a similar image.

Recall will also enable you to open the snapshot in the original application in which it was created, and, as Recall is refined over time, it will open the actual source document, website or email in a screenshot. This functionality will be improved during Recall’s preview phase.

The best part is that according to their own announcement the snapshots will not hide passwords/account numbers etc. However, it does block you from recording DRM’d video you might be watching because protecting that is important not simple things like personal information etc.

Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers. That data may be in snapshots that are stored on your device, especially when sites do not follow standard internet protocols like cloaking password entry.

This is a gold mine for data thieves, abusers, industrial espionage, identity thieves and other cyber criminals. Once they have access to a PC they don’t need to do anything else except copy the data from the Recall DB to their own system and happily browse through the users personal data at their leisure.

I don’t think MS has thought about folks who use public computers such as the ones in an Internet Cafe or Hotels or Libraries. With this feature enabled all someone has to do is wait a few days then come back and copy incredibly private information that they can then sell/use. Privacy and Domestic Abuse experts are raising questions about this as well because sure as night follows day, abusers will use this to track what their victims are doing on a computer and that can go bad very quickly.

Even if the data is supposedly only on the local machine we don’t know when MS is going to force it to be uploaded to their servers using OneDrive or other similar setups. All the coverage I have seen for this functionality 99% of them have raised similar concerns about the security, privacy and quite frankly the need for this kind of surveillance.

Imagine what would a regieme like Taliban, China or other conservative/restrictive governments do with information they get from this system. You are dreaming if you think that they will not force MS to make this information available to them at the risk of losing access to that market if they don’t. Once you have the capability to do this, feature creep will happen for sure and we will end up in a Surveillance state.

The only Windows 11 system at my place is my wife’s laptop and you can be sure that I am going to disable this ‘feature’ as soon as it launches.

Source: Bleepingcomputer: Windows 11 Recall AI feature will record everything you do on your PC

– Suramya

Older Posts »

Powered by WordPress