Suramya's Blog : Welcome to my crazy life…

February 24, 2026

You can now run an actual x86 Simulator in a Browser using CSS without any Javascript

Filed under: Tech Related — Suramya @ 3:43 PM

When I think of CSS (Cascading Style Sheets) I think of it as a way to change presentation and styling of a web page, but it looks like it is a full fledged programming language now that can run an entire x86 CPU emulator without using any Javascript.

Lyra Rebane managed this feat where you can write programs in C, compile them to x86 machine code with GCC, and run them inside CSS. A demo is available at: https://lyra.horse/x86css/. The demo doesn’t work in Firefox but works great in Chrome/Chromium based browsers.

Lyra is going to post a blog in the near future explaining how it all works and I for one am eagerly waiting for it because this is a phenomenal achievement.

How??

I plan on writing a blog post that explains how this works as well as many of the tricks used. Bookmark my blog or add it to your RSS reader.
Surely you still need a little bit of JavaScript?

Nope, this is CSS-only!

There is a script tag on this site, which is there to provide a clock to the CSS – but this is only there to make the entire thing a bit faster and more stable. The CSS also has a JS-less clock implementation, so if you disable scripts on this site, it will still run. JavaScript is not required.

My CSS clock uses an animation combined with style container queries, which means you don’t need to interact with anything for the program to run, but it also means its a bit slower and less stable as a result. A hover-based clock, such as the one in Jane Ori’s CPU Hack, is fast and stable, but requires you to hold your mouse on the screen, which some people claim does not count as turing complete for whatever reason, so I wanted this demo to be fully functional with zero user input.

Source: @rebane2001@infosec.exchange

– Suramya

February 18, 2026

Self driving cars & automated drones are vulnerable to Prompt Injection Attacks Via Road Signs

When I started working with computers way back in 1995, one of the first lessons I learnt was to keep things simple because the more complicated or more layers you have in your system the more ways there are for things to go wrong and more attack surfaces are available for a bad actor to target. This was called the KISS (Keep It Simple Stupid) principle. With the current systems adding more and more complexity it feels like people have stopped following that advice. Especially with LLM/AI getting added there is a layer of complexity that is like a black box because we can’t know enough about the model being used, such as what data was used to train it, what biases are included (knowingly or unknowingly) into the model etc.

Where cars used to be simple mechanical devices they are now instead computers on wheels that are getting more and more complicated. As per IEEE, a typical car may use 100 million lines of code and this is without AI/Self Driving systems coming into the picture.

We now have AI systems running on Cars that use models to drive cars, decide when to stop and what rules to follow. To explore the risk, researchers at the University of California, Santa Cruz, and Johns Hopkins tested the AI systems and the large vision language models (LVLMs) underpinning them and found that they would reliably follow instructions if displayed on signs held up in their camera’s view. This research adds to the growing list of evidence that AI decision-making can easily be tampered with, which is a major concern because a lot of decisions are slowly being outsourced to these “AI” systems some of which can have serious consequences.

The researchers have published their findings in a paper where they introduce CHAI (Command Hijacking against embodied AI), a physical environment indirect prompt injection attack that exploits the multimodal language interpretation abilities of AI models.

Abstract: Embodied Artificial Intelligence (AI) promises to handle edge cases in robotic vehicle systems where data is scarce by using common-sense reasoning grounded in perception and action to generalize beyond training distributions and adapt to novel real-world situations. These capabilities, however, also create new security risks. In this paper, we introduce CHAI (Command Hijacking against embodied AI), a new class of prompt-based attacks that exploit the multimodal language interpretation abilities of Large Visual-Language Models (LVLMs). CHAI embeds deceptive natural language instructions, such as misleading signs, in visual input, systematically searches the token space, builds a dictionary of prompts, and guides an attacker model to generate Visual Attack Prompts. We evaluate CHAI on four LVLM agents; drone emergency landing, autonomous driving, and aerial object tracking, and on a real robotic vehicle. Our experiments show that CHAI consistently outperforms state-of-the-art attacks. By exploiting the semantic and multimodal reasoning strengths of next-generation embodied AI systems, CHAI underscores the urgent need for defenses that extend beyond traditional adversarial robustness.

Potential consequences include self-driving cars proceeding through crosswalks without regard to humans crossing it, taking passengers to a different destination (potentially allowing bad actors to kidnap people), getting the car into an accident by forcing it to ignore traffic rules/oncoming traffic.

Source: schneier.com: Prompt Injection Via Road Signs

– Suramya

February 13, 2026

New Playable class added to Diablo II 25 years after last update

Filed under: My Thoughts,Tech Related — Suramya @ 1:26 AM

Diablo II is one of my all time favorite games and I have been playing it on and off since it first released back in 2000. The game had five character classes when it first launched and two more classes were added in the expansion Diablo II: Lord of Destruction released a year later.

Now 25 years later Diablo II a new playable class called ‘Warlock’ is coming to Diablo II as part of the ‘Reign of the Warlock‘ expansion. In addition to the new character class the update also has new quests, Terror Zones etc.

Fresh from the gates of the Burning Hells, the Reign of the Warlock is upon us! This major update brings the Warlock, the first new playable class to Diablo II in 25 years. The Warlock is a mysterious, dark scholar who’s spent years studying their taboo craft in the shadows, but no longer. Wield their awesome power to bind demons and wreak havoc upon your foes.

Reign of the Warlock brings fresh new Terror Zones, fearsome Colossal Ancients to conquer, alongside player requested quality-of-life changes, and more. Use this powerful new class to carve through the demon hordes, using your dark arts to dominate your foes.

It shows how popular the game is when the studio releases an update with new content to a 26 year old game.

Source: @arstechnica@mastodon.social

February 4, 2026

Is it worth Contributing to Open Source with AI Scrapers using your work for training materials

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Tags: , , — Suramya @ 10:38 PM

I have quite a lot of work with Open Source Software (OSS) over the years which has resulted in two job offers and multiple opportunities to speak about OSS in various forums. I have even published some of my own work on my site as well. Nowadays with ‘AI’ scrapers hammering code repositories for content that is used to train their code generators in violation of the code licenses a lot of people have been pretty upset about it with multiple lawsuits being filed and unfortunately some of the developers have gotten tired enough that they have stopped publishing their code under OSS licenses.

The community is obviously divided about this as shown by the following post on Mastodon:

Screenshot of Mastodon post. Full text under the image in blockquote
Simon Willison on porting OSS code

@yoasif 🔗 https://mastodon.social/users/yoasif/statuses/115895264796629089

Simon Willison on porting OSS code:

> I think that if “they might train on my code” is enough to drive you away from open source, your open source values are distinct enough from mine that I’m not ready to invest significantly in keeping you. I’ll put that effort into welcoming the newcomers instead.

https://simonwillison.net/2026/Jan/11/answers/

This feels very much like colonialism; take over all the code, drive the original developers away, and give the colonizers the code as a welcome present.

Basically, some people are asking Code Generators to stop scanning their code into their system otherwise they will stop contributing to OSS and on the other side we have people like Simon who think that this is a bad reason to stop contributing code to OSS. I am not going to talk about the quality of code that that code generators create and why it is a bad idea to use these generators because I have talked about that in multiple other posts.

Looking at just the question of “Is it worth Contributing to Open Source with AI Scrapers using your work for training materials”, I think the answer is yes (for me at least) and everyone has the right to answer this in their own way.

For me Open Source is about learning how things work and solving specific problems that I want to fix, now this can be in existing software already published as OSS or new code that I write and then share publicly. I am sharing it so that people don’t have to reinvent the wheel and can build on top of existing solutions (which is what OSS is all about). Is it fair/right that companies are training their LLM’s on my code and then extrapolating/building on it without credit? Of-course not. I think that it is fair that I (or any developer) gets credit for the work they put in building something.

However, I learnt quite a lot looking at code that others had shared for free as OSS and I want to keep that culture alive and give that same option to new comers that I had. We are going to need a lot of coders in the near future to fix problems that were created by ‘vibe coders’ and LLM’s and the best way to create that experience is to have them look at existing code so that they can learn from it. Both the good parts and in certain cases learn what not to do 😉 .

So in summary I would have to say that yes it is worth it. Feel free to comment and share your thoughts on this.

– Suramya

February 2, 2026

Programming: Unclear on the concept of code reviews

Filed under: My Thoughts,Tech Related — Suramya @ 1:42 PM

There is an interesting disconnect in people about software engineering. They are unable to grasp that the industry/team exists to solve business problems, not to release changes as fast as possible (I mean we do want to release fast but not at the expense of solving actual problems) or to try out the latest in technology. This is why I have seen techies in companies get upset that they can’t upgrade the entire setup to use the latest and greatest (as of now) framework they just heard about. Business doesn’t care about any of that, they want the software to work reliably, have the features they want and be stable. If you have ever had to pitch a infra upgrade, addressing technical debt or system uplift (without new features) to senior management you know what I am talking about.

A good SLDC (Software Development Life Cycle) setup ensures that there is a solid code review process setup before a change is deployed into production. This allows a person (or persons) other than the developer to review the code and highlight any issues found. It also serves as a way to train junior programmers about best practices and more efficient ways to do things. I have come back from code reviews with pages of questions and better approaches to solving the problem I was addressing. Even now all my code still goes through code review. If a company doesn’t have a good code-review workflow then I would be very scared to use any products developed by them. In Git this review is initiated by creating a PR (Pull Request), once the changes are approved they are merged into the next release branch.

Programming is more than just writing code and pushing it to production. As I said earlier the goal is to solve business problems efficiently and without bugs. So when I see statements like in the screenshot below I get very scared because it shows just how unprepared/unknowledgeable some of the folks who are ‘vibe-coding’ or AI coding changes that are being pushed to production are.

software development in 2026 is going to require some to loosen up a little. code doesn't have to be as perfectly crafted the way we did it pre-ai. call it slop if you want, but if you're still demanding perfection on every pr while your competitors are shipping 'slop' that works...you're fighting from a disadvantaged position. shipping velocity matters more than perfection
software development in 2026 is going to require some to loosen up a little. code doesn’t have to be as perfectly crafted the way we did it pre-ai.

rye
@rywalker

software development in 2026 is going to require some to loosen up a little

code doesn’t have to be as perfectly crafted the way we did it pre-ai

call it slop if you want, but if you’re still demanding perfection on every pr while your competitors are shipping “slop” that works…
you’re fighting from a disadvantaged position

shipping velocity matters more than perfection

Even ‘Pre-AI’ no one has perfectly crafted code, which is why we have technical debt and temporary fixes that last years if not decades. That said, the main goal of the Code Review is to ensure that you are

a) Doing what you want to do correctly
b) You didn’t break any existing functionality
c) Didn’t introduce any new bugs
d) Are coding efficiently/following best practices.

Out of the four listed above A to C are critical. Most of the review process usually focuses on these three with the last one being given focus as and when time permits. There are times when you encounter ‘The Coding Evangelist’ (See: Types of Software Developers for explanation) who will make you miserable trying to perfect the code but that is usually a rare occurrence (at least in my experience over the past 27 years of coding).

But if you are optimizing for speed of production instead of legibility and maintainability that is a way to incur tech debt at scale. Which is a big problem for systems in production. People often fail to take into consideration the cost of “Keeping the Lights on”.

Sometimes people do argue in reviews about stupid stuff, one funny instance I remember is from a previous company where these two senior developers had an ongoing argument about how to format the code. Each one hated the other person’s preferred formatting and they had both actually created Macros in Emacs to change the formatting to their preferred style every time they edited the file. We could figure out who the last person to work on the file was by looking at the formatting.

Long story short, (good) Code reviews are absolutely required and essential for an organization to ensure that the code in production is as stable and bug free as possible. It doesn’t matter how quickly you are pushing code if the code doesn’t work the way it is supposed to or/and has bugs in it.

– Suramya

January 30, 2026

Wasted some more time thanks to Windows 10 auto upgrading to Windows 11 and disabling local accounts

Filed under: My Life,My Thoughts,Tech Related — Suramya @ 5:02 AM

A friend of mine asked me if I had a spare laptop that they could borrow for a few days. Since I had a spare one that is used for my experiments I told them yes. To ensure that everything was working before I handed it over to them I booted up the laptop. The system started up and installed a bunch of updates before even it give me the login prompt. I let it update and then once all the updates were applied I ended up on the Login page which is where this whole painful odyssey started.

The laptop was originally used by one of the Employees in Jani’s company that was retired because they needed a faster computer. It had multiple accounts created on it, one for the user and one administrator account for me (which was a local account). The login page was only showing me the option to login as the employee using their Microsoft account. Usually there is an option to select alternate accounts from a list (or enter them manually) but in this case that wasn’t the case. Even though I had local logins enabled on the laptop I was not getting any option to select other users.

I then spent a whole lot of time trying to enable local user on the system by booting into recovery mode and manually adding the users. All the steps I tried were for Windows 10 because that is what was installed on the laptop but after a while of trying I noticed that the recovery screen mentioned something about recovering/resetting the Windows 11 system on the laptop. That is when I realized that the stupid thing had upgraded to Windows 11 and since Windows 11 makes it difficult to have local accounts it had removed the option of selecting the alternate accounts.

The only option I had to login was to request a login code sent to the employee’s email account and use that to log in. But by this time I was considering doing a full reinstall since even after logging in I would have to re-configure the system for my use and and if I was doing that I might as well do a full format and reinstall the OS.

Finally I ended up reinstalling windows 10 on the machine and surprise surprise everything was working the way to supposed to work. Thankfully I didn’t have any data on the machine that I didn’t that I minded use losing so it was easy to reset and reinstall. Now I just need to make sure the stupid thing doesn’t upgraded in again but since this time my account is a primary account on the machine I’ll still be able to login even if the system upgrades to windows 11.

I really dislike working with Windows and everytime I have to I end up wasting tons of time solving stupid issues I don’t see on Linux.

– Suramya

January 9, 2026

Conscience of a Hacker aka The Hacker Manifesto turns 40

Filed under: My Thoughts,Tech Related — Tags: , — Suramya @ 10:42 PM

The Conscience of a Hacker, also known as The Hacker Manifesto, turned 40 yesterday. If there was a document that shaped entire generations of Hackers, the Hacker Manifesto would be this document. The manifesto was first published in Phrack Volume One, Issue 7, Phile 3 of 10. The full text of the post is below:

\/\The Conscience of a Hacker/\/

by

+++The Mentor+++

Written on January 8, 1986
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Another one got caught today, it’s all over the papers. “Teenager
Arrested in Computer Crime Scandal”, “Hacker Arrested after Bank Tampering”…
Damn kids. They’re all alike.

But did you, in your three-piece psychology and 1950’s technobrain,
ever take a look behind the eyes of the hacker? Did you ever wonder what
made him tick, what forces shaped him, what may have molded him?
I am a hacker, enter my world…
Mine is a world that begins with school… I’m smarter than most of
the other kids, this crap they teach us bores me…
Damn underachiever. They’re all alike.

I’m in junior high or high school. I’ve listened to teachers explain
for the fifteenth time how to reduce a fraction. I understand it. “No, Ms.
Smith, I didn’t show my work. I did it in my head…”
Damn kid. Probably copied it. They’re all alike.

I made a discovery today. I found a computer. Wait a second, this is
cool. It does what I want it to. If it makes a mistake, it’s because I
screwed it up. Not because it doesn’t like me…
Or feels threatened by me…
Or thinks I’m a smart ass…
Or doesn’t like teaching and shouldn’t be here…
Damn kid. All he does is play games. They’re all alike.

And then it happened… a door opened to a world… rushing through
the phone line like heroin through an addict’s veins, an electronic pulse is
sent out, a refuge from the day-to-day incompetencies is sought… a board is
found.
“This is it… this is where I belong…”
I know everyone here… even if I’ve never met them, never talked to
them, may never hear from them again… I know you all…
Damn kid. Tying up the phone line again. They’re all alike…

You bet your ass we’re all alike… we’ve been spoon-fed baby food at
school when we hungered for steak… the bits of meat that you did let slip
through were pre-chewed and tasteless. We’ve been dominated by sadists, or
ignored by the apathetic. The few that had something to teach found us will-
ing pupils, but those few are like drops of water in the desert.

This is our world now… the world of the electron and the switch, the
beauty of the baud. We make use of a service already existing without paying
for what could be dirt-cheap if it wasn’t run by profiteering gluttons, and
you call us criminals. We explore… and you call us criminals. We seek
after knowledge… and you call us criminals. We exist without skin color,
without nationality, without religious bias… and you call us criminals.
You build atomic bombs, you wage wars, you murder, cheat, and lie to us
and try to make us believe it’s for our own good, yet we’re the criminals.

Yes, I am a criminal. My crime is that of curiosity. My crime is
that of judging people by what they say and think, not what they look like.
My crime is that of outsmarting you, something that you will never forgive me
for.

I am a hacker, and this is my manifesto. You may stop this individual,
but you can’t stop us all… after all, we’re all alike.

+++The Mentor+++

It was a significant force in shaping how I thought about computers and why I went into Computer Security. I first read the manifesto in late 1997 after I got my first computer and became active on the Internet and various BBS forums. Sadly I don’t remember the exact site where I found it but I can tell you it had a profound impact on me. I have always been a person who wanted to know how things work and why things were a particular way. In fact my parents actually went out and bought a series of books called “Tell me Why?” to answer my questions.

Then I found this post that put into words things I was just starting to work out and put them in plain and simple terms. To proudly say that wanting knowledge is not a bad thing, neither is wanting to know how things work and why they are done in a particular way. I still follow the same basic agenda/rule in all my work and it has helped me immensely.

Source: @phrack@haunted.computer

– Suramya

January 6, 2026

KDE’s Full form: Kool Desktop Environment

Filed under: Computer Software,Linux/Unix Related,My Thoughts,Tech Related — Suramya @ 5:12 PM

One of the cool things about Linux that also confuses a lot of people who are used to Windows is that it allows you to use different Desktop environments based on your choice. So if you like a minimalist setup or are on an old system with limited resources you can use something like IceWM or Fluxbox. Others like GNOME which is more like a Mac interface, I personally prefer KDE which is similar in layout to Windows and has a lot of good widgets and inbuilt functionality I like.

I have been using KDE since I first started using Linux back in early 2000 but till today didn’t really think about what the acronym stood. While surfing the web I found the full form and found it funny. KDE Officially stands for: Kool Desktop Environment. It was first announced on a Linux Mailing list way back on 14th Oct 1996: New Project: Kool Desktop Environment. Programmers wanted!

From that small beginning KDE is now one of the most popular Desktop Environments out there and is constantly being updated.

Thought I should share so that others also know…

– Suramya

January 5, 2026

Wasted hours of my life due to Copilot and AI on Win 11 laptop

Over the weekend Jani asked me to take a look at her laptop because it was heating up quite a bit and the CPU fan was almost constantly running on high speed. So I took the laptop ran a bunch of virus scans and malware removal tools on it. Disabled a some programs that didn’t need to be running all the time (Adobe was a big one) but still the issue wasn’t solved.

After wasting about 3 hours of my life on this I remembered that she is using Windows 11 and that Copilot is enabled by default on all Win11 systems. So I went and disabled Copilot and almost immediately the CPU utilization dropped and the system stopped heating up so much. Then I disabled Copilot in all the Office tools (Word/Excel etc) and Notepad. I mean why on earth does Notepad need Copilot/AI? It is a plain text note taking software… it shouldn’t have any AI in it.

The amount of energy that is being wasted by ‘AI’ not just in data-centers but on laptops/desktops computers/phones etc is mind boggling. If it worked well it would still make some sense but it doesn’t. In fact it is almost comically bad to the point of being dangerous.

I used to update all the software on my systems almost on auto earlier but now have to look at each upgrade to see what is being added to the software. This is so I can avoid the AI crap that is getting added to all software. For example, Calibre which is one of the best software for organizing/converting e-books recently added an AI Chatbot to “Allow asking AI questions about any book in your calibre library.” This was almost universally condemned and the project forked to remove the AI related nonsense. Similarly other software have added AI to their setup without warning and it is exhausting to have to vet every single upgrade before pushing it out.

I am happy that I run Linux so I don’t have to deal with the nonsense that MS and other big companies have been pushing out in the name of AI.

– Suramya

January 2, 2026

Steganography: Hiding data in Document Files using color tags

Steganography is the art of hiding information within container files to conceal the existence of embedded information. Media files have been the most common containers for hiding embedded data due to which there is a lot of scrutiny on media files when they are transferred. Most of the DLP (Data Leak Prevention) system focus on media files when checking for steganography. Word documents on the other hand are common enough that they can be used as containers for hidden information without raising flags.

In this paper we explore hiding secret data in a Word document by inserting multiple color tags into the file that alter the color for each character in the document to encode data without changing the visual look of the document.

Modern DLP systems can detect hidden information in media files such as images, videos or audio files by performing analysis of files to detect modification and potentially identify the hidden data. In order to be able to send data without detection a new method of hiding data needs to be found. In this paper we look at how to hide text in a word document by modifying the color tags in the word document. This allows us to exfiltrate data using word files with a minimal risk of detection using existing tools.

Introduction and History

Steganography is the art of hiding data or a message inside another file or object. This object can be an image, text, audio or video file. The word has Greek roots, and is a combination of steganos (“concealed, protected”) and graphy (“writing.”).

The first known use of steganography was in ancient Greece around 440 B.C, where the Greek ruler Histaeus would shave the head of a slave and tattoo a secret message on the slave’s scalp. After which he would wait for their hair to grow to hide the secret message and send the slave to the recipient who would then shave the head to get the message. (UK Essays, 2021) Another example from the same time period is when Demaratus sent a warning about a forthcoming attack to Greece by carving the message on the wood of a wax tablet before covering it with a fresh wax coat. This tablet that looked blank was delivered to Greece along with other blank tablets, where the Greeks removed the wax layer to read the hidden message. (Perera, 2011)

In more modern times, Steganography was used during the second world war by the Germans who used Microdots to reduce complete documents to the size of a dot which was then placed on a normal looking letter or document. Another technique used often was to encode messages in knitted scarves or sweaters sent to operatives. Every knitted garment is made of different combinations of just two stitches: a knit stitch, which is smooth and looks like a “v”, and a purl stitch, which looks like a horizontal line or a little bump. By making a specific combination of knits and purls in a predetermined pattern, spies could pass on a custom piece of fabric and read the secret message. (Zarrelli, 2021)
With the Digital age, the options to encode messages in digital files became available and steganography evolved to make use of the new medium.

How Digital Steganography works

Most digital files contain sections that can be altered without showing any obvious effects in the file. Modern techniques hide data in files by using one of the following approaches:

Adding bits to a file:

In this approach the hidden text is added to the “file header”, which usually contains information such as the file type or the resolution and color depth of a photo. This method is relatively easy to detect if we look at the file size difference. For example, if we add 1 MB of secret data to a 4 MB file, the output file size would increase by 1MB making it easy to detect if the resultant file was compared with the original.

Changing the Least Significant Bit (LSB):

To resolve this problem of changing file size, a new technique was created that makes use of the fact that the LSB’s in a file can be altered without significantly altering the source i.e. if the container was an image the altered image would look the same to human eyes. As an example, in an image file each pixel is comprised of three bytes of data corresponding to the colors red, green, and blue. LSB steganography changes the last bit of each of those bytes to hide one bit of data. Which allows a user to hide data in the file without changing the file size. The same technique can be applied to other media files such as Video or Audio files as well.

The larger the container file, the more data can be encoded into the file, which is why use of Images, Video and audio files is very popular with Steganographic users, as it allows the user to hide large quantities of data in a single file. The major limitation of using media files is that if the target doesn’t usually send or receive media files, then it is a break in the routine if they start suddenly sending or receiving such files.

Word Documents or Text files on the other hand are the bread and butter of all organizations and every user sends and receives a lot of documents throughout the course of the day. So, if we are able to hide data in a word file, then it would be easier to exfiltrate the data.

How to hide data in a text file

There are a lot of options available for use to hide information in a text file and some of them have been used historically for this purpose already, the digital text just gives us a new medium for the hidden text. Some of the options are as below:

Using patterns of letters within word

In this technique the user would send a normal looking message or document to another user. They would hide a secret message in the file by encoding a message that can only be read by taking the ith letter of each word in the message. The advantage is that you can send a lot of data using this technique, but the disadvantage is that the message can end up sounding very stilted because of the requirements of the steganography.

Using the Whitespace in the document to hide data

Another option is to use the spacing differences in the file to encode a message. One example is for the sender to put in one space after a full stop to mean 0 and two spaces after it to represent a 1. By looking at the spacing the secret message can be spelled out. The main problem with this approach is that it does not allow large quantity of data to be sent in a file, but the advantage is that it is harder to detect.

In this paper we are looking at a third way to hide data in a document by modifying the color tags in the document and we will look at this in more detail in the next section.

Hiding information using color tags in a Word Document

All versions of MS Office since 2007 save files in the Microsoft Office Open XML specification which are then zipped to create files in the DOCX format. Word files allow a user to show text in multiple colors by inserting the corresponding color tag into the file. (Microsoft, 2021) When the color of the displayed text is modified to a different color, the system adds a tag in the document.xml file located in the zip file like the following: <w:color w:val=”000000″/> to show the change in font color. The tag shows the color of the text in a Hex format, with 00 as Black and FF showing White color.

Each of the pair of bits in the color tag corresponds to the Red, Green or Blue color pallet. In each pair, the second bit is the least significant bit and its value can be modified without the output color looking significantly different to the viewer. So, visually speaking the font color represented by Hex value 000000 looks almost exactly the same as color represented by the Hex value of 010101. By altering the value of the second bit in the pair from 0 to 1 or vice versa information can be encoded into the file without adding text or information that can be found by security systems/reviewers. Since the data is in XML format, the sender can insert data into the document by inserting color tags into the document for each character. The process to hide the data would look like the following:

  • The user provides a word file to be used as an input. The file would contain sufficient text to allow the sender to encode data.
  • The system extracts the contents of the documents from the file by unzipping it.
  • The content of the document is stored in the ‘documents.xml’ file under the word folder created in the previous step.
  • The system extracts the text from the file by striping the XML tags from the file
  • For each character in the text, it adds a color tag such as or . The second bit in the pair is set to a 0 or a 1 depending on the data being encoded.
  • The original tags are restored to the file along with the new tags created.
  • The resulting file is saved as document.xml in the word folder
  • The folder is compressed as a ZIP file and renamed to .docx

The resultant file will contain the hidden data with little visual indication of the changes being made to the document and can be mailed our as usual with little chance of detection.

The recipient would follow these steps to extract the hidden data from the file:

  • Unzip the document to extract the content
  • Extract all the color font tags in the file
  • Read the second bit in every pair of color code
  • Save the values in a separate file that contains the secret information.
  • Review the information at your leisure.

This technique is fairly easy to implement with minimal coding skills required. If the setup doesn’t allow users to send out word documents, then the same technique can also be used to hide data in the html source of a website that the recipient would then download and extract. The same can also be accomplished by encoding data in emails sent from the user’s account.

Detection Techniques for hidden data in documents

Like any techniques to send hidden data the technique we just discussed has its weaknesses which can be used to detect hidden messages encoded in the document. However, such detection is not easy and most of the currently available tools will not be able to detect data hidden using this technique. This is because most commercial tools available in the market focus their efforts to detect hidden data with media files such as images, videos or audio files as they have traditionally been the most common containers used to hide data. Some of the options available to detect the possibility of hidden data are as follows:

  • Create a tool that examines all documents sent out to count the number of font tags in use in the document. If the count of the tags is over a certain threshold the file can be quarantined for review by a human
  • Use a tool checks the size a given document is expected to be based on the amount of text in the document. If the size of the file is significantly higher (due to anomalously high number of tags in the file) the file can be quarantined for review.
    • We would need to take into account any images etc embedded in the file when performing the analysis
  • Create a machine learning tool that uses AI/ML to detect files with hidden data.

Conclusion

Any data or file being sent outside the organizations network can be used to exfiltrate information from the network. The trick to detecting these attempts is to create a baseline of the activity, data sizes of the files transferred during a regular day and create alerts to notify administrators when there is a significant variation from the baseline.

Done correctly this will decrease the risk of data exfiltration but no technique to detect data is perfect so a lot of review and audits need to be done on a periodic basis to ensure that the system is still secure.

References

Microsoft. (2021, August 25). File format reference for word, Excel, and PowerPoint. Deploy Office | Microsoft Docs. Retrieved September 19, 2021, from https://docs.microsoft.com/en-us/deployoffice/compat/office-file-format-reference.
Perera, H. L. (2011, February 4). History of steganography. hareenlaks. Retrieved September 19, 2021, from http://hareenlaks.blogspot.com/2011/04/history-of-steganography.html.
UK Essays. (2021, August 12). The history & background of steganography. UK Essays. Retrieved September 19, 2021, from https://www.ukessays.com/essays/english-language/background-of-steganography.php.
Zarrelli, N. (2021, June 10). The wartime spies who used knitting as an espionage tool. Atlas Obscura. Retrieved September 19, 2021, from https://www.atlasobscura.com/articles/knitting-spies-wwi-wwii.


Note: This was originally written as a paper for one of my classes at EC-Council University in Q3 2021.

– Suramya

Older Posts »

Powered by WordPress