Suramya's Blog : Welcome to my crazy life…

November 28, 2020

My Backup strategy and how it has evolved over the years

I am a firm believer in backing up my data, some people say that I am paranoid about backing up data and I do not dispute it. All my data is backed up on multiple drives and locations and still I feel that I need additional backup. This is because I read the news and there have been multiple cases where people lost their data because they hadn’t backed it up. Initially I wasn’t that serious about it but when I was in college and working at the helpdesk, a phd student came in crying because her entire PHD thesis was on a Zip Drive and it wasn’t working anymore. She didn’t have a backup and was basically screwed. We tried a bunch of stuff to recover the data but didn’t manage to recover anything. That made me realize that I needed a better backup procedure so started my journey in creating recoverable backups.

My first backup system was a partition on my drive called backup where I created a copy of all my important data (This is back in 2000/2001). Then I realized that if the drive died then I would loose access to the backup partition as well, and I started looking for alternatives. This is around the time when I had bought a CD Writer so all my important data was backed up to CD’s and I was confident that I could recover any lost data. Shortly afterwards I moved to DVD’s for easier storage. However, I didn’t realize till a lot later that CD’s & DVD’s start becoming unreadable quite easily. Thankfully I didn’t loose any data but it was a rude awakening to find that the disks I had expected to keep my data safe were starting to become unreadable within a few years.

I then did a bunch of research online and found that the best medium for storing data long term is still Hard Drives. I didn’t want to store anything online because I want my data to be in my control so any online backup system was out of the question. I added multiple drives to my desktop and started syncing the data from the desktop & laptop to the backup drive using rync. This ensured that the important data was in three locations at any given time: My Desktop, My Laptop and the Backup drive. (Plus a DVD copy that I made of all my data every year)

I continued with this backup strategy for a few years but then realized that I had no way to go back to a previous version of any given document, if I deleted a file or wanted to go back to an older version of a file I only had 24 hours before the changes were synced to the backup drive before it was unrecoverable. There was a case where I ended up having to dig through my DVD backups to find the original version of a file that I had changed. So I did a bit of research and found rdiff-backup. It allows a user to back up one directory to another and generates an incremental backup. So we can recover/restore files based on date range. The best part is that the software is highly efficient, once the initial backup is done it only transmits the changes to the files in subsequent runs. Now that I have been using it I can restore a snapshot of my data going back to 2012 quite easily.

I was quite happy with this setup for a while, but while reading an article on best backup practices I realized that I was still depending only on 1 location for the backup data (the rdiff-data snapshots) and the best practices stated that you should also store it in an external drive or offsite location to prevent viruses/ransomware from deleting backups. So I bought a 5TB external drive and created an encrypted partition on the same to store all my important data. But I was still unhappy because all of this was still stored at my home so if I had a fire or something I would still end up loosing the data even though my external drive was kept in a safe. I still didn’t want to store data online but that was still the best way to ensure I had offsite backup. I initially thought about setting a server at my parents place in Delhi and backup there but that didn’t work out for various reasons. Plus I didn’t want to have to call them and troubleshoot backup issues over the phone.

Around this time I was reading about encrypted partitions and came up with the idea of creating an encrypted container file to store my data and then backup the container file online. I followed the steps I outlined in my post How to encrypt your Hard-drive in Linux and created the encrypted container. Once I finished that I had to upload the container to my webhost since I had unlimited storage space as per my contract. Initially I wasn’t able to because they had restricted my account’s quota but a call to their customer support sorted it out after a bit of argument and explaining what I was doing. The next hurdle I faced was uploading the file to the server because of the ridiculously low upload speed I was getting from Airtel. I had a 40 mbps connection at the time but the upload speed was restricted to 1 mbps because of ‘reasons’. After arguing with their support for a while, I was complaining about it at work and one of the folks suggest I check out ACT Internet. I checked out their plans and was quite impressed with the offerings so I switched over to ACT and was able to upload the container file quickly and painlessly.

Once the container was uploaded, I had to tackle the next problem in the process which was on how to update the files in the container without having to upload the entire container to the host. I experimented with a few solutions and then came up with the following solution:

1. Mount the remote partition as a local mount using sshfs. I mounted the partition locally using the following command: (please replace with the correct hostname and username before using)

/usr/sbin/runuser -l suramya -c "sshfs -o allow_other @hostname.com:. /mnt/offsite/"

2. Once the remote partition was mounted locally, I was able to use the usual commands to mount the encrypted partition to another location using the following command:

/usr/sbin/cryptsetup luksOpen /mnt/offsite/container/Enc_vol1.img enc --key-file /root/UserKey.dat
mount /dev/mapper/enc /mnt/stash/

In an earlier iteration of the code I wasn’t using the keyfile so had to manually enter the password everytime I wanted to backup to the offsite location. This meant that the backup was done randomly as and when I remembered to run the command manually. A few days ago I finally configured it to run automatically after adding the keyfile as a decryption key. (Obviously the keyfile should be protected and not be accessible to others because it allows users to decrypt the data without entering a password.) Now the offsite backup runs once a week while the local backup runs daily and I still backup the Backup partition to the external drive as well manually as and when I remember to do so.

In all I was quite happy with my setup but then I was updating the encrypted container and a network issue made be believe that my remote container had become corrupted (it wasn’t but I thought it was). At the same time I was fooling around with Microsoft One Drive and saw that I had 1TB of storage available over there since I was a Office 365 subscriber. This gave me the idea of backing up the Container to OneDrive as well as my site hosting.

I first tried copying the entire container to the drive and hit a limit because the file was too large. So I thought I would split the file into 5GB parts and then sync them to OneDrive using rclone. After installing rclone. I configured it to connect to OneDrive by issuing the following command and following the onscreen prompts:

rclone config

I then created a folder on OnDrive called container to store the split files and then tried uploading a test file using the command:

rclone copy $file OneDrive:container

Where OneDrive is the name of my provider that I configured in the previous step. This was successful so I just needed to create a script that did the following:

1. Update the Container file with the latest backup
2. Split the Container file into 5GB pieces using the following command:

split --verbose -d -b5GB /mnt/repository/Container/Enc_vol1.img /mnt/repository/Container/Enc_vol_

3. Upload the pieces to Ondrive.

for file in `ls /mnt/repository/Container/Enc_vol_* |sort`; do  echo "$file";  /usr/bin/rclone copy $file OneDrive:container -v &> /tmp/oneDriveSync.log; done

This command uploads the pieces to the drive one at a time and is a bit slow because it maxes out the upload speed to ~2mbps. If you split the uploads and run the command in parallel then you get a lot faster speed. Keep in mind that if you are uploading more than 10 files at a time you will start getting errors about too many open connections and then you have to wait for a few hours before you can upload again. It took a while to upload the chunks but now my files are stored in yet another location and the system is configured to sync to Onedrive once a month.

So, as of now my files are backed up as following:

  • /mnt/Backup: Local Drive. All changes are backed up daily using rdiff-backup
  • /mnt/offsite: Encrypted Container stored online. All changes are backed up weekly using rsync
  • OneDrive: Encrypted Container stored at Microsoft OneDrive. All changes are backed up monthly using rsync
  • External Drive: Encrypted backup stored in an External Hard-drive using rsync. Changes are backed up infrequently manually.
  • Laptop: All Important files are copied over to the laptop using Unison/rsync manually so that I can access my data while traveling

Finally, I am also considering backing up the snapshot data to BlueRay disks but it will take time so haven’t gotten around to it yet.

Since I have this elaborate backup procedure I wasn’t worried much when one of my disks died last week and was able to continue work without issues or worries about loosing data. I still think I can enhance the backups I take but for now I am good. If you are interested in my backup script an extract of the code is listed below:

function check_failure ()
{
	if [ $? == 0 ]; then
		logger "INFO: $1 Succeeded"
	else
		logger "FATAL: Execution of $1 failed"
		wall "FATAL: Execution of $1 failed"
		exit 1
	fi
}

###
# Syncing to internal Backup Drive
###

function local_backup ()
{
	export BACKUP_ROOT=/mnt/Backup/Snapshots
	export PARENT_ROOT=/mnt/repository

	logger "INFO: Starting System Backup"

	rdiff-backup -v 5 /mnt/data/Documents/ $BACKUP_ROOT/Documents/
	check_failure "Backing up Documents"

	rdiff-backup -v 5 /mnt/repository/Documents/Jani/ $BACKUP_ROOT/Jani_Documents/
	check_failure "Backing up Jani Documents"

	rdiff-backup -v 5 $PARENT_ROOT/Programs/ $BACKUP_ROOT/Programs/
	check_failure "Backing up Programs"

	..
	..

	logger "INFO: All Backups Completed Successfully."
}

### 
# Syncing to Off-Site Backup location
###

function offsite_backup
{
	export PARENT_ROOT=/mnt/repository

	# First we mount the remote directory to local
	logger "INFO: Mounting External Drive"
	/usr/sbin/runuser -l suramya -c "sshfs -o allow_other username@remotehost:. /mnt/offsite/"
	check_failure "Mounting External Drive"

	# Open the Encrypted Partition
	logger "INFO: Opening Encrypted Partition. Please provide password."
	/usr/sbin/cryptsetup luksOpen /mnt/offsite/container/Enc_vol1.img enc --key-file /root/keyfile1
	check_failure "Mounting Encrypted Partition Part 1"

	# Mount the device
	logger "INFO: Mounting the drive"
	mount /dev/mapper/enc /mnt/stash/
	check_failure "Mounting Encrypted Partition Part 2"

	logger "INFO: Starting System Backup"
	rsync -avz --delete  /mnt/data/Documents /mnt/stash/
	check_failure "Backing up Documents offsite"
	rsync -avz --delete /mnt/repository/Documents/Jani/ /mnt/stash/Jani_Documents/
	check_failure "Backing up Jani Documents offsite"
	..
	..
	..

	umount /mnt/stash/
	/usr/sbin/cryptsetup luksClose enc
	umount /mnt/offsite/

	logger "INFO: Offsite Backup Completed"
}

This is how I make sure my data is backed up. All of Jani’s data is also backed up to my system using robocopy as she is running Windows and then the data gets backed up by the scripts I explained above as usual. I also have scripts to backup my website/blog/databases but that’s done using a simple script. Let me know if you are interested and I will share them as well.

This is all for now. Let me know if you have any questions about the backup strategy or if you want to make fun of me. 🙂 This is all for now. Will write more later.

– Suramya

October 1, 2020

Windows XP and Server 2003 successfully compiled from leaked source code

Filed under: Computer Security,Computer Software — Suramya @ 9:39 AM

Last week in a major leak the source code for Windows XP & Windows Server 2003 was leaked on the Internet via the 4chan website. Post which it propagated like wildfire across the internet via torrents & mirrors. There were some doubts cast about the authenticity of the leak but knowledgeable folks who reviewed the code claimed that the leak looked authentic.

Now a developer who goes by the name NTDEV successfully compiled Windows XP from the leaked source code. Unfortunately it looks like the XP source code is missing some important files due to which they were unable to compile critical files such as Winlogon.exe. Which makes it impossible to install the compiled Windows XP to try it out. Fortunately they had better luck with the Windows Server 2003 source code and were able to install the compiled copy on a VM successfully.

NTDEV posted a 22 min video showcasing their journey and you can check it out here if you are interested. Their Twitter feed has more information and screenshots of their process & proof.

You can probably expect a lot more information & details on the source to be published over the next few weeks as people go over the code and then start publishing their findings.

– Suramya

September 30, 2020

How to fix vlc’s Core dumping issue while playing some videos

Over the past 2 days I found that the VLC install on my computer was suddenly having issues playing some of the video files on my computer. Initially I thought that it was a problem with the video file, then I realized that this was also happening with videos that had be playing fine earlier. When I ran vlc from the command line to play the problem video it gave the following output on screen when it crashed:

[00005587b42751b0] dummy interface: using the dummy interface module…
[00007f00c4004980] egl_x11 gl error: cannot select OpenGL API
[00007f00c4004980] gl gl: Initialized libplacebo v2.72.0 (API v72)
[00007f00c402a310] postproc filter error: Unsupported input chroma (VAOP)
[00007f00bd986e50] chain filter error: Too high level of recursion (3)
[00007f00c4028d40] main filter error: Failed to create video converter
[00007f00bd986e50] chain filter error: Too high level of recursion (3)
[00007f00c4028d40] main filter error: Failed to create video converter
[00007f00bd986e50] chain filter error: Too high level of recursion (3)
[00007f00c4028d40] main filter error: Failed to create video converter
[00007f00bd986e50] chain filter error: Too high level of recursion (3)


[00007f00c44265c0] chain filter error: Too high level of recursion (3)
[00007f00c4414240] main filter error: Failed to create video converter
[00007f00bd9020d0] main filter error: Failed to create video converter
[00007f00cc047d70] main video output error: Failed to create video converter
[00007f00cc047d70] main video output error: Failed to compensate for the format changes, removing all filters
[00007f00c4004980] gl gl: Initialized libplacebo v2.72.0 (API v72)

A google search told me that a possible solution was to disable hardware acceleration in the Video settings but that didn’t fix my problem. So I took a look at the kernel.log file in /var/log and I got the following error when the program crashed:

Sep 30 21:11:44 StarKnight kernel: [173399.132554] vlc[91472]: segfault at 28000000204 ip 00007f2d8916c1d8 sp 00007f2d8aa69db0 error 4 in libpostproc.so.55.7.100[7f2d8915c000+1d000]
Sep 30 21:11:44 StarKnight kernel: [173399.132568] Code: 98 48 8d 44 07 20 0f 18 08 8b 44 24 08 4d 8d 0c 1a 4d 8d 04 2b 85 c0 0f 85 cb fd ff ff 4c 8b 6c 24 28 4b 8d 04 29 4b 8d 14 20 <41> 0f 6f 01 43 0f 6f 0c 29 41 0f 7f 00 43 0f 7f 0c 20 43 0f 6f 04

Spent about an hour searching for the solution using the details from the kernel.log but got nowhere. Finally I found a forum post where one of the solutions offered was to remove the vlc configuration files, since I didn’t have any other bright idea’s I renamed the vlc config folder by issuing the following command:

mv ~/.config/vlc ~/.config/vlc_09302020

Then I started vlc and just like that everything started working again. 🙂 Not sure what caused the settings to get borked in the first place but the issue is fixed now so all is well.

– Suramya

September 29, 2020

Mounting a Network drive over ssh in Windows using WinFsp & SSHFS-Win

I have computers running both Windows & Linux and at times I need to share files between them and I have been looking for a convenient way to access the files from my Linux machine from my Windows machine without having to run SAMBA on the Linux. This is because historically SAMBA has been a security nightmare and I don’t want to run extra services on the computer if I can avoid it. Earlier this week I finally found a way to mount my Linux directories on Windows as a network mount over SSH using WinFsp & SSHFS-Win and I have been running it for a couple of days so far without any issues. (So far)

Follow these steps to enable SSHFS-Win on your windows machine:

Install WinFsp (Windows File System Proxy)

WinFsp is a set of software components for Windows computers that allows the creation of user mode file systems similar to FUSE (Filesystem in Userspace) in the Unix/Linux world. You can download it from the project’s GIT repository. The Installation file is available by clicking on the download link under ‘Releases’ near the top right corner of the page. The latest version is WinFsp 2020.1 at the time of this writing.

You install the software by running the MSI file you downloaded and the default options worked for me without modification.

Install SSHFS For Windows

SSHFS-Win is a minimal port of SSHFS to Windows. It is available for download from the project’s Git repository. You can compile from source or download the installation file by clicking on the download link under ‘Releases’ near the top right corner of the page. The latest version is SSHFS-Win 2020 at the time of this writing.

Please note that you will need to have WinFsp installed already before you can install SSHFS-Win successfully.

Usage:

Once you have installed both the software you can start using them and map a network drive to a directory using Windows Explorer or the net use command. Instructions for use are as below (Taken from the project Documentation):

In Windows Explorer select This PC > Map Network Drive and enter the desired drive letter and SSHFS path using the following UNC syntax:

\\sshfs\REMUSER@HOST[\PATH]

The first time you map a particular SSHFS path you will be prompted for the SSH username and password which can be saved using the Windows Credential Manager so that you don’t get prompted for it again. In order to unmap the drive, right-click on the drive icon in Windows Explorer and select Disconnect.


Visual demo of how to Map a Network drive using SSHFS-Win

You can map a network drive from the command line as well using the net use command:

net use X: \\sshfs\suramya@StarKnight

You will then be prompted for the password and once you authenticate you can use the new drive as usual. You can unmap the drive as follows:

net use X: /delete

I find this quite useful and hope you do as well.

Thanks to MakerLab, Department of Computer Science, HKU for pointing me in the correct direction

– Suramya

September 27, 2020

Using ncdu to Check Disk Space Usage In Linux

One of the common tasks I face on my Linux system is to identify what files/directories are using the most space. The traditional way to find out is to go to the top level directory and run a ‘du -hs *’ (without the quotes) on the directory and then cd into each directory, rinse and repeat. The other option available is to right click on the folder in Dolphin or any other file manager and select Properties. With the same process as before when you go into each directory individually, right click and get the properties. This is very tedious and time consuming.

Instead you can use ncdu (NCurses Disk Usage) for looking at the storage space utilization on your computer as it has a lot of advantages. It is designed to find space hogs on a remote server where you don’t have an entire graphical setup available. It is fast, simple and very easy to use. I have been using it for a while now and absolutely love it.

To Install ncdu on a Debian system, you can issue the following command:

apt-get install ncdu

Once you have it installed, the usage it very simple. Simply open a command prompt and issue the following command:

ncdu

It will start in the current directory and index all the sub-directories under it. The initial scan can take a while depending on the size of the directories under the current directory. But its comparable to the time taken when running du -hs on the directory. Once the program completes its scan, you get a simple ncurses based interface that you can navigate using the keyboard.


ncdu display for my home directory

All directories & are listed with their sizes in human readable format sorted by size with the largest files & directories at the top (in the default view). You can go into a directory by selecting it and hitting enter. The sizes for the subdirectory are immediately shown without having to run additional commands. You can also delete directories & files from within ncdu by hitting the delete key which is a huge timesaver.

If you haven’t tried it out do check it out. You will love it.

– Suramya

September 21, 2020

Diffblue’s Cover is an AI powered software that can write full Unit Tests for you

Filed under: Computer Related,Computer Software,Interesting Sites — Suramya @ 6:19 PM

Writing Unit Test cases for your software is one of the most boring parts of Software Development even though having accurate tests allows us to develop code faster & with more confidence. Having a full test suite allows a developer to ensure that the changes they have made didn’t break other parts of the project that were working fine earlier. This make Unit tests an essential part of CI/CD (Continuous Integration and Continuous Delivery) pipelines. It is therefore hard to do frequent releases without rigorous unit testing. For example SQLite database engine has 640 times as much testing code as code in the engine itself:

As of version 3.33.0 (2020-08-14), the SQLite library consists of approximately 143.4 KSLOC of C code. (KSLOC means thousands of “Source Lines Of Code” or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 640 times as much test code and test scripts – 91911.0 KSLOC.

Unfortunately, since the tests are boring and don’t give immediate tangible results they are the first casualties when a team is under a time crunch for delivery. This is where Diffblue’s Cover comes into play. Diffblue was spun out of the University of Oxford following their research into how to use AI to write tests automatically. Cover uses AI to write a complete Unit Test including logic that reflects the behavior of the program as compared to the other existing tools that generate Unit Tests based on Templates and depend on the user to provide the logic for the test.

Cover has now been released as a free Community Edition for people to see what the tool can do and try it out themselves. You can download the software from here, and the full datasheet on the software is available here.


Using Cover IntelliJ plug-in to write tests

The software is not foolproof as in it doesn’t identify bugs in the source code. It assumes that the code is working correctly when the tests are added in, so if there is incorrect logic in the code it won’t be able to help you. On the other hand if the original logic was correct then it will let you know if the changes made break any of the existing functionality.

Lodge acknowledged the problem, telling us: “The code might have bugs in it to begin with, and we can’t tell if the current logic that you have in the code is correct or not, because we don’t know what the intent is of the programmer, and there’s no good way today of being able to express intent in a way that a machine could understand.

“That is generally not the problem that most of our customers have. Most of our customers have very few unit tests, and what they typically do is have a set of tests that run functional end-to-end tests that run at the end of the process.”

Lodge’s argument is that if you start with a working application, then let Cover write tests, you have a code base that becomes amenable to high velocity delivery. “Our customers don’t have any unit tests at all, or they have maybe 5 to 10 per cent coverage. Their issue is not that they can’t test their software: they can. They can run end-to-end tests that run right before they cut a release. What they don’t have are unit tests that enable them to run a CI/CD pipeline and be able to ship software every day, so typically our customers are people who can ship software twice a year.”

The software is currently only compatible with Java & IntelliJ but work is ongoing to incorporate other coding languages & IDEs.

Thanks to Theregister.com for the link to the initial story.

– Suramya

September 9, 2020

Augmented Reality Geology

Filed under: Computer Software,Emerging Tech,Interesting Sites — Suramya @ 10:17 PM

A lot of times when you look at Augmented Reality (AR), it seems like a solution looking for problem. We still haven’t found the Killer App for AR like the VisiCalc spreadsheet was the killer app for the Apple II and Lotus 1-2-3 & Excel were for the IBM PC. There are various initiatives underway but no one has hit the jackpot yet. There are applications that allow a Doctor to see a reference text or diagram in a heads up display when they’re operating which is something that’s very useful but that’s a niche market. We need something broader in scope and there is a lot of effort focused on the educational field where they’re trying to see if they can use augmented reality in classrooms.

One of the Implementations that sounds very cool is by an app that I found recently where they are using it to project a view of rocks and minerals etc for geology students using AR. Traditionally students are taught by showing them actual physical samples of the minerals and 2D images of larger scale items like meteor craters or strata. The traditional way has its own problems of storage and portability but with AR you can look at a meteor crater in a 3D view, and the teacher can walk you through visually on how it looks and what geological stresses etc formed around it. The same is also possible for minerals and crystals along with other things.

There’s a new app, called GeoXplorer available on both Android and iOS that allows you to achieve this. The app was created by the Fossett Laboratory for Virtual Planetary Exploration to help students understand the complex, three-dimensional nature of geologic structures without having to travel all over the world. The app has a lot of models programmed into the system already with more on the way. Thanks to interest from other fields they are looking at including models of proteins, art, and archeology as well into the App.

“You want to represent that data, not in a projective way like you would do on a screen on a textbook, but actually in a three-dimensional way,” Pratt said. “So you can actually look around it [and] manipulate it exactly how you would do in real life. The thing with augmented reality that we found most attractive [compared to virtual reality] is that it provides a much more intuitive teacher-student setting. You’re not hidden behind avatars. You can use body-language cues [like] eye contact to direct people to where you want to go.”

Working with the Unity game engine, Pratt has since put together a flexible app called GeoXplorer (for iOS and Android) for displaying other models. There is already a large collection of crystalline structure models for different minerals, allowing you to see how all the atoms are arranged. There are also a number of different types of rocks, so you can see what those minerals look like in the macro world. Stepping up again in scale, there are entire rock outcrops, allowing for a genuine geology field-trip experience in your living room. Even bigger, there are terrain maps for landscapes on Earth, as well as on the Moon and Mars.

Its still a work in progress but I think it’s going to be something which is going to be really cool and might be quite a big thing coming soon into classrooms around the world. The one major constraint that I can see is right now, you have to use your phone as the AR gateway which makes it a bit cumbersome to use, something like a Microsoft HoloLens or other augmented reality goggles will make it really easy to use and make it more natural, but obviously the cost factor of these lenses is a big problem. Keeping that in mind it’s easy to understand why they went with the Phone as the AR gateway instead of a Hololens or something similar.

From Martian terrain samples collected by NASA’s Mars Reconnaissance Orbiter to Devil’s Tower in Wyoming to rare hand samples too delicate to handle, the team is constantly expanding the catalog of 3D models available through GeoXplorer and if you have a model you’d like to see added to the app please get in contact with the Fossett Lab at fossett.lab@wustl.edu.

– Suramya

August 30, 2020

How to write using inclusive language with the help of Microsoft Word

Filed under: Computer Software,Knowledgebase,My Thoughts,Techie Stuff — Suramya @ 11:59 PM

One of the key aspects of Inclusion is Inclusive language, and its very easy to use non-inclusive/gender specific language in our everyday writings. For example, when you meet a mixed gender group of people almost everyone will say something to the effect of ‘Hey Guys’. I was guilty of the same and it took a concentrated effort on my part to change my greeting to ‘Hey Folks’ and other similar changes. Its the same case with written communication and most people default to male gender focused writing. Recently I found out that Microsoft Office‘s correction tools, which most might associate with bad grammar or improper verb usage, secretly have options that help catch non-inclusive language, including gender and sexuality bias. So I wanted to share it with everyone.

Below are instructions on how to find & enable the settings:

  • Open MS Word
  • Click on File -> Options
  • Select ‘Proofing’ from the menu in the left corner and then scroll down on the right side to ‘Writing Style’ and click on the ‘Settings’ button.
  • Scroll down to the “Inclusiveness” section, select all of the checkboxes that you want Word to check for in your documents, and click the “OK” button. In some versions of Word you will need to scroll down to the ‘Inclusive Language’ section (its all the way near the bottom) and check the ‘Gender-Specific Language’ box instead.
  • Click Ok

It doesn’t sound like a big deal when you refer to someone by the wrong gender but trust me its a big deal. If you don’t believe me try addressing a group of men as ‘Hello Ladies’ and then wait for the reactions. If you can’t address a group of guys as ladies then you shouldn’t refer to a group of ladies as guys either. I think it is common courtesy and requires minimal effort over the long term (Initially things will feel a bit awkward but then you get used to it).

Well this is all for now. Will write more later.

– Suramya

August 29, 2020

You can be identified online based on your browsing history

Filed under: Computer Related,Computer Software,My Thoughts,Techie Stuff — Suramya @ 7:29 PM

Reliably Identifying people online is a bedrock of the million dollar advertising industry and as more and more users become privacy conscious browsers have been adding features to increase the user’s privacy and reduce the probability of them getting identified online. Users can be identified by Cookies, Super Cookies etc etc. Now there is a research paper (Replication: Why We Still Can’t Browse in Peace: On the Uniqueness and Reidentifiability of Web Browsing Histories) that claims to be able to identify users based on their browsing histories. It is built on top of previous research Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns and re-validates the findings of the previous paper and builds on top of it.

We examine the threat to individuals’ privacy based on the feasibility of reidentifying users through distinctive profiles of their browsing history visible to websites and third parties. This work replicates and

extends the 2012 paper Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns[48]. The original work demonstrated that browsing profiles are highly distinctive and stable.We reproduce those results and extend the original work to detail the privacy risk posed by the aggregation of browsing histories. Our dataset consists of two weeks of browsing data from ~52,000 Firefox users. Our work replicates the original paper’s core findings by identifying 48,919 distinct browsing profiles, of which 99% are unique. High uniqueness hold seven when histories are truncated to just 100 top sites. Wethen find that for users who visited 50 or more distinct do-mains in the two-week data collection period, ~50% can be reidentified using the top 10k sites. Reidentifiability rose to over 80% for users that browsed 150 or more distinct domains.Finally, we observe numerous third parties pervasive enough to gather web histories sufficient to leverage browsing history as an identifier.

Original paper

Olejnik, Castelluccia, and Janc [48] gathered data in a project aimed at educating users about privacy practices. For the analysis presented in [48] they used the CSS :vis-ited browser vulnerability [8] to determine whether various home pages were in a user’s browsing history. That is, they probed users’ browsers for 6,000 predefined “primary links” such as www.google.com and got a yes/no for whether that home page was in the user’s browsing history. A user may have visited that home page and then cleared their browsing history, in which case they would not register a hit. Additionally a user may have visited a subpage e.g. www.google.com/maps but not www.google.com in which case the probe for www.google.com would also not register a hit. The project website was open for an extended period of time and recorded profiles between January 2009 and May 2011 for 441,627 unique users, some of whom returned for multiple history tests, allowing the researchers to study the evolution of browser profiles as well. With this data, they examined the uniqueness of browsing histories.

This brings to mind a project that I saw a few years ago that would give you a list of websites from the top 1k websites that you had visited in the past using javascript and some script-fu. Unfortunately I can’t find the link to the site right now as I don’t remember the name and a generic search is returning random sites. If I find it I will post it here as it was quite interesting.

Well this is all for now. Will post more later.

– Suramya

August 27, 2020

Optimizing the making of peanut butter and banana sandwich using computer vision and machine learning

Filed under: Computer Related,Computer Software,Techie Stuff — Suramya @ 12:42 AM

The current Pandemic is forcing people to stay at home depriving them of activities that kept them occupied in the past so people are getting a bit stir-crazy & bored of staying at home. Its worse for developers/engineers as you never know what will come out from the depths of a bored programmer’s mind. Case in point is the effort spent by Ethan Rosenthal in writing Machine Learning/Computer Vision code to Optimizing the coverage of the banana slices on his peanut butter & Banana sandwich so that there is the same amount of banana in every mouthful. The whole exercise took him a few months to complete and he is quite proud of the results.

It’s really quite simple. You take a picture of your banana and bread, pass the image through a deep learning model to locate said items, do some nonlinear curve fitting to the banana, transform to polar coordinates and “slice” the banana along the fitted curve, turn those slices into elliptical polygons, and feed the polygons and bread “box” into a 2D nesting algorithm
[…]
If you were a machine learning model (or my wife), then you would tell me to just cut long rectangular strips along the long axis of the banana, but I’m not a sociopath. If life were simple, then the banana slices would be perfect circles of equal diameter, and we could coast along looking up optimal configurations on packomania. But alas, life is not simple. We’re in the middle of a global pandemic, and banana slices are elliptical with varying size.

The problem of fitting arbitrary polygons (sliced circular banana pieces) in a box (the bread piece) is NP-hard so the ideal solution is practically uncomputable and Rosenthal’s solution is a good approximation of the optimal solution in a reasonable time frame. The final solution is available as a command-line package called “nannernest” which takes a photo of the bread piece & banana as its argument and returns the an optimal slice-and-arrange pattern for the given combination.


Sample output created by nannernest

Check out the code & the full writeup on the project if you are interested. Even though the application is silly it’s a good writeup on using Machine Learning & Computer Vision for a project.

Source: Boing Boing

– Suramya

Older Posts »

Powered by WordPress