The modern search engines scan most of the public sites on a regular basis and unlike the legacy search engines also have the capability of finding and indexing data or files that are not linked to from any other sources. This allows the search engine to index data/files that could have sensitive data or details on vulnerabilities. Using publicly available information attackers can perform searches for such information without touching the target system directly leaving little trace for the defenders to watch for to be alerted. Most organizations are not aware of the information being leaked by such means and how it is compromising their cyber security. The availability of the Google Hacking Database allows even minimally skilled attackers to search for information quickly and efficiently.
This poses a high risk to the organizations leaking sensitive data. There are no sure shot solutions to this problem and even the most careful organizations will expose data that when combined with other sources allow attackers a look at the organizations digital assets and systems.
The popular image of a hacker involves an attacker sitting in a dark room typing commands in a terminal to gain access and usually is completed in a very short period of time. In real life attackers spend a lot of time performing reconnaissance on the target before even engaging with the target system. One of the popular ways of performing reconnaissance is to use search engines like Google to find data, this technique is called Google Hacking and was introduced to public in 2004 by Johnny Long. He defined it as “the art of creating complex search engine queries in order to filter through large amounts of search results for information related to computer security” (Johnny, 2004). Attackers use Google Hacking to uncover sensitive information about a company or uncover potential security vulnerabilities.
The modern search engines scan most of the public sites on a regular basis and unlike the legacy search engines also have the capability of finding and indexing data or files that are not linked to from any other sources. This allows the search engine to index data/files that could have sensitive data or details on vulnerabilities.
The Google Hacking Database (GHDB) is a consolidated database of queries that have been collected over the years thanks to contributions by researchers, hackers and general public that can be used to find sensitive data on websites such as files containing passwords, configurations, sensitive data, financial information, error messages, firewall logs and other such data. (Google Hacking Database, 2021) The database is in an easy to consume format and allows users to search for queries that will return specific types of data.
This database gives attackers the queries to be used to specific types of data, leveraging the indexing powers of Google for finding information that should not have been exposed to the public.
How Google Hacking Works
Google allows a user to search for information using search keywords and a combination of search operators to limit the search results. With the information available in the Google Hacking Database an attacker can search for specific information and limit the search to a given target domain. There are multiple kinds of queries available that target specific kinds of information. Some of the categories of information available using this are:
- Advisories and vulnerabilities: Queries that allow us to locate vulnerable servers based on product or version-specific setups with known vulnerabilities..
- Sensitive directories: Allow us to find directories with files that contain sensitive information
- Files containing passwords: Locate files containing passwords.
- Pages containing login portals: Locate login pages for various services
- Error messages: Find files with errors messages that may contain details about the system.
Below are examples of the various queries that are available and the kind of data they expose.
Searching for passwords stored in files
Users sometimes store passwords in plain text files or excel databases that are accidentally uploaded to a public site. These are then indexed by Google (or other search engines) and can be found using specific queries. For example:
allintext:"*.@gmail.com" OR "password" OR "username" filetype:xlsx
searches for all Excel files that have gmail.com in the text along with “password”. This will find all files containing any of the search terms provided. If required we can limit the search to a specific site using the “site:” search parameter.
Search for Log files
Log files contain a lot of sensitive information if exposed to public. Error logs, access logs can expose information such as PHP version you are running, CMS version details, Operating system details etc. If firewall logs or system logs are exposed it can reveal information such as usernames, firewall version and configuration details etc. Similarly SQL logs can expose sensitive data as well. This information combined with other information can give an attacker a foothold in the system. For example:
allintext:username filetype:log
This query will give results that include the text username inside all *.log files and the following query will return all directories where logfiles are publicly accessible:
intitle:"index of" errors.log
SSH private keys
SSH private keys are used to encrypt/decrypt data exchanged during SSH connections. They also allow users to authenticate to servers without the use of passwords. If they are exposed anyone can impersonate that user and if passwordless login’s are enabled the key will allow the attacker to login to the server without a password. The following query will return all directories with publicly accessible private key:
intitle:index.of id_rsa -id_rsa.pub
Login Portals
A lot of times organizations expose their development or staging systems to the internet for testing and depend on the obscurity of the system for protection. These systems are vulnerable because development systems often don’t have the same protections and controls applied on them as production systems do. In addition, there are often systems that were not meant to be pubic such as router login pages, CMS admin sections etc that increase the attack surface of the organization. A sample query to find login pages for CISCO email security appliance is listed below:
intitle:"Cisco Email Security Virtual Appliance" inurl:csrfkey=
SQL dumps
Sometimes sites require SQL datadumps to be made for backup or restoration purposes and these dumps often have a lot of sensitive data in them. Using a search query similar to the one listed below attackers can find these dumps and explore the data:
ext:sql | ext:txt intext:"-- phpMyAdmin SQL Dump --" + intext:"admin"
There are many more queries that are available in the database to search for specific data and more are added everyday.
Famous attacks that used Google Hacking/Google Dorks
Attacks using Google Hacking/Google Dorks are difficult to identify due to the passive nature of the attacks. However, even with that restriction there have been a few cases of note where the attacker’s used this technique to attack an organization’s system and some of them are listed below.
N.Y. Dam attack from Iran, 2013
Between 2011 and 2013, Hamid Firoozi from Iran gained access to the Bowman Avenue Dam in Rye Brook, New York by finding an unprotected computer that controlled the dam’s sluice gates using Google Searches. (Matthews, 2016). The issue is rampant enough that the Department of Homeland Security and FBI jointly released a warning about Google dorking. “By searching for specific file types and keywords, malicious cyber actors can locate information such as usernames and passwords, e-mail lists, sensitive documents, bank account details, and website vulnerabilities,” (FBI, 2014)
Detection of Google Hacking Attacks
Detection of these attacks is difficult due to the passive nature of the attack. However, one of the technique that is quite successful is to use a Honey Pot approach. Organizations can store files with fake information that looks authentic and important such as username and password combinations or SSH private keys that belong to non-existent accounts. Because these accounts do not exist no one should be attempting to log in to them for legitimate purposes so when a login attempt is made to these accounts or when the files are accessed we know that a Google Hacking attack is in progress and the IP address etc can be flagged for followup or blocking. We can also lure the system into a fake network which is monitored to identify what information they are looking for in the network.
Using that information, we can take further preventive measures to protect the system.
Prevention Techniques for Google Hacking attacks
There are a few steps that we can take to avoid leaking sensitive data to attackers using Google Dorks as listed below:
- Protect sensitive data with authentication for private information
- Don’t expose development systems to internet, if that is not possible restrict access using IP based restriction.
- Run regular vulnerability scans on your website/domain. A lot of the scanners now incorporate checks for popular Google Dork queries
- Run manual dork queries against your site to locate leaks before attackers do
- Add checks to your servers to find sensitive files in public directories such as any file with an extension other than a php/asp/html. These can we potential leaks
- If you find sensitive content exposed, you can request its removal by using the Google Search Console.
Conclusion
Google Hacking allows an attacker to perform reconnaissance against your organization in a passive way allowing them to collect information that can then be combined with other sources to give them a foot hold. Preventing such information leaks is a good way to protect the organizational systems and the techniques listed above can help with that. We can also subscribe to services that perform these checks on your behalf.
We covered some of the techniques available to detect and prevent Google Hacking attacks in the paper and while the techniques discussed will not protect against all attacks, they will reduce the attack surface and protect you against most attackers.
Note: This was originally written as a paper for one of my classes at EC-Council University in Q2 2021, which is why the tone is a lot more formal than my regular posts.