IT infrastructure reconnaissance – part 1 (Google hacking)

The basis of web application or infrastructure security tests is a reconnaissance, i.e. the collection of all subdomains, IP addresses, and other publicly available information. It is a good practice to use several tools simultaneously during the reconnaissance, which of course, will greatly increase the effectiveness of this testing phase – information omitted by one tool can be found by others (differences in the search algorithm or the target application of the tool). This is our motivation to publish a series of several articles on the subject of passive (but not only) reconnaissance. This cycle starts with the issue of using search engines as one of the tools useful in penetration tests.

The information presented below is for educational purposes only. Access to data located as a result of the reconnaissance may be subject to a breach of the law. Make sure you are working legally before use.

Google

The most basic source of information during the initial reconnaissance may be search engines. They are usually a very good starting point for further testing. Paradoxically you can find a lot of interesting information there, in critical cases including even sensitive files (for example, with invoices, personal data, passwords, etc.). The great advantage of this method of passive reconnaissance is the fact that we do not leave any trace of our presence on the server of the application we are testing, as opposed to the active reconnaissance method, which even bombards the servers with DNS queries or the search for paths by vulnerability scanners. Since manual search of all search results is cumbersome and the desired information will often be displayed on further search pages, search engine filters are a very important tool.

Note: Because different search engines use independent algorithms, it is important not to limit yourself to one but to check the same queries for several of them. The most popular are Google, Bing, and Yahoo.

To make the article easier, I will use the most popular search engine www.google.pl. The list of the most popular filters can be found here.

Instead of describing all filters separately, I will focus on a few interesting and most useful examples.

Before you can start hacking with search engines, you need to understand how the search algorithm works and how to deal with unwanted results. The performance of filters in browsers can be compared to sorting objects in online shops. Instead of piercing through the whole pages of the results, we can draw out those that are of interest to us.

Leaving aside the security industry for a moment, let’s assume that we want to search for a selected website and basic information about it – how to exclude subdomains, similar addresses and entire result pages.

In this case, just use the info filter, exactly as shown for www.facebook.com. (Picture nr 1)

Picture nr 1

Filters can simply speed up searches and often specify the phrase to be searched. Remaining with social media, a small task – how do you find social media sites similar to Facebook? You can enter “Facebook-like pages” in Google, but first of all, what we learn depends on the pages that have published such a list, and secondly, you can do it more professionally. In such a task it is enough to use the related filter (exactly as in the info in the example above), which will show pages of a similar nature to that of the query. Returning to our task, the results of such a query are shown in Picture nr 2.

Picture nr 2

Since simple examples are already behind us, now another important and interesting thing is the logical operators with which Google copes perfectly.

The most important operators:

  • “ “ – Google searches for exactly the phrase that we have given, e.g. “hacking sekurak”.
  • OR/AND – logical operators, that probably everyone knows. Interesting fact, the “|” sign, i.e., pipe is often used as a replacement for the OR operator.
  • () – used similarly to mathematics, they help to use operators and filters for more phrases in a query.
  •  – filters out all words given after the dash (an interesting case in Picture nr 3).
  • * – used as a wildcard, e.g., “securitum pentesting *”.
Picture nr 3

The above example is interesting because if used incorrectly, it can “cut out” a part of the phrase you are looking for.

These are just simple examples needed on the way to Google hacking. However, I recommend that you familiarize yourself with this topic, even if you are not closely related to the security industry. Not infrequently, basic filters make life easier with classic Google searches.

The good news for the lazy is that thanks to the courtesy of Google, a special GUI has been made available for precise search: https://www.google.com/advanced_search.

As I wrote above, it is not worth it to limit yourself to one search engine, in which case a list of filters should be attached to a set of valuable links, e.g., for Bing (http://help.bing.microsoft.com/#apex/18/en-us/10001/-1).

Most filters are repeated with those that can be used in Google search. A filter not used by other browsers is worth your attention: ‘IP’, which allows web pages to be searched by their IP address.

Google hacking

Finally, we come to the heart of this article, or how to hack with the help of Google, and is it possible at all? Of course it’s possible! This procedure is called Google Hacking/Google Dorking and is usually an integral part of security tests.

For all the users that look for mystically looking programs that automatically obtain information – those exist too, and will be described soon. Later on this series we will use special tools dedicated to individual tasks connected to reconnaissance.

Why does the reconnaissance begin with search engines? Often the simplest solutions are the best, and the biggest problem in the field of safety is the human factor. Expensive programs, complicated scripts, and similar pentester aids may not find anything, while the nail to the coffin waits on the Internet and you can find it with one query. Incorrect configurations, file listing, database backups, passwords and logins, or invoices are just some of the topics that can be easily accessed by a person who knows how to find them.

Below you will find examples of filter diagrams:

The three filters that are most commonly used for Google hacking are:

  • intitle – searching for specific phrases in the description of the title of the page (Picture nr 4)
  • inurl – searching for specific phrases in the URL (Picture nr 4)
  • filetype – searching for selected file types for a given phrase
Picture nr 4

Intitle

How do you start searching? It is necessary to define what we are interested in as safety testers in the initial phase of the reconnaissance.

Surprisingly often in the titles of pages, there are names of used technologies, versions, ports, that is, in fact, all the key information desired at this point in time.

Without using any additional software, you can look for a classic in the security industry, i.e., outdated software. A sample query with the help of which we can find pages using the latest version of Apache, running on Ubuntu, is shown in Picture nr 5.

Picture nr 5

Why is this filter so important? We can use it to obtain results that could not be found in the analysis of the URL of a given page. It is most often used with inurl filter, where inurl defines our goal and intitle specifies exactly what we are looking for. I will give you some examples of this below.

Inurl

Page titles are an important source of information, but we often have to separate the wheat from the chaff. This way of searching many times generates so-called false positives.

The information searched for through the intitle filter lists the data available at the address in question. In order to find such data precisely, the use of inurl may be very helpful.

The easiest way to use the inurl option is simply to type in the phrase you would like to see in the URL. Additionally, apart from quotation marks, we can add separated by the “|” character (so-called pipe) keywords around which topics we want to search.

It is worth mentioning that when it comes to Google Hacking, knowledge and familiarity with programming is often useful. It is about the fact that, using this method of reconnaissance, you can accurately search for interesting paths, places on the website, or addresses without using the brute force method. However, it is necessary to know what phrases could “trigger” interesting information. In the example below you will find login fields for Bitcoin wallets. But why did we choose the keyword user_login? It’s just a “shot” based on experience. Sometimes names, URLs, or directories have to be searched “blindly”.

For example, for the syntax: inurl: “user_login/” bitcoin | crypto | wallet we get with high probability login links to Bitcoin wallets, like the example in Picture nr 6.

Picture nr 6

In the subject of reconnaissance and various search engines (especially those searching for devices connected to the Internet, such as Shodan), the flagship example of their use is to show addresses that are not password-protected cameras with access to the Internet. Google’s search engine may not be as rich in results as the dedicated tool, but you only need to use the filter: inurl: “ViewerFrame?Mode” to find out about its capabilities. The links searched by Google in this way will lead us to the cameras in live mode, as in Picture nr 7.

Picture nr 7

After such a procedure, just like in the dedicated search engines, we can find cameras with full, 24-hour access to the Internet. Sometimes it is possible to control them – to move, enlarge the image, and so on.

Since I have already mentioned Shodan, I would like to answer a question, which will probably be mentioned in the comments: Yes, it is possible to search for open ports using Google! Sometimes it happens that the URL looks like this:

This means that the website runs on one port specified by the administrator (in this case it is port 8080). Finding the URLs constructed in this way using Google Hacking will not be difficult in any way.

The only thing we have to take care of here is to cut out from the texts of the searched pages a preset port number (in this case 8080 – without this option the list of undesirable results would be long, 8080 used in nicknames, own names, or description of objects). Of course, port 8080 is just an example. This approach allows you to search for any port.

For inurl:8080 -intext:8080 we will see the results as in Picture nr 8.

Picture nr 8

Filetype

This option is most often used when searching for sensitive documents, configurations, or logs that for an unknown reason are available to the public.

In the example below (Picture nr 9), Google will find all xls files containing the phrase twitter.com. Sometimes you can find tables with financial accounts, employee data, and other confidential information in this way.

Picture nr 9

Of course, other formats, mandatory .doc and .pdf, sometimes it’s worth going through hundreds of photos in .jpg format, because the Internet in such formats sometimes hides confidential scans, which by default should never be on the Internet.

Practical examples from a Pentester’s life      

At the very end I left some information about how Google hacking is used in real security tests. Start your adventure with such a way of searching for valuable data from https://www.exploit-db.com/google-hacking-database/, a database of many useful phrases with descriptions. Here are some practical examples:

Searching for SQL database backups for WordPress

Documents in .doc format with default logins and passwords

Searching for SQL syntax errors

Summary

These are just a few examples from the possibility well. The truth is that during the search we are limited only by our imagination :). It is impossible to tell about all the interesting applications of Google hacking because what we want to get determines our approach. However, I hope that I have shown what this technique is and why it has been appreciated by professionals in the security industry.