In the previous parts of this series, I talked about various types of web engines useful in reconnaissance phase, and curiosities, which can be found with their help.
During real tests, it can be different; sometimes, this phase can create critical errors. Often, however, it is possible to obtain only residual information about the target being investigated. Regardless of these results, the next step that Pentester takes is the proper phase of reconnaissance, i.e. recognition of the client’s infrastructure. The aim is to determine how the whole network can look like, what solutions should be expected and what is the scope of tests. I mean a situation when a tester receives specific IP addresses for testing and then searches for the corresponding services, pages or login screens. On this basis, the tester can determine the primary attack vectors.
The methodology of real tests usually looks as follows: the tester collects a range of IP addresses and domains, and based on this knowledge, we create a list of subdomains and then subsequent paths. How to quickly and effectively search for such information? Two reconnaissance methods can be used here:
- passive – searching for information available on the internet without interfering or leaving a trace of presence on the client side,
- active – using tools and methods to collect information needed to interfere with systems on the client side.
Leaving aside for a moment the method we want to use, we have to think about where such information can be obtained at all. What can the victims’ IP addresses and domain names have in common? DNS servers contain what we need in this phase. Roads are different, but only one goal – the longest possible list of domains.
To face up people who are not professionally involved in this issue, I will, first of all, raise one more important issue, which will explain what the actual reconnaissance is. Let’s assume that we have a server and the main domain of google.com. To find out the range of addresses for testing and the size of the infrastructure, we search for further subdomains: translate.google.com, cloud.google.com, ads.google.com and so on. In this phase, we couldn’t find any interesting data that could be a kind of “shortcut” in the testing phase. Does this mean the end of the reconnaissance and the start of the classic tests? Not necessarily, because we learn that the owner of example.com not only has a domain in German and Polish but also owns the domains, gmail.com and youtube.com, which may be located on the same server. However, it turns out that the last two domains mentioned above have not been developed for years, and the errors that occur there offer great opportunities for easy escalation of further tests.
The search for domains can be distinguished in two planes (Picture nr 1).
It may also happen that the two language versions of the application differ not only in terms of the content of the site but also in terms of functionality, so, for example, on a German-language website, the tester will be able to use a field that does not exist in the Polish language version for an attack.
As you can see, the task of finding the largest number of domains and subdomains is a bit complicated, and it is not possible to find all the results belonging to both axes with one simple method.
The methods by which we can effectively conduct such a search will run through this article. For a logical sequence, I will start with a passive reconnaissance, which can be used for both the x and y-axis (Picture nr 1).
VirusTotal
The portal, https://www.virustotal.com/, is probably known to the majority of readers, but it is impossible not to mention it in the subject of reconnaissance. Its most important and basic functionality is the use of more than 60 antivirus programs to check the security of the indicated URL or file that can be uploaded from the computer. This option is very useful when you want to download a file from an untrusted source or make sure that the file you have on your computer is safe before you open it.
An example of how VirusTotal works for a potentially “dangerous program” can be seen here.
However, apart from the analysis of file security, VirusTotal offers very interesting possibilities related to the reconnaissance of a selected domain. In the URL tab, just type in the domain you are interested in, for example, ‘google.com’. VirusTotal provides a lot of data, which makes it easier to collect information about our target.
We must remember that Google has a gigantic infrastructure, and queries for google.com domain will generate a huge list of search results. This is only an illustrative example, which can be used for the purposes of the article. During real tests, the number of results is so small that a little practice allows the tester to find himself in the maze of discovered information and create a working model of the infrastructure or define potential targets.
The first thing that hits the eye is the IP addresses of the searched URL (Picture nr 2).
IP addresses are already some sort of track that can be used by Pentester. In this case, we note that 216.58.217.0/24 subnetwork is likely to be our target, often resulting in a “search” for further targets across the entire subnetwork.
Then we get a Whois type data set, which is a solid handful of information related to the registration of a given website:
- date of registration of the domain
- the expiry date of the domain
- owner’s data
- email address used during domain registration
- recorder data
- names of servers, on which a given page is hosted
This type of information can be very useful during social engineering tests or red teaming, but it is also used for infrastructure tests.
An easy way to search through domains along the X-axis in Picture nr 1 is to use the data or email address of registrant to find other domains registered with the same data using reverse Whois tools (https://viewdns.info/reversewhois/). Then (depending on the range of tests) you can try to gain control of the server using the vulnerabilities on the pages found by the method described above.
VirusTotal is a great source for obtaining all subdomains for the domain we are interested in. Below are examples of subdomains that VirusTotal listed for the main domain, google.com.
Each record listed by VirusTotal can be “expanded” by clicking on it. This will redirect you to the page where related pages will be shown for the given address (Picture nr 5).
The same situation applies to each of the searched subdomains, for which we can see the related IP addresses (Picture nr 6).
As we can see, everything is complicated. It only gives a good idea that without outlining the structure of the infrastructure and identifying what can be expected in it, you cannot count on reliable tests.
Such information collected together in the early stages of the reconnaissance can be very useful and provide the ground for further exploration.
I recommend analyzing VirusTotal action on my own, for example, for Google domain: https://www.virustotal.com/#/domain/google.com.
RiskIQ
RiskIQ is a website that offers a range of security products. Among the free products worthy of attention is the RiskIQ Community Edition, which gives a lot of opportunities for reconnaissance.
This is a great alternative to VirusTotal, surprisingly often, I manage to find results that omit other search engines of this type. Also, the tool collects the functions of several other tools described in this series in one place.
Initially, the complexity of search results may be frightening, but it’s really worth taking a moment to get interested in this tool. This is an advantageous position in the arsenal of the Pentester.
An exemplary search result (or rather, the first few subdomains from the 45,000 found!) is given below for us to see:
Not all of these domains need to be accessible or exist at all, but it is already a question of probability. The chance of finding a valuable domain is greater for 45 thousand results than, for example, for a thousand.
I encourage every reader to check the possibilities, filters and other bookmarks offered by RiskIQ and to make it one of the primary tools used in reconnaissance.
CRT. SH
Let’s assume that we can collect available subdomains using ready-made databases (such as VirusTotal). You may wonder if there is any other way to enlarge your list of subdomains, ideally with those that allow you to attack the entire infrastructure.
Not just to answer to a tray, let’s think about websites from an entirely different perspective. Let’s take a look at an example of the address we see in the browser:
The use of SSL/TLS encryption by websites increases the level of security of users, and the users themselves (even those who are utterly unfamiliar with security issues) pay attention to the green padlock next to the URL, and not long ago, websites that do not use Chrome encryption are marked as “unprotected” . This partially encourages and forces website owners or administrators to use SSL/TLS encryption. For such implementation to be possible, it is currently required to add a certificate to the Certificate Transparency database. Briefly, CT operates on the principle of log servers to which each of the certification authorities can send information about issued certificates. This means that for each site (including subdomains) using SSL/TLS certificates, an entry should be made in a public database, which also contains information about expired certificates. Coming back to the subject of searching for as many subdomains as possible, certificates will not always be the best source of information. The site administrator can provide the site with a certificate with the so-called wildcard, which means that he registers the address *.example.com in the certification authority, such a record is located in CT, and all subdomains “contain” in it and do not appear in the database. However, it turns out that ordering certificates only for specific subdomains (and thus revealing their existence in CT) is a very popular practice. Coming back to the subject of searching for as many subdomains as possible, why not use it as one of the reconnaissance techniques?
The most popular tool that can be used for this purpose is the search engine, www.crt.sh. It allows you to search for registered certificates for a given domain. A very helpful and most used option during security tests is wildcard, which gives the possibility to search for subdomains for the main domain.
The simplest example of use can be %.facebook.com, this command will simply list all subdomains of the facebook.com service, whose certificates are or were recorded in the CT database.
A piece of the crt.sh search list for such a query is shown below:
Apart from a simple list of subdomains, which may not seem interesting until the records are verified, you can immediately look for exciting elements of the infrastructure. For example, using the vpn%.domain.com command may allow you to search for potential subdomains used to support the VPN service (Picture nr 10).
Maybe, at first glance, this seemingly inconspicuous search engine is really worth your attention. The method of extracting additional subdomains of a given infrastructure using the Certificate Transparency database should always be included in the range of mandatory security testers’ tools.
DNS trails
When discussing the subject of reconnaissance, it is impossible not to mention searching for valuable information in the history of DNS. What is this?
On the Internet, there are various services available that provide full DNS history so that we can view the history of changes for a given address. Sometimes, information that is no longer in the search engines and cannot be found in any other way remains there.
Let’s assume that there is a test server on which a new version of the website is being prepared. The new website will be implemented, and the old one remains on the test server. Such a server may not even have a domain name but only an IP address, but due to the inattention of programmers, it may still be accessible from the Internet. With access to a test server, developer version of the application, we can access information that may later be critical for testing the implemented version.
In this case, it may be very useful to use: www.securitytrails.com. As the name itself (and the first tab that opens after searching for a record) indicates, the basic functionality of the portal is to display DNS information about the domain.
Here we have information about individual DNS records:
It is worth spending a few minutes on it. Clicking on the appropriate records often gives the tester a better view on how the whole tested topology will look like.
Of course, since the service provides data related to the DNS theme, we have a tab here that allows you to list all subdomains associated with the searched service.
In my opinion, using this service is a good start for reconnaissance. Although the individual information that can be found here may not be as detailed as on other services, SecurityTrails is a good starting point for the next phases of the reconnaissance, and a great place to recognise how the respective DNS records look like.
DNSdumpster
DNSdumpster is another service that deals with passive reconnaissance, but it focuses mainly on information strictly related to DNS servers. With DNS dumpster, we can get data such as:
- DNS servers
- MX records
- TXT records
- Host records (A)
A very interesting service of this search engine is the graph generation for the searched phrase, on which IP addresses, DNS servers, information about records and domains are linked together, which gives a great view on how more or less such a topology may look like.
Reconnaissance is a complex process, in which the aim is to collect as much information as possible, and it cannot be assumed that the reconnaissance focuses only on searching for all possible addresses or domains.
Interesting information, which often helps to plan the attack vector, is to get to know what technologies have been used in the whole tested infrastructure. Initially, the testers try to obtain as much information as possible only from generally available external sources.
BuiltWith
This part of the reconnaissance allows you to determine what technologies the tester will work with, what technological solutions he can expect and to slowly form the vector of attack on a given infrastructure. Besides, a frequent task of the tester during tests (even when testing web applications) is to check whether it is possible to obtain information about the software used by the customer or its version. BuiltWith can be a good starting point for such tasks.
A reconnaissance is a rather complex task, which (as I mentioned in the introduction) aims to collect as much information as possible about the purpose of the tests. Let’s assume that the subject of subdomains, IP addresses and DNS servers (and related information) is closed, what else can you look for? Something that is very useful during testing, i.e. the technology and software used in the testing infrastructure. This is information that can be obtained from the tools listed above, for example:
Securitytrails
RiskIQ
A good place prepared especially for such purposes is www.builtwith.com whose task is to match as many technologies as possible to a sophisticated website.
Examples of the information offered by BuiltWith:
- Analytics and Tracking (Presence of Facebook pixel, AdWords, user activity)
- Widgets (WordPress add-ons, plugins)
- Frameworks (used frameworks)
- Content Management Systems
- JavaScript Libraries
- SSL Certificate
- Web Server
- Document Information (e.g. X-XSS-Protection or X-Frame-Options or HSTS headers)
Examples of interesting information that may be useful during further tests are presented on the picture below:
For readers interested in BuiltWith and how search results look like, the result for google.com: https://builtwith.com/?https%3a%2f%2fwww.google.com%2f.
WebArchive
In the context of searching for information, all the more so those that have been forgotten and can be very useful, it is impossible not to mention Web Archive (https://web.archive.org).
This is a website which, since 1996, using specialised crawlers, ‘saves’ information appearing on the Internet. At the moment of writing the article, it has more than 300 billion pages, 16 million text files, 3 million photos and many other files!
Entering the selected URL at the very beginning, we have a preview of when and how many times the subsequent versions of a given page have been saved by the crawler, as in Picture nr 16.
In this case, it is more interesting because we can see how the page looked like during any of the entries. For example, this is how Google presented itself in 1999 (Picture nr 17):
Now it’s time to take care of how this powerful tool can be used during reconnaissance. Using Web Archive, you can search for valuable files that administrators have forgotten, such as old documentation or information about old API methods. Then, during the tests, it is worth checking whether these methods were removed from the code, or only someone removed them from the documentation.
In search of such information, it may be useful to go to hyperlink: “Summary of”; our eyes will see segregated details, which have been saved within a given page, as in Picture nr 18.
The option available at the bottom of the ‘Explore URLs’ list allows you to view the addresses of all saved resources. There is also a search engine available, which helps to segregate relevant files and search for sources of interest to us (Picture nr 19).
To prove that Web Archive is really useful, I will now show a scenario in which it can be used for reconnaissance (reconnaissance on the verge of tests).
It often happens that the selected page changes the front-end, for example, the graphic design, but from the back-end page, nothing is modified for a long time. Following this trail, it is worth analysing all the snapshots in web.archive and look at their code. It may turn out that the administrator, at some point, installed or enabled a specific plug-in (which will be visible in the code), but there is currently no information about it in the code. For the Pentester, this may be valuable information. It can be assumed that the administrator did not remove the plug-in, but turned it off. In this case, if there is a vulnerability for the plug-in that does not require the plug-in to be turned on, it can be used as an attack vector. For example, Pentester can directly refer to the directory of the plug-in and try to find a vulnerability or obtain valuable information for further testing.
Ending
There are many methods of conducting reconnaissance. There is no single proven recipe for it. Each tester, based on his or her experience, creates a personal database of tools, which he or she then uses for testing.
It seems to me, however, that it is crucial to understand how the whole reconnaissance process works and what type of information can be found for a suitable testing technique. With such knowledge, you can choose tools, depending on your preferences, or even write your own scripts.
In the next section, I’ll introduce the scripts and programs that make it easier to use the tools described in this series and those that enable active reconnaissance.