Google can be a hacker’s best friend since they have indexed over 30 trillion webpages, many of which have sensitive data that was never meant to be published to the public. It is important as a company to make sure you do not fall victim to this and perform reconnaissance on your company’s websites.
There are many tools that help make this process easy such as theHarvester which is included in the Kali Linux OS. This tool can search Google, Bing, Linkedin, and Twitter to find indexed email addresses as well as subdomains of websites. With email addresses an attacker can craft a phishing email attack and send it to the email addresses found, the possibility of success increases with the more email addresses they send to since all they need is one person to click the link.
To access the options of the harvester you just open a terminal window and type: theharvestrer
Here are the options:
The basic syntax that would work for most searches is:
theharvester d- yourwebsite.com –l 2000 –b google
The one option you might want to change is the –l 2000 since this limits your searches to 2000 results. A good way to determine what you should limit this is to go directly to google and type site:yourwebsite.com and look at how many pages this returns and use that for your limit.
If you are writing a reconnaissance report you would probably want to export the results to a text file to reference, you can do this easily be typing your command followed by:
| tee yourfilename.txt
Problems with theHarvester:
These search engines will eventually block you from searching thousands of results because they do not want you scraping all of their data without permission. These blocks can go away within hours to a day and can be circumvented immediatly by using a VPN. The more elegant solution is to get an API key for googleCSE so you are not blocked for bot-like behavior.
Here is a good video for how to get a Google CSE API key: https://www.youtube.com/watch?v=Bxy8Yqp5XX0
Once you get the key you have to add it to: discovery/GoogleCSE.py
When this API key is added you just have to use –b googleCSE