Naming conventions
To creep the files locally, we must follow some naming conventions rules. These conventions are in place for SourceWolf to directly identify the hostname, and thereby parse all the endpoints, including the relative ones.
Consider an URL https://w3wiki.org/api/
- Remove the https (protocol) and the trailing slash (//) (if any) from the URL –> w3wiki.org/api
- Replace ‘/’ with ‘@’ –> w3wiki@api
- Save the response as a text file with the file name obtained above.
So the file finally looks like w3wiki@api.txt
Note: Make Sure You have Python Installed on your System, as this is a python-based tool. Click to check the Installation process: Python Installation Steps on Linux
SourceWolf – A CLI Web Crawler Tool in Linux
Web crawling is the process of indexing data on web pages by using a program or automated script and these automated scripts or programs are known by multiple names, that includes web crawler, spider, spider bot, and often shortened to the crawler. Manual crawling consumes a lot of time if the scope of the target is more. SourceWolf is an automated script developed in the Python Language that crawls the directories from the domain server and the status code. This can help the tester to test the pages whose responses are 200 or 301 quickly. SourceWolf is an open-source and free-to-use tool. SourceWolf tool supports custom word lists for brute-forcing. The output feature of SourceWolf is excellent as the output is stored in the leading directory, and the main directory contains sub-directories with separates status code directories.