Redes Sociales

viernes, 4 de enero de 2019

Scraping tor web services

In the world of cybersecurity sometimes it's useful to navigate through the deep web looking for infamous malware you would never find in the clear web. Sometimes you find interesting repositories of malware you want to download completely. As we are talking about malware, it's important to keep as much away as possible from the samples. I would never use Windows systems to access them and even on Linux I prefer to work from command line. However if you want to scrape a whole TOR webservice from command line it's not always easy.

Looking in internet for any way to do this (scrape a tor web service) take you to programming forums where they recommend using haskell, curl, python, php... That was not my point. I wanted a linux tool to clone the web service to locally. httrack is fine for that, but I always used it from a Windows UI. In this case I was connecting to my server via SSH. On the other hand, it's a onion service so, somehow I needed it to understand .onion addresses.

All this could be resolved very easily with:

# apt install tor

This installs a bundle of tools like torify or torsocks that allow the system to understand onion addresses.

# apt install httrack

...httrack for Linux can be launched from command line...

# torsocks httrack http://iec56w4ibovnb4w.onion

This downloaded the whole onion tor service (as it was a very simple unauthenticated service) to my local host.

Hope it helps!.