Getting files from external urls

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
Post Reply
StH
Posts: 3
Joined: Thu Dec 06, 2012 4:09 am

Getting files from external urls

Post by StH »

I'm trying to setup filters to get pdf files from a website.
The website I'm crawling is http://www.finn.no/finn/realestate/home ... t=1&page=1. I need to crawl all (4) pages in the search.
What I need to download is the pdf files linked in (almost) every result where the link says "Salgsoppgave".
Whatever filters I create it seems like it just completes without any result. Can anyone help?

User avatar
Support
Site Admin
Posts: 1892
Joined: Sun Oct 02, 2011 10:49 am

Re: Getting files from external urls

Post by Support »

Here are the filters to scan the pages for pdf files. I was not able to narrow it down to "Salgsoppgave" but it sems those are the only ones with pdf. I ran a scan on that links and got 52 pdf files.

Code: Select all

[BlackWidow v6.00 filters]
URL = http://www.finn.no/finn/realestate/homes/result?orgId=-1017&sort=1&page=1
[ ] Expert mode
[ ] Scan everything
[x] Scan whole site
Local depth: 0
[x] Scan external links
[ ] Only verify external links
External depth: 0
Default index page: 
Startup referrer: 
[ ] Slow down by 2:2 seconds
4 threads
[x] Replace "? with "result? using plain text
[x] Follow &page=\d+$ using regular expression
[x] Add \.pdf$ from URL using regular expression
[end]
Your support team.
http://SoftByteLabs.com

StH
Posts: 3
Joined: Thu Dec 06, 2012 4:09 am

Re: Getting files from external urls

Post by StH »

That's brilliant! Thanks alot!

StH
Posts: 3
Joined: Thu Dec 06, 2012 4:09 am

Re: Getting files from external urls

Post by StH »

Seems like this script is not working anymore now. I tried to see if there were any change in the website, but I don't see any.
Any idea why it's not working?

User avatar
Support
Site Admin
Posts: 1892
Joined: Sun Oct 02, 2011 10:49 am

Re: Getting files from external urls

Post by Support »

Try this one, it seems to work...

Code: Select all

[BlackWidow v6.00 filters]
URL = http://www.finn.no/finn/realestate/homes/result?orgId=-1017&sort=1&page=1
[ ] Expert mode
[ ] Scan everything
[x] Scan whole site
Local depth: 0
[x] Scan external links
[ ] Only verify external links
External depth: 0
Default index page: 
Startup referrer: 
[ ] Slow down by 2:2 seconds
4 threads
[x] Replace "? with "result? using plain text
[x] Follow \?finnkode=\d+$ using regular expression
[x] Follow &page=\d+$ using regular expression
[x] Add \.pdf$ from URL using regular expression
[end]
Your support team.
http://SoftByteLabs.com

Post Reply