BW barks up all branches of the folder tree

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
Post Reply
Latro Dektes
Posts: 3
Joined: Sun Mar 23, 2014 5:46 pm

BW barks up all branches of the folder tree

Post by Latro Dektes » Thu Feb 07, 2019 2:48 pm

Goal: I would like to download pdf files similar to this one: www.boston.gov/sites/default/files/imce ... 0_2018.pdf

Try 1) I pasted that URL into BlackWidow. Unfortunately BW crashes when I press "scan"

Try 2) I shortened the URL to www.boston.gov/sites/default/files/imce ... s/2018-08/
which not surprisingly gave a 404 error.
When I try to scan that BW gives me a link error.

Try 3) I shortened the URL to www.boston.gov
BW happily scanned that URL and after a long time it eventually found the folders that interest me.

So only try 3 worked for me. QUESTION: Is there a way I can encourage BW to focus on only part of the directory tree?

User avatar
Support
Site Admin
Posts: 1756
Joined: Sun Oct 02, 2011 10:49 am

Re: BW barks up all branches of the folder tree

Post by Support » Thu Feb 07, 2019 4:04 pm

You have to find the page where all the PDF are listed in order to get them. Like your PDF link example, what was the page you found it in?
Your support team.
http://SoftByteLabs.com

Latro Dektes
Posts: 3
Joined: Sun Mar 23, 2014 5:46 pm

Re: BW barks up all branches of the folder tree

Post by Latro Dektes » Thu Feb 07, 2019 4:25 pm

Yes, it is best to begin at the citing page. I hadn't realized that they are listed here:
www.boston.gov/departments/inspectional ... ard-appeal
So that solves today's chore.

But while I have your attention let me ask whether you are surprised by the failure of my first 2 tries.
Try 1 showed that BW crashes if you begin with the URL of a pdf file.
Try 2 shows that BW cannot proceed if your starting URL returns an error.

User avatar
Support
Site Admin
Posts: 1756
Joined: Sun Oct 02, 2011 10:49 am

Re: BW barks up all branches of the folder tree

Post by Support » Thu Feb 07, 2019 6:06 pm

I use v6.3 and scanning just the PDF works. The directory of the PDF doesn't work not because BW isn't capable but because the server doesn't allow directory indexing.
Your support team.
http://SoftByteLabs.com

Latro Dektes
Posts: 3
Joined: Sun Mar 23, 2014 5:46 pm

Re: BW barks up all branches of the folder tree

Post by Latro Dektes » Thu Feb 07, 2019 8:35 pm

Thank you. I'm all set now.
End of line.

Post Reply