I am probably being very silly but...

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

I am probably being very silly but...

Post by LambethWeb »

Hi
I've been using BW v5 for ages, quite happily - to download our whole website and intranet and to do link checking as well - again on the whole sites.

In Version 6 I am trying to download all the HTML pages in our site and can't get beyond the home page. I've checked 'Scan the whole site', 'Download while scanning' and 'Preserve directory structure.

Either I enter the URL in the browser without actually bringing the site up, in which case I get the home page and nothing else, or I do browse to the home page and the address field immediately goes to 'about:blank' instead of the home page's URL.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

Hi,

In the Filters window, did you select "Scan everything" at the top of the window?

If the URL in the address bar is not what you need after using the browser, simply change it without pressing enter. BW will scan the text in the address bar, regardless of the displayed page.
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Yes, I checked 'Scan everything...'

I did it both with the following filters:

[ ] Expert mode
[x] Scan everything
[x] Scan whole site
Local depth: 0
[ ] Scan external links
[x] Only verify external links
External depth: 0
Default index page: index.html
Browser user agent: Mozilla/4.0 (compatible; MSIE 7.0; BlackWidow v6 - http://SoftByteLabs.com)
Startup referrer:
[ ] Slow down by 10:60 seconds
6 threads
[ ] Do not follow */systems/* using wildcard
[ ] Do not follow */staffservices/* using wildcard
[ ] Do not follow *jpg using wildcard
[ ] Do not follow *gif using wildcard
[ ] Do not follow *xls using wildcard
[ ] Do not follow *doc using wildcard
[ ] Do not follow *pdf using wildcard
[ ] Do not follow */news/* using wildcard
[ ] Do not follow */dat/* using wildcard
[ ] Do not follow */lambethfirst/* using wildcard
[ ] Do not follow */PeopleSearchWebForms/* using wildcard
[ ] Do not follow */404.htm using wildcard
[ ] Do not follow *?wbc_purpose=* using wildcard

I also tried without any of the 'Do not follow' filters with no success - still get a maximum of the home page.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

ok, then how is that home page formated? Is it html? does it redirect? is it Flash, javascript etc? Is there any href tags? Looks like it's not finding any links in the page you are starting the scan from. If you have a web URL that I can take a look at, just PM it to me if you don't want to post it here.
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »


User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

It worked without problem for me. Here's the filters I used...

Code: Select all

[BlackWidow v6.00 filters]
URL = http://www.lambeth.gov.uk/home.htm
[ ] Expert mode
[x] Scan everything
[x] Scan whole site
Local depth: 0
[ ] Scan external links
[ ] Only verify external links
External depth: 0
Default index page: 
Browser user agent: Mozilla/4.0 (compatible; MSIE 7.0; BlackWidow v6 - http://SoftByteLabs.com)
Startup referrer: 
[ ] Slow down by 10:60 seconds
4 threads
[end]
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Hi

I copied your settings and get a 'can't resolve' message or similar. I am accessing the web via a proxy server. Is there anywhere I can specify this?

Thanks

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

That's because the filters contained the URL I used! How does the URL you start the scan from looks like?
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Hi

The error message i get when i copy your settings/filters is 'The server name or address could not be resolved'. This sounds to me that it doesn't like our firewall and/or our proxy connection?

Thanks

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

Yes it is. Just change the URL before you scan. If you use a proxy, how do you go about it? Do you have to login first? Is the proxy URL combined with the site URL?
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Hi

We use an 'automatic configuration script' to provide our LAN settings. This does send internet traffic through a proxy. Is that any help?

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

There is no login and we use 'normal' urls.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

if you put in the following code as the first line of the script...

BrowseTo('www.google.com');

it will bring up a browser window and should show the Google page. Does it? If so, does the script now works?
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

I inserted:

BrowseTo('www.google.com');

immediately before:

case ScannerEvent of
Starting:
begin
ExternalLinkDepth = 0; // set to 0 for no external links.
LocalLinkDepth = 2; // set to high number for no limit.
ScanWholeSite = No; // Stay within StartupURL
end;

The result was an error message saying 'Function expected on line 112 at position 10.'

Line 112 is where the BrowseTo('www.google.com'); is.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

My mistake, I thought we were using a script, so disregards my last post.

If you use the BlackWidow browser, can you access the sites?
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Yes, but sometimes the URL changes to about:blank

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

In this case, put the URL back but do not press Enter so that it stays there. BW will use the text in that bar to start the scan from.
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Thanks but I still get 'The server name or address could not be resolved'

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

I don't know what to say! This error only happens when the address is wrong. Have you tried another URL such as http://www.google.com and see if it comes up with the same error? if it does, then something is blocking BW from accessing the network and that's why it can't resolve the address.
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Yes - same thing with google.

BW 5.22 connected to http://www.lambeth.gov.uk/ and our intranet at http://intranet.lambeth.gov.uk/ without any problem. As I have already said, I connect with the internet via a proxy. BW 5.22 had no problem with this. I wonder why version 6 does?

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

ok, in your original post, you said you only get the home page, so you do get one file then right? Can you download it and open it in a text viewer see what's in it? Does it contain a URL? a proxy setting or something?
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Unfortunately, although it looks like a file is returned (home.html), when I download it it is completely empty when I open it in Notepad, Dreamweaver etc. It has a file size of 0.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

In the Filters window, under "Browser agent", is it set to BlackWidow? If so, can you click on the ... button to the right of it, it should change to something like this...

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
Your support team.
http://SoftByteLabs.com

LambethWeb
Posts: 16
Joined: Wed Feb 29, 2012 10:55 am

Re: I am probably being very silly but...

Post by LambethWeb »

Hi

I have this in the Browser user agent field:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.0.3705; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

The same issues are still happening.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: I am probably being very silly but...

Post by Support »

Is your IE disabled?
Your support team.
http://SoftByteLabs.com

Post Reply