Search found 13 matches

by jdahlin
Fri Jul 20, 2012 11:02 am
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 7821

Re: Filter Help

A reboot fixed the "not adding to structure" issue. In my other adjustments, I've managed to screw up the "not following links like news/articles/2011/12" (perhaps because there is no trailing slash) Also, I turned off "Scan External Links" because I saw a few external URLs pop-up in the can again. ...
by jdahlin
Fri Jul 20, 2012 8:24 am
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 7821

Re: Filter Help

Thanks a bunch - the scanning is working now. For some reason... nothing show up in "Structure", even if I tell it to download (which I don't want it to do... I just need a siteMap). I left it running overnight... perhaps I just need a reboot. [BlackWidow v6.00 filters] URL = http://intranet.ops.tia...
by jdahlin
Thu Jul 19, 2012 7:11 pm
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 7821

Re: Filter Help

Looks to be working much better, except, oddly, it is now getting hung up on all URLs that do not have a file name in them (not just the ones I described in the "new_articles/2011/1/" example. I presume I can modify this line so it applies to every directory and forces a file of some sort (*.*) [x] ...
by jdahlin
Thu Jul 19, 2012 5:37 pm
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 7821

Filter Help

I am trying to scan our intranet. I started off with "Scan Everything", "Stay within the full URL", and nothing for the "External Links". But, instead of staying withing "intranet.companyName.org" it also went to "diretcory.companyName.org". There are also a couple directories that I want to exclude...
by jdahlin
Fri Dec 23, 2011 7:30 am
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

Kicked off a scan last night... it ran for 13 hours and collected over 210,000 pages from about 400 different websites. Perhaps there is a way of changing [x] Follow tiaa-cref\.org\/ using regular expression So that it does not follow things that are not tiaa-cref.org but still follows external links?
by jdahlin
Thu Dec 22, 2011 10:02 am
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

Thanks for the assist. I did a scan last night from home but had to abort it when it was 6 hours into its run... I noticed in the settings you posted is that it is set to scan external sites. The structure that returned contained lots and lots of links for non tiaa-cref sites. Are these links includ...
by jdahlin
Wed Dec 21, 2011 4:14 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

BTW, I don't need to download any of it... just looking for a site inventory.
by jdahlin
Wed Dec 21, 2011 3:29 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

I need all html, pdf, and swf from the entire site. http://www.tiaa-cref.org redirects to https://www.tiaa-cref.org/public.index.html But I need everything from both http:// and https:// (There are a few directories that I don't need, plus what is in robots.txt, but it's easy enough to throw them aw...
by jdahlin
Wed Dec 21, 2011 3:06 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

This page contains lots of links to news articles which demonstrate the issue. None of the news articles appear in my scan. http://www.tiaa-cref.org/plansponsors/news/tiaa-cref-news/index.html This page is an example of one of the news articles: http://www.tiaa-cref.org/public/about/news/articles/ge...
by jdahlin
Wed Dec 21, 2011 2:22 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

I think the issue with stopping "two links deep" was due to BW thinking the switch from http:www.mysite to https://www.mysite made it think it was an external URL which I have turned off. Thanks for the filters - I'll use those instead of the ones I have defined. New issue: Our site contains a <scri...
by jdahlin
Tue Dec 20, 2011 3:04 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

Handling "Forbidden" It is scanning directories that do not have an index.html and receiving a "Forbidden" which is correct... but it keeps trying again, and again, and again, until all 6 of my threads are occupied by these URLs. How do I teach it to just throw the URL into "link errors" and move o...
by jdahlin
Tue Dec 20, 2011 2:28 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Re: Scan stopping at 2 links deep

Example... Entering our starting URL "http://www.blahblah.org" (which redirects to https://www.blahblah/public/index.html) and running the scan it gets as far as "http://www.blahblah.org/public/support/forms/index.html". All told, it located 304 pages. But, this page has links to lots of HTML page, ...
by jdahlin
Tue Dec 20, 2011 12:10 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 17452

Scan stopping at 2 links deep

I have the radio button for "Scan the whole site" checked, but it wants to stop after only 2 links deep from the home page. Other than stopping early the site-structure returned appears to be exactly what I want. If I manually navigate to one of the URLs returned from this scan and re-scan, it picks...