Search found 13 matches

by jdahlin
Fri Jul 20, 2012 11:02 am
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 10436

Re: Filter Help

A reboot fixed the "not adding to structure" issue. In my other adjustments, I've managed to screw up the "not following links like news/articles/2011/12" (perhaps because there is no trailing slash) Also, I turned off "Scan External Links" because I saw a few external ...
by jdahlin
Fri Jul 20, 2012 8:24 am
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 10436

Re: Filter Help

Thanks a bunch - the scanning is working now. For some reason... nothing show up in "Structure", even if I tell it to download (which I don't want it to do... I just need a siteMap). I left it running overnight... perhaps I just need a reboot. [BlackWidow v6.00 filters] URL = http://intran...
by jdahlin
Thu Jul 19, 2012 7:11 pm
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 10436

Re: Filter Help

Looks to be working much better, except, oddly, it is now getting hung up on all URLs that do not have a file name in them (not just the ones I described in the "new_articles/2011/1/" example. I presume I can modify this line so it applies to every directory and forces a file of some sort ...
by jdahlin
Thu Jul 19, 2012 5:37 pm
Forum: BlackWidow
Topic: Filter Help
Replies: 7
Views: 10436

Filter Help

I am trying to scan our intranet. I started off with "Scan Everything", "Stay within the full URL", and nothing for the "External Links". But, instead of staying withing "intranet.companyName.org" it also went to "diretcory.companyName.org". There ar...
by jdahlin
Fri Dec 23, 2011 7:30 am
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

Kicked off a scan last night... it ran for 13 hours and collected over 210,000 pages from about 400 different websites. Perhaps there is a way of changing [x] Follow tiaa-cref\.org\/ using regular expression So that it does not follow things that are not tiaa-cref.org but still follows external links?
by jdahlin
Thu Dec 22, 2011 10:02 am
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

Thanks for the assist. I did a scan last night from home but had to abort it when it was 6 hours into its run... I noticed in the settings you posted is that it is set to scan external sites. The structure that returned contained lots and lots of links for non tiaa-cref sites. Are these links includ...
by jdahlin
Wed Dec 21, 2011 4:14 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

BTW, I don't need to download any of it... just looking for a site inventory.
by jdahlin
Wed Dec 21, 2011 3:29 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

I need all html, pdf, and swf from the entire site. http://www.tiaa-cref.org redirects to https://www.tiaa-cref.org/public.index.html But I need everything from both http:// and https:// (There are a few directories that I don't need, plus what is in robots.txt, but it's easy enough to throw them aw...
by jdahlin
Wed Dec 21, 2011 3:06 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

This page contains lots of links to news articles which demonstrate the issue. None of the news articles appear in my scan. http://www.tiaa-cref.org/plansponsors/news/tiaa-cref-news/index.html This page is an example of one of the news articles: http://www.tiaa-cref.org/public/about/news/articles/ge...
by jdahlin
Wed Dec 21, 2011 2:22 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

I think the issue with stopping "two links deep" was due to BW thinking the switch from http:www.mysite to https://www.mysite made it think it was an external URL which I have turned off. Thanks for the filters - I'll use those instead of the ones I have defined. New issue: Our site contai...
by jdahlin
Tue Dec 20, 2011 3:04 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

Handling "Forbidden" It is scanning directories that do not have an index.html and receiving a "Forbidden" which is correct... but it keeps trying again, and again, and again, until all 6 of my threads are occupied by these URLs. How do I teach it to just throw the URL into &quo...
by jdahlin
Tue Dec 20, 2011 2:28 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Re: Scan stopping at 2 links deep

Example... Entering our starting URL "http://www.blahblah.org" (which redirects to https://www.blahblah/public/index.html) and running the scan it gets as far as "http://www.blahblah.org/public/support/forms/index.html". All told, it located 304 pages. But, this page has links to...
by jdahlin
Tue Dec 20, 2011 12:10 pm
Forum: BlackWidow
Topic: Scan stopping at 2 links deep
Replies: 14
Views: 21666

Scan stopping at 2 links deep

I have the radio button for "Scan the whole site" checked, but it wants to stop after only 2 links deep from the home page. Other than stopping early the site-structure returned appears to be exactly what I want. If I manually navigate to one of the URLs returned from this scan and re-scan...