Search found 31 matches

by alpha2
Tue Feb 19, 2013 3:22 pm
Forum: BlackWidow
Topic: virtualization
Replies: 1
Views: 3954

virtualization

Hi,

is virtualization possible - e. g. using VirtualBox or VMware and Black Widow (using Tor)? Or are there any known problems?

Regards,

Alpha
by alpha2
Sat Feb 09, 2013 6:19 pm
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

Does BrownRecluse support https? I was so happy, that BW runs without problems (in principle). If I could use it, it would be better...
by alpha2
Fri Feb 08, 2013 7:42 am
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

Did you test it on your PC? How many lines of code does it accept? For me, it doesn't start the crawl, if there are too many.

Is there any plan for a future release, where you can export and import link lists to the tool and run it with large numbers of links?
by alpha2
Thu Feb 07, 2013 3:24 am
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

I tried to copy the 500,000 lines of code (Scanlink(...)) into the expert mode (it worked fine with 100, thus it is not a code problem). I couldn't run it. The code itself is only 25 MB. On the other hand memory consumption grew drastically - to more than 350 MB of memory. Thus the memory consumptio...
by alpha2
Wed Feb 06, 2013 6:50 pm
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

Yes, I sometimes do some programming. But I didn't have the time to go through the whole documentation. But maybe I can live without this feature and just crawl the sites in the scan list. After BW has gone through the scan list, does it continue automatically with the links from the parsed document...
by alpha2
Wed Feb 06, 2013 6:00 pm
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

Starting with a page abc, I would like to follow abc/h1..h9, but not k1..9. After I have parsed all h!q..9, I would like to follow the next line in the script.
by alpha2
Wed Feb 06, 2013 10:56 am
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

Hi, what does this mean: "It's up to you do define what to follow or not. The way it is, it'll scan what's on the list, but if you want to go further, you'll have to ScanLink() the links within the page." How do I define, whether to follow and whether not and to what depth? E. g. I want to follow ju...
by alpha2
Mon Feb 04, 2013 4:49 pm
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

How many lines of code are possible? Infinite?

Does this approach only scan the sites in the list or can I also go let's say two links away with certain rules (e.g. don't follow *xyz*, but follow *zzz*?

Or does this approach only scan the links in the list?
by alpha2
Mon Feb 04, 2013 5:04 am
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

Re: List of pages to be scanned - export / import

P. S. 1) Is it possible to access external files in the expert mode? e. g. open file x while not eof() lnk=Readln() Scanlink(lnk); loop When the system scans the links as per the Scanlink commands, is it also possible to define, how deep it scans and with which patterns? E. g. starting from abc.htlm...
by alpha2
Mon Feb 04, 2013 3:28 am
Forum: BlackWidow
Topic: List of pages to be scanned - export / import
Replies: 18
Views: 23060

List of pages to be scanned - export / import

Hi, I understood, that in version 5 it was possible to export and import pages to be scanned, so that you could stop a crawl, export the "remaining" and come back. This feature is no longer available in version 6. Will it come in version 7 or in an interim release? It would be very useful, if you st...
by alpha2
Mon Nov 26, 2012 9:37 am
Forum: BlackWidow
Topic: no pictures
Replies: 8
Views: 7010

Re: no pictures

You are right. Sorry. I didn't understand the size. But when looking into the code, I see, that there is a lot of stuff - even without the pictures.
by alpha2
Wed Nov 21, 2012 2:34 pm
Forum: BlackWidow
Topic: no pictures
Replies: 8
Views: 7010

Re: no pictures

That's not what I meant. I don't have problems finding the links. But the question is, whether I can download only the pages, but not the pictures within the pages. E. g. all pages have a size of 250 kB (on average). But most of these 250 kB are images. I would like to download the page with only th...
by alpha2
Tue Nov 20, 2012 5:37 pm
Forum: BlackWidow
Topic: no pictures
Replies: 8
Views: 7010

Re: no pictures

yes. With the limitation of sites containg also name after the slash I try to avoid any surprises
by alpha2
Tue Nov 20, 2012 4:38 pm
Forum: BlackWidow
Topic: no pictures
Replies: 8
Views: 7010

Re: no pictures

www dot vorname insert dot com slash name,elias dot again html, but as the site is 99% of pages of that kind, I anyways scan the whole site with the parameters [BlackWidow v6.00 filters] URL = ... [ ] Expert mode [x] Scan everything [x] Scan whole site Local depth: 0 [x] Scan external links [ ] Only...
by alpha2
Tue Nov 20, 2012 4:10 pm
Forum: BlackWidow
Topic: no pictures
Replies: 8
Views: 7010

no pictures

Hi,

(btw. I bought BW in the meantime... ;-) ) I wanted to download from a page vorname then dot com. The files are pretty big. Is there any chance to dowbload the files without the pictures? They are within the html file, not separate.

Regards,

Alpha
by alpha2
Tue Nov 06, 2012 1:38 am
Forum: BlackWidow
Topic: google results pages
Replies: 3
Views: 4692

Re: google results pages

Does this problem also exist, if I crawl only with one stream and slowly with long delays?
Speed is not the problem.

Is there any other search engine like bing or so, which can be spidered?
by alpha2
Mon Nov 05, 2012 6:26 pm
Forum: BlackWidow
Topic: google results pages
Replies: 3
Views: 4692

google results pages

Hi, thanks for the hints so far. I would like to save the google results pages, so I can compare them time after time. Are there any standard settings I should take in order to save the google results pages locally for a certain query? I had problems, and it didn't work at all for me. But I thought,...
by alpha2
Sat Nov 03, 2012 6:54 pm
Forum: BlackWidow
Topic: restart a crawl?
Replies: 7
Views: 6623

Re: restart a crawl?

Where do I get BW version 5? I'm still experimenting with BW in the trial period.
by alpha2
Sat Nov 03, 2012 6:22 pm
Forum: BlackWidow
Topic: restart a crawl?
Replies: 7
Views: 6623

Re: restart a crawl?

So there is no chance to somehow export the "remaining"? I would anyways be interested to have a look at this list. So you neither can show nor export nor import the remaining items? The sleep mode is not so easy, as I'm not the only user of the PC. Other users might want to shut down both BW and th...
by alpha2
Sat Nov 03, 2012 5:11 pm
Forum: BlackWidow
Topic: restart a crawl?
Replies: 7
Views: 6623

Re: restart a crawl?

It restarted somehow. But in the list there were 400 remaining before. When I restartet it, there were 0 remaining. Somehow the remaining files got lost.
by alpha2
Sat Nov 03, 2012 1:36 pm
Forum: BlackWidow
Topic: restart a crawl?
Replies: 7
Views: 6623

restart a crawl?

Hi, Crawling now works phantastic. Thanks for the hints. Sorry - I have another question: The crawl detects a lot of links. As it is a shared computer and internet connection is slow, I cannot wait until everything is crawled. Is there any chance to store the current status including the links alrea...
by alpha2
Fri Nov 02, 2012 7:19 pm
Forum: BlackWidow
Topic: no re-visiting
Replies: 14
Views: 10390

Re: no re-visiting

I've sent the PN. Did you get it? (It is in my Outbox, but not in Sent) Maybe you can understand the problem without clicking on the links...
by alpha2
Fri Nov 02, 2012 6:58 pm
Forum: BlackWidow
Topic: no re-visiting
Replies: 14
Views: 10390

Re: no re-visiting

So then, you only need to scan everything in aaa, aab, aac but exclude all xx1, xx2, xx3 etc correct? No. aaa.htm is a page of it's own, and I don't need any other thing within. And my problem is that in 99% of all cases aaa/xxx1 etc. comes first and aaa.html comes later. On the other hand, if the ...
by alpha2
Fri Nov 02, 2012 6:18 pm
Forum: BlackWidow
Topic: no re-visiting
Replies: 14
Views: 10390

Re: no re-visiting

Support wrote:So you mean that aaa/xxx1 is the same as aab/xxx1 ?
No. www.sbl.net/aaa is the same as www.sbl.net/aaa/xxx1 is the same as www.sbl.net/aaa/xxx2 ...
And www.sbl.net/aab is the same as www.sbl.net/aab/xxx1 is the same as www.sbl.net/aab/xxx2 ...
But www.sbl.net/aaa is different from www.sbl.net/aab.
by alpha2
Fri Nov 02, 2012 6:08 pm
Forum: BlackWidow
Topic: no re-visiting
Replies: 14
Views: 10390

Re: no re-visiting

ok, I'm not following you here. Lets say the site is called http://www.sbl.net From what I undersand, there are links like http://www.sbl.net/aaa/xxx1 and http://www.sbl.net/aaa/xxx2 and http://www.sbl.net/aaa/xxx3 etc right? There is also http://www.sbl.net/aab/xxx1 and http://www.sbl.net/aab/xxx2...