blackwidow vs. brown recluse and blackwidow newby qns

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
Post Reply
Dev
Posts: 4
Joined: Sun Jan 08, 2012 8:54 pm

blackwidow vs. brown recluse and blackwidow newby qns

Post by Dev »

Hi
I have been using blackwidow for acouple of days to download pages from discussion forums. for example .com/discussions

I have a couple of questions:
- why does the scanner stop at random intervals. I first thought it stops after reaching certain depth but that doesnt seem to be the reason. Also if I give it the page it stopped at it continues further. The above discussion forum has about 930 pages with 15 links each of discussion threads. It would be nice if it doesnt stop in the middle and I can set it up and leave it till it is done
- if I download the files as it is scanning it saves the files with numbers instead of the actual name of the page. But if I select specific files and download them then it keeps the original name. Is there some setting that I am missing that will allow me to setup the filenaming convention or something so that I dont have to scan first and download later. that will save a lot of time
- the file extensions are all wrong but that is ok. I change them to .doc
the page has a left side menu and some random stuff in the right side bar. Is it possible to somehow program it to save only the middle column that has the body text and ignore the menus. Right now I am running a batch macro later to clean up the documents of the menu links.
- the first two lines of the document thus cleaned have information about the time the thread was created, created by and random other stuff. Is it possible to record this information in an excel file or csv or something?

I thought Brown recluse migh be useful to do all this instead of just the scanning and downloading I am doing with Black widow but I am not a programmer and was not able to figure out if Brown recluse actually can do all these things I need done. Can you please tell me if it can do these things?
thanks,
Dev
Last edited by Dev on Sun Dec 16, 2012 4:35 pm, edited 1 time in total.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Support »

Dev wrote:- why does the scanner stop at random intervals.
It's not suppose to stop like this. The reason(s) could be that the site detect the speed at which you fetch the pages and images, and block you access for 30 secs or so. Or, your Internet connection is poor, or, you use too many threads.
Dev wrote:- if I download the files as it is scanning it saves the files with numbers instead of the actual name of the page. But if I select specific files and download them then it keeps the original name. Is there some setting that I am missing that will allow me to setup the filenaming convention or something so that I dont have to scan first and download later. that will save a lot of time
If the pages are named something like /forum.php?file=blabla then the scanner will save the file using a generic unique name. But if the server provide the "save as" file name, you can see it in the Structure by scrolling the list right and look at the "Save as" column. If it has file names in it, then BlackWidow should detect this and use those names instead.
Dev wrote:- the file extensions are all wrong but that is ok. I change them to .doc
the page has a left side menu and some random stuff in the right side bar. Is it possible to somehow program it to save only the middle column that has the body text and ignore the menus. Right now I am running a batch macro later to clean up the documents of the menu links.
No it can not modify the page it saved. Well, it could be possible, but we would need to write a script only for that.
Dev wrote:- the first two lines of the document thus cleaned have information about the time the thread was created, created by and random other stuff. Is it possible to record this information in an excel file or csv or something?
Not with BlackWidow, but BrownRecluse can.
Dev wrote:I thought Brown recluse migh be useful to do all this instead of just the scanning and downloading I am doing with Black widow but I am not a programmer and was not able to figure out if Brown recluse actually can do all these things I need done. Can you please tell me if it can do these things?
Yes it can, but if you can not program, we can write you a script to do the specifics. It works only on the site the script is made for. And yes, it can pull whatever data the site provide, and save it to a CSV or tab delimited text file.
Your support team.
http://SoftByteLabs.com

Dev
Posts: 4
Joined: Sun Jan 08, 2012 8:54 pm

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Dev »

thanks for the clarification. I am using 6 threads. So I have reduced those and also slowed down by 40 sec. Lets see if that helps.

So if I want you to write a program for me for brown recluse for a specific website for example http://www. .com/discussion, what would I have to do? I have a tentative idea of what all data I want from those pages but I would have to discuss what is doable and what is not.
Last edited by Dev on Sun Dec 16, 2012 4:35 pm, edited 1 time in total.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Support »

If you can see the data on the page, then it can be extracted. Just tell me the layout of the data and where, and we'll do the rest. For example, you need name,street,city,state,zip,phone,email from pages in /directory/
Your support team.
http://SoftByteLabs.com

Dev
Posts: 4
Joined: Sun Jan 08, 2012 8:54 pm

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Dev »

Thanks.

I want an excel sheet made up of list of all the links that start with http://www. .com/discussions/
For example: http://www. .com/discussions/

For each of these items I want first cell of the row to be the name of the link. In the above example it would be xxxx. If the first cell is the whole link for the page I am fine with that too.

Each page has 3 columns, the middle one has the main text for the discussion thread. The header looks like follows:
"xxxx"

The second cell has to be the username in the second line By xyz
The third cell the date and time that follows.
The fourth, fifth and sixth should have the numbers that follow
and seventh should have the text for that post.

Is it possible to create an excel sheet like this?
Last edited by Dev on Sun Dec 16, 2012 4:36 pm, edited 1 time in total.

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Support »

It looks possible, but with BrownReclus not BlackWidow, and, the data will be saved in a text file or csv where it can be imported into Excel.
Your support team.
http://SoftByteLabs.com

Dev
Posts: 4
Joined: Sun Jan 08, 2012 8:54 pm

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Dev »

I downloaded brown recluse before, I will buy it of somebody can write this program. Should I post this same information in the Brown recluse Q&A board?

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: blackwidow vs. brown recluse and blackwidow newby qns

Post by Support »

Yes please. When you have your order#, send me a private message and I'll get started on the script.
Your support team.
http://SoftByteLabs.com

Post Reply