Blocked links

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
Post Reply
pilotmike55
Posts: 17
Joined: Wed Jan 27, 2016 4:11 pm

Blocked links

Post by pilotmike55 » Wed Jan 27, 2016 4:18 pm

I was working on a script to download some jpgs from a site. I noticed that not all of links on the webpage were found. I added an output command to the FoundLink event and I could see links being found to a certain part of the page then stop then pick up again. Almost like it skipped over a section or was blocked.

Any thoughts?

User avatar
Support
Site Admin
Posts: 1679
Joined: Sun Oct 02, 2011 10:49 am

Re: Blocked links

Post by Support » Wed Jan 27, 2016 4:27 pm

Id it one of those page where when you scroll down it load another portion? many sites used this now as oppose to load everything in one go.
Your support team.
http://SoftByteLabs.com

pilotmike55
Posts: 17
Joined: Wed Jan 27, 2016 4:11 pm

Re: Blocked links

Post by pilotmike55 » Wed Jan 27, 2016 4:37 pm

Not that I can tell. It appears the page loads completely and I see the links when viewing the source.

User avatar
Support
Site Admin
Posts: 1679
Joined: Sun Oct 02, 2011 10:49 am

Re: Blocked links

Post by Support » Wed Jan 27, 2016 4:46 pm

Can you post or PM me your script?
Your support team.
http://SoftByteLabs.com

pilotmike55
Posts: 17
Joined: Wed Jan 27, 2016 4:11 pm

Re: Blocked links

Post by pilotmike55 » Wed Jan 27, 2016 4:52 pm

Here is the link http://www.brswimwear.com/content/14-al ... el-gallery
Here is the script
case ScannerEvent of

Starting:
begin
ExternalLinkDepth = 0; // set to 0 for no external links.
LocalLinkDepth = 10; // set to high number for no limit.
ScanWholeSite = No; // Stay within StartupURL
end;

BeforeFetch:
begin
// Fetch only html pages to parse for links.
output('DocumentURL = ' + DocumentURL);
output('DocumentType = ' + DocumentType);
AcceptEvent = (DocumentType ~= 'text/html');
end;

AfterFetch:
begin
Output('Fetched documentURL ' + DocumentURL);
/*for each matching('"([^"]+\.swf)"') in Document as aLink do begin*/
for each matching('/img/') in Document as aLink do begin
aLink.ResolveRelative(DocumentURL); // resolve links like ../foo/bar/
Scanlink(aLink); // add the link to the scan queue.
end;
end;

BeforeParsing:
begin
Output('Parsing document ' + DocumentURL);
/*Document.Replace('/tn_', '/'); // try to make a large image from a thumbnail.*/
AcceptEvent = Yes;
end;

BeforeAdding:
begin
Output('DocumentURL ' + DocumentURL);
Output('DocumentType ' + DocumentType);
AcceptEvent = (DocumentType ~= 'image/'); // add any kind of images.
end;

FoundLink:
begin
/*Output('Link ' + FoundLinkURL + ' found');
Output('Root is ' + FoundLinkURL.InRootURI(StartupURL));
Output('Base is ' + FoundLinkURL.InBaseURI(StartupURL));*/
if (ScanWholeSite and FoundLinkURL.InRootURI(StartupURL)) or
((not ScanWholeSite) and FoundLinkURL.InBaseURI(StartupURL))
then
AcceptEvent = (FoundLinkDepth <= LocalLinkDepth)
else
AcceptEvent = (FoundLinkDepth <= ExternalLinkDepth);
end;

Finishing:
begin
Output('Done');
end;

else
AcceptEvent = No;

end;

User avatar
Support
Site Admin
Posts: 1679
Joined: Sun Oct 02, 2011 10:49 am

Re: Blocked links

Post by Support » Wed Jan 27, 2016 5:48 pm

The line...

Code: Select all

for each matching('/img/') in Document as aLink do begin
is wrong because it uses regular expressions and you have it set to only get /img/ from the URL. You need to catch the whole ULR in between the quotes...

Code: Select all

for each matching('"([^"]*/img/[^"]*)') in Document as aLink do begin
I put in parenthesis inside the quotes as to not get the quotes themselves as part of the URL. Running it gives me 483 pictures on every scans.
Your support team.
http://SoftByteLabs.com

JPM
Posts: 1
Joined: Wed Jan 27, 2016 6:55 pm

Registration Info

Post by JPM » Wed Jan 27, 2016 7:06 pm

I'm not sure where to post this. I havent been on here in ages.
I'm a registered user of Black Widown and lost my registration
info. I have send email but havent heard back yet. Can one of the support
team help me with this? I will send another email to support.

Also, is Michael still around?

Many thanks

Jim

User avatar
Support
Site Admin
Posts: 1679
Joined: Sun Oct 02, 2011 10:49 am

Re: Blocked links

Post by Support » Wed Jan 27, 2016 8:50 pm

Hello Jim,

Michael here :)

I'll have your registration sent to you shortly...
Your support team.
http://SoftByteLabs.com

Post Reply