Page 1 of 1

Blocked links

Posted: Wed Jan 27, 2016 4:18 pm
by pilotmike55
I was working on a script to download some jpgs from a site. I noticed that not all of links on the webpage were found. I added an output command to the FoundLink event and I could see links being found to a certain part of the page then stop then pick up again. Almost like it skipped over a section or was blocked.

Any thoughts?

Re: Blocked links

Posted: Wed Jan 27, 2016 4:27 pm
by Support
Id it one of those page where when you scroll down it load another portion? many sites used this now as oppose to load everything in one go.

Re: Blocked links

Posted: Wed Jan 27, 2016 4:37 pm
by pilotmike55
Not that I can tell. It appears the page loads completely and I see the links when viewing the source.

Re: Blocked links

Posted: Wed Jan 27, 2016 4:46 pm
by Support
Can you post or PM me your script?

Re: Blocked links

Posted: Wed Jan 27, 2016 4:52 pm
by pilotmike55
Here is the link http://www.brswimwear.com/content/14-al ... el-gallery
Here is the script
case ScannerEvent of

Starting:
begin
ExternalLinkDepth = 0; // set to 0 for no external links.
LocalLinkDepth = 10; // set to high number for no limit.
ScanWholeSite = No; // Stay within StartupURL
end;

BeforeFetch:
begin
// Fetch only html pages to parse for links.
output('DocumentURL = ' + DocumentURL);
output('DocumentType = ' + DocumentType);
AcceptEvent = (DocumentType ~= 'text/html');
end;

AfterFetch:
begin
Output('Fetched documentURL ' + DocumentURL);
/*for each matching('"([^"]+\.swf)"') in Document as aLink do begin*/
for each matching('/img/') in Document as aLink do begin
aLink.ResolveRelative(DocumentURL); // resolve links like ../foo/bar/
Scanlink(aLink); // add the link to the scan queue.
end;
end;

BeforeParsing:
begin
Output('Parsing document ' + DocumentURL);
/*Document.Replace('/tn_', '/'); // try to make a large image from a thumbnail.*/
AcceptEvent = Yes;
end;

BeforeAdding:
begin
Output('DocumentURL ' + DocumentURL);
Output('DocumentType ' + DocumentType);
AcceptEvent = (DocumentType ~= 'image/'); // add any kind of images.
end;

FoundLink:
begin
/*Output('Link ' + FoundLinkURL + ' found');
Output('Root is ' + FoundLinkURL.InRootURI(StartupURL));
Output('Base is ' + FoundLinkURL.InBaseURI(StartupURL));*/
if (ScanWholeSite and FoundLinkURL.InRootURI(StartupURL)) or
((not ScanWholeSite) and FoundLinkURL.InBaseURI(StartupURL))
then
AcceptEvent = (FoundLinkDepth <= LocalLinkDepth)
else
AcceptEvent = (FoundLinkDepth <= ExternalLinkDepth);
end;

Finishing:
begin
Output('Done');
end;

else
AcceptEvent = No;

end;

Re: Blocked links

Posted: Wed Jan 27, 2016 5:48 pm
by Support
The line...

Code: Select all

for each matching('/img/') in Document as aLink do begin
is wrong because it uses regular expressions and you have it set to only get /img/ from the URL. You need to catch the whole ULR in between the quotes...

Code: Select all

for each matching('"([^"]*/img/[^"]*)') in Document as aLink do begin
I put in parenthesis inside the quotes as to not get the quotes themselves as part of the URL. Running it gives me 483 pictures on every scans.

Registration Info

Posted: Wed Jan 27, 2016 7:06 pm
by JPM
I'm not sure where to post this. I havent been on here in ages.
I'm a registered user of Black Widown and lost my registration
info. I have send email but havent heard back yet. Can one of the support
team help me with this? I will send another email to support.

Also, is Michael still around?

Many thanks

Jim

Re: Blocked links

Posted: Wed Jan 27, 2016 8:50 pm
by Support
Hello Jim,

Michael here :)

I'll have your registration sent to you shortly...