A sitesucking issue...

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
Post Reply
RodOfNOD
Posts: 2
Joined: Sun Apr 01, 2012 8:28 pm

A sitesucking issue...

Post by RodOfNOD »

http://www.eyvindearle.com/ImageViewer.aspx?id=1

from 1 to say 999.

I want to suck all the jpegs, in the source for the page the path appears not as a number but a name:

http://www.eyvindearle.com/images/xl_im ... cturne.jpg

for example would be the one on the first page.

How do i suck all the jpgs on each of the pages...


i had this code which at least looked at all the pages...

Code: Select all

  Starting:
  begin
    for x = 1 to 9999 do begin
      lnk = 'http://www.eyvindearle.com/ImageViewer.aspx?id='+x;
       ScanLink(lnk);
    end;
  end;
the actual page code looks like:

Code: Select all

    <img id="imgPic" src="images/xl_images/Nocturne.jpg" style="border-width:0px;" />
Thanks!

User avatar
Support
Site Admin
Posts: 1879
Joined: Sun Oct 02, 2011 10:49 am

Re: A sitesucking issue...

Post by Support »

Here is a script that will do just that. Change the loop for x = 1 to 19 to whatever range you need...

Code: Select all

case ScannerEvent of

  Starting:
  begin
    for x = 1 to 19 do begin
      lnk = 'http://www.eyvindearle.com/ImageViewer.aspx?id='+x;
      ScanLink(lnk);
    end;
  end;

	BeforeFetch:
	begin
		AcceptEvent = (DocumentURL ~= '/ImageViewer\.aspx\?id=\d+$');
	end;

  AfterFetch:
	begin
		for each matching('src="(images/[^"]+)"') in Document as aLink do begin
			aLink.ResolveRelative(DocumentURL);
			Scanlink(aLink);
		end;
	end;

  FoundLink:
  begin
    AcceptEvent = (FoundLinkURL ~= '\.jpg$');
  end;

	BeforeAdding:
	begin
		AcceptEvent = (DocumentType ~= 'image/');
	end;

else
  AcceptEvent = No;

end;
Your support team.
http://SoftByteLabs.com

RodOfNOD
Posts: 2
Joined: Sun Apr 01, 2012 8:28 pm

Re: A sitesucking issue...

Post by RodOfNOD »

Thanks!

Post Reply