Hyperion Booklets

BlackWidow scans websites (it's a site ripper). It can download an entire website, or download portions of a site.
Post Reply
music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Hyperion Booklets

Post by music_lover » Thu May 03, 2012 2:23 pm

Wondering if you could create a BW web file for http://www.hyperion-records.co.uk/notes/

Clicking here: http://www.hyperion-records.co.uk/notes ... 7580-B.pdf will get me to a PDF version of a CD booklet. I'd like to grab them all.

Thanks!

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 2:28 pm

Clicking on those links you provided gives me an error that I'm 'hot linking'. From the main page, where do I click to get to these 2 links?
Your support team.
http://SoftByteLabs.com

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 2:35 pm

The first link gives an error. The second link should not. I hope that's enough. I cannot answer your question as I just don't know.

Update: well they seem to have changed things from this morning... might be a result of a keen IT person watching me attempt to get to these files.

Is there anything that can be done?

Another Update: It would seem that getting to them is as easy as this... click on "catalogue indexes" then "artists" then (for example) "singers" then (for example) "all singers" then any one of the singers listed then any one of their CDs. On the left there's a "View sleeve notes/artwork (PDF)" link. That gets you to the PDF.

Does that help?
Last edited by music_lover on Thu May 03, 2012 2:48 pm, edited 1 time in total.

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 2:43 pm

Can you backtrack the links? I mean, which page are they in?
Your support team.
http://SoftByteLabs.com

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 2:50 pm

See my edit above and let me know if that's good enough.

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 2:54 pm

ok that works. So you want to scan the entire site for the pdf or just an artist?
Your support team.
http://SoftByteLabs.com

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 3:12 pm

The entire site for .PDFs.

Thanks!

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 3:13 pm

ok, let me work on it and I'll post the filters here. Give me a few hours.
Your support team.
http://SoftByteLabs.com

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 5:59 pm

Here are the filters. Copy them an in the Filters window, click on the "Paste Settings" button and start the scan.

Code: Select all

[BlackWidow v6.00 filters]
URL = http://www.hyperion-records.co.uk/ai.asp?ai=A_Ind_10_1&vw=dc
[ ] Expert mode
[ ] Scan everything
[x] Scan whole site
Local depth: 0
[x] Scan external links
[ ] Only verify external links
External depth: 0
Default index page: 
Startup referrer: 
[ ] Slow down by 10:60 seconds
4 threads
[x] Follow /a\.asp\?a=A[^&]+&vw=dc$ using regular expression
[x] Follow /dc\.asp\?dc=D_[^&]+&vw=dc$ using regular expression
[x] Add \.pdf$ from URL using regular expression
[end]
Your support team.
http://SoftByteLabs.com

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 6:24 pm

OK thanks... it's doing its thing. I'll let you know how it works out!

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 6:51 pm

Well that just didn't work. I do appreciate your efforts, but absolutely nothing downloads. I have download while scanning checked and a valid download folder selected.

It scans and scans and scans... 2153s time but nothing downloads.

Any ideas?

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 7:36 pm

That's because it goes through all the A,B,C...Z and then each has a ton of names listed. Then the PDF. So you need to let it run for a long while!
Your support team.
http://SoftByteLabs.com

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 8:26 pm

Hmm... I'm pretty sure it stopped doing much of anything after scanning the 2000+ links but I'll try it again ;)

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Thu May 03, 2012 8:53 pm

OK so I tried again and it scanned 2,153 links and stopped. Nothing downloaded.

What's next?

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Thu May 03, 2012 10:07 pm

ok, this one works. Here's how to use it...

In the top/right corner of the Filters window, click on the "Expert" button and paste the script below into it. Then start the scan from this URL...

http://www.hyperion-records.co.uk/ai.as ... 10_1&vw=dc

Code: Select all

case ScannerEvent of

  BeforeFetch:
  begin
    AcceptEvent =
      (DocumentURL ~= '/a\.asp\?a=A[^&]+&vw=dc$') or
      (DocumentURL ~= '/dc\.asp\?dc=D_[^&]+&vw=dc$')
    ;
  end;

  AfterFetch:
  begin
    for each matching('href="(dc\.asp\?dc=D_[^&]+&vw=dc)"') in Document as aLink do begin
      aLink.ResolveRelative(DocumentURL);
      Scanlink(aLink);
    end;
    for each matching("href='([^']*\.pdf)'") in Document as aLink do begin
      aLink.ResolveRelative(DocumentURL);
      Scanlink(aLink);
    end;
  end;

  FoundLink:
  begin
    AcceptEvent =
      (FoundLinkURL ~= '/a\.asp\?a=A[^&]+&vw=dc$') or
      (FoundLinkURL ~= '/dc\.asp\?dc=D_[^&]+&vw=dc$') or
      (FoundLinkURL ~= '\.pdf$')
    ;
  end;

  BeforeAdding:
  begin
    AcceptEvent = (DocumentType ~= 'pdf');
  end;

else
  AcceptEvent = No;

end;
Your support team.
http://SoftByteLabs.com

music_lover
Posts: 19
Joined: Thu May 03, 2012 1:46 pm

Re: Hyperion Booklets

Post by music_lover » Fri May 04, 2012 12:50 pm

Well that worked perfectly. Woke up this morning to a folder filled with pdfs.
Awesome!

Thanks!

User avatar
Support
Site Admin
Posts: 1851
Joined: Sun Oct 02, 2011 10:49 am

Re: Hyperion Booklets

Post by Support » Fri May 04, 2012 2:23 pm

You are welcome.
Your support team.
http://SoftByteLabs.com

Post Reply