Page 1 of 2

Website lookup from a list and link capture

Posted: Wed Jun 12, 2013 6:18 am
by gcarmich
I installed BrownRecluse trail today. I'm not sure how to create the routine I am interested in creating. Here is a description:
I have a list of several hundred URLs in a .csv (could be .txt) format. I'd like to go to each webpage page (URL) and check (or capture) the "contact", "contacts", "contact us" link from the page - if it exists. Then create a file (or update the original list) with the original URL and the indicator or link (http://) to the "contacts" on that page if it exists. This way, I end up with a list of the pages with a "contact" links and can go directly to those pages for more information. Is this a routine that BrownRecluse could do?

For example: the input file would contain

http://www.softbytelabs.com/

and the output file would have

http://www.softbytelabs.com/ , http://www.softbytelabs.com/us/contacts.html

Thank you,

Re: Website lookup from a list and link capture

Posted: Wed Jun 12, 2013 10:41 am
by Support
Here is a script that will do this...

Code: Select all

PerlRegEx = Yes;
Output.Clear;

fn = SelectFile('Open file...');
if fn = Nothing then Terminate;

f = New(File);
f.Open(fn);
f.Seek(BeginningOfFile);

outfile = fn + '.contacts.txt';
f2 = New(File);
f2.Truncate;

Link = New(URL);

while f.Position < f.Size do begin

	lnk = f.Read;

	if Link.Get(lnk) then begin
	  cnt = WildGet(Link.Data, 'href="([^"]+contact[^"]+)"');
	  if cnt = Nothing then
	    cnt = WildGet(Link.Data, "href='([^']+contact[^']+)'");
	  if cnt then begin
	  	cnt = Link.FixUp(lnk);
	  	f2.Write(lnk+tab+cnt);
	  	Output(cnt);
		end;
	end;

end;

f2.close;
f.close;

Re: Website lookup from a list and link capture

Posted: Thu Jun 13, 2013 12:28 pm
by gcarmich
Thanks! I'll give it a try.

Re: Website lookup from a list and link capture

Posted: Thu Jun 13, 2013 2:27 pm
by gcarmich
It runs but it appears to be returning just the original URL and it does not include the "contact" URL. Also, it doesn't seem to write out the .txt file after completion - I see the output in the "runtime outputs" window. I'm new to this and might be doing something wrong.

Re: Website lookup from a list and link capture

Posted: Thu Jun 13, 2013 5:31 pm
by Support
A typo on line 26, here is the corrected one...

Code: Select all

PerlRegEx = Yes;
Output.Clear;

fn = SelectFile('Open file...');
if fn = Nothing then Terminate;

f = New(File);
f.Open(fn);
f.Seek(BeginningOfFile);

outfile = fn + '.contacts.txt';
f2 = New(File);
f2.Truncate;

Link = New(URL);

while f.Position < f.Size do begin

   lnk = f.Read;

   if Link.Get(lnk) then begin
     cnt = WildGet(Link.Data, 'href="([^"]+contact[^"]+)"');
     if cnt = Nothing then
       cnt = WildGet(Link.Data, "href='([^']+contact[^']+)'");
     if cnt then begin
        cnt = Link.FixUp(cnt);
        f2.Write(lnk+tab+cnt);
        Output(cnt);
      end;
   end;

end;

f2.close;
f.close;

Re: Website lookup from a list and link capture

Posted: Tue Jun 18, 2013 6:51 am
by gcarmich
Thanks for the sample code. I emailed a request to softbyte labs developers for a quote on some coding but I haven't received a response yet. I there a way to verify they received the request?

Thanks.

Re: Website lookup from a list and link capture

Posted: Wed Jun 19, 2013 11:42 am
by Support
Yes they have, but we are so busy right now finish up Raylectron that it may be another couple of days.

Re: Website lookup from a list and link capture

Posted: Wed Jun 19, 2013 12:08 pm
by gcarmich
Ok. Np. Thank you.

Re: Website lookup from a list and link capture

Posted: Tue Jun 25, 2013 3:23 pm
by Support
Here is a script about your request you email me the other day. I did not test it, so if there are any errors, let me know...

Code: Select all

PerlRegEx = Yes;
Output.Clear;

Keyworkds = 'contact,about,music';

sk = New(Stack);
sk.Split(Keyworkds, ',');
sk.Reverse;

fn = SelectFile('Open file...');
if fn = Nothing then Terminate;

f = New(File);
f.Open(fn);
f.Seek(BeginningOfFile);

outfile = fn + '.output.txt';
f2 = New(File);
f2.Truncate;

Link = New(URL);

while f.Position < f.Size do begin

	lnk = f.Read;

	if Link.Get(lnk) then begin
		hdr = Link.Headers;
		f2.Write(lnk+crlf);
		f2.Write(hdr+crlf);
		for i = 1 to sk.Count do begin
		  k = sk.Items[i];
			cnt = WildGet(Link.Data, 'href="([^"]+'+k+'[^"]+)"');
			if cnt = Nothing then
				cnt = WildGet(Link.Data, "href='([^']+"+k+"[^']+)'");
			if cnt then begin
				cnt = Link.FixUp(cnt);
				f2.Write(cnt);
			end;
		end;
		f2.Write('-'*80+crlf);
	end;

end;

f2.close;
f.close;

Re: Website lookup from a list and link capture

Posted: Fri Jun 28, 2013 5:06 am
by gcarmich
I ran it today but the code produced no output. I selected the source file when prompted - do I need to do any thing else?

Re: Website lookup from a list and link capture

Posted: Fri Jun 28, 2013 9:57 am
by Support
The input file should have one URL per lines, and you should also set the keywords on the 3rd source code line.

Re: Website lookup from a list and link capture

Posted: Fri Jun 28, 2013 2:21 pm
by gcarmich
The input file does have a single URL per line- same file I ran successfully with the earlier code. I left the third line as is. The keywords were on the URL referenced pages.

Re: Website lookup from a list and link capture

Posted: Fri Jun 28, 2013 5:51 pm
by Support
What do you mean by "The keywords were on the URL referenced pages"?

The way I have it setup is the input file is one URL per lines and the keywords are stored in the script itself. Isn't it how you wanted it?

Re: Website lookup from a list and link capture

Posted: Fri Jun 28, 2013 6:38 pm
by gcarmich
I mean that some of the pages have the keywords on them so at least a few of the URLs should have returned in the output.

Re: Website lookup from a list and link capture

Posted: Fri Jun 28, 2013 6:59 pm
by Support
ok this one is working, I just tested on 3 sites. It creates the output file in the same folder as the input file, with the same name but .output.txt appended to it...

Code: Select all

PerlRegEx = Yes;
Output.Clear;

Keyworkds = 'contact,about,music';

sk = New(Stack);
sk.Split(Keyworkds, ',');
sk.Reverse;

fn = SelectFile('Open file...');
if fn = Nothing then Terminate;

f = New(File);
f.Open(fn);
f.Seek(BeginningOfFile);

DecodeFileName(fn, [drv,dir,fln], [Drive,Directory,FileName]);
outfile = drv+dir+fln + '.output.txt';
f2 = New(File);
f2.Open(outfile);
f2.Truncate;

Link = New(URL);

while f.Position < f.Size do begin

   lnk = f.Read;

   if Link.Get(lnk) then begin
      hdr = Link.Headers;
      f2.Write('-'*80+crlf);
      f2.Write(lnk+crlf);
      f2.Write('-'*80+crlf);
      f2.Write(Trim(hdr)+crlf);
      f2.Write('-'*80+crlf);
      for i = 1 to sk.Count do begin
        k = sk.Items[i];
         cnt = WildGet(Link.Data, 'href="([^"]*'+k+'[^"]*)"');
         if cnt = Nothing then
            cnt = WildGet(Link.Data, "href='([^']*"+k+"[^']*)'");
         if cnt then begin
            cnt = Link.FixUp(cnt);
            f2.Write(cnt+crlf);
         end;
      end;
   end;
   f2.Write(crlf);
   f2.Write(crlf);

end;

f2.close;
f.close;

Re: Website lookup from a list and link capture

Posted: Sat Jun 29, 2013 7:23 am
by gcarmich
Perfect! Thank you

Re: Website lookup from a list and link capture

Posted: Thu Jul 04, 2013 7:47 am
by gcarmich
Is it possible to search for the keywords in both the link and the visible text on the page that references the link? Then return the link as it does is the existing code. Sometimes it says "Contacts" on the page but the link does not contain the word. Does that make sense?

Re: Website lookup from a list and link capture

Posted: Thu Jul 04, 2013 11:11 am
by Support
ok, this one will also find links that contain a keyword in the clickable text...

Code: Select all

PerlRegEx = Yes;
Output.Clear;

Keyworkds = 'contact,about,music';

sk = New(Stack);
sk.Split(Keyworkds, ',');
sk.Reverse;

fn = SelectFile('Open file...');
if fn = Nothing then Terminate;

f = New(File);
f.Open(fn);
f.Seek(BeginningOfFile);

DecodeFileName(fn, [drv,dir,fln], [Drive,Directory,FileName]);
outfile = drv+dir+fln + '.output.txt';
f2 = New(File);
f2.Open(outfile);
f2.Truncate;

Link = New(URL);

while f.Position < f.Size do begin

   lnk = f.Read;

   if Link.Get(lnk) then begin
      hdr = Link.Headers;
      f2.Write('-'*80+crlf);
      f2.Write(lnk+crlf);
      f2.Write('-'*80+crlf);
      f2.Write(Trim(hdr)+crlf);
      f2.Write('-'*80+crlf);
      for i = 1 to sk.Count do begin
        k = sk.Items[i];
         cnt = WildGet(Link.Data, 'href="([^"]*'+k+'[^"]*)"');
         if cnt = Nothing then
            cnt = WildGet(Link.Data, "href='([^']*"+k+"[^']*)'");
         if cnt = Nothing then
            cnt = WildGet(Link.Data, '<a[^>]+href="([^"]+)">[^<]*'+k);
         if cnt = Nothing then
            cnt = WildGet(Link.Data, "<a[^>]+href='([^']+)'>[^<]*"+k);
         if cnt then begin
            cnt = Link.FixUp(cnt);
            f2.Write(cnt+crlf);
         end;
      end;
   end;
   f2.Write(crlf);
   f2.Write(crlf);

end;

f2.close;
f.close;

Re: Website lookup from a list and link capture

Posted: Thu Jul 04, 2013 4:33 pm
by gcarmich
thank you

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 12:50 pm
by gcarmich
If the spider encounters a site with a login screen, is it possible to "cancel' the login and proceed to the next URL in the list?

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 12:55 pm
by gcarmich
How can I add "http://www." to the beginning of the URLs in the source .txt file before the spider tries to access the URL?

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 1:04 pm
by Support
Using the last posted script, on the line...

lnk = f.Read;

change it to...

lnk = 'http://www.' + f.Read;

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 1:10 pm
by gcarmich
thanks. Did you see my question about login screens?

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 1:38 pm
by Support
oh no I didn't :roll:

I don't think there is a way, because that window is from IE, not BR!

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 1:53 pm
by gcarmich
If I uninstalled IE would that prevent the login window from coming up and interrupting the process?