Page 2 of 2

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 2:34 pm
by Support
Unfortunately not, because BR need IE to begin with!

Re: Website lookup from a list and link capture

Posted: Wed Jul 24, 2013 2:37 pm
by gcarmich
Ok. Thanks

Re: Website lookup from a list and link capture

Posted: Thu Jul 25, 2013 10:12 pm
by gcarmich
Could the script run a timer that forces that spider to go to the next URL once the timer expires? Would this circumvent the login popup?

Re: Website lookup from a list and link capture

Posted: Thu Jul 25, 2013 10:41 pm
by Support
Unfortunately not!

Re: Website lookup from a list and link capture

Posted: Tue Jul 30, 2013 1:37 pm
by gcarmich
Using this script, how can I add pull of meta data from header- keywords and content- into the delimited output file?

Re: Website lookup from a list and link capture

Posted: Tue Jul 30, 2013 2:18 pm
by Support
Can you give me an example?

Re: Website lookup from a list and link capture

Posted: Tue Jul 30, 2013 6:55 pm
by gcarmich
From this URL:

http://www.usaultimate.org/index.html

if I look at the source on lines 5 and 6 I see:

<title id="PageTitle">USA Ultimate | Home Page</title>
<meta name="keywords" content="USA Ultimate,UPA,Ultimate,Disc,National Governing Body,Ultimate Players Association,US Ultimate,Spirit of the Game,Self Officiated,Sportsmanship,SOTG,Frisbee,College Ultimate,Club Ultimate,Youth Ultimate,Juniors Ultimate,Juniors Frisbee,Championships,Sanctioning,Observers,Coaching,WFDF,Nationals,Regionals,Sectionals,Score Reporter,Ultrastar,Tournament,League,Ultimate Videos,Ultimate Photos,Ultimate Tournament,Huck,Pull,Flying Disc,Layout,Forehand,Backhand,Hammer,Field Sport" />

I'd like to output to the txt file the keywords and title page.

If possible, I like to also output the date of the last update to the page.

Re: Website lookup from a list and link capture

Posted: Tue Jul 30, 2013 10:11 pm
by Support
ok no problem but, are you implementing this in an existing script? and if so, which one?

Re: Website lookup from a list and link capture

Posted: Wed Jul 31, 2013 7:01 am
by gcarmich
I made some slight modifications to the last script you provided by this thread. I wanted the output to be in a delimited format. I chose "|" as my delimiter because a "," caused issues in reading the output file into Excel (there is probably a better delimiter to use). I also changed the script to put just the server status instead of the full header.

PerlRegEx = Yes;
Output.Clear;

Keyworkds = 'about, contact';

sk = New(Stack);
sk.Split(Keyworkds, ',');
sk.Reverse;

fn = SelectFile('Open file...');
if fn = Nothing then Terminate;

f = New(File);
f.Open(fn);
f.Seek(BeginningOfFile);

DecodeFileName(fn, [drv,dir,fln], [Drive,Directory,FileName]);
outfile = drv+dir+fln + '.output.txt';
f2 = New(File);
f2.Open(outfile);
f2.Truncate;

Link = New(URL);

while f.Position < f.Size do begin

lnk = f.Read;

if Link.Get(lnk) then begin
hdr = Link.ServerCode;
f2.Write(lnk+'|');
f2.Write(Trim(hdr)+'|');
for i = 1 to sk.Count do begin
k = sk.Items;
cnt = WildGet(Link.Data, 'href="([^"]*'+k+'[^"]*)"');
if cnt = Nothing then
cnt = WildGet(Link.Data, "href='([^']*"+k+"[^']*)'");
if cnt = Nothing then
cnt = WildGet(Link.Data, '<a[^>]+href="([^"]+)">[^<]*'+k);
if cnt = Nothing then
cnt = WildGet(Link.Data, "<a[^>]+href='([^']+)'>[^<]*"+k);
if cnt then begin
cnt = Link.FixUp(cnt);
f2.Write(cnt+'|');
end;
end;
end;
f2.Write(crlf);
end;

f2.close;
f.close;

Re: Website lookup from a list and link capture

Posted: Wed Jul 31, 2013 2:13 pm
by Support
ok, then add these 2 lines right after if Link.Get(lnk) then begin

Code: Select all

	  PageTitle = WildGet(Link.Data, '<title[^>]*>([^<]*)');
	  PageKeywords = WildGet(Link.Data, '<meta\s+name="keywords"\s+content="([^"]+)"');
and then add PageTitle and PageKeywords to your f2.Write command. I use TAB as a delimiter because Excel loves tabs when importing...

f2.Write(cnt+TAB+PageTitle+TAB+PageKeywords);

Re: Website lookup from a list and link capture

Posted: Wed Jul 31, 2013 2:14 pm
by Support
One more thing, when you want to past code in the message, click on the Code button, it'll show your code the same way as mine.

Re: Website lookup from a list and link capture

Posted: Thu Aug 01, 2013 9:18 pm
by gcarmich
Works great. thanks! I see the code button - I'll use it next time.

Can BrownRecluse compile and execute javascript code?

Re: Website lookup from a list and link capture

Posted: Thu Aug 01, 2013 9:22 pm
by Support
It does not execute javascript because this would be a huge security risk. But if you have a page with some javascript code, you can copy it and with little modification, it will run in BrownRecluse to some extend.

Re: Website lookup from a list and link capture

Posted: Thu Aug 01, 2013 9:23 pm
by gcarmich
ok. thanks

Re: Website lookup from a list and link capture

Posted: Sun Aug 18, 2013 8:21 pm
by gcarmich
For the script in this thread, is there a way to retrieve the server IP address for each URL and place it in the .txt output file?

Re: Website lookup from a list and link capture

Posted: Sun Aug 18, 2013 8:44 pm
by Support
I do not believe so. I can't find ant reference for getting the IP.

Re: Website lookup from a list and link capture

Posted: Wed Aug 21, 2013 9:06 am
by gcarmich
If I wanted to capture phone numbers, URLs, and ZipCode in my script using:

\((?<AreaCode>\d{3})\)\s*(?<Number>\d{3}(?:-|\s*)\d{4})(?x) # Phone numbers

(?<Protocol>\w+):\/\/(?<Domain>[\w.]+\/?)\S*(?x) # URL

(?<Zip>\d{5})-(?<Sub>\d{4})(?x) # Zip Codes

Can I insert these expressions directly to the script?

Re: Website lookup from a list and link capture

Posted: Wed Aug 21, 2013 11:36 am
by Support
Yes you can. But I would suggest you try them first in the "Expression evaluator" window to make sure they work.

Re: Website lookup from a list and link capture

Posted: Mon Aug 26, 2013 9:23 pm
by gcarmich
The script runs great but I am having trouble breaking out the results from the output file in an organized way. Would it be possible to output the file with column headings referring to each keyword and header result? if there is no result, the field would be empty?

Re: Website lookup from a list and link capture

Posted: Tue Sep 03, 2013 8:27 pm
by gcarmich
I've noticed in the some of the data captured there is a line feed? and maybe a tab occasionally. Is there a way to encapsulate the output or remove the line feed/tab when the data is captured?

Re: Website lookup from a list and link capture

Posted: Tue Sep 03, 2013 8:40 pm
by Support
Yes, you can remove them from the variable holding the data, for example...

Code: Select all

PageTitle = PageTitle - '\t|\r|\n';
The | character means "OR" as in "this or that", and \t means a TAB, \r means a RETURN and \n means NEWLINE. You can do x = x - 'some regex text'; as well.

Re: Website lookup from a list and link capture

Posted: Fri Sep 06, 2013 6:53 am
by gcarmich
The script runs great but I am having trouble breaking out the results from the output file in an organized way. Would it be possible to output the file with column headings referring to each keyword and header result? if there is no result, the field would be empty?

Re: Website lookup from a list and link capture

Posted: Fri Sep 06, 2013 8:05 am
by gcarmich
Or would it be better to output a separate file for each keyword/header result since there might be more than one result for each keyword/header query?

Re: Website lookup from a list and link capture

Posted: Fri Sep 06, 2013 10:44 am
by Support
What you need to do is write to the file each fields so to keep the columns. The way you have it now is it writes only if the field is not empty...

if cnt then begin
cnt = Link.FixUp(cnt);
f2.Write(cnt+'|');
end;

should be...

if cnt then
cnt = Link.FixUp(cnt)

f2.Write(cnt+'|');