Google Search, Results Sites Extraction...

BeownReclise is a programmable web spider. Scan a web site and retrieve from it the information you need. You could scan a Real Estate web site and collect all of the agent addresses, phone numbers and emails, and place all this data into a tab delimited database file. Then import this data in your Excel application for example.
Post Reply
ryseely
Posts: 4
Joined: Wed Mar 07, 2012 6:59 am

Google Search, Results Sites Extraction...

Post by ryseely »

I need some help on where to start. I looked at the source code and it looks like may script and CSS, but I am not able to find much in the source to extract.

Any suggestions?
Where do I begin?

Thanks!!! :roll:
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: Google Search, Results Sites Extraction...

Post by Support »

What exactly do you need to extract from the Google serch result?
Your support team.
https://SoftByteLabs.com
ryseely
Posts: 4
Joined: Wed Mar 07, 2012 6:59 am

Re: Google Search, Results Sites Extraction...

Post by ryseely »

The "TEXT SITE" name.

Here is an example. I searched for "pallet" the 1st result was

Pallet - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/PalletCached - Similar
You +1'd this publicly. Undo
A pallet sometimes called a skid, is a flat transport structure that supports goods in a stable fashion while being lifted by a forklift, pallet jack, front loader or other ...

EUR-pallet - Plastic pallet - 463L master pallet - Pallet crafts


"en.wikipedia.org/wiki/Pallet" is what I need.

Thanks!!! ;)
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: Google Search, Results Sites Extraction...

Post by Support »

ok, so you need the text that's in green then?
Your support team.
https://SoftByteLabs.com
ryseely
Posts: 4
Joined: Wed Mar 07, 2012 6:59 am

Re: Google Search, Results Sites Extraction...

Post by ryseely »

en.wikipedia.org/wiki/Pallet

Yes, the text in green.

Thanks!!! 8-)
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: Google Search, Results Sites Extraction...

Post by Support »

ok, here is the script. When you run it, the browser window will come up. Do your search and then close the browser to get the result...

Code: Select all

PerlRegEx = Yes;
Output.Clear;

Link = New(URL);
rx   = New(RegEx);

Link.Location = BrowseTo('www.google.com');

loop
  Link.Get;

  rx.Data = Link.Data;
  rx.Mask = '<cite>.*?</cite>';

  while rx.Match do begin
    lnk = rx.Value - '<[^>]*>';
    Output(lnk);
  end;

  rx.Mask = '<a href="([^"]+)"[^>]*><[^>]*><[^>]*><[^>]*>Next';
  if not rx.Match then Break;

  Link.Location = Link.Fixup(Decode(rx.Value[1]));
end;
Your support team.
https://SoftByteLabs.com
ryseely
Posts: 4
Joined: Wed Mar 07, 2012 6:59 am

Re: Google Search, Results Sites Extraction...

Post by ryseely »

Works like a dream.

Thanks for your help!!! :P
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: Google Search, Results Sites Extraction...

Post by Support »

You are welcome :)
Your support team.
https://SoftByteLabs.com
Post Reply