Extract data from Amazon search results

BeownReclise is a programmable web spider. Scan a web site and retrieve from it the information you need. You could scan a Real Estate web site and collect all of the agent addresses, phone numbers and emails, and place all this data into a tab delimited database file. Then import this data in your Excel application for example.
Post Reply
steve1040
Posts: 5
Joined: Wed Jul 25, 2012 2:31 pm

Extract data from Amazon search results

Post by steve1040 »

I need some help extracting Song information from Amazon.
From this Amazon link
http://www.amazon.com/s/qid=1345588772/ ... A625151011

I need to extract to a pipe delimited file
Song title|Artist|Album|Time

each page contains 50 songs - I would like to goto the next page and extract looping until there are no more pages.

I looked at the source code and each field is different for each song.
How can I do this?

Thanks
Steve
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: Extract data from Amazon search results

Post by Support »

Here is a script that will do just that...

Code: Select all

Output.Clear;
PerlRegEx = Yes;

Link = New(URL);
rx = New(RegEx);

NextPage = 'http://www.amazon.com/s/qid=1345588772/ref=sr_pg_1?ie=UTF8&keywords=gospel%20mp3&page=1&rh=n%3A163856011%2Cn%3A!624868011%2Cn%3A624905011%2Ck%3Agospel%20mp3%2Cp_n_feature_browse-bin%3A625151011%2Cp_n_feature_browse-bin%3A625151011';

while NextPage do begin

	Link.Get(NextPage);
	NextPage = WildGet(Link.Data, '<span class="pagnNext"><a href="([^"]+)');

	rx.Data = Link.Data;
	rx.Mask = '<table border="0" cellspacing="0" cellpadding="0">\s*<tr>\s*<td width="25">\d+\.&nbsp;</td>\s*<td><a href="[^"]+">(.*?)</a></td>\s*</tr>\s*</table>\s*</td>\s*<td class="titleCol[^"]+"><a href="[^"]+">(.*?)</a></td>\s*<td class="titleCol[^"]+"><a href="[^"]+">(.*?)</a></td>\s*<td class="priceCol[^"]+">(.*?)</td>\s*';
	rx.Reset;

	while rx.Match do begin

		f1 = Trim(Decode(rx.Value[1]));
		f2 = Trim(Decode(rx.Value[2]));
		f3 = Trim(Decode(rx.Value[3]));
		f4 = Trim(Decode(rx.Value[4]));

		DataLine = f1 +'|'+ f2 +'|'+ f3 +'|'+ f4;
		Output(DataLine);

	end;

end;
Your support team.
https://SoftByteLabs.com
steve1040
Posts: 5
Joined: Wed Jul 25, 2012 2:31 pm

Re: Extract data from Amazon search results

Post by steve1040 »

Wow!
Thanks
Post Reply