Page 1 of 1

Extract data from Amazon search results

Posted: Tue Aug 21, 2012 5:53 pm
by steve1040
I need some help extracting Song information from Amazon.
From this Amazon link
http://www.amazon.com/s/qid=1345588772/ ... A625151011

I need to extract to a pipe delimited file
Song title|Artist|Album|Time

each page contains 50 songs - I would like to goto the next page and extract looping until there are no more pages.

I looked at the source code and each field is different for each song.
How can I do this?

Thanks
Steve

Re: Extract data from Amazon search results

Posted: Tue Aug 21, 2012 6:14 pm
by Support
Here is a script that will do just that...

Code: Select all

Output.Clear;
PerlRegEx = Yes;

Link = New(URL);
rx = New(RegEx);

NextPage = 'http://www.amazon.com/s/qid=1345588772/ref=sr_pg_1?ie=UTF8&keywords=gospel%20mp3&page=1&rh=n%3A163856011%2Cn%3A!624868011%2Cn%3A624905011%2Ck%3Agospel%20mp3%2Cp_n_feature_browse-bin%3A625151011%2Cp_n_feature_browse-bin%3A625151011';

while NextPage do begin

	Link.Get(NextPage);
	NextPage = WildGet(Link.Data, '<span class="pagnNext"><a href="([^"]+)');

	rx.Data = Link.Data;
	rx.Mask = '<table border="0" cellspacing="0" cellpadding="0">\s*<tr>\s*<td width="25">\d+\.&nbsp;</td>\s*<td><a href="[^"]+">(.*?)</a></td>\s*</tr>\s*</table>\s*</td>\s*<td class="titleCol[^"]+"><a href="[^"]+">(.*?)</a></td>\s*<td class="titleCol[^"]+"><a href="[^"]+">(.*?)</a></td>\s*<td class="priceCol[^"]+">(.*?)</td>\s*';
	rx.Reset;

	while rx.Match do begin

		f1 = Trim(Decode(rx.Value[1]));
		f2 = Trim(Decode(rx.Value[2]));
		f3 = Trim(Decode(rx.Value[3]));
		f4 = Trim(Decode(rx.Value[4]));

		DataLine = f1 +'|'+ f2 +'|'+ f3 +'|'+ f4;
		Output(DataLine);

	end;

end;

Re: Extract data from Amazon search results

Posted: Tue Aug 21, 2012 6:20 pm
by steve1040
Wow!
Thanks