I need some help extracting Song information from Amazon.
From this Amazon link
http://www.amazon.com/s/qid=1345588772/ ... A625151011
I need to extract to a pipe delimited file
Song title|Artist|Album|Time
each page contains 50 songs - I would like to goto the next page and extract looping until there are no more pages.
I looked at the source code and each field is different for each song.
How can I do this?
Thanks
Steve
Extract data from Amazon search results
Re: Extract data from Amazon search results
Here is a script that will do just that...
Code: Select all
Output.Clear;
PerlRegEx = Yes;
Link = New(URL);
rx = New(RegEx);
NextPage = 'http://www.amazon.com/s/qid=1345588772/ref=sr_pg_1?ie=UTF8&keywords=gospel%20mp3&page=1&rh=n%3A163856011%2Cn%3A!624868011%2Cn%3A624905011%2Ck%3Agospel%20mp3%2Cp_n_feature_browse-bin%3A625151011%2Cp_n_feature_browse-bin%3A625151011';
while NextPage do begin
Link.Get(NextPage);
NextPage = WildGet(Link.Data, '<span class="pagnNext"><a href="([^"]+)');
rx.Data = Link.Data;
rx.Mask = '<table border="0" cellspacing="0" cellpadding="0">\s*<tr>\s*<td width="25">\d+\. </td>\s*<td><a href="[^"]+">(.*?)</a></td>\s*</tr>\s*</table>\s*</td>\s*<td class="titleCol[^"]+"><a href="[^"]+">(.*?)</a></td>\s*<td class="titleCol[^"]+"><a href="[^"]+">(.*?)</a></td>\s*<td class="priceCol[^"]+">(.*?)</td>\s*';
rx.Reset;
while rx.Match do begin
f1 = Trim(Decode(rx.Value[1]));
f2 = Trim(Decode(rx.Value[2]));
f3 = Trim(Decode(rx.Value[3]));
f4 = Trim(Decode(rx.Value[4]));
DataLine = f1 +'|'+ f2 +'|'+ f3 +'|'+ f4;
Output(DataLine);
end;
end;
Your support team.
https://SoftByteLabs.com
https://SoftByteLabs.com
Re: Extract data from Amazon search results
Wow!
Thanks
Thanks