Page 1 of 1

CraigList search

Posted: Mon Apr 01, 2013 5:41 pm
by innovasolutions
Hi,

I want to be able to specify string to search in any Craiglist forums that are sold by owner. This string can match anything in the subject of post. If this cannot be scripted up I am okay of doing this part manually.

However, the search might yield multiple pages that matches. So, I want to then take each of the listings in each page and get the Description, price, when it was posted and e-mail(s) to respond to and produce a comma or pipe delimited file.

Can you help me?

Thank you
Raj

Re: CraigList search

Posted: Wed Apr 03, 2013 7:52 pm
by Support
This should not be too hard. Do you have a URL of where you want to start the scan so we can better understand what you need and how to do it?

Re: CraigList search

Posted: Thu Apr 04, 2013 8:17 pm
by innovasolutions
Hi,

I am looking to see the price of the tickets and who is selling when what?

http://chicago.craigslist.org/search/ti ... k=&maxAsk=

So, I am searching for blackhawks tickets by owner in Chicago. "blackhawks" must match the title of the post. Now, I want to get a pipe limited output of date, e-mail to respond to, price, title of the post. If we can also get the actual text of the post that would be suberb as well.

Goal is to upload into a spreadsheet to do some analysis.

Thank you,
Raj

Re: CraigList search

Posted: Mon Apr 08, 2013 3:27 pm
by Support
What I can do for you is get search the posts title and anything with "blackhawks" in it will get pulled. I can pull the title, the price, date and the CL email. Will this work for you?

Re: CraigList search

Posted: Mon Apr 08, 2013 6:35 pm
by innovasolutions
I think it should be fine. Appreciate your help

Re: CraigList search

Posted: Mon Apr 08, 2013 7:08 pm
by Support
Here it is...

Code: Select all

Delimiter = '|'; // you can use TAB or anything else.

PerlRegEx = Yes;
Output.Clear;

inet = New(URL);
rx = New(RegEx);

NextLink = 'http://chicago.craigslist.org/search/tix?zoomToPosting=&query=blackhawks&srchType=T&minAsk=&maxAsk=';

unless NextLink = Nothing do begin

  inet.Get(NextLink);
  NextPage = WildGet(inet.Data, 'href="([^"]+)">Next');

  rx.Data = inet.Data;
  rx.Mask = '"(http://chicago\.craigslist\.org/chc/tix/\d+\.html)"';
  rx.Reset;

  while rx.Match do begin
    lnk = Decode(rx.Value[1]);
    inet.Get(lnk);

		baTitle = Decode(WildGet(inet.Data, 'postingTitle\s*=\s*"([^"]+)"'));
		if baTitle ~!= 'blackhawks' then Iterate;
		price = WildGet(baTitle, '\$[.0-9]+');
		email = Decode(WildGet(inet.Data, '"mailto:([^"]+)"'));
		email -= '\?.*';
		posted = Trim(Decode(WildGet(inet.Data, 'Posted:\s*<date>([^<]+)')));
		if posted = Nothing then
		  posted = Trim(Decode(WildGet(inet.Data, 'Edited:\s*<date>([^<]+)')));

		DataLine =
			posted +Delimiter+
			email  +Delimiter+
			price  +Delimiter+
			baTitle
		;
		Output(DataLine);
  end;

end;

Re: CraigList search

Posted: Tue Apr 09, 2013 4:51 pm
by innovasolutions
Thank you. It works well but the script doesn't terminate. Do you know why?

Re: CraigList search

Posted: Tue Apr 09, 2013 6:12 pm
by Support
ah yes, replace NextPage on line 15 for NextLink and it will work, my mistake!

Re: CraigList search

Posted: Tue Apr 09, 2013 6:39 pm
by innovasolutions
Works like a champ! Thanks

Re: CraigList search

Posted: Tue Apr 09, 2013 7:04 pm
by innovasolutions
There is one more problem. It doesn't pick up the last page..

Re: CraigList search

Posted: Tue Apr 09, 2013 9:50 pm
by Support
It does, but it skip anything not in Chicago. Some of the URL have /nwc/ in it instead of /chc/

So if you want, change /chc/ on line 18 for /[^/]+/ and that will give you all of them.

Re: CraigList search

Posted: Wed Apr 10, 2013 7:07 am
by innovasolutions
Yup, thank you, I realized that as well.. different suburbs have different codes.. I changed to match all as well. I am getting the hang of this.. Hopefully as I play with it I can write a few of this on my own.
Go Blackhawks!