CraigList search

BeownReclise is a programmable web spider. Scan a web site and retrieve from it the information you need. You could scan a Real Estate web site and collect all of the agent addresses, phone numbers and emails, and place all this data into a tab delimited database file. Then import this data in your Excel application for example.
Post Reply
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

CraigList search

Post by innovasolutions »

Hi,

I want to be able to specify string to search in any Craiglist forums that are sold by owner. This string can match anything in the subject of post. If this cannot be scripted up I am okay of doing this part manually.

However, the search might yield multiple pages that matches. So, I want to then take each of the listings in each page and get the Description, price, when it was posted and e-mail(s) to respond to and produce a comma or pipe delimited file.

Can you help me?

Thank you
Raj
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support »

This should not be too hard. Do you have a URL of where you want to start the scan so we can better understand what you need and how to do it?
Your support team.
https://SoftByteLabs.com
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions »

Hi,

I am looking to see the price of the tickets and who is selling when what?

http://chicago.craigslist.org/search/ti ... k=&maxAsk=

So, I am searching for blackhawks tickets by owner in Chicago. "blackhawks" must match the title of the post. Now, I want to get a pipe limited output of date, e-mail to respond to, price, title of the post. If we can also get the actual text of the post that would be suberb as well.

Goal is to upload into a spreadsheet to do some analysis.

Thank you,
Raj
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support »

What I can do for you is get search the posts title and anything with "blackhawks" in it will get pulled. I can pull the title, the price, date and the CL email. Will this work for you?
Your support team.
https://SoftByteLabs.com
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions »

I think it should be fine. Appreciate your help
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support »

Here it is...

Code: Select all

Delimiter = '|'; // you can use TAB or anything else.

PerlRegEx = Yes;
Output.Clear;

inet = New(URL);
rx = New(RegEx);

NextLink = 'http://chicago.craigslist.org/search/tix?zoomToPosting=&query=blackhawks&srchType=T&minAsk=&maxAsk=';

unless NextLink = Nothing do begin

  inet.Get(NextLink);
  NextPage = WildGet(inet.Data, 'href="([^"]+)">Next');

  rx.Data = inet.Data;
  rx.Mask = '"(http://chicago\.craigslist\.org/chc/tix/\d+\.html)"';
  rx.Reset;

  while rx.Match do begin
    lnk = Decode(rx.Value[1]);
    inet.Get(lnk);

		baTitle = Decode(WildGet(inet.Data, 'postingTitle\s*=\s*"([^"]+)"'));
		if baTitle ~!= 'blackhawks' then Iterate;
		price = WildGet(baTitle, '\$[.0-9]+');
		email = Decode(WildGet(inet.Data, '"mailto:([^"]+)"'));
		email -= '\?.*';
		posted = Trim(Decode(WildGet(inet.Data, 'Posted:\s*<date>([^<]+)')));
		if posted = Nothing then
		  posted = Trim(Decode(WildGet(inet.Data, 'Edited:\s*<date>([^<]+)')));

		DataLine =
			posted +Delimiter+
			email  +Delimiter+
			price  +Delimiter+
			baTitle
		;
		Output(DataLine);
  end;

end;
Your support team.
https://SoftByteLabs.com
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions »

Thank you. It works well but the script doesn't terminate. Do you know why?
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support »

ah yes, replace NextPage on line 15 for NextLink and it will work, my mistake!
Your support team.
https://SoftByteLabs.com
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions »

Works like a champ! Thanks
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions »

There is one more problem. It doesn't pick up the last page..
User avatar
Support
Site Admin
Posts: 3004
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support »

It does, but it skip anything not in Chicago. Some of the URL have /nwc/ in it instead of /chc/

So if you want, change /chc/ on line 18 for /[^/]+/ and that will give you all of them.
Your support team.
https://SoftByteLabs.com
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions »

Yup, thank you, I realized that as well.. different suburbs have different codes.. I changed to match all as well. I am getting the hang of this.. Hopefully as I play with it I can write a few of this on my own.
Go Blackhawks!
Post Reply