CraigList search

BeownReclise is a programmable web spider. Scan a web site and retrieve from it the information you need. You could scan a Real Estate web site and collect all of the agent addresses, phone numbers and emails, and place all this data into a tab delimited database file. Then import this data in your Excel application for example.
Post Reply
innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

CraigList search

Post by innovasolutions » Mon Apr 01, 2013 5:41 pm

Hi,

I want to be able to specify string to search in any Craiglist forums that are sold by owner. This string can match anything in the subject of post. If this cannot be scripted up I am okay of doing this part manually.

However, the search might yield multiple pages that matches. So, I want to then take each of the listings in each page and get the Description, price, when it was posted and e-mail(s) to respond to and produce a comma or pipe delimited file.

Can you help me?

Thank you
Raj

User avatar
Support
Site Admin
Posts: 1720
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support » Wed Apr 03, 2013 7:52 pm

This should not be too hard. Do you have a URL of where you want to start the scan so we can better understand what you need and how to do it?
Your support team.
http://SoftByteLabs.com

innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions » Thu Apr 04, 2013 8:17 pm

Hi,

I am looking to see the price of the tickets and who is selling when what?

http://chicago.craigslist.org/search/ti ... k=&maxAsk=

So, I am searching for blackhawks tickets by owner in Chicago. "blackhawks" must match the title of the post. Now, I want to get a pipe limited output of date, e-mail to respond to, price, title of the post. If we can also get the actual text of the post that would be suberb as well.

Goal is to upload into a spreadsheet to do some analysis.

Thank you,
Raj

User avatar
Support
Site Admin
Posts: 1720
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support » Mon Apr 08, 2013 3:27 pm

What I can do for you is get search the posts title and anything with "blackhawks" in it will get pulled. I can pull the title, the price, date and the CL email. Will this work for you?
Your support team.
http://SoftByteLabs.com

innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions » Mon Apr 08, 2013 6:35 pm

I think it should be fine. Appreciate your help

User avatar
Support
Site Admin
Posts: 1720
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support » Mon Apr 08, 2013 7:08 pm

Here it is...

Code: Select all

Delimiter = '|'; // you can use TAB or anything else.

PerlRegEx = Yes;
Output.Clear;

inet = New(URL);
rx = New(RegEx);

NextLink = 'http://chicago.craigslist.org/search/tix?zoomToPosting=&query=blackhawks&srchType=T&minAsk=&maxAsk=';

unless NextLink = Nothing do begin

  inet.Get(NextLink);
  NextPage = WildGet(inet.Data, 'href="([^"]+)">Next');

  rx.Data = inet.Data;
  rx.Mask = '"(http://chicago\.craigslist\.org/chc/tix/\d+\.html)"';
  rx.Reset;

  while rx.Match do begin
    lnk = Decode(rx.Value[1]);
    inet.Get(lnk);

		baTitle = Decode(WildGet(inet.Data, 'postingTitle\s*=\s*"([^"]+)"'));
		if baTitle ~!= 'blackhawks' then Iterate;
		price = WildGet(baTitle, '\$[.0-9]+');
		email = Decode(WildGet(inet.Data, '"mailto:([^"]+)"'));
		email -= '\?.*';
		posted = Trim(Decode(WildGet(inet.Data, 'Posted:\s*<date>([^<]+)')));
		if posted = Nothing then
		  posted = Trim(Decode(WildGet(inet.Data, 'Edited:\s*<date>([^<]+)')));

		DataLine =
			posted +Delimiter+
			email  +Delimiter+
			price  +Delimiter+
			baTitle
		;
		Output(DataLine);
  end;

end;
Your support team.
http://SoftByteLabs.com

innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions » Tue Apr 09, 2013 4:51 pm

Thank you. It works well but the script doesn't terminate. Do you know why?

User avatar
Support
Site Admin
Posts: 1720
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support » Tue Apr 09, 2013 6:12 pm

ah yes, replace NextPage on line 15 for NextLink and it will work, my mistake!
Your support team.
http://SoftByteLabs.com

innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions » Tue Apr 09, 2013 6:39 pm

Works like a champ! Thanks

innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions » Tue Apr 09, 2013 7:04 pm

There is one more problem. It doesn't pick up the last page..

User avatar
Support
Site Admin
Posts: 1720
Joined: Sun Oct 02, 2011 10:49 am

Re: CraigList search

Post by Support » Tue Apr 09, 2013 9:50 pm

It does, but it skip anything not in Chicago. Some of the URL have /nwc/ in it instead of /chc/

So if you want, change /chc/ on line 18 for /[^/]+/ and that will give you all of them.
Your support team.
http://SoftByteLabs.com

innovasolutions
Posts: 7
Joined: Sun Mar 10, 2013 8:47 pm

Re: CraigList search

Post by innovasolutions » Wed Apr 10, 2013 7:07 am

Yup, thank you, I realized that as well.. different suburbs have different codes.. I changed to match all as well. I am getting the hang of this.. Hopefully as I play with it I can write a few of this on my own.
Go Blackhawks!

Post Reply