Extracting street address, phone number, and email from a page

BeownReclise is a programmable web spider. Scan a web site and retrieve from it the information you need. You could scan a Real Estate web site and collect all of the agent addresses, phone numbers and emails, and place all this data into a tab delimited database file. Then import this data in your Excel application for example.
Post Reply
gcarmich
Posts: 29
Joined: Wed Jun 12, 2013 6:15 am

Extracting street address, phone number, and email from a page

Post by gcarmich » Wed Jul 10, 2013 11:19 pm

For the USA, is it possible to extract the street address, city, state, zip,(possibly phone and email) without knowing where this text exist on a webpage and return the extracted information in a text file? For example, have a spider read a text file of URLs.

http://www.kennedy-center.org
http://www.nbm.org

and return in a delimited text file the original URL and the address/phone from the page:

http://www.kennedy-center.org, 2700 F Street, NW Washington, DC 20566, 800-444-1324, 202-467-4600
http://www.nbm.org, 401 F Street NW, Washington, D.C. 20001, 202.272.2448

Thanks,
Gilbert

User avatar
Support
Site Admin
Posts: 1849
Joined: Sun Oct 02, 2011 10:49 am

Re: Extracting street address, phone number, and email from a page

Post by Support » Thu Jul 11, 2013 9:56 am

Yes and no! The format changes from site to site, and not all will write the address/phone in the same way. We've done something similar for a single web site for realtors, and it didn't work that well. For example... (555)555-1212, 555-555-1212, 5555551212, 555.555.1212 etc. Then you have PO BOX, p.o box, PO-box, suite# etc. This means we have to program in all possibilities. It's a big job and the results will not be as expected.
Your support team.
http://SoftByteLabs.com

Post Reply