Page 1 of 1

Extracting street address, phone number, and email from a page

Posted: Wed Jul 10, 2013 11:19 pm
by gcarmich
For the USA, is it possible to extract the street address, city, state, zip,(possibly phone and email) without knowing where this text exist on a webpage and return the extracted information in a text file? For example, have a spider read a text file of URLs.

http://www.kennedy-center.org
http://www.nbm.org

and return in a delimited text file the original URL and the address/phone from the page:

http://www.kennedy-center.org, 2700 F Street, NW Washington, DC 20566, 800-444-1324, 202-467-4600
http://www.nbm.org, 401 F Street NW, Washington, D.C. 20001, 202.272.2448

Thanks,
Gilbert

Re: Extracting street address, phone number, and email from a page

Posted: Thu Jul 11, 2013 9:56 am
by Support
Yes and no! The format changes from site to site, and not all will write the address/phone in the same way. We've done something similar for a single web site for realtors, and it didn't work that well. For example... (555)555-1212, 555-555-1212, 5555551212, 555.555.1212 etc. Then you have PO BOX, p.o box, PO-box, suite# etc. This means we have to program in all possibilities. It's a big job and the results will not be as expected.