A script to browse search results in a defined manner and store email adresses

BeownReclise is a programmable web spider. Scan a web site and retrieve from it the information you need. You could scan a Real Estate web site and collect all of the agent addresses, phone numbers and emails, and place all this data into a tab delimited database file. Then import this data in your Excel application for example.
Post Reply
fire_and_ice
Posts: 2
Joined: Mon Dec 23, 2013 5:31 am

A script to browse search results in a defined manner and store email adresses

Post by fire_and_ice » Mon Dec 23, 2013 9:20 am

Hello there,

I'm totally new to brownrecluse, and I have spent some time to read the manual. It seems that brownrecluse is able to peform an interesting task to me.
In this topic, I would like to describe the task that I want to perform, being as precise as possible, and obtain a script that I can run on my computer.

Create a text file. This file is completed as the robot crawls.
A new line is added to the file when the robot finds one email address in a page during the exploration.
This line list the URL where the email address was found, the email itself, and any other email addresses also present on the page.
File organization
1st column = URL
2nd column = email 1
Xth column = email x
The columns are separated with tab.


Robot

The Robot start with this page:
http://www.ncbi.nlm.nih.gov/pubmed/?ter ... ll+size%22

1) First, I would like to explore (more details below step 2) the search pages. Here there is ~22143 results spread on 1108 pages. I would like to open each page of the search to perform some operations. If there is no more pages, I want the script to end.

2) On each result page: I want to explore (more details below step 3) only the links of the search page. Once the requested exploration and tasks are performed for a link, I want to return to that result page to explore the remaining results the same way until there is not any unexplored links left. If there is not any unexplored link, I want to go to the next page.

3) open each result URL
First, I want to check if there is an email address in a particular field of the webpage. It is in a dropdown menu, a “toggle”. When this menu is closed (by default) the code looks like:

<a title="Open/close author information list" class="jig-ncbitoggler ui-widget ui-ncbitoggler" href="#" aria-disabled="false" role="button" aria-expanded="false"><span class="ui-ncbitoggler-master-text">Author information </span><span class="ui-icon ui-icon-triangle-1-e"></span></a>

When it is open, the source code looks like this:

<a title="Open/close author information list" class="jig-ncbitoggler ui-widget ui-ncbitoggler-open" href="#" aria-disabled="false" role="button" aria-expanded="true"><span class="ui-ncbitoggler-master-text">Author information </span><span class="ui-icon ui-icon-triangle-1-s"></span></a>

If there is an email address in this dropdown content, I want the email to be stored, as well as the absolute link leading to this URL and as well as any other email address in the TXT. Then, I want to go back to the results page.

If there is no email adresses in this dropdown content, I want to explore a link always at the same place on the results pages. Most of the time is it a clickable logo. Sometimes it’s just a text line. The html code look like this. The target URL is underlined.

<div class="supplemental col three_col last">
<h2 class="offscreen_noflow">Supplemental Content</h2>
<div>
<div class="icons"><a href="http://www.karger.com?DOI=000356405&typ=pdf" ref="PrId=3030&itool=Abstract-def&uid=24334922&nlmid=0370307&db=pubmed&log$=linkouticon" journal="Acta Cytol" target="_blank"><img alt="Icon for S. Karger AG, Basel, Switzerland" title="Read full text in S. Karger AG, Basel, Switzerland" src="//www.ncbi.nlm.nih.gov/corehtml/query/egi ... nlm_ft.gif" border="0"></a> </div>


If there is no more URL to explore at this place, return to the search results

4) In this new link, I would like to search for emails in the entire page. Be the email addresses found or not, when the search if over the robot returns to the current page in the search results.

Can you please help me with a script to perform this task? If there is something unclear, I'll answer.

User avatar
Support
Site Admin
Posts: 1731
Joined: Sun Oct 02, 2011 10:49 am

Re: A script to browse search results in a defined manner and store email adresses

Post by Support » Tue Dec 24, 2013 1:03 am

Hello,

I've looked at the site and we can write this script, but we can only do so for customers of BrownRecluse, and there is a few of $35 to write the script.
Your support team.
http://SoftByteLabs.com

fire_and_ice
Posts: 2
Joined: Mon Dec 23, 2013 5:31 am

Re: A script to browse search results in a defined manner and store email adresses

Post by fire_and_ice » Fri Jan 03, 2014 5:06 am

Hello, and happy new year !

Thank you for your reply. My purpose is to buy brownrecluse of course, as long as it is capable of doing what I want. So if you tell me that you can write the script and that it is possible, I will buy and pay the extra for you to develop the script.

However, since I understand there is an extra for the script development, I took time to have a deep reflexion on the script. I thus added some functionnalities, and modified some others. I would like to send the complete script storyboard to a developer to have confirmation that it possible.

Whom can I send the script storyboard to?

Thank you in advance.

User avatar
Support
Site Admin
Posts: 1731
Joined: Sun Oct 02, 2011 10:49 am

Re: A script to browse search results in a defined manner and store email adresses

Post by Support » Fri Jan 03, 2014 12:30 pm

You can send it to me via a PM (private message).

Have a Happy and Healthy New Year...
Your support team.
http://SoftByteLabs.com

Post Reply