Page 1 of 1

Need a solution to remove server code

Posted: Wed Mar 27, 2019 3:08 pm
by avalanch
Hi there, I need a solution which can scan htm html & other various html extensions. I need it to be able to scan in excess of over 100K files in each run without crashing and a ability to remove what looks like timestamped server generated code such as.

Code: Select all

<!-- text below generated by server. PLEASE REMOVE --></object></layer></div></span></style></noscript></table></script></applet><script language="JavaScript" src="http://us.i1.yimg.com/us.yimg.com/i/mc/mc.js"></script><script language="JavaScript" src="http://us.js2.yimg.com/us.js.yimg.com/lib/smb/js/hosting/cp/js_source/geov2_001.js"></script><script language="javascript">geovisit();</script><noscript><img src="http://visit.geocities.yahoo.com/visit.gif?us1240811868" alt="setstats" border="0" width="1" height="1"></noscript>

<IMG SRC="http://geo.yahoo.com/serv?s=76001073&t=1240811868&f=us-w6" ALT=1 WIDTH=1 HEIGHT=1>

Code: Select all

<!-- text below generated by server. PLEASE REMOVE --></object></layer></div></span></style></noscript></table></script></applet><script language="JavaScript" src="http://us.i1.yimg.com/us.yimg.com/i/mc/mc.js"></script><script language="JavaScript" src="http://us.js2.yimg.com/us.js.yimg.com/lib/smb/js/hosting/cp/js_source/geov2_001.js"></script><script language="javascript">geovisit();</script><noscript><img src="http://visit.geocities.yahoo.com/visit.gif?us1256469497" alt="setstats" border="0" width="1" height="1"></noscript>

<IMG SRC="http://geo.yahoo.com/serv?s=76001083&t=1256469497&f=us-w8" ALT=1 WIDTH=1 HEIGHT=1>
Ideally I would like it to be able to scan through subfolders as well of course and hopefully have a regex or some filter to match and delete those lines any dynamic text in between them since they are timestamped.

Is there any solution available from softbyte labs or can one be created?

I am working with the geocities torrent which you can find online with a simple search, fully extracted, this thing is over 900GB and consists of many millions of files.

A typical folder would consist of thousands of files and be a gigabyte or so, consisting of the usual static files you would encounter: Mostly .js .css .htm .html .gif .jpeg .jpg and a surprisngly low amount of .png files considering they were originally on geocities.

Re: Need a solution to remove server code

Posted: Wed Mar 27, 2019 5:08 pm
by Support
We do not have such software but if you know how to program in Pascal, you could use our BrownRecluse software to do just that.