Need a solution to remove server code
Posted: Wed Mar 27, 2019 3:08 pm
Hi there, I need a solution which can scan htm html & other various html extensions. I need it to be able to scan in excess of over 100K files in each run without crashing and a ability to remove what looks like timestamped server generated code such as.
Ideally I would like it to be able to scan through subfolders as well of course and hopefully have a regex or some filter to match and delete those lines any dynamic text in between them since they are timestamped.
Is there any solution available from softbyte labs or can one be created?
I am working with the geocities torrent which you can find online with a simple search, fully extracted, this thing is over 900GB and consists of many millions of files.
A typical folder would consist of thousands of files and be a gigabyte or so, consisting of the usual static files you would encounter: Mostly .js .css .htm .html .gif .jpeg .jpg and a surprisngly low amount of .png files considering they were originally on geocities.
Code: Select all
<!-- text below generated by server. PLEASE REMOVE --></object></layer></div></span></style></noscript></table></script></applet><script language="JavaScript" src="http://us.i1.yimg.com/us.yimg.com/i/mc/mc.js"></script><script language="JavaScript" src="http://us.js2.yimg.com/us.js.yimg.com/lib/smb/js/hosting/cp/js_source/geov2_001.js"></script><script language="javascript">geovisit();</script><noscript><img src="http://visit.geocities.yahoo.com/visit.gif?us1240811868" alt="setstats" border="0" width="1" height="1"></noscript>
<IMG SRC="http://geo.yahoo.com/serv?s=76001073&t=1240811868&f=us-w6" ALT=1 WIDTH=1 HEIGHT=1>
Code: Select all
<!-- text below generated by server. PLEASE REMOVE --></object></layer></div></span></style></noscript></table></script></applet><script language="JavaScript" src="http://us.i1.yimg.com/us.yimg.com/i/mc/mc.js"></script><script language="JavaScript" src="http://us.js2.yimg.com/us.js.yimg.com/lib/smb/js/hosting/cp/js_source/geov2_001.js"></script><script language="javascript">geovisit();</script><noscript><img src="http://visit.geocities.yahoo.com/visit.gif?us1256469497" alt="setstats" border="0" width="1" height="1"></noscript>
<IMG SRC="http://geo.yahoo.com/serv?s=76001083&t=1256469497&f=us-w8" ALT=1 WIDTH=1 HEIGHT=1>
Is there any solution available from softbyte labs or can one be created?
I am working with the geocities torrent which you can find online with a simple search, fully extracted, this thing is over 900GB and consists of many millions of files.
A typical folder would consist of thousands of files and be a gigabyte or so, consisting of the usual static files you would encounter: Mostly .js .css .htm .html .gif .jpeg .jpg and a surprisngly low amount of .png files considering they were originally on geocities.