|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
Directory Crawler that looks like a user-agentSorry if this posted more than once. At my job we have a secure website. Every hit to the site is captured by the tracking system to the SQL Server database. We need to create an inventory system that can look at the data and tell us about the assets on the site. To get the appropriate data into the database, we need to use a directory crawler that can hit every asset and every item that it finds in the directory structure. Is there such a crawler, that can appear to be a user-agent, that can crawl a secure website? Is there such a crawler in ColdFusion? Any ideas or pointers to such a crawler would be very much appreciated. Thanks in advance, Jo-Anne Head ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:324328 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=17837.14401.4 |
|
|
Re: Directory Crawler that looks like a user-agent> At my job we have a secure website. Every hit to the site is captured by the tracking > system to the SQL Server database. > > We need to create an inventory system that can look at the data and tell us about > the assets on the site. > > To get the appropriate data into the database, we need to use a directory crawler that > can hit every asset and every item that it finds in the directory structure. > > Is there such a crawler, that can appear to be a user-agent, that can crawl a secure > website? Is there such a crawler in ColdFusion? If you want to do this using CF, you can use the CFHTTP tag as Michael mentioned. But it's not clear to me that you're capturing the information on each page; if you just need to make an HTTP request to each page so that the existing logging system logs visits, there are far easier approaches, such as wget. Of course, crawlers typically only follow plain ol' HTML links, so if you have forms-driven navigation or JavaScript-driven navigation you'll have to figure out how to get to everything. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Fig Leaf Software provides the highest caliber vendor-authorized instruction at our training centers in Washington DC, Atlanta, Chicago, Baltimore, Northern Virginia, or on-site at your location. Visit http://training.figleaf.com/ for more information! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:324332 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=17837.14401.4 |
|
|
Re: Directory Crawler that looks like a user-agentAnd you have to take into account travel via javascript, where links could be inside included .js files. An intelligent cfhttp based bot can do it all but you need to code it. There is probably anon CF based software solution already out there to do just this. On Tue, Jul 7, 2009 at 7:04 PM, Dave Watts<dwatts@...> wrote: > Of course, crawlers typically only follow plain ol' HTML links, so if > you have forms-driven navigation or JavaScript-driven navigation > you'll have to figure out how to get to everything. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:324334 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=17837.14401.4 |
| Free embeddable forum powered by Nabble | Forum Help |