|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: need codeDoing the cat sounds good.
How would you arrange the concatenation ? That is ...... what file goes first, second , 3rd ........... Python or PHP might better a better choice. Do you have any budget at all ? Gus > On Jul 3, 2009, at 4:44 PM, Dr Skip wrote: > > Alan B. Pearce wrote: >> >> You mean the way IE8 will save a web page as a .mht file, with >> pictures and >> all in it? I don't know how compressed it is. >> > > Not sure about IE8, but Firefox with addons will do that, but it's a > page at a > time. There are other addons that suck web sites or dirs too, but > they don't > create one file with all the pages concatenated together. > >> As Tamas has mentioned, this can be done with CHMs in Windows. I >> think the >> HTMLHelp tool from Microsoft compiles these - but I've only ever >> used that >> utility as part of the Sandcastle toolchain (auto-documents .NET >> XML commented >> code), so don't know how flexible it is. You'll need to grab it >> from the MS >> website - there's two of them IIRC, version 1 and version 2. Hope >> this >> helps. >> >> Regards, >> >> Pete Restall > > I'll try to take a look at this, but I suspect it will be view a > page - save > the page - give the page a name, etc, go to next page and do the > same, etc. > > I may never be heard from again.... :-O > > Optimally, I need something that will cat a directory worth of html > files with > some limited intelligence to strip out headers and metadata and such > so the > whole lot would end up as one file that is readable. Maybe a > <printing> page > break between what used to be the individual pages. > > Even a command line tool. I'm no Perl expert, but I think Perl would > be well > suited (but beyond my abilities these days). It would have to > incorporate a lot > of html knowledge though to selectively strip out stuff as it wrote > the one big > file as a one file html doc... > > -Skip http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist |
|
|
Re: need codeI presume there will be images to include in the concatenated
documents. How do you plan to deal with them ? Does one find an image URL in the HTML and grab the image, put it in a local folder with an updated name and updated link in the concatenated HTML ? Gus -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist |
|
|
Re: need codeThe older tool from Canon (it was commercial ware but in the deep discount $5
bin when I got it for Win 95) did all that. Point it at a page, give it some selection as to how deep to go and what domains (much like HTtrack or wget) or dirs and filespecs if any, and it fetched and put it in one big doc. I don't remember its file format in the end, but it could be printed to any printer including pdf (if one had acrobat in those days). I never got it to work on XP or NT, so I don't even know where it is now. Probably went to the thrift shop. It was very useful, but I think Canon just didn't want to be in the software biz unless it was based on a specific hard product of theirs. I also thought it was such an obviously useful tool that there would be more like it as the web took off. Now I can't find anything like it, but for web spiders that will recreate the site dir locally, but not put them all in one doc... Budgeting here is an odd activity - don't ask. ;) Given the fact that the function seems so obvious for a tool like Canon had, and the condition of the economy, et al, it just needs to get done in one's 'extra' time... -Skip AGSCalabrese wrote: > I presume there will be images to include in the concatenated > documents. How do you plan to deal with them ? > Does one find an image URL in the HTML and grab the image, put it > in a local folder with an updated name and updated link in the > concatenated HTML ? > > Gus -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist |
|
|
|
|
|
Re: need codeOn Thu, Jul 2, 2009 at 9:12 PM, Dr Skip<drskip@...> wrote:
> > I'm looking for something as simple as a Perl script to something as a > stand-alone windows program that will take web pages (locally stored perhaps) > and 'compile' them into a single document. If the HTML is relatively simple it sounds trivial to write a script that appends HTML files into a single HTML file but cuts out extraneous header and style info. -- Martin K. -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist |
|
|
Re: need code> I'm looking for something as simple as a Perl script to
> something as a stand-alone windows program that will > take web pages (locally stored perhaps) and 'compile' > them into a single document. Canon sold one at one > time (Win95 era) - it would do it as whole pages, and > shrink to fit (so it must have done them as images, but > they were very clear), and I think you could tell it how > deep you wanted or by individual pages. I'm afraid I don't get the main point - how is that connected to [EE] tag. Do you need to pack the code into some Eectronics? What Electronics do you target with Perl script and stand-alone windows program? Thanks -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist |
|
|
Re: need codeDr Skip wrote:
> I'm looking for something as simple as a Perl script to something as a > stand-alone windows program that will take web pages (locally stored > perhaps) and 'compile' them into a single document. I'm not sure what custom pre-processing you want to do, but just printing all documents in a folder to PDF doesn't seem so complex. pdfFactory for example is a PDF printer, and until you (manually) save the document, it just accumulates everything printed to it into a single PDF. So get pdfFactory (they have an eval version), print a few files to it and see whether that's what you want. Then write a simple script that prints all files in a directory to it and you're done. Manual intervention is needed to save the PDF when it's done printing all files in the directory. There are probably other PDF printers out there that work similarly. Gerhard -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |