Want a Pentium SBC with no bells or whistles

View: New views
7 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Re: need code

by AGSCalabrese :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Doing the cat sounds good.
How would you arrange the concatenation ?
      That is ......  what file goes first, second , 3rd ...........
Python or PHP might better a better choice.
Do you have any budget at all ?
Gus


> On Jul 3, 2009, at 4:44 PM, Dr Skip wrote:
>
> Alan B. Pearce wrote:
>>
>> You mean the way IE8 will save a web page as a .mht file, with  
>> pictures and
>> all in it? I don't know how compressed it is.
>>
>
> Not sure about IE8, but Firefox with addons will do that, but it's a  
> page at a
> time. There are other addons that suck web sites or dirs too, but  
> they don't
> create one file with all the pages concatenated together.
>
>> As Tamas has mentioned, this can be done with CHMs in Windows.  I  
>> think the
>> HTMLHelp tool from Microsoft compiles these - but I've only ever  
>> used that
>> utility as part of the Sandcastle toolchain (auto-documents .NET  
>> XML commented
>> code), so don't know how flexible it is.  You'll need to grab it  
>> from the MS
>> website - there's two of them IIRC, version 1 and version 2.  Hope  
>> this
>> helps.
>>
>> Regards,
>>
>> Pete Restall
>
> I'll try to take a look at this, but I suspect it will be view a  
> page - save
> the page - give the page a name, etc, go to next page and do the  
> same, etc.
>
> I may never be heard from again.... :-O
>
> Optimally, I need something that will cat a directory worth of html  
> files with
> some limited intelligence to strip out headers and metadata and such  
> so the
> whole lot would end up as one file that is readable. Maybe a  
> <printing> page
> break between what used to be the individual pages.
>
> Even a command line tool. I'm no Perl expert, but I think Perl would  
> be well
> suited (but beyond my abilities these days). It would have to  
> incorporate a lot
> of html knowledge though to selectively strip out stuff as it wrote  
> the one big
> file as a one file html doc...
>
> -Skip
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Re: need code

by AGSCalabrese :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I presume there will be images to include in the concatenated
documents.  How do you plan to deal with them ?
Does one find an image URL in the HTML and grab the image, put it
in a local folder with an updated name and updated link in the
concatenated HTML ?

Gus
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Re: need code

by Dr Skip :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The older tool from Canon (it was commercial ware but in the deep discount $5
bin when I got it for Win 95) did all that. Point it at a page, give it some
selection as to how deep to go and what domains (much like HTtrack or wget) or
dirs and filespecs if any, and it fetched and put it in one big doc. I don't
remember its file format in the end, but it could be printed to any printer
including pdf (if one had acrobat in those days). I never got it to work on XP
or NT, so I don't even know where it is now. Probably went to the thrift shop.

It was very useful, but I think Canon just didn't want to be in the software
biz unless it was based on a specific hard product of theirs. I also thought it
was such an obviously useful tool that there would be more like it as the web
took off.

Now I can't find anything like it, but for web spiders that will recreate the
site dir locally, but not put them all in one doc...

Budgeting here is an odd activity - don't ask. ;) Given the fact that the
function seems so obvious for a tool like Canon had, and the condition of the
economy, et al, it just needs to get done in one's 'extra' time...

-Skip

AGSCalabrese wrote:
> I presume there will be images to include in the concatenated
> documents.  How do you plan to deal with them ?
> Does one find an image URL in the HTML and grab the image, put it
> in a local folder with an updated name and updated link in the
> concatenated HTML ?
>
> Gus
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Parent Message unknown Re: need code

by Peter Restall :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jul 03, 2009; 11:44pm, Dr Skip wrote:

> I'll try to take a look at this, but I suspect it will be view a page - save
> the page - give the page a name, etc, go to next page and do the same, etc.
>
> I may never be heard from again.... :-O

I pointed out the HTMLHelp compiler precisely because of this; there is a
command-line interface for HTMLHelp that the Sandcastle tool uses in its
scripts.  Basically Sandcastle takes one or more XML files of comments,
transforms them via XSLT into HTML documents (one document per page - which
is one per class/method/property, etc. - a fair few in a large project) and
then invokes the HTMLHelp tool to compile them altogether into a single CHM
that can be viewed in the Windows Help Viewer (that dodgy util that pops up
when you hit F1 in an application).  Very useful for documenting .NET APIs.

AFAIK, there are two HTMLHelp versions - 1.4 (I think) is phased out in Vista,
where it's the new fangled 2.0 (again, I think).  Sandcastle can be downloaded
from:

http://www.microsoft.com/Downloads/details.aspx?FamilyID=e82ea71d-da89-42ee-a715-696e3a4873b2&displaylang=en

But you won't want Sandcastle; scroll down the page and there's the list of
links to the required/optional software.  It looks like they've changed the
name to HTML Help Workshop, although I reckon it's the MS Help Compiler that
you're after.

I'll insert a disclaimer in case I'm leading you on a wild goose chase :)
I've only ever used this in the context of running Sandcastle, which calls
this tool in its scripts; therefore the (rather large ?) assumption is that
you should also be able to call this tool from within a simple script - even
a batch script - that will glue your HTML bits and pieces together.  But
reading your other messages, it may not be what you're after - if you're
looking for a PDF of all the pages then this tool won't do.  Although
obviously once you have generated a CHM there are other manipulations you can
do to it.

In regards to your comments about stripping out headers and the like, it may
be worthwhile looking into XSLT; but that will only work if the HTML files are
reasonably well-formed and pretty uniform (otherwise you'll end up writing as
many XSLTs as there are file variations - we'd never see you again if you
ended up doing that either :)  And if you've never used XSLTs before, then as
fantastic as they are, they're likely to turn you into a basket case as well
(another 'write-only' language) !

Without too many specifics and a fair amount of assumptions, I'd reckon an
initial stab at a batch script would be something like:

        wget your HTML files
        for %%f in ( *.html ) do xslt transform
        call htmlhelp to generate chm from transformed xslt

But it may be more work than you're after - if you can find an off-the-shelf
tool that does the job for a couple of quid then it would save you a great
deal of time and effort.  Unfortunately I do not know what they would call
such a tool !

Regards,

Pete Restall
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Re: need code

by M.L.-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 2, 2009 at 9:12 PM, Dr Skip<drskip@...> wrote:

>
> I'm looking for something as simple as a Perl script to something as a
> stand-alone windows program that will take web pages (locally stored perhaps)
> and 'compile' them into a single document.


If the HTML is relatively simple it sounds trivial to write a script
that appends HTML files into a single HTML file but cuts out
extraneous header and style info.

--
Martin K.
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Re: need code

by Marechiare :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I'm looking for something as simple as a Perl script to
> something as a stand-alone windows program that will
> take web pages (locally stored perhaps) and 'compile'
> them into a single document. Canon sold one at one
> time (Win95 era) - it would do it as whole pages, and
> shrink to fit (so it must have done them as images, but
> they were very clear), and I think you could tell it how
> deep you wanted or by individual pages.

I'm afraid I don't get the main point - how is that connected to [EE]
tag. Do you need to pack the code into some Eectronics? What
Electronics do you target with Perl script and stand-alone windows
program?

Thanks
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist

Re: need code

by Gerhard Fiedler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dr Skip wrote:

> I'm looking for something as simple as a Perl script to something as a
> stand-alone windows program that will take web pages (locally stored
> perhaps) and 'compile' them into a single document.

I'm not sure what custom pre-processing you want to do, but just
printing all documents in a folder to PDF doesn't seem so complex.
pdfFactory for example is a PDF printer, and until you (manually) save
the document, it just accumulates everything printed to it into a single
PDF.

So get pdfFactory (they have an eval version), print a few files to it
and see whether that's what you want. Then write a simple script that
prints all files in a directory to it and you're done. Manual
intervention is needed to save the PDF when it's done printing all files
in the directory.

There are probably other PDF printers out there that work similarly.

Gerhard
--
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
< Prev | 1 - 2 | Next >