[patch] #562: cabal-install update fails going through a HTTP proxy

View: New views
11 Messages — Rating Filter:   Alert me  

[patch] #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Cabal maintainers,

A couple of days ago I was unable to cabal-install a library from
Hackage -- 00-index.tar.gz was not downloaded completely.  And if
obtaining one with wget, other .tar.gz files (packages) still could not
be fetched in full.

This problem has been reported already (see
http://hackage.haskell.org/trac/hackage/ticket/562).

06/11/09 09:49:25 changed by duncan:
> Why has this started cropping up all of a sudden? Never seen this
> before then 3 reports in as many days. Do we suspect HTTP-4000.0.6 ->
> 7 perhaps?

06/11/09 13:02:47 changed by michaeldever:
> So it's definitely a problem with the HTTP package in my opinion. I'm
> not sure if it is a problem with the packages proxy handling, as it
> does download some of the package, but not all of it.
>
> Seeing as both the Zlib library, and tar yield an end of stream error,
> its something that I'm reckoning is happening during transport.

06/13/09 09:05:26 changed by michaeldever:
> http://trac.haskell.org/http/ticket/8#comment:1

The bug is not in HTTP API, but in the way cabal-install uses it.

The type of HTTP response body is polymorphic within `HTTP' library
(rspBody :: Response a -> a) but it is specialized to Lazy.ByteString by
cabal-install's `getHTTP' function (Distribution/Client/HttpUtils.hs).

Once the type of response body is changed to _strict_ ByteString, files
get downloaded through proxy completely.

The attached module [proxy-POC.hs] makes this quite apparent (you need
to be behind a proxy; HTTP >= 4000.0.8):

    vvv@takeshi:~/src$ time runhaskell proxy-POC.hs
    Content-Length:   1200593
    bytes downloaded: 2408
    proxy-POC.hs: user error (sizes differ)
   
    real    0m1.210s
    user    0m0.556s
    sys     0m0.052s
    vvv@takeshi:~/src$ time runhaskell -DSTRICT proxy-POC.hs
    Content-Length:   1200593
    bytes downloaded: 1200593
   
    real    0m17.956s
    user    0m0.620s
    sys     0m0.028s
    vvv@takeshi:~/src$ runhaskell proxy-POC.hs    # repeatable
    Content-Length:   1200593
    bytes downloaded: 2408
    proxy-POC.hs: user error (sizes differ)

There are only 4 lines that need to be changed (2 in HttpUtils.hs and 2
in Fetch.hs); see the accompanying patch.

...And could anyone explain me, why don't lazy ByteString cause cropped
downloads in proxy-free environment?

Thank you.

--
vvv


diff -ru /tmp/cabal-install-0.6.2/Distribution/Client/Fetch.hs ./Distribution/Client/Fetch.hs
--- /tmp/cabal-install-0.6.2/Distribution/Client/Fetch.hs 2009-02-19 15:07:52.000000000 +0200
+++ ./Distribution/Client/Fetch.hs 2009-09-18 18:34:41.989206160 +0300
@@ -73,6 +73,7 @@
          ( Response(..) )
 import Network.Stream
          ( ConnError(..) )
+import Data.ByteString.Lazy ( fromChunks )
 
 
 downloadURI :: Verbosity
@@ -89,7 +90,7 @@
     Right rsp
       | rspCode rsp == (2,0,0)
      -> do info verbosity ("Downloaded to " ++ path)
-           writeFileAtomic path (rspBody rsp)
+           writeFileAtomic path $ fromChunks [rspBody rsp]
      --FIXME: check the content-length header matches the body length.
      --TODO: stream the download into the file rather than buffering the whole
      --      thing in memory.
diff -ru /tmp/cabal-install-0.6.2/Distribution/Client/HttpUtils.hs ./Distribution/Client/HttpUtils.hs
--- /tmp/cabal-install-0.6.2/Distribution/Client/HttpUtils.hs 2009-02-19 15:07:52.000000000 +0200
+++ ./Distribution/Client/HttpUtils.hs 2009-09-18 18:35:34.722456689 +0300
@@ -15,8 +15,8 @@
          , setOutHandler, setErrHandler, setProxy, request)
 import Control.Monad
          ( mplus, join, liftM2 )
-import qualified Data.ByteString.Lazy as ByteString
-import Data.ByteString.Lazy (ByteString)
+import qualified Data.ByteString as ByteString
+import Data.ByteString (ByteString)
 #ifdef WIN32
 import System.Win32.Types
          ( DWORD, HKEY )



{-# LANGUAGE CPP #-}
{-# OPTIONS_GHC -Wall #-}
-- | Re: cabal-install update fails going through a HTTP proxy
-- [hackage ticket #562]
--
-- See http://hackage.haskell.org/trac/hackage/ticket/562#comment:9
--
module Main where

import Network.HTTP.Proxy (fetchProxy)
import Network.HTTP (Request(..), Response(..), Header(..), RequestMethod(..),
                     HeaderName(..))
import Network.HTTP.Headers (lookupHeader)
import Network.Browser (browse, request, setProxy, setOutHandler, setErrHandler)
import Network.Stream (Result)

import System.Environment (getArgs)
import Network.URI (URI(..), parseURI)
import Distribution.Simple.Utils (warn, debug)
import Distribution.Verbosity (normal, {-deafening,-} Verbosity)
import Data.Maybe (fromMaybe)
import Control.Monad (when)

--- Problems (with downloading via a proxy) disappear once strict
--- ByteStrings are used.  Thus the fix is to remove `.Lazy' suffixes.

-- #define STRICT
#ifdef STRICT
import qualified Data.ByteString as BS
import Data.ByteString (ByteString)
#else
import qualified Data.ByteString.Lazy as BS
import Data.ByteString.Lazy (ByteString)
#endif

{-----------------------------------------------------------------------
vvv@takeshi:~/src$ time runhaskell proxy-POC.hs
Content-Length:   1200593
bytes downloaded: 2408
proxy-POC.hs: user error (sizes differ)

real    0m1.210s
user    0m0.556s
sys     0m0.052s
vvv@takeshi:~/src$ time runhaskell -DSTRICT proxy-POC.hs
Content-Length:   1200593
bytes downloaded: 1200593

real    0m17.956s
user    0m0.620s
sys     0m0.028s
-----------------------------------------------------------------------}

main :: IO ()
main = do
  args <- getArgs
  rsp  <- fetch $ (args ++ ["http://hackage.haskell.org/packages/archive/"
                            ++ "00-index.tar.gz"]) !! 0
  let lenHeader = fromMaybe "" $ lookupHeader HdrContentLength (rspHeaders rsp)
      lenBody   = show $ BS.length $ rspBody rsp
  report lenHeader lenBody

report :: String -> String -> IO ()
report hdr bdy = do
  when (avail hdr) (putStrLn $ "Content-Length:   " ++ hdr)
  putStrLn ("bytes downloaded: " ++ bdy)
  when (avail hdr && hdr /= bdy) (fail "sizes differ")
    where avail = not . null

fetch :: String -> IO (Response ByteString)
fetch s = do
  case parseURI s of
    Nothing -> fail ("fetch: unable to parse URI: " ++ s)
    Just u  -> do
            Right rsp <- getHTTP normal u
            return rsp

------------------------------------------------------------------------
-- The following functions are copied (with minimal changes) from
-- cabal-install's 'Distribution.Client.HttpUtils'.

-- |Carry out a GET request, using the local proxy settings
getHTTP :: Verbosity -> URI -> IO (Result (Response ByteString))
getHTTP verbosity uri = do
                 -- p   <- proxy verbosity
                 p <- fetchProxy False
                 let req = mkRequest uri
                 (_, resp) <-
                     browse $ do
                          setErrHandler (warn verbosity . ("http error: "++))
                          setOutHandler (debug verbosity)
                          setProxy p
                          request req
                 return (Right resp)

mkRequest :: URI -> Request ByteString
mkRequest uri = Request{ rqURI     = uri
                       , rqMethod  = GET
                       , rqHeaders = [Header HdrUserAgent userAgent]
                         ++ [Header HdrCacheControl "no-cache"] -- XXX *new*
                       , rqBody    = BS.empty }
  -- where userAgent = "cabal-install/" ++ display Paths_cabal_install.version
  where userAgent = "proxy-POC.hs"


_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Parent Message unknown Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Sep 19, 2009 at 3:17 AM, Duncan Coutts <duncan@...> wrote:
> But that is exactly what makes me think it's a bug in the HTTP library.
> The HTTP library provides instances for String, strict ByteString and
> lazy ByteString. With one instance provided by the HTTP library your
> test program fails and with the other it works.

I see...

> It not completely implausible that it could be the fault of the way we
> use the HTTP library. For example if we were holding onto the lazy
> ByteString for a long period without demanding all of it then perhaps
> that could upset the network flow by causing timeouts or something,
> however I don't think anything like that is going on here. The code
> pretty swiftly takes the response and writes the content out to disk.
>
> As I mentioned in the Cabal ticket, I'd be very interested in hearing
> Sigbjorn's diagnosis before considering whether we want to work around
> the problem by switching to strict ByteString. If at all possible I
> would prefer to stick to lazy ByteString.

Okay, let's wait then.

Thank you, Duncan.

--
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Felipe Lessa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Sep 19, 2009 at 12:50:54AM +0300, Valery V. Vorotyntsev wrote:
> ...And could anyone explain me, why don't lazy ByteString cause cropped
> downloads in proxy-free environment?

What about a lazy ByteString with rnf as soon as possible?

--
Felipe.
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Duncan Coutts :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 2009-09-19 at 09:46 -0300, Felipe Lessa wrote:
> On Sat, Sep 19, 2009 at 12:50:54AM +0300, Valery V. Vorotyntsev wrote:
> > ...And could anyone explain me, why don't lazy ByteString cause cropped
> > downloads in proxy-free environment?
>
> What about a lazy ByteString with rnf as soon as possible?

That's essentially what we are doing by writing the result to file.

Duncan

_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Parent Message unknown Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Nov 15, 2009 at 12:10 AM, Sigbjorn Finne
<sigbjorn.finne@...> wrote:
>
> I've got a tentative fix in for this, not disabling closing altogether (that
> wouldn't be compliant with Connection:close handling), but delaying
> it until EOF is reached...at least that's the intent. It may not be lazy
> enough, but I'm unable to verify either way right now.
>
> If anyone's interested in testing, the repo contains the changes made --
>
>  git://code.galois.com/HTTPbis.git/

Thanks, Sigbjorn!

I'll test it this Monday (by running `cabal update' with new HTTP).

--
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Sun, Nov 15, 2009 at 12:10 AM, Sigbjorn Finne
> <sigbjorn.finne@...> wrote:
>>
>> I've got a tentative fix in for this, not disabling closing altogether (that
>> wouldn't be compliant with Connection:close handling), but delaying0
>> it until EOF is reached...at least that's the intent. It may not be lazy
>> enough, but I'm unable to verify either way right now.
>>
>> If anyone's interested in testing, the repo contains the changes made --
>>
>>  git://code.galois.com/HTTPbis.git/

On Sun, Nov 15, 2009 at 12:52 AM, Valery V. Vorotyntsev
<valery.vv@...> wrote:
> Thanks, Sigbjorn!
>
> I'll test it this Monday (by running `cabal update' with new HTTP).

Sorry, Sigbjorn, it didn't work.

I've installed HTTP-4000.0.9 (the git version) and rebuilt-reinstalled
cabal-install with it. The problem persist:

    $ cabal update
    Downloading the latest package list from hackage.haskell.org
    cabal: Codec.Compression.Zlib: premature end of compressed stream
    $
    $ runhaskell proxy-POC.hs
    Content-Length:   1304519
    bytes downloaded: 2408
    proxy-POC.hs: user error (sizes differ)

I've rolled back to patched HTTP-4000.0.8, the one with connection
closing commented out in `sendHTTP_notify'. I need this proxy
downloading thing to just work...

                                * * *

AFAIU, you cannot test HTTP operation over proxy server, can you? I'm
not sure, but installing squid might help exposing the bug.

And if you let me speculate a bit... What if we employ ByteString's
hGetContents for reading from connection?  `hGetContents' closes
handle automatically upon reaching EOF[1]. And ByteString does
buffering by itself, HTTP package would not need as many
{read,write}Block, buffer{Get,Put}Block calls[2] as there are now. The
code would be simpler with fewer places for our current bug to hide.

  [1] http://is.gd/4W9sp
  [2] http://is.gd/4W97Y

This is just an idea, nothing more.

--
Regards,
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Parent Message unknown Re: #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 16, 2009 at 3:26 PM, Sigbjorn Finne
<sigbjorn.finne@...> wrote:
> Thanks - that's good, now we know :)  I suspect tempering the notion of
> close and doing
> a socket shutdown is the way forward here. I'll see if I can play with this
> some tonight.

Good luck with that!  :)

--
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Parent Message unknown Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Valery V. Vorotyntsev wrote:
>> Sorry, Sigbjorn, it didn't work.

Sigbjorn Finne wrote:
> Thanks - that's good, now we know :)

I'm not so sure any more.

>> I've rolled back to patched HTTP-4000.0.8, the one with connection
>> closing commented out in `sendHTTP_notify'. I need this proxy
>> downloading thing to just work...

Cabal client reinstalled with "patched" HTTP-4000.0.8 was still having
occasional failures today. My proxy-POC.hs always succeeded, while
cabal failed quite often with familiar

    cabal: Codec.Compression.Zlib: premature end of compressed stream

This means that proxy-POC.hs is not trustworthy: it doesn't expose the
bug in 100% cases.

I am going to roll back even further tomorrow: I'll patch [1]
cabal-install to use strict ByteStrings and see if /that/ does the
job.  (I'm not sure in anything any more.)

  [1] http://hpaste.org/fastcgi/hpaste.fcgi/view?id=9447#a9504

> I suspect tempering the notion of close and doing a socket shutdown
> is the way forward here.

So do I...  If my suspicions are of any significance after several
delusive "Wolf!" cries [2,3].

  [2] http://hackage.haskell.org/trac/hackage/ticket/562#comment:9
  [3] http://trac.haskell.org/http/ticket/8#comment:7

> I'll see if I can play with this some tonight.

I've installed squid proxy server. Gotta make this bug exposable...

--
Regards,
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Parent Message unknown Re: [patch] #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Valery V. Vorotyntsev wrote:
>> Could you please notify me, if you make any commits tomorrow in day time?
>>
>> git port is not accessible for me in working hours: the proxied LAN allows
>> outbound connection to 80 and 443 ports only. But when I know there are
>> new commits, I'll establish GPRS connection and download 'em.

Sigbjorn Finne wrote:
> Certainly, but no immediate plans to do so -- you can see repo updates via
> http://code.galois.com/
> (and subscribe to feeds to be notified that way.)

Oh, I didn't know about these feeds. Thanks!

> It'd be nice if code.galois allowed clone/pulls via http..

It would be, yes. Only few sites seem to care.

I start to wonder, whether proxied [office] LANs are so rare nowadays?
:)

--
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Re: #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 16, 2009 at 11:15 PM, Valery V. Vorotyntsev
<valery.vv@...> wrote:

> Cabal client reinstalled with "patched" HTTP-4000.0.8 was still having
> occasional failures today. My proxy-POC.hs always succeeded, while
> cabal failed quite often with familiar
>
>    cabal: Codec.Compression.Zlib: premature end of compressed stream
>
> This means that proxy-POC.hs is not trustworthy: it doesn't expose the
> bug in 100% cases.
>
> I am going to roll back even further tomorrow: I'll patch [1]
> cabal-install to use strict ByteStrings and see if /that/ does the
> job.  (I'm not sure in anything any more.)
>
>  [1] http://hpaste.org/fastcgi/hpaste.fcgi/view?id=9447#a9504

Strict bytestrings in cabal-install do the job, yes.

> I've installed squid proxy server. Gotta make this bug exposable...

Downloading through local squid did not manifest the bug.

                                * * *

Sigbjorn, I am stuck. If you have some idea on how to pursue this bug
(print debugging with Debug.Trace?), I'll be delighted to test it.

The bug is reproducible from within office LAN (read: ``business
hours, 8 am to 16 pm UTC''):

    $ cabal update -v3
    Downloading the latest package list from hackage.haskell.org
    Sending:
    GET http://hackage.haskell.org/packages/archive/00-index.tar.gz HTTP/1.1
    Proxy-Authorization: Basic SG93IGFyZSB5b3U/
    User-Agent: cabal-install/0.6.2
    Host: hackage.haskell.org
    proxy uri host: xproxy, port: :3128
    Creating new connection to xproxy:3128
    Received:
    HTTP/1.0 200 OK
    Date: Mon, 23 Nov 2009 12:15:23 GMT
    Server: Apache/2.2.3 (Debian)
    Last-Modified: Mon, 23 Nov 2009 01:21:45 GMT
    ETag: "388dda-141754-a6117c40"
    Accept-Ranges: bytes
    Content-Length: 1316692
    Content-Type: application/x-tar
    Content-Encoding: x-gzip
    X-Cache: MISS from xproxy.foo.bar.ua
    X-Cache-Lookup: MISS from xproxy.foo.bar.ua:3128
    Via: 1.1 xproxy.foo.bar.ua:3128 (squid/2.7.STABLE6)
    Connection: close
    Downloaded to /home/vvv/.cabal/packages/hackage.haskell.org/00-index.tar.gz
    cabal: Codec.Compression.Zlib: premature end of compressed stream

Have fun!

--
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries

Parent Message unknown Re: #562: cabal-install update fails going through a HTTP proxy

by Valery V. Vorotyntsev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sigbjorn Finne <sigbjorn.finne@...> wrote:
>
> Despite repeated attempts from this end, I am unable to reproduce
> this via local proxies.
>
> Hence chasing down whatever problem is biting you (and others) is
> tricky without a repro case.

I'll try this change tomorrow:

-----BEGIN DIFF-----
diff --git a/Network/TCP.hs b/Network/TCP.hs
index 7d3dbe7..04b8e2a 100644
--- a/Network/TCP.hs
+++ b/Network/TCP.hs
@@ -34,7 +34,7 @@ module Network.TCP

 import Network.BSD (getHostByName, hostAddresses)
 import Network.Socket
-   ( Socket, SockAddr(SockAddrInet), SocketOption(KeepAlive)
+  ( Socket, SockAddr(SockAddrInet), SocketOption(KeepAlive, Linger)
    , SocketType(Stream), inet_addr, connect
    , shutdown, ShutdownCmd(..)
    , sClose, setSocketOption, getPeerName
@@ -189,6 +189,7 @@ openTCPConnection_ :: BufferType ty => String ->
Int -> Bool -
 openTCPConnection_ uri port stashInput = do
     s <- socket AF_INET Stream 6
     setSocketOption s KeepAlive 1
+    setSocketOption s Linger 5
     hostA <- getHostAddr uri
     let a = SockAddrInet (toEnum port) hostA
     catchIO (connect s a) (\e -> sClose s >> ioError e)
-----END DIFF-----

| If there is still data waiting to be transmitted over the
| connection, normally `close' tries to complete this transmission.
| You can control this behavior using the `SO_LINGER' socket option to
| specify a timeout period; see *note _Socket Options_.
 [http://www.gnu.org/s/libc/manual/html_node/Closing-a-Socket.html]

--
vvv
_______________________________________________
Libraries mailing list
Libraries@...
http://www.haskell.org/mailman/listinfo/libraries