Read attached file from webpage

View: New views
6 Messages — Rating Filter:   Alert me  

Read attached file from webpage

by caymanag :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am trying to figure out how to use HttpClient to read a file that is presented as an attachment on a web page.  Below are the slightly simplified request and response headers.  See in the response headers that the file is specified as Content-Disposition attachment, filename="myfile.csv".   A web browser will correctly download this file.  I don't know how to read this file from HttpClient code.   I'm guessing that perhaps I am not paying attention to the chunked encoding that is causing me to read 0.

Request Header
(Request-Line) GET /dir1/dir2/dir3?par1=val1&par2=val2&par3=val3 HTTP/1.1
Host webhost.server.com
User-Agent Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Connection keep-alive
Cookie cookie1=cookievalue

Response Header
(Status-Line) HTTP/1.1 200 OK
Set-Cookie cookie2=value2
Content-Disposition attachment; filename="myfile.csv"
Content-Type text/csv; charset=UTF-8
Content-Encoding gzip
Transfer-Encoding chunked
Date Wed, 09 Sep 2009 02:31:38 GMT
Expires Wed, 09 Sep 2009 02:31:38 GMT
Cache-Control private, max-age=0
X-Content-Type-Options nosniff
Server servername

Here are the debug messages:

2009/09/08 23:02:48:363 EDT [DEBUG] header - >> "GET /dir1/dir2/dir3?par1=val1&par2=val2&par3=val3 HTTP/1.1 HTTP/1.1[\r][\n]"
2009/09/08 23:02:48:364 EDT [DEBUG] HttpMethodBase - Adding Host request header
2009/09/08 23:02:48:366 EDT [DEBUG] header - >> "User-Agent: Jakarta Commons-HttpClient/3.1[\r][\n]"
2009/09/08 23:02:48:366 EDT [DEBUG] header - >> "Host:webhost.server.com[\r][\n]"
2009/09/08 23:02:48:367 EDT [DEBUG] header - >> "Cookie: cookie1=cookieval[\r][\n]"
2009/09/08 23:02:48:370 EDT [DEBUG] header - >> "[\r][\n]"
2009/09/08 23:02:48:439 EDT [DEBUG] header - << "HTTP/1.1 200 OK[\r][\n]"
2009/09/08 23:02:48:440 EDT [DEBUG] header - << "HTTP/1.1 200 OK[\r][\n]"
2009/09/08 23:02:48:441 EDT [DEBUG] header - << "Set-Cookie: cookie2=value2[\r][\n]"
2009/09/08 23:02:48:442 EDT [DEBUG] header - << "Content-disposition: attachment; filename="myfile.csv"[\r][\n]"
2009/09/08 23:02:48:443 EDT [DEBUG] header - << "Content-Type: text/csv; charset=UTF-8[\r][\n]"
2009/09/08 23:02:48:444 EDT [DEBUG] header - << "Transfer-Encoding: chunked[\r][\n]"
2009/09/08 23:02:48:444 EDT [DEBUG] header - << "Date: Wed, 09 Sep 2009 03:02:48 GMT[\r][\n]"
2009/09/08 23:02:48:445 EDT [DEBUG] header - << "Expires: Wed, 09 Sep 2009 03:02:48 GMT[\r][\n]"
2009/09/08 23:02:48:446 EDT [DEBUG] header - << "Cache-Control: private, max-age=0[\r][\n]"
2009/09/08 23:02:48:447 EDT [DEBUG] header - << "X-Content-Type-Options: nosniff[\r][\n]"
2009/09/08 23:02:48:448 EDT [DEBUG] header - << "Server: servername[\r][\n]"
2009/09/08 23:02:48:448 EDT [DEBUG] header - << "[\r][\n]"
2009/09/08 23:02:48:450 EDT [DEBUG] HttpMethodBase - Cookie accepted: "cookie2=value2"
2009/09/08 23:02:48:451 EDT [DEBUG] HttpConnection - Input data available
2009/09/08 23:03:51:935 EDT [DEBUG] header - << "[\r][\n]"
2009/09/08 23:03:51:936 EDT [DEBUG] HttpMethodBase - Resorting to protocol version default close connection policy
2009/09/08 23:03:51:936 EDT [DEBUG] HttpMethodBase - Should NOT close connection, using HTTP/1.1
2009/09/08 23:03:51:937 EDT [DEBUG] HttpConnection - Releasing connection back to connection manager.

Here is the code that I wrote:

                BufferedReader reader = null;
                StringBuffer buf = null;

                GetMethod getMethod = new GetMethod(urlString);
               
                // Add any query parameters
                if (queryStringNameValuePairs != null) {
                        getMethod.setQueryString(queryStringNameValuePairs);
                }
               
                // Add any request headers
                if (requestHeaders != null) {
                        for (int i = 0; i < requestHeaders.length; i++) {
                                NameValuePair nvp = requestHeaders[i];
                                getMethod.addRequestHeader(nvp.getName(), nvp.getValue());
                        }
                }

                client.executeMethod(getMethod);
                int statuscode = getMethod.getStatusCode();

                if (statuscode == HttpStatus.SC_OK) {
                    // Here is where I had hoped to read the file
                        reader = new BufferedReader(new InputStreamReader(getMethod.getResponseBodyAsStream()));
                        buf = new StringBuffer();
                        for (String line = reader.readLine(); line != null; line = reader.readLine()) {
                                buf.append(line + "\n");
                        }
                        getMethod.releaseConnection();
                        System.out.println(buf.toString());  //nothing
                }
       
Unfortunately, there is never any data to read.   Thank you,  Cayman.

Re: Read attached file from webpage

by olegk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Sep 08, 2009 at 08:55:34PM -0700, caymanag wrote:

>
> I am trying to figure out how to use HttpClient to read a file that is
> presented as an attachment on a web page.  Below are the slightly simplified
> request and response headers.  See in the response headers that the file is
> specified as Content-Disposition attachment, filename="myfile.csv".   A web
> browser will correctly download this file.  I don't know how to read this
> file from HttpClient code.   I'm guessing that perhaps I am not paying
> attention to the chunked encoding that is causing me to read 0.
>
> Request Header
> (Request-Line) GET /dir1/dir2/dir3?par1=val1&par2=val2&par3=val3 HTTP/1.1
> Host webhost.server.com
> User-Agent Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
> rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13
> Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language en-us,en;q=0.5
> Accept-Encoding gzip,deflate
> Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Keep-Alive 300
> Connection keep-alive
> Cookie cookie1=cookievalue
>
> Response Header
> (Status-Line) HTTP/1.1 200 OK
> Set-Cookie cookie2=value2
> Content-Disposition attachment; filename="myfile.csv"
> Content-Type text/csv; charset=UTF-8
> Content-Encoding gzip
> Transfer-Encoding chunked
> Date Wed, 09 Sep 2009 02:31:38 GMT
> Expires Wed, 09 Sep 2009 02:31:38 GMT
> Cache-Control private, max-age=0
> X-Content-Type-Options nosniff
> Server servername
>
> Here are the debug messages:
>
> 2009/09/08 23:02:48:363 EDT [DEBUG] header - >> "GET
> /dir1/dir2/dir3?par1=val1&par2=val2&par3=val3 HTTP/1.1 HTTP/1.1[\r][\n]"
> 2009/09/08 23:02:48:364 EDT [DEBUG] HttpMethodBase - Adding Host request
> header
> 2009/09/08 23:02:48:366 EDT [DEBUG] header - >> "User-Agent: Jakarta
> Commons-HttpClient/3.1[\r][\n]"
> 2009/09/08 23:02:48:366 EDT [DEBUG] header - >>
> "Host:webhost.server.com[\r][\n]"
> 2009/09/08 23:02:48:367 EDT [DEBUG] header - >> "Cookie:
> cookie1=cookieval[\r][\n]"
> 2009/09/08 23:02:48:370 EDT [DEBUG] header - >> "[\r][\n]"
> 2009/09/08 23:02:48:439 EDT [DEBUG] header - << "HTTP/1.1 200 OK[\r][\n]"
> 2009/09/08 23:02:48:440 EDT [DEBUG] header - << "HTTP/1.1 200 OK[\r][\n]"
> 2009/09/08 23:02:48:441 EDT [DEBUG] header - << "Set-Cookie:
> cookie2=value2[\r][\n]"
> 2009/09/08 23:02:48:442 EDT [DEBUG] header - << "Content-disposition:
> attachment; filename="myfile.csv"[\r][\n]"
> 2009/09/08 23:02:48:443 EDT [DEBUG] header - << "Content-Type: text/csv;
> charset=UTF-8[\r][\n]"
> 2009/09/08 23:02:48:444 EDT [DEBUG] header - << "Transfer-Encoding:
> chunked[\r][\n]"
> 2009/09/08 23:02:48:444 EDT [DEBUG] header - << "Date: Wed, 09 Sep 2009
> 03:02:48 GMT[\r][\n]"
> 2009/09/08 23:02:48:445 EDT [DEBUG] header - << "Expires: Wed, 09 Sep 2009
> 03:02:48 GMT[\r][\n]"
> 2009/09/08 23:02:48:446 EDT [DEBUG] header - << "Cache-Control: private,
> max-age=0[\r][\n]"
> 2009/09/08 23:02:48:447 EDT [DEBUG] header - << "X-Content-Type-Options:
> nosniff[\r][\n]"
> 2009/09/08 23:02:48:448 EDT [DEBUG] header - << "Server: servername[\r][\n]"
> 2009/09/08 23:02:48:448 EDT [DEBUG] header - << "[\r][\n]"
> 2009/09/08 23:02:48:450 EDT [DEBUG] HttpMethodBase - Cookie accepted:
> "cookie2=value2"
> 2009/09/08 23:02:48:451 EDT [DEBUG] HttpConnection - Input data available
> 2009/09/08 23:03:51:935 EDT [DEBUG] header - << "[\r][\n]"
> 2009/09/08 23:03:51:936 EDT [DEBUG] HttpMethodBase - Resorting to protocol
> version default close connection policy
> 2009/09/08 23:03:51:936 EDT [DEBUG] HttpMethodBase - Should NOT close
> connection, using HTTP/1.1
> 2009/09/08 23:03:51:937 EDT [DEBUG] HttpConnection - Releasing connection
> back to connection manager.
>

I see no content sent by the server at all. Is this wire log complete?

Oleg



> // Add any query parameters
> if (queryStringNameValuePairs != null) {
> getMethod.setQueryString(queryStringNameValuePairs);
> }
>
> // Add any request headers
> if (requestHeaders != null) {
> for (int i = 0; i < requestHeaders.length; i++) {
> NameValuePair nvp = requestHeaders[i];
> getMethod.addRequestHeader(nvp.getName(), nvp.getValue());
> }
> }
>
> client.executeMethod(getMethod);
> int statuscode = getMethod.getStatusCode();
>
> if (statuscode == HttpStatus.SC_OK) {
>   // Here is where I had hoped to read the file
> reader = new BufferedReader(new
> InputStreamReader(getMethod.getResponseBodyAsStream()));
> buf = new StringBuffer();
> for (String line = reader.readLine(); line != null; line =
> reader.readLine()) {
> buf.append(line + "\n");
> }
> getMethod.releaseConnection();
> System.out.println(buf.toString());  //nothing
> }
>
> Unfortunately, there is never any data to read.   Thank you,  Cayman.
>
> --
> View this message in context: http://www.nabble.com/Read-attached-file-from-webpage-tp25358160p25358160.html
> Sent from the HttpClient-User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@...
> For additional commands, e-mail: httpclient-users-help@...
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@...
For additional commands, e-mail: httpclient-users-help@...


Re: Read attached file from webpage

by caymanag :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The log is the (DEBUG) messages that I'm getting from my session using log4j.  From the Content-disposition: attachment; filename="myfile.csv" response header I see that there is a file myfile.csv to be downloaded, I just don't know how to do it.  If I paste the same GET into a web browser, I do get the file.

I changed the reader code as follows:
int buf = new int[10000]; int ptr = 0;
while (true) {
  buf[ptr++] = reader.read();
}

All that I end up reading is 13,10 followed by lots of -1 values.

olegk wrote:

I see no content sent by the server at all. Is this wire log complete?

Oleg

Re: Read attached file from webpage

by olegk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 09, 2009 at 04:45:52AM -0700, caymanag wrote:

>
> The log is the (DEBUG) messages that I'm getting from my session using log4j.
> From the Content-disposition: attachment; filename="myfile.csv" response
> header I see that there is a file myfile.csv to be downloaded, I just don't
> know how to do it.  If I paste the same GET into a web browser, I do get the
> file.
>
> I changed the reader code as follows:
> int buf = new int[10000]; int ptr = 0;
> while (true) {
>   buf[ptr++] = reader.read();
> }
>
> All that I end up reading is 13,10 followed by lots of -1 values.
>

That's because there is no content.

Oleg

>
> olegk wrote:
> >
> >
> >
> > I see no content sent by the server at all. Is this wire log complete?
> >
> > Oleg
> >
> >
> >
>
> --
> View this message in context: http://www.nabble.com/Read-attached-file-from-webpage-tp25358160p25363257.html
> Sent from the HttpClient-User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@...
> For additional commands, e-mail: httpclient-users-help@...
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@...
For additional commands, e-mail: httpclient-users-help@...


Re: Read attached file from webpage

by caymanag :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ok, I agree that there is apparently no content to read.   When a web browser sees the Content-disposition: attachment; filename="myfile.csv", it knows to pop up a dialog box and/or save the file to the file system.   What do I need to do in a client application to read an attachment, once I see this same header?   Do I issue another GET and somehow append the filename?  

olegk wrote:
On Wed, Sep 09, 2009 at 04:45:52AM -0700, caymanag wrote:
>
> The log is the (DEBUG) messages that I'm getting from my session using log4j.
> From the Content-disposition: attachment; filename="myfile.csv" response
> header I see that there is a file myfile.csv to be downloaded, I just don't
> know how to do it.  If I paste the same GET into a web browser, I do get the
> file.
>
> I changed the reader code as follows:
> int buf = new int[10000]; int ptr = 0;
> while (true) {
>   buf[ptr++] = reader.read();
> }
>
> All that I end up reading is 13,10 followed by lots of -1 values.
>

That's because there is no content.

Oleg

Re: Read attached file from webpage

by Sam Berlin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Your best bet to figure out what your browser is sending/receiving is using
a tool like Wireshark.

Sam

On Wed, Sep 9, 2009 at 8:35 AM, caymanag <caymanag@...> wrote:

>
> Ok, I agree that there is apparently no content to read.   When a web
> browser
> sees the Content-disposition: attachment; filename="myfile.csv", it knows
> to
> pop up a dialog box and/or save the file to the file system.   What do I
> need to do in a client application to read an attachment, once I see this
> same header?   Do I issue another GET and somehow append the filename?
>
>
> olegk wrote:
> >
> > On Wed, Sep 09, 2009 at 04:45:52AM -0700, caymanag wrote:
> >>
> >> The log is the (DEBUG) messages that I'm getting from my session using
> >> log4j.
> >> From the Content-disposition: attachment; filename="myfile.csv" response
> >> header I see that there is a file myfile.csv to be downloaded, I just
> >> don't
> >> know how to do it.  If I paste the same GET into a web browser, I do get
> >> the
> >> file.
> >>
> >> I changed the reader code as follows:
> >> int buf = new int[10000]; int ptr = 0;
> >> while (true) {
> >>   buf[ptr++] = reader.read();
> >> }
> >>
> >> All that I end up reading is 13,10 followed by lots of -1 values.
> >>
> >
> > That's because there is no content.
> >
> > Oleg
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Read-attached-file-from-webpage-tp25358160p25364070.html
> Sent from the HttpClient-User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@...
> For additional commands, e-mail: httpclient-users-help@...
>
>