Last redirect URL

View: New views
6 Messages — Rating Filter:   Alert me  

Last redirect URL

by droidin.net :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have rather simple HttpClient 4 code that calls HttpGet to get HTML output. The HTML returns with scripts and image locations all set to local (e.g. <img src="/images/foo.jpg"/>) so I need calling URL to make these into absolute (<img src="http://foo.com/images/foo.jpg"/> Now comes the problem - during the call there may be one or two 302 redirects so the original URL is no longer reflects the location of HTML. How do I get the latest URL of the returned content given all the redirects I may (or may not) have?

I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - couldn't find anything.

Re: Last redirect URL

by Ken Krugler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 21, 2009, at 2:30pm, droidin.net wrote:

>
> I have rather simple HttpClient 4 code that calls HttpGet to get  
> HTML output.
> The HTML returns with scripts and image locations all set to local  
> (e.g.
> /images/foo.jpg ) so I need calling URL to make these into absolute (
> http://foo.com/images/foo.jpg  Now comes the problem - during the  
> call there
> may be one or two 302 redirects so the original URL is no longer  
> reflects
> the location of HTML. How do I get the latest URL of the returned  
> content
> given all the redirects I may (or may not) have?
>
> I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() -
> couldn't find anything.

 From past posts on the list, I thought httpMethod.getURI() would  
return the final URL.

-- Ken


--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@...
For additional commands, e-mail: httpclient-users-help@...


Re: Last redirect URL

by olegk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Sep 21, 2009 at 05:14:05PM -0700, Ken Krugler wrote:

>
> On Sep 21, 2009, at 2:30pm, droidin.net wrote:
>
>>
>> I have rather simple HttpClient 4 code that calls HttpGet to get HTML
>> output.
>> The HTML returns with scripts and image locations all set to local  
>> (e.g.
>> /images/foo.jpg ) so I need calling URL to make these into absolute (
>> http://foo.com/images/foo.jpg  Now comes the problem - during the call
>> there
>> may be one or two 302 redirects so the original URL is no longer  
>> reflects
>> the location of HTML. How do I get the latest URL of the returned  
>> content
>> given all the redirects I may (or may not) have?
>>
>> I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() -
>> couldn't find anything.
>
> From past posts on the list, I thought httpMethod.getURI() would return
> the final URL.
>
> -- Ken
>
>

Ken,

This is only partially correct. The original request object remains unmodified.
So, one needs to retrieve the internal HttpUriRequest and HttpHost objects from
the execution context in order to find out the final request URI / target host.
For details see:

http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205

Hope this helps

Oleg

> --------------------------
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-210-6378
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@...
> For additional commands, e-mail: httpclient-users-help@...
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@...
For additional commands, e-mail: httpclient-users-help@...


Re: Last redirect URL

by Ken Krugler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Oleg,

On Sep 22, 2009, at 2:47am, Oleg Kalnichevski wrote:

> On Mon, Sep 21, 2009 at 05:14:05PM -0700, Ken Krugler wrote:
>>
>> On Sep 21, 2009, at 2:30pm, droidin.net wrote:
>>
>>>
>>> I have rather simple HttpClient 4 code that calls HttpGet to get  
>>> HTML
>>> output.
>>> The HTML returns with scripts and image locations all set to local
>>> (e.g.
>>> /images/foo.jpg ) so I need calling URL to make these into  
>>> absolute (
>>> http://foo.com/images/foo.jpg  Now comes the problem - during the  
>>> call
>>> there
>>> may be one or two 302 redirects so the original URL is no longer
>>> reflects
>>> the location of HTML. How do I get the latest URL of the returned
>>> content
>>> given all the redirects I may (or may not) have?
>>>
>>> I looked at HttpGet#getAllHeaders() and  
>>> HttpResponse#getAllHeaders() -
>>> couldn't find anything.
>>
>> From past posts on the list, I thought httpMethod.getURI() would  
>> return
>> the final URL.
>>
>> -- Ken
>>
>>
>
> Ken,
>
> This is only partially correct. The original request object remains  
> unmodified.
> So, one needs to retrieve the internal HttpUriRequest and HttpHost  
> objects from
> the execution context in order to find out the final request URI /  
> target host.
> For details see:
>
> http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205

Worked like a charm, thanks for the ref!

I assume that if the HttpClient.execute(method, context) call returns  
w/o throwing an exeception, the call to:

context.getAttribute(ExecutionContext.HTTP_TARGET_HOST)

...will always return a valid, non-null HttpHost, yes?

Thanks,

-- Ken





--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@...
For additional commands, e-mail: httpclient-users-help@...


Re: Last redirect URL

by olegk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ken Krugler wrote:

> Hi Oleg,
>
> On Sep 22, 2009, at 2:47am, Oleg Kalnichevski wrote:
>
>> On Mon, Sep 21, 2009 at 05:14:05PM -0700, Ken Krugler wrote:
>>>
>>> On Sep 21, 2009, at 2:30pm, droidin.net wrote:
>>>
>>>>
>>>> I have rather simple HttpClient 4 code that calls HttpGet to get HTML
>>>> output.
>>>> The HTML returns with scripts and image locations all set to local
>>>> (e.g.
>>>> /images/foo.jpg ) so I need calling URL to make these into absolute (
>>>> http://foo.com/images/foo.jpg  Now comes the problem - during the call
>>>> there
>>>> may be one or two 302 redirects so the original URL is no longer
>>>> reflects
>>>> the location of HTML. How do I get the latest URL of the returned
>>>> content
>>>> given all the redirects I may (or may not) have?
>>>>
>>>> I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() -
>>>> couldn't find anything.
>>>
>>> From past posts on the list, I thought httpMethod.getURI() would return
>>> the final URL.
>>>
>>> -- Ken
>>>
>>>
>>
>> Ken,
>>
>> This is only partially correct. The original request object remains
>> unmodified.
>> So, one needs to retrieve the internal HttpUriRequest and HttpHost
>> objects from
>> the execution context in order to find out the final request URI /
>> target host.
>> For details see:
>>
>> http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205 
>>
>
> Worked like a charm, thanks for the ref!
>
> I assume that if the HttpClient.execute(method, context) call returns
> w/o throwing an exeception, the call to:
>
> context.getAttribute(ExecutionContext.HTTP_TARGET_HOST)
>
> ...will always return a valid, non-null HttpHost, yes?
>
> Thanks,
>

Yes, it should

Oleg

> -- Ken
>
>
>
>
>
> --------------------------
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-210-6378
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@...
> For additional commands, e-mail: httpclient-users-help@...
>


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@...
For additional commands, e-mail: httpclient-users-help@...


Re: Last redirect URL

by Ken Krugler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

Just to log it someplace public, here's what I wound up doing to get  
the final URL (probably overkill re all of the to/from URI/URL stuff):

             getter = new HttpGet(new URI(url));
             HttpContext localContext = new BasicHttpContext();
             response = _httpClient.execute(getter, localContext);

             int httpStatus = response.getStatusLine().getStatusCode();
             if (httpStatus != HttpStatus.SC_OK) {
                 throw new HttpFetchException(url, "Error fetching " +  
url, httpStatus, headerMap);
             }

             HttpHost host =  
(HttpHost)localContext.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
             HttpUriRequest finalRequest =  
(HttpUriRequest
)localContext.getAttribute(ExecutionContext.HTTP_REQUEST);

             try {
                 URL hostUrl = new URI(host.toURI()).toURL();
                 redirectedUrl = new URL(hostUrl,  
finalRequest.getURI().toString()).toExternalForm();
             } catch (MalformedURLException e) {
                 LOGGER.warn("Invalid host/uri specified in final  
fetch: " + host + finalRequest.getURI());
                 redirectedUrl = url;
             }

> This is only partially correct. The original request object remains  
> unmodified.
> So, one needs to retrieve the internal HttpUriRequest and HttpHost  
> objects from
> the execution context in order to find out the final request URI /  
> target host.
> For details see:
>
> http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205

The key bit of documentation was at:

http://hc.apache.org/httpcomponents-client/tutorial/html/httpagent.html#d4e1022

-- Ken


--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378