|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Last redirect URLI have rather simple HttpClient 4 code that calls HttpGet to get HTML output. The HTML returns with scripts and image locations all set to local (e.g. <img src="/images/foo.jpg"/>) so I need calling URL to make these into absolute (<img src="http://foo.com/images/foo.jpg"/> Now comes the problem - during the call there may be one or two 302 redirects so the original URL is no longer reflects the location of HTML. How do I get the latest URL of the returned content given all the redirects I may (or may not) have?
I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - couldn't find anything. |
|
|
Re: Last redirect URLOn Sep 21, 2009, at 2:30pm, droidin.net wrote: > > I have rather simple HttpClient 4 code that calls HttpGet to get > HTML output. > The HTML returns with scripts and image locations all set to local > (e.g. > /images/foo.jpg ) so I need calling URL to make these into absolute ( > http://foo.com/images/foo.jpg Now comes the problem - during the > call there > may be one or two 302 redirects so the original URL is no longer > reflects > the location of HTML. How do I get the latest URL of the returned > content > given all the redirects I may (or may not) have? > > I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - > couldn't find anything. From past posts on the list, I thought httpMethod.getURI() would return the final URL. -- Ken -------------------------- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-210-6378 --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscribe@... For additional commands, e-mail: httpclient-users-help@... |
|
|
Re: Last redirect URLOn Mon, Sep 21, 2009 at 05:14:05PM -0700, Ken Krugler wrote:
> > On Sep 21, 2009, at 2:30pm, droidin.net wrote: > >> >> I have rather simple HttpClient 4 code that calls HttpGet to get HTML >> output. >> The HTML returns with scripts and image locations all set to local >> (e.g. >> /images/foo.jpg ) so I need calling URL to make these into absolute ( >> http://foo.com/images/foo.jpg Now comes the problem - during the call >> there >> may be one or two 302 redirects so the original URL is no longer >> reflects >> the location of HTML. How do I get the latest URL of the returned >> content >> given all the redirects I may (or may not) have? >> >> I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - >> couldn't find anything. > > From past posts on the list, I thought httpMethod.getURI() would return > the final URL. > > -- Ken > > Ken, This is only partially correct. The original request object remains unmodified. So, one needs to retrieve the internal HttpUriRequest and HttpHost objects from the execution context in order to find out the final request URI / target host. For details see: http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205 Hope this helps Oleg > -------------------------- > Ken Krugler > TransPac Software, Inc. > <http://www.transpac.com> > +1 530-210-6378 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-users-unsubscribe@... > For additional commands, e-mail: httpclient-users-help@... > --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscribe@... For additional commands, e-mail: httpclient-users-help@... |
|
|
Re: Last redirect URLHi Oleg,
On Sep 22, 2009, at 2:47am, Oleg Kalnichevski wrote: > On Mon, Sep 21, 2009 at 05:14:05PM -0700, Ken Krugler wrote: >> >> On Sep 21, 2009, at 2:30pm, droidin.net wrote: >> >>> >>> I have rather simple HttpClient 4 code that calls HttpGet to get >>> HTML >>> output. >>> The HTML returns with scripts and image locations all set to local >>> (e.g. >>> /images/foo.jpg ) so I need calling URL to make these into >>> absolute ( >>> http://foo.com/images/foo.jpg Now comes the problem - during the >>> call >>> there >>> may be one or two 302 redirects so the original URL is no longer >>> reflects >>> the location of HTML. How do I get the latest URL of the returned >>> content >>> given all the redirects I may (or may not) have? >>> >>> I looked at HttpGet#getAllHeaders() and >>> HttpResponse#getAllHeaders() - >>> couldn't find anything. >> >> From past posts on the list, I thought httpMethod.getURI() would >> return >> the final URL. >> >> -- Ken >> >> > > Ken, > > This is only partially correct. The original request object remains > unmodified. > So, one needs to retrieve the internal HttpUriRequest and HttpHost > objects from > the execution context in order to find out the final request URI / > target host. > For details see: > > http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205 Worked like a charm, thanks for the ref! I assume that if the HttpClient.execute(method, context) call returns w/o throwing an exeception, the call to: context.getAttribute(ExecutionContext.HTTP_TARGET_HOST) ...will always return a valid, non-null HttpHost, yes? Thanks, -- Ken -------------------------- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-210-6378 --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscribe@... For additional commands, e-mail: httpclient-users-help@... |
|
|
Re: Last redirect URLKen Krugler wrote:
> Hi Oleg, > > On Sep 22, 2009, at 2:47am, Oleg Kalnichevski wrote: > >> On Mon, Sep 21, 2009 at 05:14:05PM -0700, Ken Krugler wrote: >>> >>> On Sep 21, 2009, at 2:30pm, droidin.net wrote: >>> >>>> >>>> I have rather simple HttpClient 4 code that calls HttpGet to get HTML >>>> output. >>>> The HTML returns with scripts and image locations all set to local >>>> (e.g. >>>> /images/foo.jpg ) so I need calling URL to make these into absolute ( >>>> http://foo.com/images/foo.jpg Now comes the problem - during the call >>>> there >>>> may be one or two 302 redirects so the original URL is no longer >>>> reflects >>>> the location of HTML. How do I get the latest URL of the returned >>>> content >>>> given all the redirects I may (or may not) have? >>>> >>>> I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - >>>> couldn't find anything. >>> >>> From past posts on the list, I thought httpMethod.getURI() would return >>> the final URL. >>> >>> -- Ken >>> >>> >> >> Ken, >> >> This is only partially correct. The original request object remains >> unmodified. >> So, one needs to retrieve the internal HttpUriRequest and HttpHost >> objects from >> the execution context in order to find out the final request URI / >> target host. >> For details see: >> >> http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205 >> > > Worked like a charm, thanks for the ref! > > I assume that if the HttpClient.execute(method, context) call returns > w/o throwing an exeception, the call to: > > context.getAttribute(ExecutionContext.HTTP_TARGET_HOST) > > ...will always return a valid, non-null HttpHost, yes? > > Thanks, > Yes, it should Oleg > -- Ken > > > > > > -------------------------- > Ken Krugler > TransPac Software, Inc. > <http://www.transpac.com> > +1 530-210-6378 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: httpclient-users-unsubscribe@... > For additional commands, e-mail: httpclient-users-help@... > --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscribe@... For additional commands, e-mail: httpclient-users-help@... |
|
|
Re: Last redirect URLHi all,
Just to log it someplace public, here's what I wound up doing to get the final URL (probably overkill re all of the to/from URI/URL stuff): getter = new HttpGet(new URI(url)); HttpContext localContext = new BasicHttpContext(); response = _httpClient.execute(getter, localContext); int httpStatus = response.getStatusLine().getStatusCode(); if (httpStatus != HttpStatus.SC_OK) { throw new HttpFetchException(url, "Error fetching " + url, httpStatus, headerMap); } HttpHost host = (HttpHost)localContext.getAttribute(ExecutionContext.HTTP_TARGET_HOST); HttpUriRequest finalRequest = (HttpUriRequest )localContext.getAttribute(ExecutionContext.HTTP_REQUEST); try { URL hostUrl = new URI(host.toURI()).toURL(); redirectedUrl = new URL(hostUrl, finalRequest.getURI().toString()).toExternalForm(); } catch (MalformedURLException e) { LOGGER.warn("Invalid host/uri specified in final fetch: " + host + finalRequest.getURI()); redirectedUrl = url; } > This is only partially correct. The original request object remains > unmodified. > So, one needs to retrieve the internal HttpUriRequest and HttpHost > objects from > the execution context in order to find out the final request URI / > target host. > For details see: > > http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e205 The key bit of documentation was at: http://hc.apache.org/httpcomponents-client/tutorial/html/httpagent.html#d4e1022 -- Ken -------------------------- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-210-6378 |
| Free embeddable forum powered by Nabble | Forum Help |