squid seems to be altering Content-Type response headers

View: New views
3 Messages — Rating Filter:   Alert me  

squid seems to be altering Content-Type response headers

by celejar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi list,

I'm a squid newbie, and I'm baffled by this.  I see urls that come in
as 'text/xml', but after a while, squid starts declaring them as
'textl/html'.  This seems to happen after I get a TCP_REFRESH_HIT/304
or TCP_REFRESH_HIT/200.  E.g.:

1257062345.706      0 127.0.0.1 TCP_HIT/200 43811 GET http://wordpress.org/development/feed/ - NONE/- text/xml
1257062349.093      0 127.0.0.1 TCP_HIT/200 43811 GET http://wordpress.org/development/feed/ - NONE/- text/xml
1257062369.819      0 127.0.0.1 TCP_HIT/200 43811 GET http://wordpress.org/development/feed/ - NONE/- text/xml
1257062411.195      0 127.0.0.1 TCP_HIT/200 43811 GET http://wordpress.org/development/feed/ - NONE/- text/xml
1257062417.219      0 127.0.0.1 TCP_HIT/200 43846 GET http://wordpress.org/development/feed/ - NONE/- text/xml
1257062417.768    125 127.0.0.1 TCP_MISS/200 1591 GET http://wordpress.org/favicon.ico - DIRECT/72.233.56.139 application/octet-stream 1257062446.025    129 127.0.0.1 TCP_REFRESH_HIT/304 461 GET http://wordpress.org/development/feed/ - DIRECT/72.233.56.138 text/html

...

1257062489.765      0 127.0.0.1 TCP_HIT/200 43795 GET http://wordpress.org/development/feed/ - NONE/- text/html
1257062561.439      0 127.0.0.1 TCP_HIT/200 43796 GET http://wordpress.org/development/feed/ - NONE/- text/html
1257062568.249      0 127.0.0.1 TCP_HIT/200 43796 GET http://wordpress.org/development/feed/ - NONE/- text/html

When I do:

~# squidclient -m PURGE http://wordpress.org/development/feed/
HTTP/1.0 200 OK
Server: squid/2.7.STABLE7
Date: Sun, 01 Nov 2009 08:12:53 GMT
Content-Length: 0
Expires: Sun, 01 Nov 2009 08:12:53 GMT
X-Cache: MISS from localhost.localdomain
X-Cache-Lookup: NONE from localhost.localdomain:3128
Via: 1.0 localhost.localdomain:3128 (squid/2.7.STABLE7)
Connection: close

Then all's right with the world again:

1257063219.237    395 127.0.0.1 TCP_MISS/200 43779 GET http://wordpress.org/development/feed/ - DIRECT/72.233.56.138 text/xml

...

1257063263.068      0 127.0.0.1 TCP_HIT/200 43809 GET http://wordpress.org/development/feed/ - NONE/- text/xml
1257063265.805      0 127.0.0.1 TCP_HIT/200 43809 GET http://wordpress.org/development/feed/ - NONE/- text/xml

But the pattern seems to be that soon it will begin to return the
incorrect 'text/html' again.

These posts claim that squid shouldn't ever be altering the
Content-Type headers:

http://www.mail-archive.com/squid-users@.../msg67818.html
http://www.squid-cache.org/mail-archive/squid-users/200111/1119.html

Am I missing something obvious?  Is this a misconfiguration on my part
(I have a stock Debian Sid installation), or is there something wrong
with the Wordpress site (so far I've only seen the problem with
wordpress.com and wordpress.org, but I haven't tested extensively)?

Celejar
--
foffl.sourceforge.net - Feeds OFFLine, an offline RSS/Atom aggregator
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator


Re: squid seems to be altering Content-Type response headers

by Henrik Nordstrom-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

sön 2009-11-01 klockan 03:19 -0500 skrev Celejar:

> 1257062417.219      0 127.0.0.1 TCP_HIT/200 43846 GET http://wordpress.org/development/feed/ - NONE/- text/xml

> 1257062446.025    129 127.0.0.1 TCP_REFRESH_HIT/304 461 GET http://wordpress.org/development/feed/ - DIRECT/72.233.56.138 text/html

Here 72.233.56.139 apparently said that the content-type should be
updated to text/html but that the response otherwise is the same as
before.

> These posts claim that squid shouldn't ever be altering the
> Content-Type headers:

It doesn't. But a received 304 response may update the stored headers
with new values given by the origin server.

Regards
Henrik


Re: squid seems to be altering Content-Type response headers

by celejar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 01 Nov 2009 21:48:32 +0100
Henrik Nordstrom <henrik@...> wrote:

> sön 2009-11-01 klockan 03:19 -0500 skrev Celejar:
>
> > 1257062417.219      0 127.0.0.1 TCP_HIT/200 43846 GET
> > http://wordpress.org/development/feed/ - NONE/- text/xml
>
> > 1257062446.025    129 127.0.0.1 TCP_REFRESH_HIT/304 461 GET
> > http://wordpress.org/development/feed/ - DIRECT/72.233.56.138
> > text/html
>
> Here 72.233.56.139 apparently said that the content-type should be
> updated to text/html but that the response otherwise is the same as
> before.

Right you are.  I ran wireshark, which shows the headers as:

HTTP/1.0 304 Not Modified
X-Pingback: http://wordpress.org/development/xmlrpc.php
Last-Modified: Sat, 31 Oct 2009 21:28:00 GMT
ETag: "a4aa82a49dbe294617210eb367fa0997"
Content-type: text/html
Date: Mon, 02 Nov 2009 01:25:17 GMT
Server: LiteSpeed
Connection: close

> > These posts claim that squid shouldn't ever be altering the
> > Content-Type headers:
>
> It doesn't. But a received 304 response may update the stored headers
> with new values given by the origin server.

So I guess that this is some sort of bug in wordpress - reported:

http://core.trac.wordpress.org/ticket/11060

Thanks,
Celejar
--
foffl.sourceforge.net - Feeds OFFLine, an offline RSS/Atom aggregator
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator