Site not considered safe revisited

View: New views
4 Messages — Rating Filter:   Alert me  

Site not considered safe revisited

by Thomas R Bailey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sebastian, I just noticed something.

Are you using libwww-perl or Python-urllib/2 or even PHP/5 to GET the playlist?
You might think about changing the user agent in your script to spiff(+http://xspf.org) or xspf(+http://xspf.org). All of these scripting languages are used extensively with unadulterated user agent strings to unmercifully hack at web servers all over cyberspace. I've had to block them as user agents to protect my files and bandwidth as have many many other people.

Examples I found are here:
Perl: http://kobesearch.cpan.org/htdocs/libwww-perl/LWP/UserAgent.html
Python: http://diveintopython.org/http_web_services/user_agent.html
PHP: http://www.seopher.com/articles/how_to_change_your_php_user_agent_to_avoid_being_blocked_when_using_curl

-Tom
xspfphp.sourceforge.net


_______________________________________________
Playlist mailing list
Playlist@...
http://lists.musicbrainz.org/mailman/listinfo/playlist

Re: Site not considered safe revisited

by rbu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thursday 16 April 2009, Tom wrote:

> Are you using| libwww-perl |or| ||Python-urllib/2 |or even |PHP/5 |to
> GET the playlist?|
>
> |You might think about changing the user agent in your script to
>
> /spiff(+http://xspf.org)/ or /xspf(+http://xspf.org)/. All of these
> scripting languages are used extensively with unadulterated user
> agent strings to unmercifully hack at web servers all over
> cyberspace. I've had to block them as user agents to protect my files
> and bandwidth as have many many other people.

Installing protection based on User-Agents is most harmful. You will
unnecessarily cause pain for users of your site (such as people using
RSS readers), yourself and others (as seen here). Furthermore, you
probably miss most of the "attacks" on your site because they are
shadowed by people who know how to alter the User-Agent and will pop up
as some Windows XP Internet Explorer in your stats.

While it is possible to alter the User-Agent HTTP header using urllib2
in Python, I would recommend you rethink your filters, work around the
problems you are causing (by downloading the file yourself and use the
file upload) or submit a patch.



Robert

_______________________________________________
Playlist mailing list
Playlist@...
http://lists.musicbrainz.org/mailman/listinfo/playlist

Parent Message unknown Re: Site not considered safe revisited

by rbu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tom

I took the liberty of replying on-list. Your mail is fully quoted below,
plus replies inline.

On Thursday 16 April 2009, Tom wrote:

> Robert Buchholz wrote:
> > On Thursday 16 April 2009, Tom wrote:
> >> Are you using| libwww-perl |or| ||Python-urllib/2 |or even |PHP/5
> >> |to GET the playlist?|
> >>
> >> |You might think about changing the user agent in your script to
> >>
> >> /spiff(+http://xspf.org)/ or /xspf(+http://xspf.org)/. All of
> >> these scripting languages are used extensively with unadulterated
> >> user agent strings to unmercifully hack at web servers all over
> >> cyberspace. I've had to block them as user agents to protect my
> >> files and bandwidth as have many many other people.
> >
> > Installing protection based on User-Agents is most harmful.
>
> I don't think I understand why you think it's harmful. Less than
> effective, perhaps but harmful?
> The only other alternative is to stop serving files. I'm not
> protecting content, only bandwidth.
It is harmful because it is causing non-obvious failures for people
without malicious intent. It is costs time and causes frustration.


> If  a person can use content, they can make a copy of it.
>
> > You will
> > unnecessarily cause pain for users of your site (such as people
> > using RSS readers), yourself and others (as seen here).
>
> It's not my site that has the issue.

Do you mean you are not running trbailey.net, or do you mean that
trbailey.net is not causing the problem? If the latter, then I
disagree. You cannot install a filter on your site and then behave as
if others misbehaved and got caught by the filter.
Which RFC or other standard has been violated?

> > Furthermore, you
> > probably miss most of the "attacks" on your site because they are
> > shadowed by people who know how to alter the User-Agent and will
> > pop up as some Windows XP Internet Explorer in your stats.
>
> That's a nice philosophy and possibly correct, however, /any script
> that does not clearly identify itself is asking to be blocked/.

The script clearly identifies itself. It is a Python 2.5 script, and it
is using urllib2 to download the content.

> User
> agent blocking is not the only tool and generally only partially
> successful. But since implementing domain and UA blocking I no longer
> see my domain appear in "free music" playlists. It's but one tiny
> tool in a larger toolbox. It's my bandwidth I'm most interested in
> protecting, not site content. I'm not concerned if someone wants a
> copy of a file. They are welcome to download it the same way I did.
> /I'm concerned when a commercial ad server like dizzler wraps my
> bandwidth in advertising to make money/. If I see 500 hits from one
> ua/domain and it's been downloading for hours I can safely assume
> it's not a regular user. But that might not apply to all sites. UA
> screening works well for me, but it won't for everyone.
>
> > While it is possible to alter the User-Agent HTTP header using
> > urllib2 in Python, I would recommend you rethink your filters, work
> > around the problems you are causing (by downloading the file
> > yourself and use the file upload) or submit a patch.
>
> I'm not sure if you understand the context in which this discussion
> is occurring.
> I'm not attempting to download or acquire a file.
Maybe I was unclear here. I meant that to work around the User-Agent
filter, you can download the XSPF file from the site in question and
submit it to the validator using its "File Upload" field.



> I'm asking if the author of the validator at spiff dot com can or
> will alter the user agent he's using to GET pages to be validated.
>
> I have no idea what you mean when you say submit a patch?
> Patch for what, his custom spiff validator?

That is what I meant. The validator is open source.

> -Tom


Robert


_______________________________________________
Playlist mailing list
Playlist@...
http://lists.musicbrainz.org/mailman/listinfo/playlist

signature.asc (853 bytes) Download Attachment

Parent Message unknown Re: Site not considered safe revisited

by Thomas R Bailey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Robert Buchholz wrote:
Hi Tom

I took the liberty of replying on-list. Your mail is fully quoted below, 
plus replies inline.

  
No problem.
It's not my site that has the issue.
    

Do you mean you are not running trbailey.net, or do you mean that 
trbailey.net is not causing the problem? If the latter, then I 
disagree. You cannot install a filter on your site and then behave as 
if others misbehaved and got caught by the filter.
Which RFC or other standard has been violated?
  
Courtesy and unrealistic expectation.
It's unrealistic to think or expect people in the big bit pool not to use UA screening as a tool.
Every other major bot script that requests pages or files identifies itself via the user agent string with a url back to it's origin.
The script clearly identifies itself. It is a Python 2.5 script, and it 
is using urllib2 to download the content.
  
I disagree. It's xspf.org that's downloading the content, not the script. The script is a non-entity agent, a tool used to facilitate the download. Furthermore; Presenting the language of the script as a user agent appears as obfuscation to the average webmaster dufus like me. If you're the author you must have considered the consequences of a robot downloader operating with a default UA string, especially in this day and age when so many kids use ready-made irc-bot scripts to infest the information highways of the world. A default UA string, IMHO, is like a "highwayman leaping from the bushes" on the road to London.
  
User 
agent blocking is not the only tool and generally only partially
successful. But since implementing domain and UA blocking I no longer
see my domain appear in "free music" playlists. It's but one tiny
tool in a larger toolbox. It's my bandwidth I'm most interested in
protecting, not site content. I'm not concerned if someone wants a
copy of a file. They are welcome to download it the same way I did.
/I'm concerned when a commercial ad server like dizzler wraps my
bandwidth in advertising to make money/. If I see 500 hits from one
ua/domain and it's been downloading for hours I can safely assume
it's not a regular user. But that might not apply to all sites. UA
screening works well for me, but it won't for everyone.

    
While it is possible to alter the User-Agent HTTP header using
urllib2 in Python, I would recommend you rethink your filters, work
around the problems you are causing (by downloading the file
yourself and use the file upload) or submit a patch.
      
I'm not sure if you understand the context in which this discussion
is occurring.
I'm not attempting to download or acquire a file.
    

Maybe I was unclear here. I meant that to work around the User-Agent 
filter, you can download the XSPF file from the site in question and 
submit it to the validator using its "File Upload" field.
  
Context:
I'm working on a dynamic playlist generator so the content in question is a dynamic playlist and it's obviously on my site. I submitted a URL that failed to download because I don't allow unfettered access to my site by unidentified or known pirate scripts. I think it's more courteous and correct to clearly identify the site that requests the file, not just the tool being used to request it. In the end it either works or it doesn't. I can add an exception for now but I think it prudent to plan for such a change in the future.
  
I'm asking if the author of the validator at spiff dot com can or
will alter the user agent he's using to GET pages to be validated.

I have no idea what you mean when you say submit a patch?
Patch for what, his custom spiff validator?
    

That is what I meant. The validator is open source.
  
I wasn't aware it was. I've seen no references on the site to obtaining the source but I haven't been looking. As far as I know xspf.org is the only installation.
I'm not a python programmer but I'll be happy to submit a patch that IMHO properly identifies the site requester with it's ip or domain and an optional url.
I'm not terribly familiar with patches but I'll muddle through. It can't be much different from SVN updates.
Thanks for the info and the chat.
-Tom


_______________________________________________
Playlist mailing list
Playlist@...
http://lists.musicbrainz.org/mailman/listinfo/playlist