Proposal for SafeString, refactoring of XSS protection code

View: New views
10 Messages — Rating Filter:   Alert me  

Proposal for SafeString, refactoring of XSS protection code

by Andy Staudacher-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Introducing `new SafeString($dirty_string)` with a magic __toString() method, converting the dirty html auto-magically to a HTML escaped version.

Details:

Usage

It's mostly all hidden - the magic happens behind the scenes.

In Controllers:

// Always clean / purify in Views. Just assign the string to a View var here.
$view->filename = $item->filename;
// t() returns safe HTML (purified on the translation server side)
$view->message = t("Hello world");
// t() parameters are cleaned automatically
$view->message = t("Hello %name", array('name' => $user->name));

In Views:

// For variables that can contain HTML
<?= $title->purified_html() ?>
// For variables that are not expected to contain HTML
// Do nothing (it's a SafeString instance)
<?= $filename ?>
// Magic does not apply to view variables that aren't strings
<?= SafeString::of($item->foo) ?>

Opinions?

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Bharat Mediratta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


For closure -- Andy, Tim and I discussed this at the meetup and came up
with a reasonable compromise that will keep our APIs slim while still
giving us the security we need.

Andy Staudacher wrote:

> Introducing `new SafeString($dirty_string)` with a magic __toString()
> method, converting the dirty html auto-magically to a HTML escaped version.
>
> Details:
> http://docs.google.com/Doc?docid=0AQL3WDIhuD0IZGc4M2I2cTJfMTRkbWtmZmRuNA
>
>
>       Usage
>
> It's mostly all hidden - the magic happens behind the scenes.
>
>
>         In Controllers:
>
> // Always clean / purify in Views. Just assign the string to a View var
> here.
> $view->filename = $item->filename;
> // t() returns safe HTML (purified on the translation server side)
> $view->message = t("Hello world");
> // t() parameters are cleaned automatically
> $view->message = t("Hello %name", array('name' => $user->name));
>
>
>         In Views:
>
> // For variables that can contain HTML
> <?= $title->purified_html() ?>
> // For variables that are not expected to contain HTML
> // Do nothing (it's a SafeString instance)
> <?= $filename ?>
> // Magic does not apply to view variables that aren't strings
> <?= SafeString::of($item->foo) ?>
>
> Opinions?
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>
>
> ------------------------------------------------------------------------
>
> __[ g a l l e r y - d e v e l ]_________________________
>
> [ list info/archive --> http://gallery.sf.net/lists.php ]
> [ gallery info/FAQ/download --> http://gallery.sf.net ]


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Andy Staudacher-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've just pushed these changes to the gallery master branch.

I'll update gallery-contrib code later tonight.

Changes:
  p::clean() -> html::clean()
  p::purify() -> html::purify()

  use html::clean() in views on all PHP output with the exception of...
    - using html::purify() for PHP variables which can contain HTML (e.g. comments, item titles, etc.)
    - t() and t2() don't need to be cleaned. The returned string is always clean HTML.
    - $theme->pager() and most other $theme methods return clean HTML. No need to run this through html::clean().

  use t() and t2() in JavaScript as: var js_var = <?= t("Hello")->for_js() ?>;

  escape PHP vars for JavaScript: var js_var = <?= html::js_string($some_php_string) ?>;

  escape PHP vars for HTML element attributes: <a title="<?= html::clean_attribute($php_string) ?>">

  when passing a URL to t() and t2() use html::mark_safe(): <span><?= t('Click <a href="%url">here</a>', array("url" => html::mark_safe(url::site("foo")))) ?></span>

We'll need to create more docs for this of course.

 - Andy


On Sun, Aug 30, 2009 at 2:12 PM, Bharat Mediratta <bharat@...> wrote:

For closure -- Andy, Tim and I discussed this at the meetup and came up
with a reasonable compromise that will keep our APIs slim while still
giving us the security we need.

Andy Staudacher wrote:
> Introducing `new SafeString($dirty_string)` with a magic __toString()
> method, converting the dirty html auto-magically to a HTML escaped version.
>
> Details:
> http://docs.google.com/Doc?docid=0AQL3WDIhuD0IZGc4M2I2cTJfMTRkbWtmZmRuNA
>
>
>       Usage
>
> It's mostly all hidden - the magic happens behind the scenes.
>
>
>         In Controllers:
>
> // Always clean / purify in Views. Just assign the string to a View var
> here.
> $view->filename = $item->filename;
> // t() returns safe HTML (purified on the translation server side)
> $view->message = t("Hello world");
> // t() parameters are cleaned automatically
> $view->message = t("Hello %name", array('name' => $user->name));
>
>
>         In Views:
>
> // For variables that can contain HTML
> <?= $title->purified_html() ?>
> // For variables that are not expected to contain HTML
> // Do nothing (it's a SafeString instance)
> <?= $filename ?>
> // Magic does not apply to view variables that aren't strings
> <?= SafeString::of($item->foo) ?>
>
> Opinions?
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>
>
> ------------------------------------------------------------------------
>
> __[ g a l l e r y - d e v e l ]_________________________
>
> [ list info/archive --> http://gallery.sf.net/lists.php ]
> [ gallery info/FAQ/download --> http://gallery.sf.net ]



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Bharat Mediratta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Great work!  Some comments...

> Changes:
>   p::clean() -> html::clean()
>   p::purify() -> html::purify()

These seem fine.

>   use html::clean() in views on all PHP output with the exception of...
>     - using html::purify() for PHP variables which can contain HTML
> (e.g. comments, item titles, etc.)
>     - t() and t2() don't need to be cleaned. The returned string is
> always clean HTML.

Does this mean that we expect the inputs to t() to be clean already?
What if I do:

<?= t("%foo", array("foo" => "<html>")) ?>

Will that return "<html>" or "<html>"?

>   use t() and t2() in JavaScript as: var js_var = <?=
> t("Hello")->for_js() ?>;
>
>   escape PHP vars for JavaScript: var js_var = <?=
> html::js_string($some_php_string) ?>;
>
>   escape PHP vars for HTML element attributes: <a title="<?=
> html::clean_attribute($php_string) ?>">

These names seem inconsistent.  Can you make the first one
html::clean_js() so that they're all clean_xxx() methods?  OR am I
missing something?

>   when passing a URL to t() and t2() use html::mark_safe(): <span><?=
> t('Click <a href="%url">here</a>', array("url" =>
> html::mark_safe(url::site("foo")))) ?></span>

And here, can we make it html::mark_clean() so that clean is in the API
somewhere?

-Bharat

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Andy Staudacher-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Aug 30, 2009 at 9:07 PM, Bharat Mediratta <bharat@...> wrote:

Great work!  Some comments...


Changes:
 p::clean() -> html::clean()
 p::purify() -> html::purify()

These seem fine.


 use html::clean() in views on all PHP output with the exception of...
   - using html::purify() for PHP variables which can contain HTML (e.g. comments, item titles, etc.)
   - t() and t2() don't need to be cleaned. The returned string is always clean HTML.

Does this mean that we expect the inputs to t() to be clean already? What if I do:

<?= t("%foo", array("foo" => "<html>")) ?>

Will that return "<html>" or "&lt;html&gt;"?

The latter. If you do <?= t("<span>%foo</span>", array("foo" => "<html>")) ?>
it will return "<span>&lt;html&gt;</span>".

If you have parameters to t() that are clean already, do:
   <?= t("<span>%foo</span>", array("foo" => html::mark_safe("<html>"))) ?>
and it will return "<span><html></span>".

 use t() and t2() in JavaScript as: var js_var = <?= t("Hello")->for_js() ?>;

 escape PHP vars for JavaScript: var js_var = <?= html::js_string($some_php_string) ?>;

 escape PHP vars for HTML element attributes: <a title="<?= html::clean_attribute($php_string) ?>">

These names seem inconsistent.  Can you make the first one html::clean_js() so that they're all clean_xxx() methods?  OR am I missing something?

Already considered and rejected. js_string() is indeed not the same as clean() just for js.
clean() implies the returned value is safe for inclusion in HTML.
clean_js() would imply that the returned value is safe for use in JS, anywhere.
js_string() is very descriptive, yet short. It returns a JS string.

clean_attribute() is another story. It somewhere similar to clean(), just specific for attributes.

 when passing a URL to t() and t2() use html::mark_safe(): <span><?= t('Click <a href="%url">here</a>', array("url" => html::mark_safe(url::site("foo")))) ?></span>

And here, can we make it html::mark_clean() so that clean is in the API somewhere?

OK, makes sense. Will do.

 - Andy 

-Bharat


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Bharat Mediratta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andy Staudacher wrote:

> On Sun, Aug 30, 2009 at 9:07 PM, Bharat Mediratta <bharat@...
> <mailto:bharat@...>> wrote:
>
>
>     Great work!  Some comments...
>
>
>         Changes:
>          p::clean() -> html::clean()
>          p::purify() -> html::purify()
>
>
>     These seem fine.
>
>
>          use html::clean() in views on all PHP output with the exception
>         of...
>            - using html::purify() for PHP variables which can contain
>         HTML (e.g. comments, item titles, etc.)
>            - t() and t2() don't need to be cleaned. The returned string
>         is always clean HTML.
>
>
>     Does this mean that we expect the inputs to t() to be clean already?
>     What if I do:
>
>     <?= t("%foo", array("foo" => "<html>")) ?>
>
>     Will that return "<html>" or "<html>"?
>
>
> The latter. If you do <?= t("<span>%foo</span>", array("foo" =>
> "<html>")) ?>
> it will return "<span><html></span>".
>
> If you have parameters to t() that are clean already, do:
>    <?= t("<span>%foo</span>", array("foo" => html::mark_safe("<html>"))) ?>
> and it will return "<span><html></span>".
>
>          use t() and t2() in JavaScript as: var js_var = <?=
>         t("Hello")->for_js() ?>;
>
>          escape PHP vars for JavaScript: var js_var = <?=
>         html::js_string($some_php_string) ?>;
>
>          escape PHP vars for HTML element attributes: <a title="<?=
>         html::clean_attribute($php_string) ?>">
>
>
>     These names seem inconsistent.  Can you make the first one
>     html::clean_js() so that they're all clean_xxx() methods?  OR am I
>     missing something?
>
>
> Already considered and rejected. js_string() is indeed not the same as
> clean() just for js.
> clean() implies the returned value is safe for inclusion in HTML.
> clean_js() would imply that the returned value is safe for use in JS,
> anywhere.
> js_string() is very descriptive, yet short. It returns a JS string.

I understand now from looking at the code.  So before we had:

  var x = "<?= $some_string ?>";

Now we have:

  var x = <?= html::js_string($some_string) ?>;

I recognize that you're adding the outer quotation marks as a
convenience in the js_string() function, but I worry that this will be a
little confusing to JS developers who are expecting strings to be inside
quotation marks.  Perhaps we could have it do:

  var x = "<?= html::clean_js_string($some_string) ?>";

That's longer, but it's got "clean" in it as well as "js_string" and a
JS developer would see the outer quotes so would know that anything
inside it is definitely in a JS string context.

thoughts?

Everything else looks fine to me.

>
> clean_attribute() is another story. It somewhere similar to clean(),
> just specific for attributes.
>
>          when passing a URL to t() and t2() use html::mark_safe():
>         <span><?= t('Click <a href="%url">here</a>', array("url" =>
>         html::mark_safe(url::site("foo")))) ?></span>
>
>
>     And here, can we make it html::mark_clean() so that clean is in the
>     API somewhere?
>
>
> OK, makes sense. Will do.
>
>  - Andy
>
>
>     -Bharat
>
>


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Andy Staudacher-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Aug 31, 2009 at 11:26 AM, Bharat Mediratta <bharat@...> wrote:
Andy Staudacher wrote:
> On Sun, Aug 30, 2009 at 9:07 PM, Bharat Mediratta <bharat@...
> <mailto:bharat@...>> wrote:
>
>
>     Great work!  Some comments...
>
>
>         Changes:
>          p::clean() -> html::clean()
>          p::purify() -> html::purify()
>
>
>     These seem fine.
>
>
>          use html::clean() in views on all PHP output with the exception
>         of...
>            - using html::purify() for PHP variables which can contain
>         HTML (e.g. comments, item titles, etc.)
>            - t() and t2() don't need to be cleaned. The returned string
>         is always clean HTML.
>
>
>     Does this mean that we expect the inputs to t() to be clean already?
>     What if I do:
>
>     <?= t("%foo", array("foo" => "<html>")) ?>
>
>     Will that return "<html>" or "&lt;html&gt;"?
>
>
> The latter. If you do <?= t("<span>%foo</span>", array("foo" =>
> "<html>")) ?>
> it will return "<span>&lt;html&gt;</span>".
>
> If you have parameters to t() that are clean already, do:
>    <?= t("<span>%foo</span>", array("foo" => html::mark_safe("<html>"))) ?>
> and it will return "<span><html></span>".
>
>          use t() and t2() in JavaScript as: var js_var = <?=
>         t("Hello")->for_js() ?>;
>
>          escape PHP vars for JavaScript: var js_var = <?=
>         html::js_string($some_php_string) ?>;
>
>          escape PHP vars for HTML element attributes: <a title="<?=
>         html::clean_attribute($php_string) ?>">
>
>
>     These names seem inconsistent.  Can you make the first one
>     html::clean_js() so that they're all clean_xxx() methods?  OR am I
>     missing something?
>
>
> Already considered and rejected. js_string() is indeed not the same as
> clean() just for js.
> clean() implies the returned value is safe for inclusion in HTML.
> clean_js() would imply that the returned value is safe for use in JS,
> anywhere.
> js_string() is very descriptive, yet short. It returns a JS string.

I understand now from looking at the code.  So before we had:

 var x = "<?= $some_string ?>";

Now we have:

 var x = <?= html::js_string($some_string) ?>;

I recognize that you're adding the outer quotation marks as a
convenience in the js_string() function, but I worry that this will be a
little confusing to JS developers who are expecting strings to be inside
quotation marks.  Perhaps we could have it do:

Minor tidbit: It's not just convenience to include the delimiting quotation marks. It's about semantics. What is returned is a JavaScript string object.

More importantly, there's no way (without tons of effort) I can guarantee that some string is "clean JavaScript." What's clean (purified / harmless) JavaScript anyway?
I can just guarantee that the returned string is safe to use as a JavaScript string. That the returned string won't add any active code to your JavaScript (unless you use the JavaScript string later as event handler or the like).

So returning a complete JavaScript string object is very clear IMO. Just like json_encode().

 - Andy

 var x = "<?= html::clean_js_string($some_string) ?>";

That's longer, but it's got "clean" in it as well as "js_string" and a
JS developer would see the outer quotes so would know that anything
inside it is definitely in a JS string context.

thoughts?

Everything else looks fine to me.

>
> clean_attribute() is another story. It somewhere similar to clean(),
> just specific for attributes.
>
>          when passing a URL to t() and t2() use html::mark_safe():
>         <span><?= t('Click <a href="%url">here</a>', array("url" =>
>         html::mark_safe(url::site("foo")))) ?></span>
>
>
>     And here, can we make it html::mark_clean() so that clean is in the
>     API somewhere?
>
>
> OK, makes sense. Will do.
>
>  - Andy
>
>
>     -Bharat
>
>



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Bharat Mediratta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andy Staudacher wrote:

> On Mon, Aug 31, 2009 at 11:26 AM, Bharat Mediratta <bharat@...
> <mailto:bharat@...>> wrote:
>
>     Andy Staudacher wrote:
>     > On Sun, Aug 30, 2009 at 9:07 PM, Bharat Mediratta
>     <bharat@... <mailto:bharat@...>
>     > <mailto:bharat@... <mailto:bharat@...>>> wrote:
>     >
>     >
>     >     Great work!  Some comments...
>     >
>     >
>     >         Changes:
>     >          p::clean() -> html::clean()
>     >          p::purify() -> html::purify()
>     >
>     >
>     >     These seem fine.
>     >
>     >
>     >          use html::clean() in views on all PHP output with the
>     exception
>     >         of...
>     >            - using html::purify() for PHP variables which can contain
>     >         HTML (e.g. comments, item titles, etc.)
>     >            - t() and t2() don't need to be cleaned. The returned
>     string
>     >         is always clean HTML.
>     >
>     >
>     >     Does this mean that we expect the inputs to t() to be clean
>     already?
>     >     What if I do:
>     >
>     >     <?= t("%foo", array("foo" => "<html>")) ?>
>     >
>     >     Will that return "<html>" or "<html>"?
>     >
>     >
>     > The latter. If you do <?= t("<span>%foo</span>", array("foo" =>
>     > "<html>")) ?>
>     > it will return "<span><html></span>".
>     >
>     > If you have parameters to t() that are clean already, do:
>     >    <?= t("<span>%foo</span>", array("foo" =>
>     html::mark_safe("<html>"))) ?>
>     > and it will return "<span><html></span>".
>     >
>     >          use t() and t2() in JavaScript as: var js_var = <?=
>     >         t("Hello")->for_js() ?>;
>     >
>     >          escape PHP vars for JavaScript: var js_var = <?=
>     >         html::js_string($some_php_string) ?>;
>     >
>     >          escape PHP vars for HTML element attributes: <a title="<?=
>     >         html::clean_attribute($php_string) ?>">
>     >
>     >
>     >     These names seem inconsistent.  Can you make the first one
>     >     html::clean_js() so that they're all clean_xxx() methods?  OR am I
>     >     missing something?
>     >
>     >
>     > Already considered and rejected. js_string() is indeed not the same as
>     > clean() just for js.
>     > clean() implies the returned value is safe for inclusion in HTML.
>     > clean_js() would imply that the returned value is safe for use in JS,
>     > anywhere.
>     > js_string() is very descriptive, yet short. It returns a JS string.
>
>     I understand now from looking at the code.  So before we had:
>
>      var x = "<?= $some_string ?>";
>
>     Now we have:
>
>      var x = <?= html::js_string($some_string) ?>;
>
>     I recognize that you're adding the outer quotation marks as a
>     convenience in the js_string() function, but I worry that this will be a
>     little confusing to JS developers who are expecting strings to be inside
>     quotation marks.  Perhaps we could have it do:
>
>
> Minor tidbit: It's not just convenience to include the delimiting
> quotation marks. It's about semantics. What is returned is a JavaScript
> string object.
>
> More importantly, there's no way (without tons of effort) I can
> guarantee that some string is "clean JavaScript." What's clean (purified
> / harmless) JavaScript anyway?
> I can just guarantee that the returned string is safe to use as a
> JavaScript string. That the returned string won't add any active code to
> your JavaScript (unless you use the JavaScript string later as event
> handler or the like).
>
> So returning a complete JavaScript string object is very clear IMO. Just
> like json_encode().

I agree that to some developers this will be clear.  But I'm betting
we're going to see a lot of people using this API wrong because they
simply won't expect that quotation marks are added automatically.
They're going to add a second set of quotes around it, and that's going
to cause weird errors.

>
>  - Andy
>
>      var x = "<?= html::clean_js_string($some_string) ?>";
>
>     That's longer, but it's got "clean" in it as well as "js_string" and a
>     JS developer would see the outer quotes so would know that anything
>     inside it is definitely in a JS string context.
>
>     thoughts?
>
>     Everything else looks fine to me.
>
>     >
>     > clean_attribute() is another story. It somewhere similar to clean(),
>     > just specific for attributes.
>     >
>     >          when passing a URL to t() and t2() use html::mark_safe():
>     >         <span><?= t('Click <a href="%url">here</a>', array("url" =>
>     >         html::mark_safe(url::site("foo")))) ?></span>
>     >
>     >
>     >     And here, can we make it html::mark_clean() so that clean is
>     in the
>     >     API somewhere?
>     >
>     >
>     > OK, makes sense. Will do.
>     >
>     >  - Andy
>     >
>     >
>     >     -Bharat
>     >
>     >
>
>


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Andy Staudacher-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Aug 31, 2009 at 11:43 AM, Bharat Mediratta <bharat@...> wrote:
Andy Staudacher wrote:
> On Mon, Aug 31, 2009 at 11:26 AM, Bharat Mediratta <bharat@...
> <mailto:bharat@...>> wrote:
>
>     Andy Staudacher wrote:
>     > On Sun, Aug 30, 2009 at 9:07 PM, Bharat Mediratta
>     <bharat@... <mailto:bharat@...>
>     > <mailto:bharat@... <mailto:bharat@...>>> wrote:
>     >
>     >
>     >     Great work!  Some comments...
>     >
>     >
>     >         Changes:
>     >          p::clean() -> html::clean()
>     >          p::purify() -> html::purify()
>     >
>     >
>     >     These seem fine.
>     >
>     >
>     >          use html::clean() in views on all PHP output with the
>     exception
>     >         of...
>     >            - using html::purify() for PHP variables which can contain
>     >         HTML (e.g. comments, item titles, etc.)
>     >            - t() and t2() don't need to be cleaned. The returned
>     string
>     >         is always clean HTML.
>     >
>     >
>     >     Does this mean that we expect the inputs to t() to be clean
>     already?
>     >     What if I do:
>     >
>     >     <?= t("%foo", array("foo" => "<html>")) ?>
>     >
>     >     Will that return "<html>" or "&lt;html&gt;"?
>     >
>     >
>     > The latter. If you do <?= t("<span>%foo</span>", array("foo" =>
>     > "<html>")) ?>
>     > it will return "<span>&lt;html&gt;</span>".
>     >
>     > If you have parameters to t() that are clean already, do:
>     >    <?= t("<span>%foo</span>", array("foo" =>
>     html::mark_safe("<html>"))) ?>
>     > and it will return "<span><html></span>".
>     >
>     >          use t() and t2() in JavaScript as: var js_var = <?=
>     >         t("Hello")->for_js() ?>;
>     >
>     >          escape PHP vars for JavaScript: var js_var = <?=
>     >         html::js_string($some_php_string) ?>;
>     >
>     >          escape PHP vars for HTML element attributes: <a title="<?=
>     >         html::clean_attribute($php_string) ?>">
>     >
>     >
>     >     These names seem inconsistent.  Can you make the first one
>     >     html::clean_js() so that they're all clean_xxx() methods?  OR am I
>     >     missing something?
>     >
>     >
>     > Already considered and rejected. js_string() is indeed not the same as
>     > clean() just for js.
>     > clean() implies the returned value is safe for inclusion in HTML.
>     > clean_js() would imply that the returned value is safe for use in JS,
>     > anywhere.
>     > js_string() is very descriptive, yet short. It returns a JS string.
>
>     I understand now from looking at the code.  So before we had:
>
>      var x = "<?= $some_string ?>";
>
>     Now we have:
>
>      var x = <?= html::js_string($some_string) ?>;
>
>     I recognize that you're adding the outer quotation marks as a
>     convenience in the js_string() function, but I worry that this will be a
>     little confusing to JS developers who are expecting strings to be inside
>     quotation marks.  Perhaps we could have it do:
>
>
> Minor tidbit: It's not just convenience to include the delimiting
> quotation marks. It's about semantics. What is returned is a JavaScript
> string object.
>
> More importantly, there's no way (without tons of effort) I can
> guarantee that some string is "clean JavaScript." What's clean (purified
> / harmless) JavaScript anyway?
> I can just guarantee that the returned string is safe to use as a
> JavaScript string. That the returned string won't add any active code to
> your JavaScript (unless you use the JavaScript string later as event
> handler or the like).
>
> So returning a complete JavaScript string object is very clear IMO. Just
> like json_encode().

I agree that to some developers this will be clear.  But I'm betting
we're going to see a lot of people using this API wrong because they
simply won't expect that quotation marks are added automatically.
They're going to add a second set of quotes around it, and that's going
to cause weird errors.

Either way we go, both error modes are quite dangerous.
 - With ::js_string(), when adding double-quotes around it, the payload (e.g. "; do_something_scary();") could inject XSS.
 - With ::clean_js(), when forgetting to add quotes around it, the payload could obviously inject XSS as well.

But I can add a check in the XSS scanner to verify that a js_string isn't preceded by a double-quote.

If we rule out security as an issue, what other error modes do we have?
 - With ::js_string(), when adding quotes around it, it will break in weird ways.
 - With ::clean_js(), when forgetting to add quotes around it, it will break in weird ways.

So that's just the same either way.

In the end it's 
a) just a question of what's more intuitive, what's easier to understand. In this regard, I don't see any difference between the two approaches.
b) semantics about what "clean" means and "clean" should not be in this method name, for the reasons mentioned before. The returned string is not clean for JS.

I'm still not inclined to change this function name / behavior.

 - Andy


>
>  - Andy
>
>      var x = "<?= html::clean_js_string($some_string) ?>";
>
>     That's longer, but it's got "clean" in it as well as "js_string" and a
>     JS developer would see the outer quotes so would know that anything
>     inside it is definitely in a JS string context.
>
>     thoughts?
>
>     Everything else looks fine to me.
>
>     >
>     > clean_attribute() is another story. It somewhere similar to clean(),
>     > just specific for attributes.
>     >
>     >          when passing a URL to t() and t2() use html::mark_safe():
>     >         <span><?= t('Click <a href="%url">here</a>', array("url" =>
>     >         html::mark_safe(url::site("foo")))) ?></span>
>     >
>     >
>     >     And here, can we make it html::mark_clean() so that clean is
>     in the
>     >     API somewhere?
>     >
>     >
>     > OK, makes sense. Will do.
>     >
>     >  - Andy
>     >
>     >
>     >     -Bharat
>     >
>     >
>
>



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]

Re: Proposal for SafeString, refactoring of XSS protection code

by Bharat Mediratta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andy Staudacher wrote:
> In the end it's
> a) just a question of what's more intuitive, what's easier to
> understand. In this regard, I don't see any difference between the two
> approaches.
> b) semantics about what "clean" means and "clean" should not be in this
> method name, for the reasons mentioned before. The returned string is
> not clean for JS.
>
> I'm still not inclined to change this function name / behavior.

Ok.  I'm making an argument based on intuition, but perhaps we should
wait until we see what developers actually do with the API.  I'm fine
with leaving it alone for now.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
__[ g a l l e r y - d e v e l ]_________________________

[ list info/archive --> http://gallery.sf.net/lists.php ]
[ gallery info/FAQ/download --> http://gallery.sf.net ]