Re: Regex to catch <p>s

View: New views
4 Messages — Rating Filter:   Alert me  

Re: Regex to catch <p>s

by Ryan S-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



<clip>
>  To say I suck at regex is an understatement so really need any help I can get on this, I have a page of text with different html tags in them, but each "block" of text has a <p> or a < class="something"> tag... anybody have any regex that will catch each of these paragraphs and put then into an array


If you're using php5 you can use DOM's getElementsByTagName.

If you still think you need to do some sort of regex it is possible
but it will be buggy at best.


</clip>

Nope, need a regex... guess I have no choice, either chancy regex or nothing... I know for a fact that the first paragraph tag wont contain a class, and for the <p> tags that contain a class="blah" does it matter that i know exactly what the classname is?



      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Regex to catch <p>s

by Shawn McKenzie :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ryan S wrote:

>
> <clip>
>>  To say I suck at regex is an understatement so really need any help I can get on this, I have a page of text with different html tags in them, but each "block" of text has a <p> or a < class="something"> tag... anybody have any regex that will catch each of these paragraphs and put then into an array
>
>
> If you're using php5 you can use DOM's getElementsByTagName.
>
> If you still think you need to do some sort of regex it is possible
> but it will be buggy at best.
>
>
> </clip>
>
> Nope, need a regex... guess I have no choice, either chancy regex or nothing... I know for a fact that the first paragraph tag wont contain a class, and for the <p> tags that contain a class="blah" does it matter that i know exactly what the classname is?
>
>
>
>       ____________________________________________________________________________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

preg_match_all('|<p[^>]*>(.*)</p>|Ui', $myText, $myArray);

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Regex to catch <p>s

by vester_s :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

$tag_regex=array(
  '/\<p(\s*)\>(.*?)\<\/p\> /si' => "$1",
  '/\<(\s*)(*.?)class\=(*.?)\>(.*?)\<\/(*.?)\>/si' => "$3"
);

$paragraphs=preg_replace(array_keys($tag_regex),array_values($tag_regex),$page);

I am not sure what tag is that you mean on <class="something">, but in this RE .. it should capture any <p> tags (the first element of the array) and any tags (the second element of the array) that has attribute class on it.

You can find another example of this kind of HTML parsing in the PHP... try googling it..:)

HTH

Ryan S-4 wrote:

<clip>
>  To say I suck at regex is an understatement so really need any help I can get on this, I have a page of text with different html tags in them, but each "block" of text has a <p> or a < class="something"> tag... anybody have any regex that will catch each of these paragraphs and put then into an array


If you're using php5 you can use DOM's getElementsByTagName.

If you still think you need to do some sort of regex it is possible
but it will be buggy at best.


</clip>

Nope, need a regex... guess I have no choice, either chancy regex or nothing... I know for a fact that the first paragraph tag wont contain a class, and for the <p> tags that contain a class="blah" does it matter that i know exactly what the classname is?



      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Parent Message unknown Re: Regex to catch <p>s

by Aschwin Wesselius-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ryan S wrote:

> Hey!
>
> Thansk for replying!
>
> <clip>
> It is obvious I haven't had my caffeine yet. This is my last try to
> get the pattern straight:
>
> <?php
>
> $html = <<<END_OF_HTML
>
> <b>hello</b>
> <b class="blah">hello</b>
> <p>those</p>
> <p class="blah">hello</p>
> <a>hello</a>
> <a href="url">this</a>
> <a>rose</a>
> <a href="regex yo">hello</a>
> <a>nose</a>
> <a id="2" href="regex yo">hello</a>
> <p>that</p>
> <p class="blah" title="whatever">hello</p>
> END_OF_HTML;
>
> $tags = array();
> $tags[] = 'p';
> $tags[] = 'a';
>
> $attr = array();
> $attr[] = 'class';
> $attr[] = 'href';
>
> $vals = array();
> $vals[] = 'blah';
> $vals[] = 'url';
> $vals[] = 'yo';
>
> $text = array();
> $text[] = 'hello';
> $text[] = 'this';
> $text[] = 'that';
>
> $tags = implode('|', $tags);
> $attr = implode('|', $attr);
> $vals = implode('|', $vals);
> $text = implode('|', $text);
>
> $pattern =
> '/<('.$tags.')[^>]*('.$attr.')?[^>]*('.$vals.')?[^>]*>('.$text.')[^<\/]*<\/\1>/i';
>
> echo $pattern."\n";
> echo "--------------------\n";
>
> preg_match_all($pattern, $html, $matches);
>
> var_dump($matches);
>
> ?>
> </clip>
>
> I dont get why you added this
> $tags[] = 'a';
>
> Does that mean I will have to make tags like that for all the html
> tags that i think will be on the page?

Hi,

I said before that the example could be a little bit overkill, but it
gives a quick example how to find any tag(s) given, with any
attribute(s) given and with any text given in between the opening and
closing tag.

And yes, it might be incomplete or maybe not even accurate, but it does
give you a headstart on your solution. There always will be people who
will give you a shorter, cleaner, more beautiful example, but I hope
that it was helpful for you or will be helpful for someone else.

Cheers,

Aschwin Wesselius