This is actually the regular saying used for "shortcodes" in WordPress (one for the entire tag, other for that characteristics).

return '(.?)\[('.$tagregexp.')\b(.*?)(?:(\/))?\](?:(.+?)\[\/\2\])?(.?)';
$pattern = '/(\w+)\s*=\s*"([^"]*)"(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s|$)|(\w+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';

It parses things like

[foo bar="baz"]content[/foo]


[foo /]

Within the WordPress trac they are saying it's a little problematic, but my primary problem is it don't support shortcodes within the characteristics, as with

[foo bar="[baz /]"]content[/foo]

since the regex stops the primary shortcode in the first appearance of the closing bracket, so within the example it renders

[foo bar="[baz /]



shows because it is.

Can there be in whatever way to alter the regex therefore it bypass any occurrence of [ with ] and it is content when happens between your opening tag or self-closing tag?

What's your ultimate goal? Even when WordPress’ regex were better, the shortcode wouldn't be performed.

return '(.?)\[('.$tagregexp.')\b((?:"[^"]*"|.)*?)(?:/)?\](?:(.+?)\[\/\2\])?(.?)';

is really a variation around the first regex in which the bit that suits the characteristics continues to be transformed to capture strings completely without regard to what's inside them:


rather than


Observe that it does not handle strings with steered clear of quote figures inside them (yet - can be achieved, but could it be necessary?). I've not transformed other things because I'm not sure the syntax for WordPress shortcodes.

However it appears like it might have been cleared up just a little by getting rid of unnecessary backslashes and parentheses:

return '(.?)\[(foo)\b((?:"[^"]*"|.)*?)/?\](?:(.+?)\[/\2\])?(.?)';

Possibly further enhancements are warranted. I am a bit concerned about the unprecise us dot within the above snippet, and I'd rather use (?:"[^"]*"|[^/\]])* rather than (?:"[^"]*"|.)*?, but I'm not sure whether that will break another thing. Also, I'm not sure exactly what the leading and trailing (.?) are great for. They do not match anything inside your example so I'm not sure their purpose.

Would you like a drop-in alternative for your regex? That one enables attribute values to contain stuff that seem like tags, as with your example:


Or, in additional readable form:

/(.?)              # could be [
 \[(\w+)\b         # tag name
 ((?:[^"'\[\]]++   # attributes
 (?:(?<=(\/)\])   # '/' if self-closing
   |([^\[\]]*+)   # ...or content
    \[\/\2\]      # ...and closing tag
 )(.?)            # could be ]

When I comprehend it, $tagregexp within the original is definitely an alternation of all of the tag names which have been defined I replaced \w+ for readability. Everything the initial regex captures, that one does too, as well as in exactly the same groups. The only real difference would be that the / inside a self-closing tag is taken in group #3 together with the characteristics too as with its very own group (#4).

I do not think another regex must be transformed unless of course you need to add full support for tags baked into attribute values. That will also mean permitting for steered clear of quotes in that one, and I'm not sure how you would like to do this. Doubling them could be my prediction that's how Textpattern will it, and WordPress is allegedly according to that.

This is a great one of why applications like WordPress should not be implemented with regexes. The only method to add or change functionality is as simple as making the regexes bigger and uglier as well as harder to keep.