Are square brackets in Web addresses permitted?
I observed that Apache commons HttpClient (3..1) throws an IOException, wget and Opera however accept square brackets.
URL example: http://example.com/path/to/file.html
My HTTP client encounters such Web addresses but I am unsure whether or not to patch the code in order to throw the best (because it really ought to be).
RFC 3986 states
A number recognized by an online Protocol literal address, version 6 [RFC3513] or later, is distinguished by attaching the IP literal within square brackets ("[" and "]"). This may be the only place where square bracket figures are permitted within the URI syntax.
So you shouldn't be seeing such URI's within the wild theoretically, because they should arrive encoded.
Any browser or web-enabled software that accepts Web addresses and isn't tossing the best when special figures are introduced is nearly certain to be encoding the special figures behind the curtain. Curly brackets, square brackets, spaces, etc all have particular encoded methods for representing them so they won't produce conflicts. According to the prior solutions, the most secure way to cope with these would be to URL-scribe them before handing them on something which will attempt to solve the URL.
I understand this is a little old, however i wanted to notice that PHP uses brackets to pass through arrays inside a URL.
http://world wide web.example.com/foo.php?bar=1&bar=2&bar=3
Within this situation
$_GET['bar'] will contain
array(1, 2, 3).
Virtually the only real figures not permitted in pathnames are # and ? because they signify the finish from the path.
The uri rfc may have the definative answer:
Figures could be unsafe for several reasons. The area character is unsafe because significant spaces may disappear and minor spaces might be introduced when Web addresses are transcribed or typeset or exposed to treating word-processing programs. The figures "<" and ">" are unsafe since they're used because the delimiters around Web addresses in free text the quote mark (""") can be used to delimit Web addresses in certain systems. The smoothness "#" is unsafe and really should continually be encoded since it is utilized in Internet as well as in other systems to delimit a URL from the fragment/anchor identifier that may abide by it. The smoothness "%" is unsafe since it is employed for encodings of other figures. Other figures are unsafe because gateways along with other transport agents are recognized to sometimes modify such figures. These figures are "", "", "", "^", "~", "[", "]", and "`".
All unsafe figures should always be encoded inside a URL. For example, the smoothness "#" should be encoded within Web addresses even just in systems that don't normally cope with fragment or anchor identifiers, to ensure that when the URL is replicated into another system that does rely on them, it won't be essential to alter the URL encoding.
The reply is they should be hex encoded, but knowing postel's law, the majority of things need them verbatim.
For implementing the HttpClient commons class, you need to consider the org.apache.commons.httpclient.util.URIUtil class, particularly the scribe() method. Utilize it to URI-scribe the URL prior to trying to fetch it.
StackOverflow appears not to scribe them:
Better to URL scribe individuals, because they are clearly not supported in most web servers. Sometimes, even if there's a typical, not everybody follows it.
Based on the URL specs, the square brackets aren't valid URL figures.
Here's the appropriate clips:
The "national" and "punctuation" figures don't come in any productions and for that reason might not come in Web addresses.
national vline [ ] ^ ~
punctuation < >