I'm battling having a very fundamental regex condition in my .htaccess file which i hope someone may have the ability to shed some light on. The fundamental premise is the fact that I must train Apache to change any .html extension right into a .var extension. I'd believed that the rule could be positively trivial:
RewriteRule ^([^.]+)\.html$ $1.var
However the [^.] part simply does not work. Bizarrely, it really works like so
RewriteRule ^([^A-Z]+)\.html$ $1.var
I don't realise why this latter rule works. Assume I'm searching for personal files known as "index.html" then $1 should match to "index." and also the ".html" bit should really neglect to match.
To widen the scope from the question slightly, I'm really racking my brain regarding how to implement a multi-lingual site. I do not like Apache's MultiView option since it forces upon us a flat directory structure with file extensions that are not identifiable to a lot of development tools. I possibly could go the .var type-map route but am discovering that the default config for Apache does not support all of this that well either (hence my activities into regex land). So as i am using mod_rewrite, I'm convinced that I would go the entire hog: each time a request a title.html file is received which file doesn't exist, check whether there is available a XX/title.html file rather, where "XX" may be the language code based on the user's preferences.
This could produce a neater directory structure, although it does possibly not perform along with the .var approach in times in which the language preference from the user's browser isn't supported in by my website (by which situation .var would substitute EN or similar).
Any ideas? Thanks.
Why not only use
^(.*)\.html$? This can match any string that finishes in
.html. In the end, filenames can contain several us dot.
index when the regex is used situation-sensitively. Possibly for this reason? Why
[^.]+ should fail is beyond me, though.
. matches everything but newlines.
Within a personality class, the
^ means "not".
+ means a number of from the preceding character class.
Then when you are writing
([^.]+), that states "match a number of newlines". So unless of course you've got a URL made up of newlines then ".html", this can not work.
^([^A-Z]+)\.html$ is guaranteed as it matches a number of figures which are not uppercase letters. For those who have any uppercase letters prior to the ".html" inside your URL, that one will fail too.
Tim Pietzcker's suggestion is correct: only use
^(.*)\.html$,bearing in mind this will not operate in the odd situation you have newlines inside your URL.
Within the odd situation that you simply really have URL's with newlines inside them, you should use
^([\d\D]+)\.html$, that will match numbers and non-numbers (i.e. everything) up to the ".html".