I am searching for a method to improve this regular expression:


This extracts the sixth area of the point.separated.string.of.arbitrary.measures as much as "_tid"

Therefore if it appears such as this:


it will return


Funnily enough, basically take away the _tid area of the regex ^(?:([^.]+).?){6}, I recieve the performance I had been searching for.. one to two seconds for any million strings to check on. Using the _tid.. it requires as much as a few minutes.

It is possible to better method of doing this?

EDIT: Ah, I didn't remember to say, this really is in Apache Pig, so everything ought to be within the regex clause.

I'd first split the String on ., obtain the sixth part, split it on _, get part one:


Not examined!

You didn't remember to flee the us dot, do this


by doing this your regex has a smaller amount options to complement. The "." without getting away matches any character (without line break figures).

Another possibility I see is eliminating the not compulsory us dot


View it here on Regexr

That one appears to operate faster than yours:


That one provides me with the very best performance results:

    Pattern p = Pattern.compile(".*\\.([^_]+)_tid.*");