I've got a site with Drupal, apache solr and tomcat as host for apache solr. I edited the tomcat schema.xml make it possible for utf-8 support. Which enabled looks for utf-8 figures.

Nevertheless the actual resultset works suddenly. When looking for quite happy with utf-8 figures, apache solr returns quite happy with the "equivalent" character too.

Example Searching for lag (law) will return quite happy with låg (low). Completely different things in Swedish. Is easy to config. And for the reason that situation, where?

Appears like you will find the ASCIIFoldingFilterFactory setup inside your schema.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

This really is configurable by solr. when Solr indexes an archive (see type="index"), it uses the analyzers and filters you defined inside your schema. Furthermore, whenever you problem searching (see type="query"), the search again is going to be examined with a queryAnalyzer and filters. This is exactly what is determined within the schema. I recommend while using Solr direct web interface, and anlyze your query along with your indexing procedure.

for instance:

 <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" expand="false" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
    <filter catenateAll="0" catenateNumbers="1" catenateWords="1" class="solr.WordDelimiterFilterFactory" generateNumberParts="1" generateWordParts="1" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
    <filter catenateAll="0" catenateNumbers="0" catenateWords="0" class="solr.WordDelimiterFilterFactory" generateNumberParts="1" generateWordParts="1" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>

for instance we are able to add solr.ISOLatin1AccentFilterFactory for changing highlighted figures within the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent

I recommend searching at the schema once more.

OK thanks both!

Uncommenting the

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

Uncommenting the road above in type="index" och type="query" have been effective.

note <!-- below

 <analyzer type="query">
    <!--
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    -->
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="0"
            splitOnCaseChange="1"
            preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>