Greetings,

I've the next Apache Lucene snippet that's giving me some nice results:

int numHits=100;
        int resultsPerPage=100;
        IndexSearcher searcher=new IndexSearcher(reader);
        TopScoreDocCollector collector=TopScoreDocCollector.create(numHits,true);
        Query q=parser.parse(queryString);
        searcher.search(q,collector);
        ScoreDoc[] hits=collector.topDocs(0*resultsPerPage,resultsPerPage).scoreDocs;

        Results r=new Results();
        r.length=hits.length;
        for(int i=0;i<hits.length;i++){
            Document doc=searcher.doc(hits[i].doc);
            double distanceKm=getGreatCircleDistance(lucene2double(doc.get("lat")), lucene2double(doc.get("lng")), Double.parseDouble(userLat), Double.parseDouble(userLng));
            double newRelevance=((1/distanceKm)*Math.log(hits[i].score)/Math.log(2))*(0-1);
            System.out.println(hits[i].doc+"\t"+hits[i].score+"\t"+doc.get("content")+"\t"+"Km="+distanceKm+"\trlvnc="+String.valueOf(newRelevance));
        } 

What I wish to know, is hits[i].score always between and 1? It appears this way, however i can not be sure. I have even checked the Lucene documentation (class ScoreDocs) with no success. You will see I am calculating the log from the "newRelevance" value, which is dependant on hits[i].score. I want hits[i].score to become between and 1, because if it's below zero, I'll have an error above 1 and also the sign can change from negative to positive.

I really hope some Lucene expert available can provide me some insight.

Thank you,

Yes, the score will be between and 1.

When Lucene computes the score, it finds individual scores for term hits within fields, etc... and totals them. When the greatest rated hit includes a total more than 1, all the document scores are normalised to become between and 1, using the greatest rated document getting a score of just one. You can definitely no document's total was more than 1, no normalisation happens and also the scores are came back as-is. For this reason sometimes the very best document includes a score of just one and in other cases includes a score less than 1.


EDIT: Getting done a little more research, the reply is probably no. Within the version of Lucene I'm acquainted with (v2.3.2), searches go through the Hits object, whose GetMoreDocs() method normalises scores if them are more than 1. In later versions, it seems to become that this isn't the situation because the Hits class is no more used. Whether your scores is going to be between and 1 will rely on which version of Lucene you're using, and which mechanism has been accustomed to search.

To quote in the Lucene mailing list:

The score is definitely an arbitrary number > . It isn't stabilized to anything, it will only be employed to e.g. sort the outcomes

In my opinion that Lucene scores will always be normalised, i.e. the very best-scoring hits get 1 (or close to it). The values should then continually be between 0 and 1. By extension, which means that the scores don't have any objective meaning, i.e. they can't be in comparison with anything apart from other hits in the same result set.

Disclaimer: I'm not a Lucene Researcher. This really is based only on my small findings of Lucene for action, though, I have never witnessed this really recorded, and so i might have got completely the incorrect finish from the stick.

The scores are between 1 and , however the top score doesn't have to become 1. Scores will always be in accordance with each other, along with a direct comparison shouldn't be made between lots of two different queries.