I am really creating a system in which you input some text files to some StandardAnalyzer, and also the items in that file are then changed through the creation of the StandardAnalyzer (which tokenizes and removes all of the stop words). The code ive developed till now's :

    File f = new File(path);

    TokenStream stream = analyzer.tokenStream("contents", 
            new StringReader(readFileToString(f)));

    CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);

        while (stream.incrementToken()) {
            String term = charTermAttribute.toString();

           //Following is the readFileToString(File f) function
     StringBuilder textBuilder = new StringBuilder();
     String ls = System.getProperty("line.separator");
     Scanner scanner = new Scanner(new FileInputStream(f));

     while (scanner.hasNextLine()){
          textBuilder.append(scanner.nextLine() + ls);
    return textBuilder.toString();

The readFileToString(f) is a straightforward function which converts the file contents to some string representation. The output i am getting would be the words each using the spaces or even the new line together removed. It is possible to method to preserve the initial spaces or even the new line figures following the analyzer output, to ensure that i'm able to replace the initial file contents using the strained items in the StandardAnalyzer and offer it inside a readable form?

Tokenizers save the word position, so theoretically you could think about the positioning to find out the number of figures you will find in between each token, however they don't save the information that was between your tokens. So you can get back spaces, although not newlines.

If you are confident with JFlex you can customize the tokenizer to deal with newlines as a sign. That's most likely harder than any gain you'd get from this though.