UPDATE: Solved

I had been calling FTPClient.setFileType() before I drenched in, leading to the FTP server to make use of the default mode (ASCII) regardless of what I place it to. The customer, however, was acting as if the file type have been correctly set. BINARY mode has become working just as preferred, moving the file byte-for-byte in most cases. All I needed to do would be a little traffic sniffing at in wireshark after which mimicing the FTP instructions using netcat to determine what happening. Why did not I think about that on friday!? Thanks, everybody for the help!

I've an xml file, utf-16 encoded, that we am installing from an FTP site using apache's commons-internet-2. java library's FTPClient. It provides support for 2 transfer modes: ASCII_FILE_TYPE and BINARY_FILE_TYPE, the main difference since ASCII will replace line separators using the appropriate local line separator ('\r\n' or simply '\n' -- in hex, 0x0d0a or simply 0x0a). My issue is this: I've got a test file, utf-16 encoded, that consists of the next:

<?xml version='1.0' encoding='utf-16'?>
<data>
    <blah>blah</blah>
</data>

Here's the hex:
0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
0000040: 0066 002d 0031 0036 0027 003f 003e 000a .f.-.1.6.'.?.>..
0000050: 003c 0064 0061 0074 0061 003e 000a 0009 .<.d.a.t.a.>....
0000060: 003c 0062 006c 0061 0068 003e 0062 006c .<.b.l.a.h.>.b.l
0000070: 0061 0068 003c 002f 0062 006c 0061 0068 .a.h.<./.b.l.a.h
0000080: 003e 000a 003c 002f 0064 0061 0074 0061 .>...<./.d.a.t.a
0000090: 003e 000a                                                            .>..

After I use ASCII way of this file it transfers properly, byte-for-byte the end result has got the same md5sum. Great. After I use BINARY transfer mode, which isn't designed to do not shuffle bytes from an InputStream into an OutputStream, as a result the newlines (0x0a) are transformed into carriage return + newline pairs (0x0d0a). Here's the hex after binary transfer:

0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
0000040: 0066 002d 0031 0036 0027 003f 003e 000d .f.-.1.6.'.?.>..
0000050: 0a00 3c00 6400 6100 7400 6100 3e00 0d0a ..<.d.a.t.a.>...
0000060: 0009 003c 0062 006c 0061 0068 003e 0062 ...<.b.l.a.h.>.b
0000070: 006c 0061 0068 003c 002f 0062 006c 0061 .l.a.h.<./.b.l.a
0000080: 0068 003e 000d 0a00 3c00 2f00 6400 6100 .h.>....<./.d.a.
0000090: 7400 6100 3e00 0d0a                                        t.a.>...

It doesn't only convert the newline figures (so it should not), however it does not respect the utf-16 encoding (not too I'd expect it to understand it should, it is simply a dumb FTP pipe). It makes sense unreadable without further processing to realign the bytes. I'd only use ASCII mode, but my application may also be moving real binary data (mp3 files and jpeg images) over the same pipe. While using BINARY transfer mode on these binary files also causes these to have random 0x0ds injected to their contents, which can't securely be removed because the binary data frequently consists of legitimate 0x0d0a sequences. Basically use ASCII mode on these files, then your "clever" FTPClient converts these 0x0d0as into 0x0a departing the file sporadic regardless of what I actually do.

I suppose my question(s) is(are): does anybody are conscious of worthwhile FTP libraries for java that simply slowly move the damned bytes after that to here, or can i need to hack up apache commons-internet-2. and keep my very own FTP client code for this purpose simple application? Has other people worked with this particular bizarre behavior? Any suggestions could be appreciated.

I examined the commons-internet source code also it does not seem like it's accountable for the strange behavior when BINARY mode can be used. However the InputStream it's reading through from in BINARY mode is simply a java.io.BufferedInptuStream wrapped around a socket InputStream. Do these lower level java streams ever inflict strange byte-manipulation? I'd be shocked when they did, however i aren't seeing what else might be happening here.

EDIT 1:

Here is a minimal bit of code that imitates what I am doing to download the file. To compile, simply do

javac -classpath /path/to/commons-net-2.0.jar Main.java

To operate, you will need sites /tmp/ascii and /tmp/binary for that file to download to, plus an ftp site setup using the file relaxing in it. The code must also be set up using the appropriate ftp host, account information. I place the file on my small testing ftp site underneath the test/ folder and known as the file test.xml. The exam file should a minimum of have several line, and become utf-16 encoded (it isn't really necessary, but will assist you to recreate my exact situation). I made use of vim's :set fileencoding=utf-16 command after opening a brand new file and joined the xml text recommended above. Finally, to operate, simply do

java -cp .:/path/to/commons-net-2.0.jar Main

Code:

(NOTE: this code modified to make use of custom FTPClient object, linked below under "EDIT 2")

import java.io.*;
import java.util.zip.CheckedInputStream;
import java.util.zip.CheckedOutputStream;
import java.util.zip.CRC32;
import org.apache.commons.net.ftp.*;

public class Main implements java.io.Serializable
{
    public static void main(String[] args) throws Exception
    {
        Main main = new Main();
        main.doTest();
    }

    private void doTest() throws Exception
    {
        String host = "ftp.host.com";
        String user = "user";
        String pass = "pass";

        String asciiDest = "/tmp/ascii";
        String binaryDest = "/tmp/binary";

        String remotePath = "test/";
        String remoteFilename = "test.xml";

        System.out.println("TEST.XML ASCII");
        MyFTPClient client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.ASCII_FILE_TYPE);
        File path = new File("/tmp/ascii");
        downloadFTPFileToPath(client, "test/", "test.xml", path);
        System.out.println("");

        System.out.println("TEST.XML BINARY");
        client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);
        path = new File("/tmp/binary");
        downloadFTPFileToPath(client, "test/", "test.xml", path);
        System.out.println("");

        System.out.println("TEST.MP3 ASCII");
        client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.ASCII_FILE_TYPE);
        path = new File("/tmp/ascii");
        downloadFTPFileToPath(client, "test/", "test.mp3", path);
        System.out.println("");

        System.out.println("TEST.MP3 BINARY");
        client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);
        path = new File("/tmp/binary");
        downloadFTPFileToPath(client, "test/", "test.mp3", path);
    }

    public static File downloadFTPFileToPath(MyFTPClient ftp, String remoteFileLocation, String remoteFileName, File path)
        throws Exception
    {
        // path to remote resource
        String remoteFilePath = remoteFileLocation + "/" + remoteFileName;

        // create local result file object
        File resultFile = new File(path, remoteFileName);

        // local file output stream
        CheckedOutputStream fout = new CheckedOutputStream(new FileOutputStream(resultFile), new CRC32());

        // try to read data from remote server
        if (ftp.retrieveFile(remoteFilePath, fout)) {
            System.out.println("FileOut: " + fout.getChecksum().getValue());
            return resultFile;
        } else {
            throw new Exception("Failed to download file completely: " + remoteFilePath);
        }
    }

    public static MyFTPClient createFTPClient(String url, String user, String pass, int type)
        throws Exception
    {
        MyFTPClient ftp = new MyFTPClient();
        ftp.connect(url);
        if (!ftp.setFileType( type )) {
            throw new Exception("Failed to set ftpClient object to BINARY_FILE_TYPE");
        }

        // check for successful connection
        int reply = ftp.getReplyCode();
        if (!FTPReply.isPositiveCompletion(reply)) {
            ftp.disconnect();
            throw new Exception("Failed to connect properly to FTP");
        }

        // attempt login
        if (!ftp.login(user, pass)) {
            String msg = "Failed to login to FTP";
            ftp.disconnect();
            throw new Exception(msg);
        }

        // success! return connected MyFTPClient.
        return ftp;
    }

}

EDIT 2:

Okay I adopted the CheckedXputStream advice and listed here are my results. I designed a copy of apache's FTPClient known as MyFTPClient, and that i wrapped both SocketInputStream and also the BufferedInputStream inside a CheckedInputStream using CRC32 checksums. In addition, I wrapped the FileOutputStream which i share with FTPClient to keep the output inside a CheckOutputStream with CRC32 checksum. The code for MyFTPClient is published here and I have modified the above mentioned test code to make use of this version from the FTPClient (attempted to publish a gist Hyperlink to the modified code, however i need 10 status indicates publish several URL!), test.xml and test.mp3 and also the outcome was thus:

14:00:08,644 DEBUG [main,TestMain] TEST.XML ASCII
14:00:08,919 DEBUG [main,MyFTPClient] Socket CRC32: 2739864033
14:00:08,919 DEBUG [main,MyFTPClient] Buffer CRC32: 2739864033
14:00:08,954 DEBUG [main,FTPUtils] FileOut CRC32: 866869773

14:00:08,955 DEBUG [main,TestMain] TEST.XML BINARY
14:00:09,270 DEBUG [main,MyFTPClient] Socket CRC32: 2739864033
14:00:09,270 DEBUG [main,MyFTPClient] Buffer CRC32: 2739864033
14:00:09,310 DEBUG [main,FTPUtils] FileOut CRC32: 2739864033

14:00:09,310 DEBUG [main,TestMain] TEST.MP3 ASCII
14:00:10,635 DEBUG [main,MyFTPClient] Socket CRC32: 60615183
14:00:10,635 DEBUG [main,MyFTPClient] Buffer CRC32: 60615183
14:00:10,636 DEBUG [main,FTPUtils] FileOut CRC32: 2352009735

14:00:10,636 DEBUG [main,TestMain] TEST.MP3 BINARY
14:00:11,482 DEBUG [main,MyFTPClient] Socket CRC32: 60615183
14:00:11,482 DEBUG [main,MyFTPClient] Buffer CRC32: 60615183
14:00:11,483 DEBUG [main,FTPUtils] FileOut CRC32: 60615183

This will make, essentially zero sense whatsoever because listed here are the md5sums from the corresponsing files:

bf89673ee7ca819961442062eaaf9c3f  ascii/test.mp3
7bd0e8514f1b9ce5ebab91b8daa52c4b  binary/test.mp3
ee172af5ed0204cf9546d176ae00a509  original/test.mp3

104e14b661f3e5dbde494a54334a6dd0  ascii/test.xml
36f482a709130b01d5cddab20a28a8e8  binary/test.xml
104e14b661f3e5dbde494a54334a6dd0  original/test.xml

I am baffled. I swear I've not permuted the filenames/pathways at any time within this process, and I have triple-checked everything. It should be something simple, however i haven't the foggiest idea where you can look next. Within the interest of functionality I am likely to proceed by calling to the spend to complete my FTP transfers, however i plan to pursue this until I realize what the heck is happening. I'll update this thread with my findings, and I'll still appreciate any contributions anybody might have. Hopefully this is helpful to a person sooner or later!