random memes }

Why FileInputStream is slow - continued

As noted earlier reads through FileInputStream were radically slower than I had expected. Hit a roadblock when I could not find the sources for SetByteArrayRegion(). Got the JDK7 sources to build locally, so went looking again - and finally found a mess of macros. A *partial* expansion of the SetByteArrayRegion() definition:

    DT_VOID_RETURN_MARK_DECL(SetByteArrayRegion);

    JNI_ENTRY(void, jni_SetByteArrayRegion(JNIEnv *env, jbyteArray array, jsize start, jsize len, const ElementType *buf))
      JNIWrapper("SetByteArrayRegion");
      DTRACE_PROBE5(hotspot_jni, SetByteArrayRegion__entry, env, array, start, len, buf);
      DT_VOID_RETURN_MARK(SetByteArrayRegion);
      typeArrayOop dst = typeArrayOop(JNIHandles::resolve_non_null(array));
      if (start < 0 || len < 0 || ((unsigned int)start + (unsigned int)len > (unsigned int)dst->length())) {
        THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException());
      } else {
        if (len > 0) {
          int sc = typeArrayKlass::cast(dst->klass())->log2_element_size();
          memcpy((u_char*) dst->byte_at_addr(start),
                 (u_char*) buf,
                 len < < sc);
        }
      }
    JNI_END

I think I know what this code is doing ... but no guarantees. Have to admit to having serious doubts about this code. This sort of massive macro expansion is more a characteristic of C code, not C++ code. Is this really necessary? I cannot tell - without spending a lot more time.

Re-wrote the readBytes() function to something more sensible - a single I/O call with no buffer allocation or copies.

    int
    readBytes(JNIEnv *env, jobject this, jbyteArray bytes,
              jint off, jint len, jfieldID fid)
    {
        int nread, datalen;
        jbyte* pBuffer;
        FD fd;

        if (IS_NULL(bytes)) {
            JNU_ThrowNullPointerException(env, 0);
            return -1;
        }
        datalen = (*env)->GetArrayLength(env, bytes);

        if ((off < 0) || (off > datalen) ||
            (len < 0) || ((off + len) > datalen) || ((off + len) < 0)) {
            JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException", 0);
            return -1;
        }

        if (len == 0) {
            return 0;
        }

        fd = GET_FD(this, fid);
        if (fd == -1) {
            JNU_ThrowIOException(env, "Stream Closed");
            return -1;
        }

        pBuffer = (*env)->GetByteArrayElements(env,bytes,0);
        nread = IO_Read(fd, pBuffer+off, len);
        (*env)->ReleaseByteArrayElements(env,bytes,pBuffer,0);
        if (0 < nread) {
            /* The read() is good */
        } else if (nread == JVM_IO_ERR) {
            JNU_ThrowIOExceptionWithLastError(env, "Read error");
        } else if (nread == JVM_IO_INTR) { /* EOF */
            JNU_ThrowByName(env, "java/io/InterruptedIOException", 0);
        } else { /* EOF */
            nread = -1;
        }

        return nread;
    }

To measure the effect of the modified FileInputStream, first we need the measure throughput with feed-workers:

    preston@mercury:~/workspace/feed-workers$ time ./feed-workers -n 1 -r `which cat` logs/_3.5G > /dev/null
    Sun Aug 17 23:30:29 2008
    Scanning: logs/_3.5G
    Done with: logs/_3.5G
    Worker #12263 ended with status: 0
    Sun Aug 17 23:32:00 2008
    Elapsed (ms): 90235, total (MB): 3394
    Scanned 37 MB/s

    real    1m30.242s
    user    0m0.848s
    sys     0m11.437s

    preston@mercury:~/workspace/feed-workers$ time ./feed-workers -n 1 -r `which cat` logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB > /dev/null
    Sun Aug 17 23:46:58 2008
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Scanning: logs/_400MB
    Done with: logs/_400MB
    Worker #13303 ended with status: 0
    Sun Aug 17 23:47:02 2008
    Elapsed (ms): 4337, total (MB): 4243
    Scanned 978 MB/s

    real    0m4.346s
    user    0m0.384s
    sys     0m3.932s

For this test machine, the C++ code manages 37MB/s reading from disk, and 978MB/s reading cached file data. (The machine used for the timings in this article is different from prior articles. The differences make sense - this is a new laptop with a faster CPU and slower disk.)

Running the prior Java file reader using FileInputStream against the stock JDK7 build:

    preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.base/bin/java -version
    openjdk version "1.7.0-internal"
    OpenJDK Runtime Environment (build 1.7.0-internal-preston_2008_08_12_18_09-b00)
    OpenJDK 64-Bit Server VM (build 14.0-b01, mixed mode)
    preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.base/bin/java  -jar upload/wide-finder-j.jar logs/_400MB logs/_400MB
    Wide Finder in Java

    FILE /home/preston/workspace/wide-finder-j/logs/_400MB

    Time for READ - to get file cache consistent
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1760
    Rate: 241 MB/s

    Time for READ - for measure
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1737>
    Rate: 244 MB/s

    FILE /home/preston/workspace/wide-finder-j/logs/_400MB

    Time for READ - to get file cache consistent
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1750
    Rate: 242 MB/s

    Time for READ - for measure
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1741
    Rate: 243 MB/s

Running the same Java code, but using the JDK7 build with the FileInputStream using the modified readBytes():

    preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.new/bin/java  -jar upload/wide-finder-j.jar logs/_400MB logs/_400MB
    Wide Finder in Java

    FILE /home/preston/workspace/wide-finder-j/logs/_400MB

    Time for READ - to get file cache consistent
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1770
    Rate: 239 MB/s

    Time for READ - for measure
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1734
    Rate: 244 MB/s

    FILE /home/preston/workspace/wide-finder-j/logs/_400MB

    Time for READ - to get file cache consistent
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1762
    Rate: 240 MB/s

    Time for READ - for measure
    File: /home/preston/workspace/wide-finder-j/logs/_400MB
    Size: 444945620
    Read: 444945620
    Time: 1734
    Rate: 244 MB/s

Right. No difference? Odd - there must be something eating a lot of CPU in the Java/JVM code. I'm not tooled up for profiling the Java/JVM code, so ... not sure if I am will get to the bottom of this one.

The point of this exercise is that I believe the FileInputStream read() code should - for large buffers - come very close to the performance of C++ code. Something is knocking down the cached-file-data read rate for Java code to about a fourth of C++ code. This is a good point to apply profiling - and the block of time needed may be more than I am willing to spend.