Skip to content

Why FileInputStream is slow – continued

As noted earlier reads through FileInputStream were radically slower than I had expected. Hit a roadblock when I could not find the sources for SetByteArrayRegion(). Got the JDK7 sources to build locally, so went looking again – and finally found a mess of macros. A *partial* expansion of the SetByteArrayRegion() definition:

DT_VOID_RETURN_MARK_DECL(SetByteArrayRegion);

JNI_ENTRY(void, jni_SetByteArrayRegion(JNIEnv *env, jbyteArray array, jsize start, jsize len, const ElementType *buf))
  JNIWrapper("SetByteArrayRegion");
  DTRACE_PROBE5(hotspot_jni, SetByteArrayRegion__entry, env, array, start, len, buf);
  DT_VOID_RETURN_MARK(SetByteArrayRegion); 
  typeArrayOop dst = typeArrayOop(JNIHandles::resolve_non_null(array)); 
  if (start < 0 || len < 0 || ((unsigned int)start + (unsigned int)len > (unsigned int)dst->length())) { 
    THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException()); 
  } else { 
    if (len > 0) { 
      int sc = typeArrayKlass::cast(dst->klass())->log2_element_size(); 
      memcpy((u_char*) dst->byte_at_addr(start), 
             (u_char*) buf, 
             len < < sc);    
    } 
  } 
JNI_END

I think I know what this code is doing ... but no guarantees. Have to admit to having serious doubts about this code. This sort of massive macro expansion is more a characteristic of C code, not C++ code. Is this really necessary? I cannot tell - without spending a lot more time.

Re-wrote the readBytes() function to something more sensible - a single I/O call with no buffer allocation or copies.

int
readBytes(JNIEnv *env, jobject this, jbyteArray bytes,
          jint off, jint len, jfieldID fid)
{
    int nread, datalen;
    jbyte* pBuffer;
    FD fd;

    if (IS_NULL(bytes)) {
        JNU_ThrowNullPointerException(env, 0);
        return -1;
    }
    datalen = (*env)->GetArrayLength(env, bytes);

    if ((off < 0) || (off > datalen) ||
        (len < 0) || ((off + len) > datalen) || ((off + len) < 0)) {
        JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException", 0);
        return -1;
    }

    if (len == 0) {
        return 0;
    } 

    fd = GET_FD(this, fid);
    if (fd == -1) {
        JNU_ThrowIOException(env, "Stream Closed");
        return -1;
    }

    pBuffer = (*env)->GetByteArrayElements(env,bytes,0);
    nread = IO_Read(fd, pBuffer+off, len);
    (*env)->ReleaseByteArrayElements(env,bytes,pBuffer,0);
    if (0 < nread) {
        /* The read() is good */
    } else if (nread == JVM_IO_ERR) {
        JNU_ThrowIOExceptionWithLastError(env, "Read error");
    } else if (nread == JVM_IO_INTR) { /* EOF */
        JNU_ThrowByName(env, "java/io/InterruptedIOException", 0);
    } else { /* EOF */
        nread = -1;
    }

    return nread;
}

To measure the effect of the modified FileInputStream, first we need the measure throughput with feed-workers:

preston@mercury:~/workspace/feed-workers$ time ./feed-workers -n 1 -r `which cat` logs/_3.5G > /dev/null
Sun Aug 17 23:30:29 2008
Scanning: logs/_3.5G
Done with: logs/_3.5G
Worker #12263 ended with status: 0
Sun Aug 17 23:32:00 2008
Elapsed (ms): 90235, total (MB): 3394
Scanned 37 MB/s

real	1m30.242s
user	0m0.848s
sys	0m11.437s

preston@mercury:~/workspace/feed-workers$ time ./feed-workers -n 1 -r `which cat` logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB > /dev/null
Sun Aug 17 23:46:58 2008
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Scanning: logs/_400MB
Done with: logs/_400MB
Worker #13303 ended with status: 0
Sun Aug 17 23:47:02 2008
Elapsed (ms): 4337, total (MB): 4243
Scanned 978 MB/s

real	0m4.346s
user	0m0.384s
sys	0m3.932s

For this test machine, the C++ code manages 37MB/s reading from disk, and 978MB/s reading cached file data. (The machine used for the timings in this article is different from prior articles. The differences make sense – this is a new laptop with a faster CPU and slower disk.)

Running the prior Java file reader using FileInputStream against the stock JDK7 build:

preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.base/bin/java -version
openjdk version "1.7.0-internal"
OpenJDK Runtime Environment (build 1.7.0-internal-preston_2008_08_12_18_09-b00)
OpenJDK 64-Bit Server VM (build 14.0-b01, mixed mode)
preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.base/bin/java  -jar upload/wide-finder-j.jar logs/_400MB logs/_400MB
Wide Finder in Java

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1760
Rate: 241 MB/s

Time for READ - for measure
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1737>
Rate: 244 MB/s

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1750
Rate: 242 MB/s

Time for READ - for measure
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1741
Rate: 243 MB/s

Running the same Java code, but using the JDK7 build with the FileInputStream using the modified readBytes():

preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.new/bin/java  -jar upload/wide-finder-j.jar logs/_400MB logs/_400MB
Wide Finder in Java

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1770
Rate: 239 MB/s

Time for READ - for measure
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1734
Rate: 244 MB/s

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1762
Rate: 240 MB/s

Time for READ - for measure
File: /home/preston/workspace/wide-finder-j/logs/_400MB
Size: 444945620
Read: 444945620
Time: 1734
Rate: 244 MB/s

Right. No difference? Odd – there must be something eating a lot of CPU in the Java/JVM code. I’m not tooled up for profiling the Java/JVM code, so … not sure if I am will get to the bottom of this one.

The point of this exercise is that I believe the FileInputStream read() code should – for large buffers – come very close to the performance of C++ code. Something is knocking down the cached-file-data read rate for Java code to about a fourth of C++ code. This is a good point to apply profiling – and the block of time needed may be more than I am willing to spend.