random memes }

Why FileInputStream is slow - continued

As noted earlier reads through FileInputStream were radically slower than I had expected. Hit a roadblock when I could not find the sources for SetByteArrayRegion(). Got the JDK7 sources to build locally, so went looking again - and finally found a mess of macros. A *partial* expansion of the SetByteArrayRegion() definition:

DT_VOID_RETURN_MARK_DECL(SetByteArrayRegion);

JNI_ENTRY(void, jni_SetByteArrayRegion(JNIEnv env, jbyteArray array, jsize start, jsize len, const ElementType buf)) JNIWrapper("SetByteArrayRegion"); DTRACE_PROBE5(hotspot_jni, SetByteArrayRegion__entry, env, array, start, len, buf); DT_VOID_RETURN_MARK(SetByteArrayRegion); typeArrayOop dst = typeArrayOop(JNIHandles::resolve_non_null(array)); if (start < 0 || len < 0 || ((unsigned int)start + (unsigned int)len > (unsigned int)dst->length())) { THROW(vmSymbols::java_lang_ArrayIndexOutOfBoundsException()); } else { if (len > 0) { int sc = typeArrayKlass::cast(dst->klass())->log2_element_size(); memcpy((u_char) dst->byte_at_addr(start), (u_char) buf, len < < sc); } } JNI_END

I think I know what this code is doing ... but no guarantees. Have to admit to having serious doubts about this code. This sort of massive macro expansion is more a characteristic of C code, not C++ code. Is this really necessary? I cannot tell - without spending a lot more time.

Re-wrote the readBytes() function to something more sensible - a single I/O call with no buffer allocation or copies.

int readBytes(JNIEnv env, jobject this, jbyteArray bytes, jint off, jint len, jfieldID fid) { int nread, datalen; jbyte pBuffer; FD fd;

if (IS_NULL(bytes)) { JNU_ThrowNullPointerException(env, 0); return -1; } datalen = (*env)->GetArrayLength(env, bytes);

if ((off < 0) || (off > datalen) || (len < 0) || ((off + len) > datalen) || ((off + len) < 0)) { JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException", 0); return -1; }

if (len == 0) { return 0; }

fd = GET_FD(this, fid); if (fd == -1) { JNU_ThrowIOException(env, "Stream Closed"); return -1; }

pBuffer = (*env)->GetByteArrayElements(env,bytes,0); nread = IO_Read(fd, pBuffer+off, len); (env)->ReleaseByteArrayElements(env,bytes,pBuffer,0); if (0 < nread) { / The read() is good / } else if (nread == JVM_IO_ERR) { JNU_ThrowIOExceptionWithLastError(env, "Read error"); } else if (nread == JVM_IO_INTR) { / EOF / JNU_ThrowByName(env, "java/io/InterruptedIOException", 0); } else { / EOF */ nread = -1; }

return nread; }

To measure the effect of the modified FileInputStream, first we need the measure throughput with feed-workers:

preston@mercury:~/workspace/feed-workers$ time ./feed-workers -n 1 -r which cat logs/_3.5G > /dev/null Sun Aug 17 23:30:29 2008 Scanning: logs/_3.5G Done with: logs/_3.5G Worker #12263 ended with status: 0 Sun Aug 17 23:32:00 2008 Elapsed (ms): 90235, total (MB): 3394 Scanned 37 MB/s

real 1m30.242s user 0m0.848s sys 0m11.437s

preston@mercury:~/workspace/feed-workers$ time ./feed-workers -n 1 -r which cat logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB logs/_400MB > /dev/null Sun Aug 17 23:46:58 2008 Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Scanning: logs/_400MB Done with: logs/_400MB Worker #13303 ended with status: 0 Sun Aug 17 23:47:02 2008 Elapsed (ms): 4337, total (MB): 4243 Scanned 978 MB/s

real 0m4.346s user 0m0.384s sys 0m3.932s

For this test machine, the C++ code manages 37MB/s reading from disk, and 978MB/s reading cached file data. (The machine used for the timings in this article is different from prior articles. The differences make sense - this is a new laptop with a faster CPU and slower disk.)

Running the prior Java file reader using FileInputStream against the stock JDK7 build:

preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.base/bin/java -version openjdk version "1.7.0-internal" OpenJDK Runtime Environment (build 1.7.0-internal-preston_2008_08_12_18_09-b00) OpenJDK 64-Bit Server VM (build 14.0-b01, mixed mode) preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.base/bin/java -jar upload/wide-finder-j.jar logs/_400MB logs/_400MB Wide Finder in Java

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1760 Rate: 241 MB/s

Time for READ - for measure File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1737> Rate: 244 MB/s

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1750 Rate: 242 MB/s

Time for READ - for measure File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1741 Rate: 243 MB/s

Running the same Java code, but using the JDK7 build with the FileInputStream using the modified readBytes():

preston@mercury:~/workspace/wide-finder-j$ ~/sources/j2sdk-image.new/bin/java -jar upload/wide-finder-j.jar logs/_400MB logs/_400MB Wide Finder in Java

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1770 Rate: 239 MB/s

Time for READ - for measure File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1734 Rate: 244 MB/s

FILE /home/preston/workspace/wide-finder-j/logs/_400MB

Time for READ - to get file cache consistent File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1762 Rate: 240 MB/s

Time for READ - for measure File: /home/preston/workspace/wide-finder-j/logs/_400MB Size: 444945620 Read: 444945620 Time: 1734 Rate: 244 MB/s

Right. No difference? Odd - there must be something eating a lot of CPU in the Java/JVM code. I'm not tooled up for profiling the Java/JVM code, so ... not sure if I am will get to the bottom of this one.

The point of this exercise is that I believe the FileInputStream read() code should - for large buffers - come very close to the performance of C++ code. Something is knocking down the cached-file-data read rate for Java code to about a fourth of C++ code. This is a good point to apply profiling - and the block of time needed may be more than I am willing to spend.