Wednesday, May 9, 2012

Disagreement on the Spdy Receive Window

‹prev | My Chain | next›

After some initial difficulty, I ended yesterday with spdylay able to make spdy/3 connections to the spdy-v3 branch of node-spdy. It was able to make a connection, but not complete—likely due to a control flow issue in either spdylay or node-spdy. So tonight I set out to isolate the problem and, ideally, fix it.

But first, I update spdylay. As I mentioned, I had some initial difficulty with the connect primarily due to a bug in node-spdy's SETTINGS frame. Chrome seems to silently ignore the problem, proceeding as if the extra data were not present. Spdylay, on the other hand, silently failed in response to the problem.

I had fully intended to report the silent failure as an issue, but was pleasantly surprised to find that Tatsuhiro Tsujikawa had not only read last night's work, but already added the feature. So I:
git fetch origin
git merge origin/master
make
make install
With that, I get started investigating my control flow issue.

First up tonight, I try a simpler case than last night—a default control flow request. That is, I call spdycat on my very large image without an alternate receive window size (no -w flag):
➜  spdylay git:(master) spdycat -v -n https://localhost:3000/images/pipelining.jpg
[  0.003] NPN select next protocol: the remote server offers:
          * spdy/3
          * spdy/2
          * http/1.1
          * http/1.0
          NPN selected the protocol: spdy/3
[  0.009] send SYN_STREAM frame <version=3, flags=1, length=118>
          (stream_id=1, assoc_stream_id=0, pri=3)
          :host: localhost:3000
          :method: GET
          :path: /images/pipelining.jpg
          :scheme: https
          :version: HTTP/1.1
          accept: */*
          user-agent: spdylay/0.1.0
[  0.010] recv SETTINGS frame <version=3, flags=0, length=12>
          (niv=1)
          [4(1):100]
[  0.018] recv SYN_REPLY frame <version=3, flags=0, length=34>
          (stream_id=1) 
          :status: 200  
          :version: HTTP/1.1
[  0.018] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.018] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.018] recv DATA frame (stream_id=1, flags=0, length=1300)
...
[  0.422] send WINDOW_UPDATE frame 
          (stream_id=1, delta_window_size=33160)
...
[  0.462] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.462] recv DATA frame (stream_id=1, flags=0, length=369)
[  0.462] recv DATA frame (stream_id=1, flags=1, length=0)
[  0.462] send GOAWAY frame 
          (last_good_stream_id=0)
As there should be, spdycat issues numerous WINDOW_UPDATEs to the server along the way to indicate that it is ready to process more data. When node-spdy sees this, it send the data along until the receive window is again empty and blocks for another WINDOW_UPDATE.

Everything looks OK there although the last line is somewhat enigmatic. It seems like stream_id=1 was successful, but spdycat reports the last "good" stream_id as "0". As a quick sanity check, I load the page containing this image in Chrome. In the SPDY tab of chrome://net-internals, I see that the last two DATA frames are 1300 and 369 bytes long, just as with spdycat:
SPDY_SESSION_RECV_DATA
--> flags = 0
--> size = 1300
--> stream_id = 7
SPDY_SESSION_RECV_DATA
--> flags = 0
--> size = 369
--> stream_id = 7
SPDY_SESSION_RECV_DATA
--> flags = 0
--> size = 0
--> stream_id = 7
(the last DATA frame is a FIN by virtue of its zero size)

So I take from this that spdycat and Chrome both agree up on the default case. As for the non-default case, when I tell spdycat to use an initial receive window of 16kb (214 = 16kb), I still do not get complete transfer:
➜  spdylay git:(master) spdycat -v -w14 -n https://localhost:3000/images/pipelining.jpg
...
[  0.008] send SETTINGS frame 
          (niv=1)
          [7(0):16384]
[  0.013] send SYN_STREAM frame 
          (stream_id=1, assoc_stream_id=0, pri=3)
          :host: localhost:3000
          :method: GET
          :path: /images/pipelining.jpg
          :scheme: https
          :version: HTTP/1.1
          accept: */*
          user-agent: spdylay/0.1.0
...
[  0.231] send WINDOW_UPDATE frame 
          (stream_id=1, delta_window_size=9100)
[  0.256] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.295] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.295] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.295] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.296] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.296] recv DATA frame (stream_id=1, flags=0, length=1300)

                                 

Fatal
There are actually six WINDOW_UPDATEs issued by spdycat and processed correctly by node-spdy. For whatever reason, that last one fails. The most obvious cause is that spdycat and node-spdy disagree on the remaining window size. To answer that, I need to see what node-spdy thinks remains...

I add a console.log() to node-spdy to show me this information:
Stream.prototype._writeData = function _writeData(fin, buffer) {
  if (this.framer.version == 3) {
    if (this._transferWindowSize > 0) {
      this._transferWindowSize -= buffer.length;
      console.log("I think the transferWindowSize is: " + this._transferWindowSize);
    }

    if (this._transferWindowSize < 0) {
      this._transferWindowBuffer.push([fin, buffer]);
      return;
    }
  }

  // actually write to the stream...
};
I re-run spdycat -w14, and again get the unexplained pause. Back in the node-spdy window, I find:
I think the transferWindowSize is: 15084
I think the transferWindowSize is: 13784
I think the transferWindowSize is: 12484
I think the transferWindowSize is: 11184
I think the transferWindowSize is: 9884
I think the transferWindowSize is: 8584
I think the transferWindowSize is: 7284
I think the transferWindowSize is: 5984
I think the transferWindowSize is: 4684
I think the transferWindowSize is: 3384
I think the transferWindowSize is: 2084
I think the transferWindowSize is: 784
I think the transferWindowSize is: -516
I think the transferWindowSize is: 7284
I think the transferWindowSize is: 5984
I think the transferWindowSize is: 4684
I think the transferWindowSize is: 3384
I think the transferWindowSize is: 2084
I think the transferWindowSize is: 784
I think the transferWindowSize is: -516
...
The pattern of 2084, 784 and -516 remaining repeats itself over and over again until...
...
I think the transferWindowSize is: 1424
I think the transferWindowSize is: 124
I think the transferWindowSize is: -1176
...
I do not know if that is important because it is followed by another successful pattern of 2084, 784 and -516, followed by the one that hangs the connection:
...
I think the transferWindowSize is: 2084
I think the transferWindowSize is: 784
I think the transferWindowSize is: -516
I think the transferWindowSize is: 7284
I think the transferWindowSize is: 5984
I think the transferWindowSize is: 4684
I think the transferWindowSize is: 3384
I think the transferWindowSize is: 2084
I think the transferWindowSize is: 784
I think the transferWindowSize is: -516
Unfortunately, I seem to be stuck here. I think node-spdy is calculating this correctly—it definitely works with Chrome and spdylay for the default case.

Update: I am able to spdylay to transfer the large image if I switch the receive window exhaustion around in node-spdy. That is, if I calculate the receive window size after the decision is made as to whether or not the window is exhausted:
Stream.prototype._writeData = function _writeData(fin, buffer) {
  if (this.framer.version == 3) {
    if (this._transferWindowSize <= 0) {
      this._transferWindowBuffer.push([fin, buffer]);
      return;
    }

    if (this._transferWindowSize > 0) {
      this._transferWindowSize -= buffer.length;
    }
  }

  // actually write to the stream...
};
Then, everything works in spdycat for the default case (no -w option):
➜  spdylay git:(master) spdycat -v -n https://localhost:3000/images/pipelining.jpg
...
[  0.162] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.162] recv DATA frame (stream_id=1, flags=0, length=369)
[  0.162] recv DATA frame (stream_id=1, flags=1, length=0)
[  0.162] send GOAWAY frame 
          (last_good_stream_id=0)
It works with -w16 (which should be the same as the default case):
➜  spdylay git:(master) spdycat -v -w16 -n https://localhost:3000/images/pipelining.jpg
...
[  0.166] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.166] recv DATA frame (stream_id=1, flags=0, length=369)
[  0.166] recv DATA frame (stream_id=1, flags=1, length=0)
[  0.167] send GOAWAY frame 
          (last_good_stream_id=0)
And it even works with a non-default transfer window like 16kb:
➜  spdylay git:(master) spdycat -v -w14 -n https://localhost:3000/images/pipelining.jpg
...
[  0.514] recv DATA frame (stream_id=1, flags=0, length=1300)
[  0.514] recv DATA frame (stream_id=1, flags=0, length=369)
[  0.514] recv DATA frame (stream_id=1, flags=1, length=0)
Fatal
(I am unsure why there is a "Fatal" at the end there—it also pops up on occasion for the -w16 case)

So problem solved? I just need to keep the updated calculation and I am done, right? Well, no. Unfortunately Chrome does not like this control flow calculation. Both large images in my sample web page now fail to load:


And, checking the SPDY tab, I see that Chrome disagrees with the transfer window size:
...
SPDY_SESSION_RECV_DATA
--> flags = 0
--> size = 1300
--> stream_id = 7
SPDY_SESSION_SEND_RST_STREAM
--> description = "Negative recv window size"
--> status = 1
--> stream_id = 7
...
I know that this updated implementation is incorrect because Chrome refuses it. That said, I don't know that spdylay's implementation is incorrect—my implementation may only work incidentally. Hopefully Tatsuhiro is paying attention again...



Day #381

1 comment:

  1. node-spdy sends 1300 bytes DATA frame regardless of the remaining window size. If the current window size if less than 1300, sending 1300 bytes DATA leads to negative window size; Chrome does not like this and issues RST_STREAM with "Negative recv window size". For my environment, this occurs 1 out of 10 access from Chrome, I think this is timing issue. To fix this problem, the easy approach is slice data according to remaining window size:

    diff --git a/lib/spdy/server.js b/lib/spdy/server.js
    index 8f1a89e..7e4e623 100644
    --- a/lib/spdy/server.js
    +++ b/lib/spdy/server.js
    @@ -471,19 +471,22 @@ Stream.prototype.drainWindow = function(size) {
    // Internal function
    //
    Stream.prototype._writeData = function _writeData(fin, buffer) {
    + var len;
    if (this.framer.version == 3) {
    - if (this._transferWindowSize > 0)
    - this._transferWindowSize -= buffer.length;
    -
    - if (this._transferWindowSize < 0) {
    + if (this._transferWindowSize <= 0) {
    this._transferWindowBuffer.push([fin, buffer]);
    return;
    }
    + len = Math.min(this._transferWindowSize, buffer.length);
    + if(len < buffer.length) {
    + this._transferWindowBuffer.push([fin, buffer.slice(len)]);
    + fin = false;
    + }
    + this._transferWindowSize -= len;
    }
    -
    this.lock(function() {
    var stream = this,
    - frame = this.framer.dataFrame(this.id, fin, buffer);
    + frame = this.framer.dataFrame(this.id, fin, buffer.slice(0, len));

    stream.connection.scheduler.schedule(stream, frame);
    stream.connection.scheduler.tick();

    I tested above change with Chrome and have not encountered flow control error so far.

    As for "Fatal" output of spdycat, it turns out node-spdy closes TCP socket after sending response. I think this is not fatal because all response data has been sent, so I'll change the code not to display this.

    As for last-good-stream-id, the SPDY spec says that it is the last stream ID the sender of GOAWAY replied to. In this case, spdycat does not replied any stream, it is 0.

    ReplyDelete