TUN-8621: Prevent QUIC connection from closing before grace period after unregistering
Some checks failed
Check / check (1.22.x, macos-latest) (push) Has been cancelled
Check / check (1.22.x, ubuntu-latest) (push) Has been cancelled
Check / check (1.22.x, windows-latest) (push) Has been cancelled

Whenever cloudflared receives a SIGTERM or SIGINT it goes into graceful shutdown mode, which unregisters the connection and closes the control stream. Unregistering makes it so we no longer receive any new requests and makes the edge close the connection, allowing in-flight requests to finish (within a 3 minute period).
 This was working fine for http2 connections, but the quic proxy was cancelling the context as soon as the controls stream ended, forcing the process to stop immediately.

 This commit changes the behavior so that we wait the full grace period before cancelling the request
This commit is contained in:
GoncaloGarcia
2024-08-30 12:51:20 +01:00
committed by Gonçalo Garcia
parent ab0bce58f8
commit e05939f1c9
8 changed files with 53 additions and 16 deletions

View File

@@ -83,6 +83,7 @@ type QUICConnection struct {
rpcTimeout time.Duration
streamWriteTimeout time.Duration
gracePeriod time.Duration
}
// NewQUICConnection returns a new instance of QUICConnection.
@@ -100,6 +101,7 @@ func NewQUICConnection(
packetRouterConfig *ingress.GlobalRouterConfig,
rpcTimeout time.Duration,
streamWriteTimeout time.Duration,
gracePeriod time.Duration,
) (*QUICConnection, error) {
udpConn, err := createUDPConnForConnIndex(connIndex, localAddr, logger)
if err != nil {
@@ -136,6 +138,7 @@ func NewQUICConnection(
connIndex: connIndex,
rpcTimeout: rpcTimeout,
streamWriteTimeout: streamWriteTimeout,
gracePeriod: gracePeriod,
}, nil
}
@@ -158,8 +161,17 @@ func (q *QUICConnection) Serve(ctx context.Context) error {
// In the future, if cloudflared can autonomously push traffic to the edge, we have to make sure the control
// stream is already fully registered before the other goroutines can proceed.
errGroup.Go(func() error {
defer cancel()
return q.serveControlStream(ctx, controlStream)
// err is equal to nil if we exit due to unregistration. If that happens we want to wait the full
// amount of the grace period, allowing requests to finish before we cancel the context, which will
// make cloudflared exit.
if err := q.serveControlStream(ctx, controlStream); err == nil {
select {
case <-ctx.Done():
case <-time.Tick(q.gracePeriod):
}
}
cancel()
return err
})
errGroup.Go(func() error {
defer cancel()