-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/opentelemetry: add trace event for name resolution delay #8074
base: master
Are you sure you want to change the base?
stats/opentelemetry: add trace event for name resolution delay #8074
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8074 +/- ##
==========================================
- Coverage 82.29% 82.10% -0.19%
==========================================
Files 387 387
Lines 39065 38947 -118
==========================================
- Hits 32150 31979 -171
- Misses 5584 5635 +51
- Partials 1331 1333 +2
|
stats/handlers.go
Outdated
@@ -38,6 +38,8 @@ type RPCTagInfo struct { | |||
// FailFast indicates if this RPC is failfast. | |||
// This field is only valid on client side, it's always false on server side. | |||
FailFast bool | |||
// NameResolutionDelay indicates if the RPC was delayed due to address resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NameResolutionDelay indicates if there was a delay in the name resolution.
// This field is only valid on client side, it's always false on server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stream.go
Outdated
@@ -212,9 +216,13 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
} | |||
// Provide an opportunity for the first RPC to see the first service config | |||
// provided by the resolver. | |||
if err := cc.waitForResolvedAddrs(ctx); err != nil { | |||
isDelayed, err := cc.waitForResolvedAddrs(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isDelayed -> nameResDelayed/nameResolutionDelayed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stream.go
Outdated
@@ -416,8 +424,9 @@ func (cs *clientStream) newAttemptLocked(isTransparent bool) (*csAttempt, error) | |||
method := cs.callHdr.Method | |||
var beginTime time.Time | |||
shs := cs.cc.dopts.copts.StatsHandlers | |||
isDelayed, _ := ctx.Value(nameResolutionDelayKey).(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about naming the variable as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stream.go
Outdated
@@ -212,9 +216,13 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
} | |||
// Provide an opportunity for the first RPC to see the first service config | |||
// provided by the resolver. | |||
if err := cc.waitForResolvedAddrs(ctx); err != nil { | |||
isDelayed, err := cc.waitForResolvedAddrs(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning a bool and error is not good practice for go. It breaks the established pattern of error handling in Go because returned bool indicates success/failure in general. Can we do something better? It might be fine if we can't but we can try to look for better ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an approach as follows:
Add a nameResolutionDelay field: Add a new nameResolutionDelay field to the ClientConn struct to store the delay state.
Modify waitForResolvedAddrs: Set the nameResolutionDelay field directly in ClientConn instead of returning a boolean.
Access in newAttemptLocked: Use the nameResolutionDelay field from ClientConn within newAttemptLocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we want to add the field to clientconn because clientconn is not restricted to only single rpc. Returning a struct sounds better but we don't have any other field apart from boolean field. Let's keep bool, err for now. But make sure docstring is updated to indicate the bool correctly.
just to give you more context
|
Test Case 1: Fast Path (Line 699) Setup: Test Case 2: Waiting Path (Line 703) Setup: |
Done |
stream.go
Outdated
var mc serviceconfig.MethodConfig | ||
var onCommit func() | ||
newStream := func(ctx context.Context, done func()) (iresolver.ClientStream, error) { | ||
return newClientStreamWithParams(ctx, desc, cc, method, mc, onCommit, done, opts...) | ||
return newClientStreamWithParams(ctx, desc, cc, method, mc, onCommit, done, rpcInfo, opts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we just send the bool value instead of struct? Does it work? If yes, then that's more simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
|
||
// EmptyCall is a simple RPC that returns an empty response. | ||
func (s *server) EmptyCall(_ context.Context, _ *testgrpc.Empty) (*testgrpc.Empty, error) { | ||
return &testgrpc.Empty{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same. It should be testpb.Empty{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
defer cancel() | ||
client := testgrpc.NewTestServiceClient(clientConn) | ||
// First RPC call should succeed immediately. | ||
if _, err := client.EmptyCall(ctx, &testgrpc.Empty{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a testpb.Empty{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
defer cleanup() | ||
|
||
statsHandler := &testStatsHandler{} | ||
resolverBuilder := manual.NewBuilderWithScheme("instant") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it simple like rb := manual.NewBuilderWithScheme("")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/clientconn_test.go
Outdated
resolverBuilder := manual.NewBuilderWithScheme("instant") | ||
resolverBuilder.InitialState(resolver.State{Addresses: []resolver.Address{{Addr: stub.Address}}}) | ||
// Create a ClientConn using the manual resolver. | ||
clientConn, err := grpc.NewClient(resolverBuilder.Scheme()+":///test.server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it simple like cc, err := grpc.NewClient(resolverBuilder.Scheme()+":///test.server",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
close(resolutionReady) | ||
case <-rpcCompleted: | ||
t.Fatal("RPC completed prematurely before resolution was updated!") | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use test context for timeout
case <-ctx.Done():
t.Fatalf("Test setup timed out: %v", ctx.Err())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
t.Fatalf("RPC failed after resolution: %v", err) | ||
} | ||
t.Log("RPC completed successfully after resolution.") | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, use test context for timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
grpc.WithStatsHandler(statsHandler), | ||
) | ||
if err != nil { | ||
t.Fatalf("grpc.NewClient error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be t.Fatalf("NewClient() failed: %v", err)
test/clientconn_test.go
Outdated
defer cleanup() | ||
|
||
statsHandler := &testStatsHandler{} | ||
clientConn, resolverBuilder := createTestClient(t, "delayed", statsHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be cc, rb := createTestClient(t, "delayed", statsHandler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/clientconn_test.go
Outdated
}() | ||
// Simulate delayed resolution and unblock it via resolutionReady | ||
go func() { | ||
<-resolutionReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before <-resolutionReady, you can add the t.LogF("RPC waiting for resolved addresses")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -52,6 +52,9 @@ func populateSpan(rs stats.RPCStats, ai *attemptInfo) { | |||
) | |||
// increment previous rpc attempts applicable for next attempt | |||
atomic.AddUint32(&ai.previousRPCAttempts, 1) | |||
if ai.nameResolutionDelayed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong. As per https://github.com/grpc/proposal/blob/master/A72-open-telemetry-tracing.md#tracing-information, "Delayed name resolution" should be an event in the call span not attempt span.
This should be right place https://github.com/grpc/grpc-go/blob/master/stats/opentelemetry/client_tracing.go#L34. Before creating the attempt span, you need to retrieve the current call span using trace.SpanFromContext
and add an event to that span. Before that, also check if that event already exist or not. And only add, if it exist.
if _, err := client.EmptyCall(ctx, &testpb.Empty{}); err != nil { | ||
t.Fatalf("First RPC failed unexpectedly: %v", err) | ||
} | ||
// Verify that name resolution did not happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't mean "name resolution did not happen". Either it should be "name resolution did not happen again" or "Verify that RPC was not blocked on waiting for resolver to return addresses indicating no name resolution delay". I prefer the latter.
if err != nil { | ||
t.Fatalf("RPC failed after resolution: %v", err) | ||
} | ||
if !statsHandler.nameResolutionDelayed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment about verifying that RPC was blocked on resolver to return addresses indicating name resolution delay.
stats/opentelemetry: add trace event for name resolution delay.
RELEASE NOTES: None