Skip to content

streamlining overhead in sse stream#988

Open
zeiler wants to merge 1 commit intomasterfrom
stream-overhead-reduction
Open

streamlining overhead in sse stream#988
zeiler wants to merge 1 commit intomasterfrom
stream-overhead-reduction

Conversation

@zeiler
Copy link
Member

@zeiler zeiler commented Mar 14, 2026

This pull request introduces optimizations to streaming and serialization logic for model outputs, focusing on reducing overhead and improving performance in common cases. The main improvements are fast-paths for single-output streaming, efficient status handling, and bypassing unnecessary parsing in OpenAI-compatible streaming.

Performance optimizations for streaming and serialization:

  • Added a fast-path for streaming single-output responses in generate_wrapper, precomputing the serializer and signature info to avoid repeated overhead in the serialization process.
  • Introduced pre-allocation of status protos and optimized status handling in runner_item_generate, including a fast-path for single output with SUCCESS status and improved handling for batch/multi-output cases. [1] [2]

OpenAI-compatible streaming improvements:

  • Implemented _raw_sse_stream in openai_class.py to stream raw SSE JSON strings, bypassing Pydantic parsing and reducing per-chunk overhead.
  • Modified openai_stream_transport to use the new _raw_sse_stream method for chat completions, further improving streaming performance by skipping unnecessary parsing.

General codebase updates:

  • Imported serializer_from_signature in model_class.py to support the new fast serialization logic.
  • Added local import of json as _json in openai_class.py for efficient JSON handling in streaming.

@zeiler zeiler requested a review from luv-bansal March 14, 2026 13:54
"""Create chat completions with retry logic."""
return self.client.chat.completions.create(**kwargs)

def _raw_sse_stream(self, completion_args: Dict[str, Any]) -> Iterator[str]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luv-bansal do you think this needs a retry decorator too?

@github-actions
Copy link

Code Coverage

Package Line Rate Health
clarifai 45%
clarifai.cli 61%
clarifai.cli.templates 67%
clarifai.cli.templates.toolkits 100%
clarifai.client 65%
clarifai.client.auth 67%
clarifai.constants 100%
clarifai.datasets 100%
clarifai.datasets.export 69%
clarifai.datasets.upload 75%
clarifai.datasets.upload.loaders 37%
clarifai.models 100%
clarifai.rag 0%
clarifai.runners 52%
clarifai.runners.models 58%
clarifai.runners.pipeline_steps 39%
clarifai.runners.pipelines 72%
clarifai.runners.utils 62%
clarifai.runners.utils.data_types 72%
clarifai.schema 100%
clarifai.urls 58%
clarifai.utils 65%
clarifai.utils.evaluation 16%
clarifai.workflows 95%
Summary 60% (11372 / 19003)

Minimum allowed line rate is 50%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant