Skip to content

Reliability Hardening: Fixed tool_use conversion, streaming bridge, and upstream retries #91

@swe-squad-alpha

Description

@swe-squad-alpha

Overview

This issue documents the root cause analysis and resolution of several critical reliability and performance issues discovered during development.

Issues Resolved

1. Tool Use Conversion Gate (ISSUE-004)

  • Problem: Many OpenAI-compatible models (qwen, deepseek, minimax) were returning tool calls as plain text or were being suppressed by a model-prefix gate.
  • Fix: Removed the model-prefix gate in convert_litellm_to_anthropic and added a parser fallback for manual tool calls in content_text.
  • Result: Models like deepseek-v3.1 now correctly produce structured tool_use blocks.

2. Streaming Response RuntimeError

  • Problem: Agent would 'stall' or connections would drop during streaming.
  • Root Cause: HTTPException was being raised after the first SSE blocks were yielded.
  • Fix: Implemented a response_started flag. Any errors occurring after headers are sent are now yielded as standard Anthropic SSE error events.

3. Upstream Stalls (502/504 Errors)

  • Problem: Transient 502/504 errors from upstream providers were not being handled gracefully.
  • Fix: Implemented retry_with_backoff for both non-streaming and the initial phase of streaming requests.

4. Streaming Lifecycle Fix

  • Problem: AsyncClient created outside the generator lifecycle was being closed prematurely.
  • Fix: Moved AsyncClient lifecycle inside the stream generator.

Status

All fixes have been implemented and verified with complex tool-use tests. We have prepared a PR with these changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions