Skip to content

[backfill:2025-10-15..2025-10-15] Code Generation β€” 4 paper(s)Β #115

@Zhaoyang-Chu

Description

@Zhaoyang-Chu

πŸ“‹ backfill:2025-10-15..2025-10-15 Β· Code Generation β€” 4 paper(s)

Auto-processed by the arXiv crawler pipeline. Review each paper and reply with the commands below.


1. David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation

Authors: Philipp Bauerfeind, Amir Salarpour, David Fernandez, Pedram MohajerAnsari, Johannes Reschke, et al.
Venue: arXiv 2025/10 | benchmark empirical

The paper introduces NL2Scenic, a benchmark and dataset for generating automotive scenario code in the Scenic DSL from natural language descriptions. It evaluates thirteen large language models using both text-based and execution-based metrics, finding that GPT-4o performs best while smaller open-source models show competitive performance.
πŸ“„ Paper


2. Training LLM Agents to Empower Humans

Authors: Evan Ellis, Vivek Myers, Jens Tuyls, Sergey Levine, Anca Dragan, et al.
Venue: arXiv 2025/10 | benchmark

Proposes Empower, a self-supervised fine-tuning method that maximizes human empowerment to create more effective assistive agents. The authors evaluate this approach through a user study and a new multi-turn code assistance benchmark, demonstrating improved human-agent collaboration.
πŸ“„ Paper


3. Automated Network Protocol Testing with LLM Agents

Authors: Yunze Wei, Kaiwen Wei, Shibo Du, Jianyu Wang, Zhangzhong Liu, et al.
Venue: arXiv 2025/10

NeTestLLM is a multi-agent system designed for end-to-end automated network protocol testing by translating specifications into executable test cases. It utilizes hierarchical protocol understanding and iterative generation with runtime feedback to refine test artifacts and improve coverage across protocols like OSPF and BGP.
πŸ“„ Paper


4. A Matter of Representation: Towards Graph-Based Abstract Code Generation

Authors: Nyx Iskandar, Hisham Bedri, Andy Tsen
Venue: arXiv 2025/10 | benchmark

This paper explores graph-based abstract code generation by proposing JSON representations to enable LLMs to generate logic for visual programming environments. It introduces ScratchTest, a benchmark based on a Python re-implementation of Scratch, to evaluate how different graph representations affect generation accuracy.
πŸ“„ Paper


Review commands (comment on this issue):

  • /approve all β€” accept all papers
  • /approve 1,3 β€” accept papers 1 and 3
  • /reject 2 β€” discard paper 2
  • /approve 1,3 /reject 2 β€” mixed
  • /edit 1 category=code_generation β€” change category before approving
  • /edit 1 venue=ICSE 2026 β€” fix venue

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions