[backfill:2025-10-15..2025-10-15] Code Generation — 4 paper(s)

## 📋 backfill:2025-10-15..2025-10-15 · Code Generation — 4 paper(s)

> Auto-processed by the arXiv crawler pipeline. Review each paper and reply with the commands below.

---

### 1. David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation
**Authors:** Philipp Bauerfeind, Amir Salarpour, David Fernandez, Pedram MohajerAnsari, Johannes Reschke, et al.  
**Venue:** arXiv 2025/10 | `benchmark` `empirical`  
> The paper introduces NL2Scenic, a benchmark and dataset for generating automotive scenario code in the Scenic DSL from natural language descriptions. It evaluates thirteen large language models using both text-based and execution-based metrics, finding that GPT-4o performs best while smaller open-source models show competitive performance.
[📄 Paper](https://arxiv.org/abs/2510.14115)

---

### 2. Training LLM Agents to Empower Humans
**Authors:** Evan Ellis, Vivek Myers, Jens Tuyls, Sergey Levine, Anca Dragan, et al.  
**Venue:** arXiv 2025/10 | `benchmark`  
> Proposes Empower, a self-supervised fine-tuning method that maximizes human empowerment to create more effective assistive agents. The authors evaluate this approach through a user study and a new multi-turn code assistance benchmark, demonstrating improved human-agent collaboration.
[📄 Paper](https://arxiv.org/abs/2510.13709)

---

### 3. Automated Network Protocol Testing with LLM Agents
**Authors:** Yunze Wei, Kaiwen Wei, Shibo Du, Jianyu Wang, Zhangzhong Liu, et al.  
**Venue:** arXiv 2025/10  
> NeTestLLM is a multi-agent system designed for end-to-end automated network protocol testing by translating specifications into executable test cases. It utilizes hierarchical protocol understanding and iterative generation with runtime feedback to refine test artifacts and improve coverage across protocols like OSPF and BGP.
[📄 Paper](https://arxiv.org/abs/2510.13248)

---

### 4. A Matter of Representation: Towards Graph-Based Abstract Code Generation
**Authors:** Nyx Iskandar, Hisham Bedri, Andy Tsen  
**Venue:** arXiv 2025/10 | `benchmark`  
> This paper explores graph-based abstract code generation by proposing JSON representations to enable LLMs to generate logic for visual programming environments. It introduces ScratchTest, a benchmark based on a Python re-implementation of Scratch, to evaluate how different graph representations affect generation accuracy.
[📄 Paper](https://arxiv.org/abs/2510.13163)

---

**Review commands** (comment on this issue):
- `/approve all` — accept all papers
- `/approve 1,3` — accept papers 1 and 3
- `/reject 2` — discard paper 2
- `/approve 1,3 /reject 2` — mixed
- `/edit 1 category=code_generation` — change category before approving
- `/edit 1 venue=ICSE 2026` — fix venue


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backfill:2025-10-15..2025-10-15] Code Generation — 4 paper(s) #115

📋 backfill:2025-10-15..2025-10-15 · Code Generation — 4 paper(s)

1. David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation

2. Training LLM Agents to Empower Humans

3. Automated Network Protocol Testing with LLM Agents

4. A Matter of Representation: Towards Graph-Based Abstract Code Generation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[backfill:2025-10-15..2025-10-15] Code Generation — 4 paper(s) #115

Description

📋 backfill:2025-10-15..2025-10-15 · Code Generation — 4 paper(s)

1. David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation

2. Training LLM Agents to Empower Humans

3. Automated Network Protocol Testing with LLM Agents

4. A Matter of Representation: Towards Graph-Based Abstract Code Generation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions