feat: add L1 instruction cache to core fetcher by mohamedx2 · Pull Request #58 · adam-maj/tiny-gpu

mohamedx2 · 2026-05-15T16:53:37Z

This pull request introduces the first major architectural upgrade from the "Next Steps", which is adding an internal Instruction Cache!

Previously, the fetcher.sv inside each core blindly issued memory fetch requests to the external Program Memory Controller for every single instruction, resulting in significant delays, especially for loops (like the matmul.asm kernel).

With this update:

Implemented a tiny, direct-mapped L1 instruction cache in the fetcher.sv logic.
The fetcher transitions straight to FETCHED in 1 cycle on a Cache Hit, bypassing the external Read operation.
Updated the roadmap in README.md to show the first checklist ticket as complete.

Copilot

Pull request overview

This PR adds an instruction-side caching mechanism to the core fetcher to reduce repeated program-memory read requests (especially beneficial for tight loops), and marks the corresponding roadmap item as complete.

Changes:

Added an instruction cache path in src/fetcher.sv to return cached instructions without issuing external program-memory reads on hits.
Added a skills-lock.json plus embedded-systems “skill” documentation under .agents/skills/....
Updated README.md “Next Steps” checklist item for instruction caching to completed.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/fetcher.sv	Adds instruction cache storage/valid tracking and hit/miss logic to bypass program memory on hits.
README.md	Marks “Add a simple cache for instructions” as completed in the roadmap checklist.
skills-lock.json	Adds a lockfile describing an embedded-systems skill dependency (but path currently doesn’t match repo layout).
.agents/skills/embedded-systems/SKILL.md	Adds embedded-systems skill definition doc.
.agents/skills/embedded-systems/references/rtos-patterns.md	Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/power-optimization.md	Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/microcontroller-programming.md	Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/memory-optimization.md	Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/communication-protocols.md	Adds embedded-systems reference material.

Comments suppressed due to low confidence (3)

src/fetcher.sv:46

Resetting every icache_data entry in a for-loop will synthesize to a large amount of reset logic (and may prevent inferring a memory/RAM). Since icache_valid is already cleared on reset, the data array doesn’t need to be reset; dropping the loop (or using an init file for simulation only) will reduce area and improve synth/FPGA mapping.

            icache_valid <= 0;
            for (i = 0; i < (1<<PROGRAM_MEM_ADDR_BITS); i = i + 1) begin
                icache_data[i] <= 0;
            end

src/fetcher.sv:56

icache_valid/icache_data are indexed with current_pc (fixed 8-bit), but their depth is based on PROGRAM_MEM_ADDR_BITS. If PROGRAM_MEM_ADDR_BITS != 8, this can become an out-of-range access or unintended truncation. To keep the parameter meaningful, make current_pc width PROGRAM_MEM_ADDR_BITS (and propagate through core/scheduler) or explicitly slice/cast the index to PROGRAM_MEM_ADDR_BITS.

                        if (icache_valid[current_pc]) begin
                            // Cache Hit
                            fetcher_state <= FETCHED;
                            instruction <= icache_data[current_pc];
                            mem_read_valid <= 0;

src/fetcher.sv:62

This PR’s main behavioral change is the cache-hit fast path (FETCHED without asserting mem_read_valid). There are existing cocotb integration tests, but they only assert final memory results and won’t catch regressions where the cache stops being used. Adding a test that asserts program memory reads drop on repeated PCs (e.g., a tight loop) or that cycle count decreases would make the cache behavior verifiable.

                    if (core_state == 3'b001) begin
                        if (icache_valid[current_pc]) begin
                            // Cache Hit
                            fetcher_state <= FETCHED;
                            instruction <= icache_data[current_pc];
                            mem_read_valid <= 0;
                        end else begin
                            // Cache Miss
                            fetcher_state <= FETCHING;
                            mem_read_valid <= 1;
                            mem_read_address <= current_pc;
                        end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    // Simple Instruction Cache
+    reg [PROGRAM_MEM_DATA_BITS-1:0] icache_data [0:(1<<PROGRAM_MEM_ADDR_BITS)-1];
+    reg [(1<<PROGRAM_MEM_ADDR_BITS)-1:0] icache_valid;
+    integer i;


+    "embedded-systems": {
+      "source": "jeffallan/claude-skills",
+      "sourceType": "github",
+      "skillPath": "skills/embedded-systems/SKILL.md",


+{
+  "version": 1,
+  "skills": {
+    "embedded-systems": {
+      "source": "jeffallan/claude-skills",
+      "sourceType": "github",
+      "skillPath": "skills/embedded-systems/SKILL.md",
+      "computedHash": "046558a8e6cd2c7e3e1da41360d64687fc72514329e04aad15e5bb7c4d19681b"
+    }
+  }
+}


mohamedx2 added 2 commits May 15, 2026 17:43

chore: add embedded-systems agent skill

df4d820

feat: add L1 instruction cache to core fetcher

2f8f444

Copilot AI review requested due to automatic review settings May 15, 2026 16:53

Copilot AI reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add L1 instruction cache to core fetcher#58

feat: add L1 instruction cache to core fetcher#58
mohamedx2 wants to merge 2 commits into
adam-maj:masterfrom
mohamedx2:feature/instruction-cache

mohamedx2 commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mohamedx2 commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants