Skip to content

feat: add L1 instruction cache to core fetcher#58

Open
mohamedx2 wants to merge 2 commits into
adam-maj:masterfrom
mohamedx2:feature/instruction-cache
Open

feat: add L1 instruction cache to core fetcher#58
mohamedx2 wants to merge 2 commits into
adam-maj:masterfrom
mohamedx2:feature/instruction-cache

Conversation

@mohamedx2
Copy link
Copy Markdown

This pull request introduces the first major architectural upgrade from the "Next Steps", which is adding an internal Instruction Cache!

Previously, the fetcher.sv inside each core blindly issued memory fetch requests to the external Program Memory Controller for every single instruction, resulting in significant delays, especially for loops (like the matmul.asm kernel).

With this update:

  1. Implemented a tiny, direct-mapped L1 instruction cache in the fetcher.sv logic.
  2. The fetcher transitions straight to FETCHED in 1 cycle on a Cache Hit, bypassing the external Read operation.
  3. Updated the roadmap in README.md to show the first checklist ticket as complete.

Copilot AI review requested due to automatic review settings May 15, 2026 16:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an instruction-side caching mechanism to the core fetcher to reduce repeated program-memory read requests (especially beneficial for tight loops), and marks the corresponding roadmap item as complete.

Changes:

  • Added an instruction cache path in src/fetcher.sv to return cached instructions without issuing external program-memory reads on hits.
  • Added a skills-lock.json plus embedded-systems “skill” documentation under .agents/skills/....
  • Updated README.md “Next Steps” checklist item for instruction caching to completed.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/fetcher.sv Adds instruction cache storage/valid tracking and hit/miss logic to bypass program memory on hits.
README.md Marks “Add a simple cache for instructions” as completed in the roadmap checklist.
skills-lock.json Adds a lockfile describing an embedded-systems skill dependency (but path currently doesn’t match repo layout).
.agents/skills/embedded-systems/SKILL.md Adds embedded-systems skill definition doc.
.agents/skills/embedded-systems/references/rtos-patterns.md Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/power-optimization.md Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/microcontroller-programming.md Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/memory-optimization.md Adds embedded-systems reference material.
.agents/skills/embedded-systems/references/communication-protocols.md Adds embedded-systems reference material.
Comments suppressed due to low confidence (3)

src/fetcher.sv:46

  • Resetting every icache_data entry in a for-loop will synthesize to a large amount of reset logic (and may prevent inferring a memory/RAM). Since icache_valid is already cleared on reset, the data array doesn’t need to be reset; dropping the loop (or using an init file for simulation only) will reduce area and improve synth/FPGA mapping.
            icache_valid <= 0;
            for (i = 0; i < (1<<PROGRAM_MEM_ADDR_BITS); i = i + 1) begin
                icache_data[i] <= 0;
            end

src/fetcher.sv:56

  • icache_valid/icache_data are indexed with current_pc (fixed 8-bit), but their depth is based on PROGRAM_MEM_ADDR_BITS. If PROGRAM_MEM_ADDR_BITS != 8, this can become an out-of-range access or unintended truncation. To keep the parameter meaningful, make current_pc width PROGRAM_MEM_ADDR_BITS (and propagate through core/scheduler) or explicitly slice/cast the index to PROGRAM_MEM_ADDR_BITS.
                        if (icache_valid[current_pc]) begin
                            // Cache Hit
                            fetcher_state <= FETCHED;
                            instruction <= icache_data[current_pc];
                            mem_read_valid <= 0;

src/fetcher.sv:62

  • This PR’s main behavioral change is the cache-hit fast path (FETCHED without asserting mem_read_valid). There are existing cocotb integration tests, but they only assert final memory results and won’t catch regressions where the cache stops being used. Adding a test that asserts program memory reads drop on repeated PCs (e.g., a tight loop) or that cycle count decreases would make the cache behavior verifiable.
                    if (core_state == 3'b001) begin
                        if (icache_valid[current_pc]) begin
                            // Cache Hit
                            fetcher_state <= FETCHED;
                            instruction <= icache_data[current_pc];
                            mem_read_valid <= 0;
                        end else begin
                            // Cache Miss
                            fetcher_state <= FETCHING;
                            mem_read_valid <= 1;
                            mem_read_address <= current_pc;
                        end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/fetcher.sv
Comment on lines +32 to +35
// Simple Instruction Cache
reg [PROGRAM_MEM_DATA_BITS-1:0] icache_data [0:(1<<PROGRAM_MEM_ADDR_BITS)-1];
reg [(1<<PROGRAM_MEM_ADDR_BITS)-1:0] icache_valid;
integer i;
Comment thread skills-lock.json
"embedded-systems": {
"source": "jeffallan/claude-skills",
"sourceType": "github",
"skillPath": "skills/embedded-systems/SKILL.md",
Comment thread skills-lock.json
Comment on lines +1 to +11
{
"version": 1,
"skills": {
"embedded-systems": {
"source": "jeffallan/claude-skills",
"sourceType": "github",
"skillPath": "skills/embedded-systems/SKILL.md",
"computedHash": "046558a8e6cd2c7e3e1da41360d64687fc72514329e04aad15e5bb7c4d19681b"
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants