reforcemind · elprofesoriqo · Mar 31, 2026 · Mar 29, 2026 · Mar 29, 2026 · Mar 30, 2026
diff --git a/README.md b/README.md
@@ -6,14 +6,18 @@ Multi-Agent Reinforcement Learning (MARL) cybersecurity simulator mathematically
 **Project:** NetForge RL
 **GNN-based Policy Model:** https://github.com/elprofesoriqo/GNN-based-Policy-Model-for-MARL-Cyber
 
-## Architectural Overhaul Notice
+## Architectural Changes & State-of-the-Art Modeling
 
-This repository represents a complete structural redesign of the original CybORG framework. I took ownership of this branch because the legacy CybORG environment was fundamentally restricted to single-agent, turn-based paradigms (utilizing nested OpenAI Gym wrappers) which artificially broke parallel gradients and hindered true Multi-Agent research.
+This repository is a dramatic evolution from the legacy CybORG / CAGE challenge environment. While acknowledging the incredible fundamental work by DSTG, NetForge RL transitions the paradigm from a synchronous, fully observable game into a high-fidelity, physically constrained network simulation designed for real-world Sim-to-Real transfer.
 
 ### What is Different?
-1. **Parallel Execution via PettingZoo:** The core simulator is now strictly built upon the `pettingzoo.ParallelEnv` standard instead of monolithic Gym wrappers. Red and Blue teams act in a simultaneous time vacuum, and the engine natively resolves their conflicting action intents.
-2. **Abstract Action Engine:** Actions no longer mutate simulator state directly via complex monolithic switch statements. `BaseAction` computes an `ActionEffect` (JSON representation of physical network impact), which the core environment evaluates and securely commits.
-3. **No Legacy Bloat:** I have deleted all obsolete OpenAI Gym references, redundant CAGE challenge sub-modules, and unneeded demo code. 
+1. **Interruptible Tick-Based Engine:** CybORG's instantaneous actions are gone. NetForge RL runs on an asynchronous `current_tick` clock. Actions have a `duration` natively. Real-time interruptions exist: if the SOC isolates a host mid-exfiltration, the attacker's action is aborted.
+2. **Strict POMDP Isolation & Fog of War:** Defenders do not see the ground truth. They receive dynamic telemetry alerts generated by a newly implemented `siem_log_buffer` suffering from realistic `log_latency`. Background noise agents obfuscate true malicious alerts.
+3. **MultiDiscrete Tensors & Procedural Networks:** To avoid static overfitting and combinatorial explosions, Action spaces utilize `MultiDiscrete` Arrays (e.g. `[ActionType, TargetIP]`). Topologies procedurally generate up to 50 active nodes utilizing padded masking dynamically.
+4. **Attack Economics & Cost Mechanics:** Each agent is bounded by Operational Budgets (`agent_funds`, `agent_compute`). Reckless defensive isolation triggers massive Business Downtime mathematical penalties mirroring real-world SLA fines.
+5. **Cyber-Physical (OT) Convergence:** Generating distinct `OT_Subnets` featuring `PLC` nodes mapping thermodynamic vulnerabilities. Red operators can inflict catastrophic Kinetic Impacts `(+10000/-10000 rewards)` overriding logical state tracking entirely.
+6. **Social Engineering (Stochastics):** DMZ architectures can natively be bypassed by Red teams leveraging `SpearPhishing` arrays scaled against dynamically rolled `human_vulnerability_score` matrix properties. Blue counters this via explicit `SecurityAwarenessTraining` capital expenditure.
+7. **Ray RLlib & PyTorch LSTMs:** Packaged natively with Custom PyTorch Models linking Recurrent Memory sequences (LSTMs) alongside mathematical boolean Action Masking dropping invalid tensor networks natively out-of-the-box.
 
 ### Simulator Architecture Flow
 
@@ -37,10 +41,10 @@ graph TD
 The environment is designed to be highly plug-and-play. 
 
 ```python
-from marl_cyborg.environment.parallel_env import ParallelMarlCyborg
+from netforge_rl.environment.parallel_env import NetForgeRLEnv
 
 # Instantiate the native PettingZoo environment
-env = ParallelMarlCyborg(scenario_config={})
+env = NetForgeRLEnv(scenario_config={})
 
 # Reset to get parallel Gymnasium boxes
 observations, infos = env.reset()
@@ -53,50 +57,38 @@ print("Blue Box:", observations["Blue"])
 
 The primary reason for this fork is extensibility. Want to add an *ARP Poisoning* attack? 
 
-Simply inherit the `BaseAction` inside `marl_cyborg/actions/network/arp_poison.py`, write how it modifies the theoretical `ActionEffect`, and the engine natively calculates the physics resolution. See `marl_cyborg.actions.network.ip_fragmentation.IPFragmentationAction` for a physical example of this structural implementation.
+Simply inherit the `BaseAction` inside `netforge_rl/actions/network/arp_poison.py`, write how it modifies the theoretical `ActionEffect`, and the engine natively calculates the physics resolution. See `netforge_rl.actions.network.ip_fragmentation.IPFragmentationAction` for a physical example of this structural implementation.
 
 ## License & Accreditation
 This project is built upon the foundational work provided by the original CybORG contributors (CyberSecurityCRC / DSTG). The core internal simulator physics remain preserved, while the outward translation layers, action hierarchy, and Multi-Agent APIs have been entirely redesigned by Igor Jankowski.
 
 ## Repository Structure
 
-- `marl_cyborg/`: Core simulation environment
+- `netforge_rl/`: Core simulation environment
   - `actions/`: Contains definitions for all `BaseAction` implementations.
-    - `red_actions.py`: Red team offensive actions.
-    - `blue_actions.py`: Blue team defensive actions.
-  - `core/`: State, Observation, and Action abstract base classes.
+  - `agents/`: Contains specialized algorithmic actors like `GreenAgent` (Background Noise simulation).
+  - `core/`: State, Observation, and Action abstract base classes enforcing physical constraints.
   - `environment/`:
-    - `parallel_env.py`: The primary PettingZoo MARL environment.
+    - `parallel_env.py`: The primary asynchronous PettingZoo MARL environment.
     - `pcap_synthesizer.py`: Generates synthetic offline `.pcap` network traffic mappings.
 - `train_curriculum.py`: Example RL training script.
 - `test_physics.py`: Physics unit tests.
 
 ## Available Actions
 
-All actions are natively available to the RL models through the environment's discrete action space (`Discrete(256)`). The engine dynamically scales and maps these 11 actions per team against all available network IPs.
+All actions are natively available to the RL models through the environment's `MultiDiscrete` action space mapped seamlessly via PyTorch Logit structures.
 
 ### Red Team (Offensive)
-1. **NetworkScan**: Scans a target subnet for active IP addresses.
-2. **DiscoverRemoteSystems**: Performs a Ping Sweep to pinpoint active hosts.
-3. **DiscoverNetworkServices**: Port scans a host to enumerate running services.
-4. **ExploitRemoteService**: Exploits a vulnerability on a target IP to gain User privileges.
-5. **PrivilegeEscalate**: Escalates from User to Root access.
-6. **Impact**: Destroys/encrypts data on a compromised host (Ransomware/Wiper).
-7. **ExploitBlueKeep**: Exploits RDP (CVE-2019-0708) on Port 3389.
-8. **ExploitEternalBlue**: Exploits SMB (MS17-010) on Port 445.
-9. **ExploitHTTP_RFI**: Remote File Inclusion exploit targeting Port 80.
-10. **JuicyPotato**: Local privilege escalation via DCOM (Windows).
-11. **V4L2KernelExploit**: Local privilege escalation via Video4Linux kernel vulns (Linux).
+1. **NetworkScan / DiscoverRemoteSystems / DiscoverNetworkServices**: Passive/Active reconnaissance probing ports & ping sweeps.
+2. **SpearPhishing**: Bypasses corporate structures directly exploiting human error factors inside user networks.
+3. **ExploitRemoteService / ExploitEternalBlue...**: Gain user privileges weaponizing CVEs based on specific OS versions and open Ports.
+4. **PrivilegeEscalate**: Pivot from constrained user constraints to `Root`/`System`.
+5. **Impact**: Ransomware execution mapping standard IT failure metrics.
+6. **OverloadPLC (Kinetic)**: Weaponizes thermodynamics on compromised OT Networks forcing episode kinetic destruction sequences.
 
 ### Blue Team (Defensive)
-1. **IsolateHost**: Disconnects a host completely from the network.
-2. **RestoreHost**: Brings an isolated host back online from a clean snapshot.
-3. **Monitor**: Actively monitors traffic on a specific subnet or host for anomalies.
-4. **Analyze**: Deep scans a specific host for malware signatures or unauthorized user activity.
-5. **DeployDecoy**: Deploys a generic fake service (Apache/Tomcat/Femitter) to bait attackers.
-6. **Remove**: Removes unauthorized user privileges.
-7. **RestoreFromBackup**: Purges an infected host and restores it to a clean baseline from a backup.
-8. **DecoyApache**: Deploys a fake Apache web server (Port 80) honeypot.
-9. **DecoySSHD**: Deploys a fake SSH daemon (Port 22) honeypot.
-10. **DecoyTomcat**: Deploys a fake Tomcat server (Port 8080) honeypot.
-11. **Misinform**: Injects false host telemetry or alters logging to feed Red agents fake data.
+1. **IsolateHost / RestoreHost**: Logical quarantining of suspected nodes (Incurs heavily tracked SLA Business downtime).
+2. **Monitor / Analyze**: Asynchronous deep network/host scans bypassing standard physical delays.
+3. **SecurityAwarenessTraining**: Burns financial budget mathematically slashing organic `human_vulnerability_scores` defending against phish payloads.
+4. **DeployHoneytoken (Active Deception)**: Secretly seeds RAM-based tokens triggering massive unevadable 0-delay severity 10 SIEM alerts when parsed by automated Red lateral mapping capabilities.
+5. **DecoyApache / DecoySSHD / DeployDecoy**: Deploys visible port-80/22 traps binding attacker compute resources across dead execution loops.
diff --git a/changelog.md b/changelog.md
@@ -1,11 +1,11 @@
 # Changelog
 
-All notable changes to the `marl_cyborg` project will be documented in this file.
+All notable changes to the `netforge_rl` project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
 ## [3.0.0] - 2026-02-28
 ### Added
-- **PettingZoo API Core Integration**: Created `marl_cyborg/environment/parallel_env.py` substituting the legacy wrapper paradigm with `pettingzoo.ParallelEnv`, explicitly allowing concurrent multi-agent action steps.
+- **PettingZoo API Core Integration**: Created `netforge_rl/environment/parallel_env.py` substituting the legacy wrapper paradigm with `pettingzoo.ParallelEnv`, explicitly allowing concurrent multi-agent action steps.
 - **Gymnasium Box Compatibility**: All spaces natively map to `gymnasium.spaces` APIs instead of arbitrary nested classes.
 - **`BaseAction` / `BaseObservation` Abstract Hierarchy**: Abstracted action mutation. Cyber attacks no longer edit the state directly, but rather return a theoretical JSON impact via `ActionEffect` allowing the environment to resolve simultaneity conflicts natively.
 - **Python 3.12 Support (Native)**: Enforced via the new `pyproject.toml` definition.

diff --git a/marl_cyborg/core/action.py b/marl_cyborg/core/action.py
diff --git a/marl_cyborg/environment/__init__.py b/marl_cyborg/environment/__init__.py