Feat/async ticks and networking#3
Merged
Merged
Conversation
…asking, OT/SCADA PLC nodes, Business Downtime economics, and dynamic procedural network padding.
Added OverloadPLC termination rewards, DMZ SpearPhishing bypass, SecurityAwareness mitigation logic, and RAM-seeded Honeytoken Active Deception.
Completely purged legacy MARL configurations. Registered procedurally-generated topologies, Dictionary POMDP observations, and ConflictResolution physics engines mapped securely for Ray executions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Core Compute & Scalability (RL Fundamentals)
To train agents at an enterprise or research scale, the environment must be capable of generating millions of steps per second.
DictorMultiDiscretespaces. The agent makes sequential decisions within a single tick:Category (e.g., Exploit) -> Target (IP Mask) -> Payload (CVE). This drastically reduces the combinatorial explosion of the search space.env.reset()call dynamically generates a new graph-based topology (viaNetworkX), adhering to logical business architectures (Internet -> DMZ -> Intranet -> Secure Zone). This guarantees Zero-Shot Transfer capabilities and prevents overfitting to a static map.2. Semantic & Operational Realism (Hyper-Realism)
Minimizing the Sim-to-Real gap to ensure agent strategies are viable in actual corporate networks.
ActionRegistrypattern dictates that every action class contains strict metadata (e.g.,tactics: ["TA0008"]). This enables the generation of human-readable "Playbooks" once the agent is trained.3. Game Theory & Research Infrastructure
Training cannot rely on static heuristic bots. It requires a co-evolutionary ecosystem.
4. "Beyond SOTA" Innovations (Exclusive to NetForge_RL)
Features designed to outclass current academic and commercial simulators:
A. Cloud-Native & Ephemeral Resources (K8s):
Current SOTA environments simulate static bare-metal servers.
NetForge_RLwill introduce "Ephemerality." Nodes (representing Kubernetes Pods) can be automatically destroyed and spun up by an Auto-Scaler every few dozen ticks. The Red agent must learn to infect base images (Supply Chain) or execute Container Breakouts before their foothold evaporates.B. Active Deception (Honeypots & Honeytokens):
The Blue agent gains actions to inject fake credentials into RAM or spin up decoy services. When the Red agent ingests this data, its internal observation vector is "poisoned," forcing it to waste ticks attacking void targets while generating critical-priority alerts.
C. Zero-Trust Architecture (ZTA) Simulation:
Moving beyond perimeter firewalls, the environment simulates continuous identity verification. Lateral movement success depends on a dynamic "Trust Score" attached to the session token, which degrades over time or upon anomalous behavior.
D. LLM-Driven SIEM Log Generation:
Instead of returning hardcoded strings,
NetForge_RLcan optionally pipe action vectors through a lightweight local Large Language Model to generate highly realistic, unstructured system event logs (e.g., Sysmon data), forcing the Blue agent's pipeline to process noisy, real-world text data.Networks are operated by humans. The environment introduces simulated "User Nodes" (NPCs) that generate background traffic and possess a stochastic "Vulnerability Score". The Red agent can execute
SpearPhishingorWateringHoleattacks. The Blue agent can counter withSecurityAwarenessTraining, which temporarily reduces the users' susceptibility but costs operational budget. This forces agents to account for human error, not just software bugs.F. Cyber-Physical Convergence (ICS/OT & SCADA Segments):
Moving beyond data exfiltration, the environment includes Operational Technology (OT) subnets representing physical infrastructure (e.g., PLCs, cooling systems, power grids). Compromising these nodes manipulates continuous physical state variables (e.g., temperature, pressure). This allows research into catastrophic "Kinetic Impact" scenarios, where the reward function shifts from digital access to physical process disruption.
G. Attack Economics & Asymmetric Resource Budgets:
Actions are no longer "free" outside of tick consumption. Both agents operate under strict economic constraints:
IsolateHostorDropSubnetRouteaction actively penalizes the Blue agent's reward function by simulating lost business revenue (Business Downtime). This forces Blue to prioritize surgical remediation over blanket network shutdowns.H. Dynamic SOAR & YARA Synthesis:
The Blue agent's action space is elevated from simple "Block IP" commands to dynamic rule generation. The defender can synthesize and deploy programmatic signatures (e.g., simplified YARA or Snort rules). The environment's physics engine dynamically evaluates the Red agent's subsequent payloads against these newly deployed regex/signature structures, enabling true automated incident response (SOAR) simulation.