diff --git a/.github/skills/dev/planning/create-refactor-plan/SKILL.md b/.github/skills/dev/planning/create-refactor-plan/SKILL.md new file mode 100644 index 000000000..b5f783d14 --- /dev/null +++ b/.github/skills/dev/planning/create-refactor-plan/SKILL.md @@ -0,0 +1,174 @@ +--- +name: create-refactor-plan +description: Guide for creating refactor plans in the torrust-tracker project. Covers identifying quality gaps, decomposing them into trackable items ordered by impact vs effort, writing the plan document, and committing it. Use when planning improvements to readability, testability, maintainability, modularity, or documentation quality. Triggers on "create refactor plan", "refactor plan", "plan refactor", "post-implementation improvements", "code quality plan", or "technical debt plan". +metadata: + author: torrust + version: "1.0" + semantic-links: + related-artifacts: + - docs/templates/REFACTOR-PLAN.md +--- + +# Creating Refactor Plans + +## When to Write a Refactor Plan + +Write a refactor plan when: + +- A completed implementation has known quality gaps that are not blocking but worth tracking. +- A code review, post-implementation audit, or routine quality check identifies improvements + across multiple dimensions (readability, testability, maintainability, modularity, docs). +- The improvements are too numerous or varied to address in a single commit but collectively + deserve a structured approach. + +Do **not** write a refactor plan for: + +- A single trivial fix — just fix it in place. +- Bug fixes — those belong in issue specs (`docs/templates/ISSUE.md`). +- Architectural decisions — those belong in ADRs (`docs/templates/ADR.md`). + +## Workflow Overview + +1. **Identify quality gaps** by auditing the code, spec, and tests. +2. **Decompose** gaps into discrete, independently completable items. +3. **Order** items by impact vs effort (highest impact / lowest effort first). +4. **Draft the plan** using the template. +5. **Run linters** and fix any issues. +6. **Commit** the plan. +7. **Implement** items one at a time, ticking checkboxes as each is done. +8. **Revisit** the plan after implementation to evaluate whether the template and skill + need improvements. + +## Step-by-Step Process + +### Step 1: Identify and Categorize Quality Gaps + +Review the following dimensions systematically: + +| Dimension | Questions to Ask | +| --------------- | ----------------------------------------------------------------------------------------------------- | +| Correctness | Are there edge cases not tested? Does documentation match actual behaviour? | +| Readability | Is intent clear at a glance? Are names self-explanatory? Are surprising choices explained? | +| Testability | Can behaviour be verified without spawning a process? Are unit and integration paths both covered? | +| Maintainability | Are concerns separated? Is any function too long or doing too many things? | +| Modularity | Are abstractions reusable? Are conversions done in idiomatic places (e.g. `From` impls)? | +| Documentation | Are public APIs documented? Are non-obvious invariants or contract details captured in spec and code? | + +### Step 2: Write Each Item + +Each item in the plan must contain: + +- **Problem**: what is wrong and why it matters — be specific (name files, functions, line ranges). +- **Files**: the files affected. +- **Change**: what exactly changes — prefer concrete before/after examples over vague descriptions. + +Use the effort and impact labels consistently: + +| Impact | Meaning | +| ------ | --------------------------------------------------------- | +| High | Correctness, observability, or user-facing contract issue | +| Medium | Developer experience, maintainability, clarity | +| Low | Nice-to-have polish or future-proofing | + +| Effort | Meaning | +| ------- | --------------------------------------------------- | +| Trivial | One-liner or wording change, no logic involved | +| Low | Small, self-contained code or doc change (< 1 hour) | +| Medium | Moderate refactor or new abstraction (1–4 hours) | +| High | Significant new code, e.g. mock server (> 4 hours) | + +### Step 3: Order Items + +Sort items in the plan and in the execution table by: + +1. Highest impact first. +2. Lowest effort first within the same impact band. + +This ensures the most valuable, cheapest improvements are visible and tackled first. + +### Step 4: Create the Plan File + +Plans follow the same `drafts/` → `open/` → `closed/` lifecycle as issue specs. + +```bash +touch docs/refactor-plans/drafts/{short-description}.md +``` + +Use the template at [docs/templates/REFACTOR-PLAN.md](../../../../docs/templates/REFACTOR-PLAN.md). + +Naming convention: `{related-artifact-short-description}.md` + +Example: `1178-monitor-udp-post-implementation-improvements.md` + +Each item heading uses a checkbox and an impact/effort label: + +```markdown +### 1. [ ] {Title} [HIGH impact / TRIVIAL effort] +``` + +The execution table also has a `Status` column with `[ ]`: + +```markdown +| 1 | [ ] | {Item} | High | Trivial | +``` + +To mark an item done, flip `[ ]` → `[x]` in **both** the heading and the table row. + +### Step 5: Validate and Commit + +Move the plan from `drafts/` to `open/` when implementation starts: + +```bash +git mv docs/refactor-plans/drafts/{filename}.md docs/refactor-plans/open/{filename}.md +``` + +```bash +linter all # Must pass + +git add docs/refactor-plans/ +git commit -S -m "docs({scope}): add refactor plan for {description}" +``` + +### Step 6: Implement and Track Progress + +Work through items in order. After completing each item: + +1. Flip `[ ]` → `[x]` in the item heading. +2. Flip `[ ]` → `[x]` in the execution table row. +3. Run `linter all` and fix any new issues. +4. Commit the implementation and the updated plan together. + +When all items are done, move the plan to `closed/`: + +```bash +git mv docs/refactor-plans/open/{filename}.md docs/refactor-plans/closed/{filename}.md +git commit -S -m "docs({scope}): close refactor plan for {description}" +``` + +### Step 7: Revisit the Template and Skill + +After implementing all items, evaluate: + +- Did the template structure make items easy to write and track? +- Were the impact/effort labels consistently interpreted? +- Is anything missing that would have made the plan more useful? + +Update `docs/templates/REFACTOR-PLAN.md` and this skill file if improvements are identified. + +## Naming Convention + +File name format: `{related-artifact-short-description}.md` + +| Lifecycle stage | Folder | +| --------------- | ----------------------------- | +| Being written | `docs/refactor-plans/drafts/` | +| In progress | `docs/refactor-plans/open/` | +| All done | `docs/refactor-plans/closed/` | + +## Relationship to Other Artifacts + +| Artifact | When to Use Instead | +| ------------- | ----------------------------------------------------------------- | +| Issue spec | When the improvement is a bug fix or new feature | +| ADR | When the improvement requires documenting an architectural choice | +| Refactor plan | When improvements are quality gaps with no new functionality | diff --git a/console/tracker-client/src/console/clients/checker/app.rs b/console/tracker-client/src/console/clients/checker/app.rs index b3eb2e6ca..60ddd0bb3 100644 --- a/console/tracker-client/src/console/clients/checker/app.rs +++ b/console/tracker-client/src/console/clients/checker/app.rs @@ -57,20 +57,28 @@ //! } //! ``` use std::path::PathBuf; +use std::str::FromStr; use std::sync::Arc; +use std::time::Duration; -use clap::Parser; +use bittorrent_primitives::info_hash::InfoHash as TorrustInfoHash; +use clap::{Parser, Subcommand}; use tracing::level_filters::LevelFilter; +use url::Url; use super::config::Configuration; use super::console::Console; use super::error::{AppError, ConfigSource}; -use super::service::{CheckResult, Service}; +use super::monitor::udp::{run_monitor, MonitorUdpConfig, DEFAULT_INFO_HASH}; +use super::service::Service; use crate::console::clients::checker::config::parse_from_json; #[derive(Parser, Debug)] #[clap(author, version, about, long_about = None)] struct Args { + #[command(subcommand)] + command: Option, + /// Path to the JSON configuration file. #[clap(short, long, env = "TORRUST_CHECKER_CONFIG_PATH")] config_path: Option, @@ -80,15 +88,54 @@ struct Args { config_content: Option, } +#[derive(Subcommand, Debug)] +enum Command { + /// Run periodic monitor checks. + Monitor { + #[command(subcommand)] + protocol: MonitorProtocol, + }, +} + +#[derive(Subcommand, Debug)] +enum MonitorProtocol { + /// Monitor a UDP tracker using announce probes. + Udp { + /// UDP tracker URL. + #[arg(long, value_parser = parse_udp_url)] + url: Url, + + /// Seconds between probes. + #[arg(long, default_value_t = 300, value_parser = clap::value_parser!(u64).range(1..))] + interval: u64, + + /// Probe timeout in seconds. + #[arg(long, default_value_t = 10, value_parser = clap::value_parser!(u64).range(1..))] + timeout: u64, + + /// Total monitor runtime in seconds. + #[arg(long, default_value_t = 86_400, value_parser = clap::value_parser!(u64).range(1..))] + duration: u64, + + /// Info-hash used in announce requests. + #[arg(long, default_value = DEFAULT_INFO_HASH, value_parser = parse_info_hash)] + info_hash: TorrustInfoHash, + }, +} + /// # Errors /// /// Will return an `AppError::InvalidConfig` if the configuration cannot be parsed, /// or an `AppError::Runtime` if the checks fail to execute. -pub async fn run() -> Result, AppError> { +pub async fn run() -> Result<(), AppError> { tracing_stdout_init(LevelFilter::INFO); let args = Args::parse(); + if let Some(command) = args.command { + return run_command(command).await; + } + let config = setup_config(args)?; let console_printer = Console {}; @@ -98,7 +145,11 @@ pub async fn run() -> Result, AppError> { console: console_printer, }; - service.run_checks().await.map_err(|e| AppError::Runtime(e.to_string())) + service + .run_checks() + .await + .map_err(|e| AppError::Runtime(e.to_string())) + .map(|_results| ()) } fn tracing_stdout_init(filter: LevelFilter) { @@ -131,3 +182,48 @@ fn load_config_from_file(path: &PathBuf) -> Result { message: e.to_string(), }) } + +async fn run_command(command: Command) -> Result<(), AppError> { + match command { + Command::Monitor { + protocol: + MonitorProtocol::Udp { + url, + interval, + timeout, + duration, + info_hash, + }, + } => { + let config = MonitorUdpConfig { + url, + interval: Duration::from_secs(interval), + timeout: Duration::from_secs(timeout), + duration: Duration::from_secs(duration), + info_hash, + }; + + run_monitor(config) + .await + .map_err(|e| AppError::Runtime(format!("udp monitor failed: {e}"))) + } + } +} + +fn parse_udp_url(url_str: &str) -> Result { + let url = Url::parse(url_str).map_err(|e| format!("invalid URL: {e}"))?; + + if url.scheme() != "udp" { + return Err("URL scheme must be udp".to_string()); + } + + if url.port().is_none() { + return Err("URL must include an explicit port".to_string()); + } + + Ok(url) +} + +fn parse_info_hash(info_hash_str: &str) -> Result { + TorrustInfoHash::from_str(info_hash_str).map_err(|e| format!("failed to parse info-hash `{info_hash_str}`: {e:?}")) +} diff --git a/console/tracker-client/src/console/clients/checker/mod.rs b/console/tracker-client/src/console/clients/checker/mod.rs index 77924db73..351b90c30 100644 --- a/console/tracker-client/src/console/clients/checker/mod.rs +++ b/console/tracker-client/src/console/clients/checker/mod.rs @@ -4,5 +4,6 @@ pub mod config; pub mod console; pub mod error; pub mod logger; +pub mod monitor; pub mod printer; pub mod service; diff --git a/console/tracker-client/src/console/clients/checker/monitor/mod.rs b/console/tracker-client/src/console/clients/checker/monitor/mod.rs new file mode 100644 index 000000000..7e5aaa137 --- /dev/null +++ b/console/tracker-client/src/console/clients/checker/monitor/mod.rs @@ -0,0 +1 @@ +pub mod udp; diff --git a/console/tracker-client/src/console/clients/checker/monitor/udp.rs b/console/tracker-client/src/console/clients/checker/monitor/udp.rs new file mode 100644 index 000000000..1498dde90 --- /dev/null +++ b/console/tracker-client/src/console/clients/checker/monitor/udp.rs @@ -0,0 +1,388 @@ +use std::net::SocketAddr; +use std::time::{Duration, Instant}; + +use bittorrent_primitives::info_hash::InfoHash as TorrustInfoHash; +use bittorrent_tracker_client::udp; +use bittorrent_udp_tracker_protocol::TransactionId; +use reqwest::Url; +use serde::Serialize; + +use crate::console::clients::udp::checker::{AnnounceParams, Client}; +use crate::console::clients::udp::Error as UdpError; + +pub const DEFAULT_INFO_HASH: &str = "9c38422213e30bff212b30c360d26f9a02136422"; // DevSkim: ignore DS173237 + +#[derive(Debug, Clone)] +pub struct MonitorUdpConfig { + pub url: Url, + pub interval: Duration, + pub timeout: Duration, + pub duration: Duration, + pub info_hash: TorrustInfoHash, +} + +#[derive(Debug, Clone, Default)] +struct Stats { + total: u64, + timeouts: u64, + successes: u64, + min_ms: Option, + max_ms: Option, + sum_ms: u64, + last_ms: Option, +} + +impl Stats { + fn record_success(&mut self, elapsed_ms: u64) { + self.total += 1; + self.successes += 1; + self.sum_ms += elapsed_ms; + self.min_ms = Some(self.min_ms.map_or(elapsed_ms, |current| current.min(elapsed_ms))); + self.max_ms = Some(self.max_ms.map_or(elapsed_ms, |current| current.max(elapsed_ms))); + self.last_ms = Some(elapsed_ms); + } + + fn record_timeout(&mut self) { + self.total += 1; + self.timeouts += 1; + self.last_ms = None; + } + + fn record_error(&mut self) { + self.total += 1; + self.last_ms = None; + } + + fn average_ms(&self) -> Option { + self.sum_ms.checked_div(self.successes) + } + + /// Returns the percentage of probes that timed out, rounded down to the nearest integer. + /// + /// The denominator is `total = successes + timeouts + errors`. Error probes (those that + /// fail for reasons other than a network timeout) count toward `total` without being + /// counted as timeouts, so they reduce `timeout_percent` without being successes. For + /// example, three probes where one succeeds, one times out, and one errors gives + /// `timeout_percent = 1 × 100 / 3 = 33`, not `50`. + fn timeout_percent(&self) -> u64 { + self.timeouts.saturating_mul(100).checked_div(self.total).unwrap_or(0) + } +} + +#[derive(Serialize)] +struct ProbeEvent { + event: &'static str, + sequence: u64, + url: String, + status: &'static str, + elapsed_ms: Option, + #[serde(skip_serializing_if = "Option::is_none")] + message: Option, +} + +#[derive(Serialize)] +struct MonitorResult { + udp_trackers: Vec, +} + +#[derive(Serialize)] +struct UdpTrackerResult { + url: String, + status: MonitorStatus, +} + +#[derive(Serialize)] +struct MonitorStatus { + code: &'static str, + message: String, + stats: MonitorStats, +} + +#[derive(Serialize)] +struct MonitorStats { + total: u64, + timeouts: u64, + timeout_percent: u64, + min_ms: Option, + max_ms: Option, + average_ms: Option, + last_ms: Option, +} + +impl From<&Stats> for MonitorStats { + fn from(stats: &Stats) -> Self { + Self { + total: stats.total, + timeouts: stats.timeouts, + timeout_percent: stats.timeout_percent(), + min_ms: stats.min_ms, + max_ms: stats.max_ms, + average_ms: stats.average_ms(), + last_ms: stats.last_ms, + } + } +} + +enum ProbeOutcome { + Ok { elapsed_ms: u64 }, + Timeout, + Error { message: String }, +} + +/// # Errors +/// +/// Returns an error if URL resolution or JSON serialization fails. +pub async fn run_monitor(config: MonitorUdpConfig) -> Result<(), String> { + let url = config.url.to_string(); + let (stats, interrupted) = run_probe_loop(&config).await?; + + let message = if interrupted { + "monitor interrupted" + } else { + "monitor completed" + }; + + let output = MonitorResult { + udp_trackers: vec![UdpTrackerResult { + url, + status: MonitorStatus { + code: "ok", + message: message.to_string(), + stats: MonitorStats::from(&stats), + }, + }], + }; + + let final_json = serde_json::to_string(&output).map_err(|e| format!("final JSON serialization failed: {e}"))?; + println!("{final_json}"); + + Ok(()) +} + +async fn run_probe_loop(config: &MonitorUdpConfig) -> Result<(Stats, bool), String> { + let started_at = Instant::now(); + let url = config.url.to_string(); + let mut interrupted = false; + let mut stats = Stats::default(); + let mut sequence: u64 = 0; + + loop { + // Exit before starting a new probe if the time budget is already exhausted. + if started_at.elapsed() >= config.duration { + break; + } + + sequence += 1; + + tokio::select! { + _ = tokio::signal::ctrl_c() => { + interrupted = true; + break; + } + probe_result = run_probe(config) => { + match probe_result { + ProbeOutcome::Ok { elapsed_ms } => { + stats.record_success(elapsed_ms); + emit_probe_event(&ProbeEvent { + event: "probe", + sequence, + url: url.clone(), + status: "ok", + elapsed_ms: Some(elapsed_ms), + message: None, + })?; + } + ProbeOutcome::Timeout => { + stats.record_timeout(); + emit_probe_event(&ProbeEvent { + event: "probe", + sequence, + url: url.clone(), + status: "timeout", + elapsed_ms: None, + message: None, + })?; + } + ProbeOutcome::Error { message } => { + stats.record_error(); + emit_probe_event(&ProbeEvent { + event: "probe", + sequence, + url: url.clone(), + status: "error", + elapsed_ms: None, + message: Some(message), + })?; + } + } + } + } + + // Exit before sleeping if the duration elapsed during the probe itself, + // so we never sleep after the last probe. + if started_at.elapsed() >= config.duration { + break; + } + + let remaining = config.duration.saturating_sub(started_at.elapsed()); + let sleep_duration = config.interval.min(remaining); + + tokio::select! { + _ = tokio::signal::ctrl_c() => { + interrupted = true; + break; + } + () = tokio::time::sleep(sleep_duration) => {} + } + } + + Ok((stats, interrupted)) +} + +fn emit_probe_event(event: &ProbeEvent) -> Result<(), String> { + let json = serde_json::to_string(event).map_err(|e| format!("probe JSON serialization failed: {e}"))?; + eprintln!("{json}"); + Ok(()) +} + +async fn run_probe(config: &MonitorUdpConfig) -> ProbeOutcome { + let remote_addr = match resolve_socket_addr(&config.url) { + Ok(remote_addr) => remote_addr, + Err(message) => return ProbeOutcome::Error { message }, + }; + + // Measure network probe time only (connect + announce), excluding DNS resolution. + let probe_started = Instant::now(); + + let client = match Client::new(remote_addr, config.timeout).await { + Ok(client) => client, + Err(err) => { + if is_timeout_error(&err) { + return ProbeOutcome::Timeout; + } + return ProbeOutcome::Error { + message: err.to_string(), + }; + } + }; + + let transaction_id = TransactionId::new(1); + + let connection_id = match client.send_connection_request(transaction_id).await { + Ok(connection_id) => connection_id, + Err(err) => { + if is_timeout_error(&err) { + return ProbeOutcome::Timeout; + } + return ProbeOutcome::Error { + message: err.to_string(), + }; + } + }; + + match client + .send_announce_request(transaction_id, connection_id, config.info_hash, &AnnounceParams::default()) + .await + { + Ok(_response) => { + // `as_millis()` returns u128; overflow into u64 would require a single probe + // to run for over 584 million years, which cannot happen in practice. + // `u64::MAX` is therefore an unreachable sentinel. + let elapsed_ms = u64::try_from(probe_started.elapsed().as_millis()).unwrap_or(u64::MAX); + ProbeOutcome::Ok { elapsed_ms } + } + Err(err) => { + if is_timeout_error(&err) { + ProbeOutcome::Timeout + } else { + ProbeOutcome::Error { + message: err.to_string(), + } + } + } + } +} + +fn resolve_socket_addr(url: &Url) -> Result { + let socket_addrs = url + .socket_addrs(|| None) + .map_err(|e| format!("failed to resolve tracker URL `{url}`: {e}"))?; + + socket_addrs + .first() + .copied() + .ok_or_else(|| format!("no socket addresses resolved for tracker URL `{url}`")) +} + +fn is_timeout_udp_client_error(err: &udp::Error) -> bool { + matches!( + err, + udp::Error::TimeoutWhileBindingToSocket { .. } + | udp::Error::TimeoutWhileConnectingToRemote { .. } + | udp::Error::TimeoutWaitForWriteableSocket + | udp::Error::TimeoutWhileSendingData { .. } + | udp::Error::TimeoutWaitForReadableSocket + | udp::Error::TimeoutWhileReceivingData + ) +} + +fn is_timeout_error(err: &UdpError) -> bool { + match err { + UdpError::UnableToBindAndConnect { err, .. } + | UdpError::UnableToSendConnectionRequest { err } + | UdpError::UnableToReceiveConnectResponse { err } + | UdpError::UnableToSendAnnounceRequest { err } + | UdpError::UnableToReceiveAnnounceResponse { err } + | UdpError::UnableToSendScrapeRequest { err } + | UdpError::UnableToReceiveScrapeResponse { err } + | UdpError::UnableToReceiveResponse { err } + | UdpError::UnableToGetLocalAddr { err } => is_timeout_udp_client_error(err), + UdpError::UnexpectedConnectionResponse { .. } => false, + } +} + +#[cfg(test)] +mod tests { + use super::Stats; + + #[test] + fn it_should_return_none_average_when_there_are_no_successful_probes() { + let mut stats = Stats::default(); + stats.record_timeout(); + + assert_eq!(stats.average_ms(), None); + } + + #[test] + fn it_should_compute_integer_average_for_successful_probes() { + let mut stats = Stats::default(); + stats.record_success(100); + stats.record_success(101); + + assert_eq!(stats.average_ms(), Some(100)); + } + + #[test] + fn it_should_compute_timeout_percent_as_integer() { + let mut stats = Stats::default(); + stats.record_success(100); + stats.record_timeout(); + stats.record_timeout(); + + assert_eq!(stats.timeout_percent(), 66); + } + + #[test] + fn it_should_return_all_null_latency_fields_when_every_probe_times_out() { + let mut stats = Stats::default(); + stats.record_timeout(); + stats.record_timeout(); + stats.record_timeout(); + + assert_eq!(stats.min_ms, None); + assert_eq!(stats.max_ms, None); + assert_eq!(stats.average_ms(), None); + assert_eq!(stats.last_ms, None); + assert_eq!(stats.timeout_percent(), 100); + } +} diff --git a/console/tracker-client/tests/tracker_checker.rs b/console/tracker-client/tests/tracker_checker.rs index 50287dcb8..7ded4ea8a 100644 --- a/console/tracker-client/tests/tracker_checker.rs +++ b/console/tracker-client/tests/tracker_checker.rs @@ -47,147 +47,8 @@ fn resolve_tracker_checker_binary() -> std::path::PathBuf { ); } -mod invalid_configuration_from_env_var { - use super::tracker_checker_bin; +#[path = "tracker_checker/configuration.rs"] +mod configuration; - #[test] - fn it_should_exit_with_code_2_on_invalid_json() { - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG", r#"{"invalid json":"#) - .output() - .expect("Failed to run tracker_checker"); - - assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for invalid config"); - } - - #[test] - fn it_should_write_json_error_to_stderr_on_invalid_json() { - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG", r#"{"invalid json":"#) - .output() - .expect("Failed to run tracker_checker"); - - let stderr = String::from_utf8_lossy(&output.stderr); - assert!( - stderr.contains(r#""kind":"invalid_configuration""#), - "Expected JSON error envelope on stderr, got: {stderr}" - ); - assert!( - stderr.contains(r#""source":"TORRUST_CHECKER_CONFIG""#), - "Expected source field to identify env var, got: {stderr}" - ); - } - - #[test] - fn it_should_include_parse_detail_in_stderr_error_message_on_trailing_comma() { - let config = r#"{ - "udp_trackers": [], - "http_trackers": [ - "http://127.0.0.1:7070", - ], - "health_checks": [] - }"#; - - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG", config) - .output() - .expect("Failed to run tracker_checker"); - - let stderr = String::from_utf8_lossy(&output.stderr); - assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for invalid config"); - assert!( - stderr.contains("trailing comma"), - "Expected 'trailing comma' detail in stderr, got: {stderr}" - ); - } - - #[test] - fn it_should_produce_no_output_on_stdout_on_config_error() { - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG", r#"{"invalid json":"#) - .output() - .expect("Failed to run tracker_checker"); - - // Per the I/O contract, stdout is for successful results only - let stdout = String::from_utf8_lossy(&output.stdout); - assert!(stdout.is_empty(), "Expected no stdout on config error, got: {stdout}"); - } -} - -mod invalid_configuration_from_file { - use std::io::Write; - - use super::tracker_checker_bin; - - #[test] - fn it_should_exit_with_code_2_on_invalid_json_in_file() { - let mut tmp = tempfile::NamedTempFile::new().expect("Failed to create temp file"); - write!(tmp, r#"{{"invalid json":"#).unwrap(); - - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG_PATH", tmp.path()) - .output() - .expect("Failed to run tracker_checker"); - - assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for invalid config file"); - } - - #[test] - fn it_should_include_file_path_in_stderr_source_field() { - let mut tmp = tempfile::NamedTempFile::new().expect("Failed to create temp file"); - write!(tmp, r#"{{"invalid json":"#).unwrap(); - let path = tmp.path().to_string_lossy().to_string(); - - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG_PATH", &path) - .output() - .expect("Failed to run tracker_checker"); - - let stderr = String::from_utf8_lossy(&output.stderr); - assert!( - stderr.contains(&path), - "Expected file path in stderr source field, got: {stderr}" - ); - } - - #[test] - fn it_should_exit_with_code_2_when_config_file_does_not_exist() { - let output = tracker_checker_bin() - .env("TORRUST_CHECKER_CONFIG_PATH", "/nonexistent/path/config.json") - .output() - .expect("Failed to run tracker_checker"); - - assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for missing config file"); - } -} - -mod no_configuration_provided { - use super::tracker_checker_bin; - - #[test] - fn it_should_exit_with_code_2_when_no_config_is_provided() { - let output = tracker_checker_bin() - // Ensure neither env var is set - .env_remove("TORRUST_CHECKER_CONFIG") - .env_remove("TORRUST_CHECKER_CONFIG_PATH") - .output() - .expect("Failed to run tracker_checker"); - - assert_eq!(output.status.code(), Some(2), "Expected exit code 2 when no config provided"); - } - - #[test] - fn it_should_write_json_error_to_stderr_when_no_config_is_provided() { - let output = tracker_checker_bin() - .env_remove("TORRUST_CHECKER_CONFIG") - .env_remove("TORRUST_CHECKER_CONFIG_PATH") - .output() - .expect("Failed to run tracker_checker"); - - let stderr = String::from_utf8_lossy(&output.stderr); - assert!( - stderr.contains(r#""kind":"invalid_configuration""#), - "Expected JSON error envelope on stderr, got: {stderr}" - ); - } -} +#[path = "tracker_checker/monitor.rs"] +mod monitor; diff --git a/console/tracker-client/tests/tracker_checker/configuration.rs b/console/tracker-client/tests/tracker_checker/configuration.rs new file mode 100644 index 000000000..56f90fc02 --- /dev/null +++ b/console/tracker-client/tests/tracker_checker/configuration.rs @@ -0,0 +1,144 @@ +mod invalid_configuration_from_env_var { + use super::super::tracker_checker_bin; + + #[test] + fn it_should_exit_with_code_2_on_invalid_json() { + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG", r#"{"invalid json":"#) + .output() + .expect("Failed to run tracker_checker"); + + assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for invalid config"); + } + + #[test] + fn it_should_write_json_error_to_stderr_on_invalid_json() { + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG", r#"{"invalid json":"#) + .output() + .expect("Failed to run tracker_checker"); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains(r#""kind":"invalid_configuration""#), + "Expected JSON error envelope on stderr, got: {stderr}" + ); + assert!( + stderr.contains(r#""source":"TORRUST_CHECKER_CONFIG""#), + "Expected source field to identify env var, got: {stderr}" + ); + } + + #[test] + fn it_should_include_parse_detail_in_stderr_error_message_on_trailing_comma() { + let config = r#"{ + "udp_trackers": [], + "http_trackers": [ + "http://127.0.0.1:7070", + ], + "health_checks": [] + }"#; + + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG", config) + .output() + .expect("Failed to run tracker_checker"); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for invalid config"); + assert!( + stderr.contains("trailing comma"), + "Expected 'trailing comma' detail in stderr, got: {stderr}" + ); + } + + #[test] + fn it_should_produce_no_output_on_stdout_on_config_error() { + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG", r#"{"invalid json":"#) + .output() + .expect("Failed to run tracker_checker"); + + // Per the I/O contract, stdout is for successful results only + let stdout = String::from_utf8_lossy(&output.stdout); + assert!(stdout.is_empty(), "Expected no stdout on config error, got: {stdout}"); + } +} + +mod invalid_configuration_from_file { + use std::io::Write; + + use super::super::tracker_checker_bin; + + #[test] + fn it_should_exit_with_code_2_on_invalid_json_in_file() { + let mut tmp = tempfile::NamedTempFile::new().expect("Failed to create temp file"); + write!(tmp, r#"{{"invalid json":"#).unwrap(); + + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG_PATH", tmp.path()) + .output() + .expect("Failed to run tracker_checker"); + + assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for invalid config file"); + } + + #[test] + fn it_should_include_file_path_in_stderr_source_field() { + let mut tmp = tempfile::NamedTempFile::new().expect("Failed to create temp file"); + write!(tmp, r#"{{"invalid json":"#).unwrap(); + let path = tmp.path().to_string_lossy().to_string(); + + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG_PATH", &path) + .output() + .expect("Failed to run tracker_checker"); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains(&path), + "Expected file path in stderr source field, got: {stderr}" + ); + } + + #[test] + fn it_should_exit_with_code_2_when_config_file_does_not_exist() { + let output = tracker_checker_bin() + .env("TORRUST_CHECKER_CONFIG_PATH", "/nonexistent/path/config.json") + .output() + .expect("Failed to run tracker_checker"); + + assert_eq!(output.status.code(), Some(2), "Expected exit code 2 for missing config file"); + } +} + +mod no_configuration_provided { + use super::super::tracker_checker_bin; + + #[test] + fn it_should_exit_with_code_2_when_no_config_is_provided() { + let output = tracker_checker_bin() + // Ensure neither env var is set + .env_remove("TORRUST_CHECKER_CONFIG") + .env_remove("TORRUST_CHECKER_CONFIG_PATH") + .output() + .expect("Failed to run tracker_checker"); + + assert_eq!(output.status.code(), Some(2), "Expected exit code 2 when no config provided"); + } + + #[test] + fn it_should_write_json_error_to_stderr_when_no_config_is_provided() { + let output = tracker_checker_bin() + .env_remove("TORRUST_CHECKER_CONFIG") + .env_remove("TORRUST_CHECKER_CONFIG_PATH") + .output() + .expect("Failed to run tracker_checker"); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains(r#""kind":"invalid_configuration""#), + "Expected JSON error envelope on stderr, got: {stderr}" + ); + } +} diff --git a/console/tracker-client/tests/tracker_checker/monitor.rs b/console/tracker-client/tests/tracker_checker/monitor.rs new file mode 100644 index 000000000..959c11f16 --- /dev/null +++ b/console/tracker-client/tests/tracker_checker/monitor.rs @@ -0,0 +1,98 @@ +/// Tests for the `monitor udp` subcommand. +/// +/// # Timeout-only test environment +/// +/// The helper [`spawn_udp_sink`] binds a UDP socket that silently discards every incoming +/// packet and never sends any response. This means every probe issued by the monitor will +/// time out. The tests in this module therefore exercise: +/// +/// - JSON shape of probe events on stderr (`"status":"timeout"`) +/// - JSON shape of the final summary on stdout (null latency fields, `timeout_percent` > 0) +/// - Exit code 0 for a completed-but-all-timeout run +/// +/// They do **not** exercise the success path (a probe receiving a valid `AnnounceResponse`, +/// non-null `elapsed_ms`, populated min/max/average latency stats). A success-path +/// integration test requires a proper mock UDP tracker that speaks the `BitTorrent` UDP +/// protocol. The refactor plan item for that test has been intentionally deferred to the +/// future tracker-client repository split. +use std::net::{SocketAddr, UdpSocket}; +use std::sync::mpsc; +use std::thread; +use std::time::Duration; + +use serde_json::Value; + +use super::tracker_checker_bin; + +fn spawn_udp_sink() -> (SocketAddr, mpsc::Sender<()>, thread::JoinHandle<()>) { + let socket = UdpSocket::bind("127.0.0.1:0").expect("Failed to bind UDP sink socket"); + socket + .set_read_timeout(Some(Duration::from_millis(100))) + .expect("Failed to configure UDP sink read timeout"); + let addr = socket.local_addr().expect("Failed to get UDP sink local address"); + + let (tx, rx) = mpsc::channel::<()>(); + let join_handle = thread::spawn(move || { + let mut buffer = [0_u8; 2048]; + + loop { + if rx.try_recv().is_ok() { + break; + } + + drop(socket.recv_from(&mut buffer)); + } + }); + + (addr, tx, join_handle) +} + +#[test] +fn it_should_emit_monitor_probe_events_to_stderr_and_summary_to_stdout() { + let (addr, stop_tx, join_handle) = spawn_udp_sink(); + + let output = tracker_checker_bin() + .arg("monitor") + .arg("udp") + .arg("--url") + .arg(format!("udp://{addr}")) + .arg("--interval") + .arg("1") + .arg("--timeout") + .arg("1") + .arg("--duration") + .arg("2") + .output() + .expect("Failed to run tracker_checker monitor udp"); + + let _ = stop_tx.send(()); + assert!(join_handle.join().is_ok(), "UDP sink thread should not panic"); + + assert_eq!( + output.status.code(), + Some(0), + "Expected exit code 0 for successful monitor execution" + ); + + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains("\"event\":\"probe\""), + "Expected probe NDJSON events on stderr, got: {stderr}" + ); + + let stdout = String::from_utf8_lossy(&output.stdout); + let parsed: Value = serde_json::from_str(&stdout).expect("Expected valid JSON monitor summary on stdout"); + + assert!( + parsed["udp_trackers"].is_array(), + "Expected udp_trackers array in stdout JSON" + ); + assert_eq!(parsed["udp_trackers"][0]["url"], format!("udp://{addr}")); + assert!( + parsed["udp_trackers"][0]["status"]["stats"]["total"] + .as_u64() + .expect("Expected stats.total to be u64") + >= 1, + "Expected at least one probe" + ); +} diff --git a/docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md b/docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md index e0a2ed09c..963518829 100644 --- a/docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md +++ b/docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md @@ -7,7 +7,7 @@ github-issue: 1178 spec-path: docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md branch: 1178-tracker-checker-udp-add-monitor-uptime-command related-pr: null -last-updated-utc: 2026-05-12 08:00 +last-updated-utc: 2026-05-12 16:55 semantic-links: skill-links: - create-issue @@ -44,27 +44,27 @@ same interval, but the interval should be configurable. ## Goals -- [ ] Add a UDP uptime-monitor command to the tracker-client toolbox -- [ ] The command accepts a UDP tracker URL and optional configuration (interval, timeout, info-hash) -- [ ] On every probe the command prints one JSON object per line to stderr (NDJSON) -- [ ] At the end of execution, the command prints final statistics to stdout in JSON format -- [ ] Final statistics include: +- [x] Add a UDP uptime-monitor command to the tracker-client toolbox +- [x] The command accepts a UDP tracker URL and optional configuration (interval, timeout, info-hash) +- [x] On every probe the command prints one JSON object per line to stderr (NDJSON) +- [x] At the end of execution, the command prints final statistics to stdout in JSON format +- [x] Final statistics include: - Total probe count - Timeout count (and percentage) - Minimum response time - Maximum response time - Average response time - Last response time -- [ ] The command accepts a duration argument and exits automatically after that duration -- [ ] `Ctrl+C` is supported to stop monitoring early and still print final JSON results -- [ ] `linter all` exits with code `0` -- [ ] `cargo machete` reports no unused dependencies -- [ ] Existing tests pass +- [x] The command accepts a duration argument and exits automatically after that duration +- [x] `Ctrl+C` is supported to stop monitoring early and still print final JSON results +- [x] `linter all` exits with code `0` +- [x] `cargo machete` reports no unused dependencies +- [x] Existing tests pass ## Proposed CLI ```text -cargo run --bin tracker_checker -- monitor udp \ +cargo run -p torrust-tracker-client --bin tracker_checker -- monitor udp \ --url udp://127.0.0.1:6969 \ --interval 300 \ --timeout 10 \ @@ -81,14 +81,19 @@ cargo run --bin torrust-tracker-client -- \ --timeout 10 ``` +Note: this feature is intentionally added as a `tracker_checker` subcommand for now. A future +CLI consolidation effort may merge binaries into a single entry point (see +). + ### Options -| Option | Default | Description | -| ------------ | ------- | --------------------------------------------- | -| `--url` | — | UDP tracker URL (required) | -| `--interval` | `300` | Seconds between probes | -| `--timeout` | `10` | Seconds to wait for a response before timeout | -| `--duration` | `86400` | Total monitor runtime in seconds | +| Option | Default | Description | +| ------------- | ------------------------------------------ | --------------------------------------------- | +| `--url` | — | UDP tracker URL (required) | +| `--interval` | `300` | Seconds between probes | +| `--timeout` | `10` | Seconds to wait for a response before timeout | +| `--duration` | `86400` | Total monitor runtime in seconds | +| `--info-hash` | `9c38422213e30bff212b30c360d26f9a02136422` | Info-hash used in announce requests | ### Sample Output @@ -99,7 +104,7 @@ stderr: {"event":"probe","sequence":3,"url":"udp://127.0.0.1:6969","status":"timeout","elapsed_ms":null} stdout: -{"udp_trackers":[{"url":"udp://127.0.0.1:6969","status":{"code":"ok","message":"monitor completed","stats":{"total":3,"timeouts":1,"timeout_percent":33.3,"min_ms":98,"max_ms":122,"average_ms":110,"last_ms":null}}}]} +{"udp_trackers":[{"url":"udp://127.0.0.1:6969","status":{"code":"ok","message":"monitor completed","stats":{"total":3,"timeouts":1,"timeout_percent":33,"min_ms":98,"max_ms":122,"average_ms":110,"last_ms":null}}}]} ``` ## Implementation Plan @@ -121,7 +126,8 @@ Create a new module, e.g. - A `run_monitor` async function that loops forever (until Ctrl+C signal) - Each iteration sends a UDP `announce` request using the existing `UdpTrackerClient` -- Records `start` / `end` timestamps and computes elapsed milliseconds +- Records `start` / `end` timestamps and computes elapsed milliseconds as integer `u64` + (truncating sub-millisecond precision) - Treats no response within `--timeout` as a timeout event ### Task 3: Track statistics @@ -163,7 +169,7 @@ ran successfully. ### Task 6: Wire the new subcommand into the binary entry point -Update `console/tracker-client/src/bin/tracker_checker.rs` to dispatch to the new monitor loop +Update `console/tracker-client/src/console/clients/checker/app.rs` to dispatch to the new monitor loop when the `monitor` subcommand is selected. ## Key Files @@ -177,35 +183,96 @@ when the `monitor` subcommand is selected. ## Acceptance Criteria -- [ ] AC1: `monitor udp --url udp://127.0.0.1:6969` starts a probe loop and prints a status +- [x] AC1: `monitor udp --url udp://127.0.0.1:6969` starts a probe loop and prints a status JSON line after each probe to stderr (NDJSON) -- [ ] AC2: When monitoring ends, final aggregate statistics are printed to stdout as valid JSON -- [ ] AC3: When a probe does not receive a response within the timeout, it is recorded as - `TIMEOUT` and excluded from response-time averages -- [ ] AC4: `--duration` controls total runtime and the command exits normally when elapsed -- [ ] AC5: `Ctrl+C` stops monitoring early and still emits final JSON stats -- [ ] AC6: The `--interval` option controls the delay between probes -- [ ] AC7: `--duration` defaults to `86400` seconds when omitted -- [ ] AC8: If all probes timeout but execution is otherwise successful, exit code is `0` -- [ ] AC9: `linter all` exits with code `0` -- [ ] AC10: `cargo machete` reports no unused dependencies -- [ ] AC11: Existing tests pass +- [x] AC2: When monitoring ends, final aggregate statistics are printed to stdout as valid JSON +- [x] AC3: When a probe does not receive a response within the timeout, it is recorded as + `TIMEOUT` and excluded from response-time averages. Additionally, `last_ms` is set to + `null` when the most recent probe times out. +- [x] AC4: `--duration` controls total runtime and the command exits normally when elapsed +- [x] AC5: `Ctrl+C` stops monitoring early and still emits final JSON stats +- [x] AC6: The `--interval` option controls the delay between probes +- [x] AC7: `--duration` defaults to `86400` seconds when omitted +- [x] AC8: If all probes timeout but execution is otherwise successful, exit code is `0` +- [x] AC9: `linter all` exits with code `0` +- [x] AC10: `cargo machete` reports no unused dependencies +- [x] AC11: Existing tests pass ### Acceptance Verification -| AC ID | Status (`TODO`/`DONE`) | Evidence | -| ----- | ---------------------- | -------- | -| AC1 | TODO | | -| AC2 | TODO | | -| AC3 | TODO | | -| AC4 | TODO | | -| AC5 | TODO | | -| AC6 | TODO | | -| AC7 | TODO | | -| AC8 | TODO | | -| AC9 | TODO | | -| AC10 | TODO | | -| AC11 | TODO | | +| AC ID | Status (`TODO`/`DONE`) | Evidence | +| ----- | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| AC1 | DONE | Manual run on 2026-05-12: stderr emitted one NDJSON `probe` JSON line per probe | +| AC2 | DONE | Manual run on 2026-05-12: stdout emitted final JSON summary | +| AC3 | DONE | Integration behavior validated by monitor implementation/tests: timeout probes are tracked as `timeout` and excluded from average (`average_ms` derives from successful probes only); `last_ms` is `null` when the most recent probe timed out | +| AC4 | DONE | Manual run with `--duration 60` exited after one minute | +| AC5 | DONE | Ctrl+C support implemented via `tokio::signal::ctrl_c`; verified in code path and covered by acceptance-level implementation checks | +| AC6 | DONE | Manual run with `--interval 10` produced 6 probes across 60 seconds | +| AC7 | DONE | CLI parser default for `--duration` is `86400` | +| AC8 | DONE | Exit-code contract verified: monitor completes with process exit code `0` when app execution is successful | +| AC9 | DONE | `linter all` passed on 2026-05-12 | +| AC10 | DONE | `cargo machete` passed on 2026-05-12 | +| AC11 | DONE | `cargo test -p torrust-tracker-client --test tracker_checker` and `cargo test -p torrust-tracker-client monitor::udp` passed on 2026-05-12 | + +### Manual Verification (Official Demo Tracker — Up) + +Executed on 2026-05-12 from workspace root against `udp://udp1.torrust-tracker-demo.com:6969/announce` (live): + +```text +cargo run -p torrust-tracker-client --bin tracker_checker -- monitor udp \ + --url udp://udp1.torrust-tracker-demo.com:6969/announce \ + --interval 10 \ + --timeout 10 \ + --duration 60 +``` + +Observed output: + +```text +{"event":"probe","sequence":1,"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":"ok","elapsed_ms":208} +{"event":"probe","sequence":2,"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":"ok","elapsed_ms":140} +{"event":"probe","sequence":3,"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":"ok","elapsed_ms":138} +{"event":"probe","sequence":4,"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":"ok","elapsed_ms":131} +{"event":"probe","sequence":5,"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":"ok","elapsed_ms":145} +{"event":"probe","sequence":6,"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":"ok","elapsed_ms":141} +{"udp_trackers":[{"url":"udp://udp1.torrust-tracker-demo.com:6969/announce","status":{"code":"ok","message":"monitor completed","stats":{"total":6,"timeouts":0,"timeout_percent":0,"min_ms":131,"max_ms":208,"average_ms":150,"last_ms":141}}}]} +``` + +Notes: + +- Initial attempt without package selection from workspace root (`cargo run --bin tracker_checker -- ...`) failed because the binary belongs to package `torrust-tracker-client`. +- Corrected command above resolves that issue. + +### Manual Verification (Old Demo Tracker — Down) + +Executed on 2026-05-12 from workspace root against `udp://tracker.torrust-demo.com:6969/announce` +(confirmed down by [newtrackon](https://newtrackon.com)): + +```text +cargo run -p torrust-tracker-client --bin tracker_checker -- monitor udp \ + --url udp://tracker.torrust-demo.com:6969/announce \ + --interval 10 \ + --timeout 10 \ + --duration 60 +``` + +Observed output: + +```text +{"event":"probe","sequence":1,"url":"udp://tracker.torrust-demo.com:6969/announce","status":"timeout","elapsed_ms":null} +{"event":"probe","sequence":2,"url":"udp://tracker.torrust-demo.com:6969/announce","status":"timeout","elapsed_ms":null} +{"event":"probe","sequence":3,"url":"udp://tracker.torrust-demo.com:6969/announce","status":"timeout","elapsed_ms":null} +{"udp_trackers":[{"url":"udp://tracker.torrust-demo.com:6969/announce","status":{"code":"ok","message":"monitor completed","stats":{"total":3,"timeouts":3,"timeout_percent":100,"min_ms":null,"max_ms":null,"average_ms":null,"last_ms":null}}}]} +``` + +Notes: + +- All 3 probes timed out within the 60-second window (each probe consumed its full 10 s timeout, + so only 3 probes fit in 60 s), confirming the tracker is unreachable. +- Latency fields (`min_ms`, `max_ms`, `average_ms`, `last_ms`) are all `null` when every probe + times out, matching the agreed design decision. +- `timeout_percent` is `100` (integer), and `status.code` remains `"ok"` because the monitor + itself ran to completion — timeout-heavy runs do not set a non-zero exit code. ## Risks and Trade-offs @@ -216,16 +283,29 @@ when the `monitor` subcommand is selected. - **UDP announcement contents**: The monitor sends a real announce request. The info-hash and peer fields will be test values (re-using the existing `QueryBuilder::with_default_values` defaults unless overridden). This is acceptable for monitoring purposes. +- **`timeout_percent` denominator includes error probes**: `timeout_percent` is computed as + `timeouts × 100 / total`, where `total = successes + timeouts + errors`. A probe that fails + with a non-timeout error (e.g., a DNS failure or connection refused) counts toward `total` + without being counted as a timeout. This reduces `timeout_percent` without the probe being a + success, which can be surprising. The name `timeout_percent` is intentionally scoped to + timeouts; errors are a separate failure mode tracked only implicitly through `total`. +- **`elapsed_ms` excludes DNS resolution time**: Probe timing starts after `resolve_socket_addr` + succeeds, so `elapsed_ms` measures UDP connect + announce network work only. DNS lookup + failures are reported as probe errors with `elapsed_ms: null`. +- **Success-path integration test deferral**: A full mock-UDP-tracker success-path integration + test is intentionally deferred until the tracker-client is moved into its own repository. + Implementing that heavier harness now in the monorepo would likely be duplicated effort; it is + planned as follow-up work in the new tracker-client repository. ## Progress Tracking ### Workflow Checkpoints -- [ ] Spec drafted in `docs/issues/open/` -- [ ] Spec reviewed and approved by user/maintainer -- [ ] Implementation completed -- [ ] Reviewer validated acceptance criteria and updated checkboxes -- [ ] Committer verified spec progress is up to date before commit +- [x] Spec drafted in `docs/issues/open/` +- [x] Spec reviewed and approved by user/maintainer +- [x] Implementation completed +- [x] Reviewer validated acceptance criteria and updated checkboxes +- [x] Committer verified spec progress is up to date before commit - [ ] Issue closed and spec moved from `docs/issues/open/` to `docs/issues/closed/` ### Progress Log @@ -233,6 +313,12 @@ when the `monitor` subcommand is selected. - 2026-05-11 20:00 UTC - Agent - Spec created from GitHub issue #1178 content - 2026-05-12 00:00 UTC - Agent - Incorporated maintainer decisions: monitor in tracker_checker, seconds unit, UDP-only scope, duration-controlled run, stderr live output plus final JSON on stdout - 2026-05-12 08:00 UTC - Agent - Incorporated answered follow-ups: default duration `86400`, align final JSON with checker shape, keep exit code `0` for timeout-heavy but successful runs +- 2026-05-12 09:30 UTC - Maintainer + Agent - Confirmed command remains a `tracker_checker` subcommand, documented future binary consolidation context, and confirmed `null` latency fields when all probes timeout +- 2026-05-12 10:00 UTC - Maintainer + Agent - Finalized elapsed-time precision: `elapsed_ms` uses integer milliseconds (`u64`) with truncation +- 2026-05-12 16:55 UTC - Agent - Performed 60-second manual verification against `udp://udp1.torrust-tracker-demo.com:6969/announce`, captured command/output in spec, and corrected workspace-root command invocation to include `-p torrust-tracker-client` +- 2026-05-12 17:10 UTC - Agent - Performed 60-second manual verification against `udp://tracker.torrust-demo.com:6969/announce` (confirmed down); all 3 probes timed out, null latency fields and `timeout_percent: 100` observed as designed +- 2026-05-12 17:40 UTC - Agent - Updated probe timing to start after address resolution so `elapsed_ms` excludes DNS lookup time; documented behavior in Risks and Trade-offs +- 2026-05-12 17:45 UTC - Maintainer + Agent - Deferred success-path mock UDP integration test until planned tracker-client repository split to avoid duplicate harness work ## Open Questions diff --git a/docs/refactor-plans/closed/1178-monitor-udp-post-implementation-improvements.md b/docs/refactor-plans/closed/1178-monitor-udp-post-implementation-improvements.md new file mode 100644 index 000000000..18c9d43f0 --- /dev/null +++ b/docs/refactor-plans/closed/1178-monitor-udp-post-implementation-improvements.md @@ -0,0 +1,255 @@ +# Refactor Plan — Issue #1178 Monitor UDP: Post-Implementation Improvements + +## Goal + +Address quality gaps identified after the initial implementation of the `monitor udp` subcommand +(issue #1178). Items are ordered from **highest impact / lowest effort** to **lowest impact / +highest effort** so they can be tackled incrementally. + +Related issue spec: `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +## Items + +### 1. [x] Fix stale `timeout_percent` sample value in spec [HIGH impact / TRIVIAL effort] + +**Problem**: The "Sample Output" section in the issue spec shows `"timeout_percent":33.3` (a +float). The implementation produces `33` (integer `u64`). Any reader using the spec as a +reference for the output contract will be misled. + +**Files**: `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Change**: Replace `33.3` → `33` in the sample output block. + +--- + +### 2. [x] Add `--info-hash` to the Options table in the spec [HIGH impact / TRIVIAL effort] + +**Problem**: The implementation exposes `--info-hash` with a sensible default, but the spec's +CLI Options table omits it. A user reading the spec will not know the option exists. + +**Files**: `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Change**: Add a row for `--info-hash` (default `9c38422213e30bff212b30c360d26f9a02136422`, +description "Info-hash used in announce requests"). + +--- + +### 3. [x] Tick completed Goals and Workflow Checkpoints in the spec [HIGH impact / TRIVIAL effort] + +**Problem**: Implementation is complete, manually verified, and committed, but both the `Goals` +checklist and the `Workflow Checkpoints` list still show unchecked `[ ]` items. They look like +open work to any reader. + +**Files**: `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Change**: Mark all completed goals and checkpoints as `[x]`. + +--- + +### 4. [x] Add a unit test asserting all-null latency fields when every probe times out [HIGH impact / LOW effort] + +**Problem**: The "down tracker" scenario (every probe times out → `min_ms`, `max_ms`, +`average_ms`, `last_ms` all `null`) is the most important correctness property of the stats +struct, but it has no dedicated test. It is only validated by a manual run against a live tracker. + +**Files**: `console/tracker-client/src/console/clients/checker/monitor/udp.rs` + +**Change**: Add a unit test in the existing `#[cfg(test)]` block that: + +1. Creates a `Stats` with only `record_timeout()` calls. +2. Asserts `min_ms`, `max_ms`, `average_ms()`, and `last_ms` are all `None`. +3. Asserts `timeout_percent()` returns `100`. + +--- + +### 5. [x] Document that the integration test exercises only the timeout path [HIGH impact / LOW effort] + +**Problem**: `spawn_udp_sink()` silently discards UDP packets without ever sending a valid +`ConnectResponse`. Every probe in the integration test therefore times out. The test validates +JSON shape and exit code but not a successful probe event. This is non-obvious and could mask +regressions in the success path. + +**Files**: `console/tracker-client/tests/tracker_checker.rs` + +**Change**: Add a doc comment on the `monitor_udp` test module explaining that the UDP sink +intentionally produces timeouts, and note that a success-path integration test requires a proper +mock tracker responding to the UDP protocol (tracked as a follow-up). + +--- + +### 6. [x] Correct Task 6 file reference in the Implementation Plan [MEDIUM impact / TRIVIAL effort] + +**Problem**: Implementation Plan Task 6 says "Update +`console/tracker-client/src/bin/tracker_checker.rs`", but the actual dispatch was added to +`console/tracker-client/src/console/clients/checker/app.rs`. A future contributor tracing a +regression will look in the wrong file. + +**Files**: `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Change**: Correct the file path in Task 6 to reference `app.rs`. + +--- + +### 7. [x] Document `last_ms: null` on timeout in AC3 [MEDIUM impact / LOW effort] + +**Problem**: AC3 states that timed-out probes are "excluded from response-time averages" but +does not mention that `last_ms` also becomes `null` when a probe times out. This is a separate, +non-obvious contract detail buried only in the manual verification notes. + +**Files**: `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Change**: Update the AC3 description to explicitly state that `last_ms` is set to `null` when +the most recent probe times out. + +--- + +### 8. [x] Document the double duration-check intent in `run_monitor` [MEDIUM impact / LOW effort] + +**Problem**: `run_monitor` contains two `if started_at.elapsed() >= config.duration { break; }` +guards — one before the probe and one before the sleep. This is intentional (avoids sleeping +after the last probe) but reads like an accidental duplication and will confuse reviewers. + +**Files**: `console/tracker-client/src/console/clients/checker/monitor/udp.rs` + +**Change**: Add inline comments on each guard explaining its distinct purpose: + +- First guard: "exit before starting a new probe if the budget is exhausted" +- Second guard: "exit before sleeping if duration elapsed during the probe itself" + +--- + +### 9. [x] Document `u64::MAX` fallback for `elapsed_ms` [MEDIUM impact / LOW effort] + +**Problem**: + +```rust +let elapsed_ms = u64::try_from(probe_started.elapsed().as_millis()).unwrap_or(u64::MAX); +``` + +`u64::MAX` as a fallback would make a conversion-overflow probe appear as ~584 million years of +latency. Since `as_millis()` returns `u128`, overflow could only occur if a single probe ran for +over 584 million years (impossible in practice), but the fallback is still an incorrect sentinel +in principle — no reader will understand it without a comment. + +**Files**: `console/tracker-client/src/console/clients/checker/monitor/udp.rs` + +**Change**: Add a comment explaining why overflow is unreachable in practice and that `u64::MAX` +is a placeholder that cannot realistically occur. + +--- + +### 10. [x] Document that `timeout_percent` denominator includes error probes [MEDIUM impact / LOW effort] + +**Problem**: `timeout_percent = timeouts × 100 / total`, where +`total = successes + timeouts + errors`. A probe that errors (not timeout) reduces the percentage +without being a success. The name `timeout_percent` implies "fraction of probes that timed out" +but errors silently dilute the denominator. This behaviour is not documented anywhere in the +spec or code. + +**Files**: + +- `console/tracker-client/src/console/clients/checker/monitor/udp.rs` +- `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Change**: + +- Add a doc comment on `timeout_percent()` explaining the denominator includes errors. +- Add a note in the spec's Risks and Trade-offs section. + +--- + +### 11. [x] Document that `elapsed_ms` includes DNS resolution time [MEDIUM impact / MEDIUM effort] + +**Problem**: The `probe_started` timer is captured before `resolve_socket_addr()`. For trackers +with non-trivial DNS lookup times, the reported latency includes DNS resolution, not just +network round-trip time. This deviates from what most users expect "announce response time" to +mean. + +**Files**: + +- `console/tracker-client/src/console/clients/checker/monitor/udp.rs` +- `docs/issues/open/1178-tracker-checker-udp-add-monitor-uptime-command.md` + +**Options** (choose one): + +- **Document only**: Add a comment in code and a note in the spec explaining what is measured. +- **Fix timing**: Move `probe_started` to after `resolve_socket_addr()` — DNS time is then + excluded from latency. Note that this changes the reported metric. + +--- + +### 12. [x] Extract `run_probe_loop` from `run_monitor` [LOW impact / MEDIUM effort] + +**Problem**: `run_monitor` is ~90 lines handling multiple concerns: the probe loop, signal +handling, sleep, outcome dispatch, stats recording, event emission, and final JSON output. This +makes each piece harder to read and impossible to test independently. + +**Files**: `console/tracker-client/src/console/clients/checker/monitor/udp.rs` + +**Change**: Extract a private `async fn run_probe_loop(config: &MonitorUdpConfig) -> (Stats, bool /* interrupted */)` that: + +1. Runs the loop. +2. Returns final stats and the interrupted flag. + +`run_monitor` then calls `run_probe_loop`, formats, and prints. This makes the loop logic unit- +testable without spawning a subprocess. + +--- + +### 13. [x] Implement `From<&Stats> for MonitorStats` [LOW impact / LOW effort] + +**Problem**: The conversion from `Stats` to `MonitorStats` is an inline struct literal embedded +inside the already-long `run_monitor` function. A `From` implementation would express the +intent clearly and clean up `run_monitor`. + +**Files**: `console/tracker-client/src/console/clients/checker/monitor/udp.rs` + +**Change**: Add `impl From<&Stats> for MonitorStats` and replace the inline literal with +`MonitorStats::from(&stats)`. + +--- + +### 14. [x] Add a success-path integration test using a mock UDP tracker [DEFERRED] + +**Problem**: The only integration test uses a UDP sink that never responds, so the success path +(probe receives a valid `AnnounceResponse`, `elapsed_ms` is Some, latency stats are populated) +is never exercised at the integration level. + +**Files**: `console/tracker-client/tests/tracker_checker.rs` + +**Change**: Implement a minimal mock UDP tracker in the test helper that: + +1. Binds a UDP socket. +2. Responds to a `ConnectRequest` with a valid `ConnectResponse`. +3. Responds to an `AnnounceRequest` with a valid `AnnounceResponse`. + +Then add a test asserting that `elapsed_ms` is non-null, `status` is `"ok"`, and `stats.total`, +`stats.successes`, `min_ms`, `max_ms`, `average_ms`, and `last_ms` are all populated. + +This is the highest-confidence validation of the happy path and closes the gap left by item 5. + +**Deferral decision (2026-05-12)**: Deferred on purpose. The tracker client is planned to move to +its own repository shortly; implementing this heavier integration harness in the current monorepo +would likely be duplicated effort. The success-path integration/e2e test will be implemented in +the future tracker-client repository once the move is completed. + +--- + +## Order of Execution + +| Order | Status | Item | Impact | Effort | +| ----- | ------ | ------------------------------------------------------------------------------------------- | ------ | ------- | +| 1 | [x] | Fix stale `timeout_percent` sample value | High | Trivial | +| 2 | [x] | Add `--info-hash` to Options table | High | Trivial | +| 3 | [x] | Tick completed Goals and Checkpoints | High | Trivial | +| 4 | [x] | Unit test: all-null latency on all-timeouts | High | Low | +| 5 | [x] | Document integration test exercises timeout path only | High | Low | +| 6 | [x] | Correct Task 6 file reference | Medium | Trivial | +| 7 | [x] | Document `last_ms: null` on timeout in AC3 | Medium | Low | +| 8 | [x] | Document double duration-check intent | Medium | Low | +| 9 | [x] | Document `u64::MAX` fallback | Medium | Low | +| 10 | [x] | Document `timeout_percent` denominator includes errors | Medium | Low | +| 11 | [x] | Document / fix `elapsed_ms` includes DNS time | Medium | Medium | +| 12 | [x] | Extract `run_probe_loop` from `run_monitor` | Low | Medium | +| 13 | [x] | `From<&Stats> for MonitorStats` | Low | Low | +| 14 | [x] | Success-path integration test with mock UDP tracker (deferred to tracker-client repo split) | Low | High | diff --git a/docs/refactor-plans/closed/README.md b/docs/refactor-plans/closed/README.md new file mode 100644 index 000000000..ec9366ab3 --- /dev/null +++ b/docs/refactor-plans/closed/README.md @@ -0,0 +1,15 @@ +# Closed Refactor Plans + +This folder holds refactor plans where all items have been completed. Plans are kept here +temporarily as a reference while adjacent work is still in progress. + +## Lifecycle + +1. **All items done** → plan moves from `docs/refactor-plans/open/` to here. +2. **Buffer period** → file lives here while it may still be referenced by active work. +3. **Cleanup** → once no longer referenced, the file is deleted. + +## Related Skills + +- Create a refactor plan: + [`.github/skills/dev/planning/create-refactor-plan/SKILL.md`](../../../.github/skills/dev/planning/create-refactor-plan/SKILL.md) diff --git a/docs/refactor-plans/agent-docs-refactor-plan.md b/docs/refactor-plans/closed/agent-docs-refactor-plan.md similarity index 100% rename from docs/refactor-plans/agent-docs-refactor-plan.md rename to docs/refactor-plans/closed/agent-docs-refactor-plan.md diff --git a/docs/refactor-plans/drafts/README.md b/docs/refactor-plans/drafts/README.md new file mode 100644 index 000000000..d8eeebdea --- /dev/null +++ b/docs/refactor-plans/drafts/README.md @@ -0,0 +1,16 @@ +# Draft Refactor Plans + +This folder contains refactor plan drafts that are being written or awaiting review before +implementation begins. + +## Lifecycle + +1. Create a new plan file here using the template at + [`docs/templates/REFACTOR-PLAN.md`](../../templates/REFACTOR-PLAN.md). +2. Review the plan. +3. When implementation is ready to start, move the plan to `docs/refactor-plans/open/`. + +## Related Skills + +- Create a refactor plan: + [`.github/skills/dev/planning/create-refactor-plan/SKILL.md`](../../../.github/skills/dev/planning/create-refactor-plan/SKILL.md) diff --git a/docs/refactor-plans/open/README.md b/docs/refactor-plans/open/README.md new file mode 100644 index 000000000..bf0eb1f09 --- /dev/null +++ b/docs/refactor-plans/open/README.md @@ -0,0 +1,15 @@ +# Open Refactor Plans + +This folder contains refactor plans that are actively being worked through. + +## Lifecycle + +1. Draft a plan in `docs/refactor-plans/drafts/`. +2. When implementation starts, move the plan here. +3. Tick checkboxes as each item is completed. +4. When all items are done, move the plan to `docs/refactor-plans/closed/`. + +## Related Skills + +- Create a refactor plan: + [`.github/skills/dev/planning/create-refactor-plan/SKILL.md`](../../../.github/skills/dev/planning/create-refactor-plan/SKILL.md) diff --git a/docs/templates/REFACTOR-PLAN.md b/docs/templates/REFACTOR-PLAN.md new file mode 100644 index 000000000..78c518aa6 --- /dev/null +++ b/docs/templates/REFACTOR-PLAN.md @@ -0,0 +1,64 @@ +--- +doc-type: refactor-plan +status: draft +related-issue: null +spec-path: docs/refactor-plans/drafts/{short-description}.md +last-updated-utc: YYYY-MM-DD HH:MM +semantic-links: + skill-links: + - create-refactor-plan + related-artifacts: + - .github/skills/dev/planning/create-refactor-plan/SKILL.md +--- + + + +# Refactor Plan — {Title} + +## Goal + +State in one or two sentences what the refactor achieves and why it is worthwhile. +Focus on the quality property improved (readability, testability, maintainability, etc.). + +Related artifact: `{path/to/related/file-or-issue-spec.md}` + +## Items + + + +### 1. [ ] {Short title} [{IMPACT} impact / {EFFORT} effort] + +**Problem**: Describe the current state and why it is a problem. Be specific — name +files, line numbers, or function names where relevant. + +**Files**: + +- `{path/to/file.rs}` + +**Change**: Describe exactly what needs to change. Prefer concrete before/after +examples over abstract descriptions. + +--- + +### 2. [ ] {Short title} [{IMPACT} impact / {EFFORT} effort] + +**Problem**: ... + +**Files**: + +- `{path/to/file.rs}` + +**Change**: ... + +--- + +## Order of Execution + +| Order | Status | Item | Impact | Effort | +| ----- | ------ | --------------------- | ------ | ------- | +| 1 | [ ] | {Short title of item} | High | Trivial | +| 2 | [ ] | {Short title of item} | Medium | Low | + + + diff --git a/project-words.txt b/project-words.txt index fbc6da1f3..1c8bd2307 100644 --- a/project-words.txt +++ b/project-words.txt @@ -157,6 +157,7 @@ matchmakes Mebibytes metainfo middlewares +millis misresolved mmap mmdb