v1.5.1: optimise network node status api by abhishek9686 · Pull Request #4076 · gravitl/netmaker

abhishek9686 · 2026-06-29T05:34:57Z

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide testing steps

Checklist before requesting a review

My changes affect only 10 files or less.
I have performed a self-review of my code and tested it.
If it is a new feature, I have added thorough tests, my code is <= 1450 lines.
If it is a bugfix, my code is <= 200 lines.
My functions are <= 80 lines.
I have had my code reviewed by a peer.
My unit tests pass locally.
Netmaker is awesome.

tenki-reviewer · 2026-06-29T05:35:02Z

Review Complete

Files Reviewed: 4
Findings: 2

By Severity:

🟠 High: 2

Two high-severity bugs: GetMetricsForNodeIDs silently drops cache misses without DB fallback, causing degraded node status computation; and peerSummaryStatus always returns OnlineSt when metrics are absent, defeating peer-connectivity filtering.

Files Reviewed (4 files)

logic/nodes.go
pro/initialize.go
pro/logic/metrics.go
pro/logic/status.go

tenki-reviewer

Risk: 🟠 High (78/100) — 2 high findings · 295 LOC across 4 files

Summary

PR #4076 modifies node status and metrics logic across 4 files in the Netmaker mesh VPN platform. Two high-severity bugs were identified that directly impact network node health assessment.

Finding-001: Cache miss black hole in `pro/logic/metrics.go`

The batch function GetMetricsForNodeIDs (lines 92-98) returns immediately after the cache-read loop without a DB fallback for cache misses. This diverges from the single-node GetMetrics function (lines 125-130) which correctly falls through to database.FetchRecord. Impact: nodes with valid DB metrics but absent from cache receive nil metrics, causing skipped peer-connectivity analysis and incorrect static-node status (UnKnown).

Finding-002: Dead return path in `pro/logic/status.go`

The function peerSummaryStatus (lines 188-193) has both branches of the if-else returning OnlineSt unconditionally, making the time.Since check dead code. Peers without metrics always appear healthy, defeating the filtering in checkPeerConnectivityWithContext (lines 243-246) that should exclude Error/Warning peers from the unconnected count — inflating peerNotConnectedCnt and producing false Warning/Error status.

Risk Assessment

Both bugs degrade the core node-status pipeline: one silently drops data, the other produces uniformly optimistic results. Together they can mask real connectivity problems in production meshes. Both are fixable with targeted one-line or small-block changes.

tenki-reviewer · 2026-06-29T05:44:42Z

+	if servercfg.CacheEnabled() {
+		for id := range want {
+			if m, ok := getMetricsFromCache(id); ok {
+				result[id] = metricsReadCopy(&m)
+			}
+		}
+		return result


🟠 GetMetricsForNodeIDs cache path silently drops cache misses without DB fallback — diverges from GetMetrics and causes degraded status computation (bug)

In pro/logic/metrics.go, the batch function GetMetricsForNodeIDs (lines 82-120) when cache is enabled (lines 92-98) iterates the wanted IDs, looks them up in the in-memory cache, and returns immediately — with no fallback to the database for IDs not found in cache. By contrast, the single-node GetMetrics function (lines 122-146, specifically lines 125-130) correctly falls through to database.FetchRecord on cache miss. This inconsistency has two concrete impacts: (1) Nodes whose metrics exist in the DB but are absent from cache (e.g., startup races, cache eviction, newly created nodes) receive nil metrics, causing computeNodeStatusWithContext to skip peer-connectivity analysis and fall back to simple check-in heuristics. (2) Static nodes whose ingress gateway metrics are missing from cache get UnKnown status instead of the correct online/offline determination. Additionally, the bulk status API path and individual GetNodeStatus calls may return different status values for the same node when cache is enabled but has missing entries.

💡 Suggestion: After the cache-read loop (line 97), collect IDs not found in cache and query them via GetMetrics (which itself does cache→DB fallback) or batch-fetch from DB, then merge into the result map before returning.

📋 Prompt for AI Agents

In pro/logic/metrics.go at lines 92-98, after the cache-read loop, collect any wanted IDs that were not found in the cache. For those missed IDs, call GetMetrics(id) (which has its own cache→DB fallback) or perform a batch DB query. Merge the results into the result map before returning, ensuring the batch function is semantically equivalent to calling GetMetrics for each ID individually. This aligns the behavior with the single-node GetMetrics function at lines 122-146.

tenki-reviewer · 2026-06-29T05:44:42Z

+func peerSummaryStatus(node models.Node, metrics *models.Metrics, ctx *networkStatusContext) models.NodeStatus {
+	if metrics == nil || metrics.Connectivity == nil {
+		if time.Since(node.LastCheckIn) < models.LastCheckInThreshold {
+			return models.OnlineSt
+		}
+		return models.OnlineSt


🟠 peerSummaryStatus always returns OnlineSt when metrics are absent; both branches identical (bug)

In pro/logic/status.go, the peerSummaryStatus function (lines 188-222) contains a logic error at lines 189-193. When metrics == nil || metrics.Connectivity == nil, both the if-branch and the fallback return models.OnlineSt. This means peers without metrics always appear healthy, defeating the filtering logic in checkPeerConnectivityWithContext (line 243-246) that is meant to skip peers with Error/Warning status when counting unconnected peers.

💡 Suggestion: Change line 193 from return models.OnlineSt to return models.OfflineSt so that peers without metrics who have exceeded the check-in threshold are correctly reported as offline rather than online.

📋 Prompt for AI Agents

In pro/logic/status.go at line 193, change return models.OnlineSt to return models.OfflineSt. This fixes a logic bug in peerSummaryStatus where the fallback branch incorrectly returns OnlineSt instead of OfflineSt when a node's metrics are absent and the check-in threshold has been exceeded.

v1.5.1: optimise network node status api

bc04864

tenki-reviewer Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.5.1: optimise network node status api#4076

v1.5.1: optimise network node status api#4076
abhishek9686 wants to merge 1 commit into
release-v1.5.1from
v1.5.1-optimise-node-status

abhishek9686 commented Jun 29, 2026

Uh oh!

tenki-reviewer Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

tenki-reviewer Bot left a comment

Uh oh!

tenki-reviewer Bot Jun 29, 2026

Uh oh!

tenki-reviewer Bot Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

abhishek9686 commented Jun 29, 2026

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide testing steps

Checklist before requesting a review

Uh oh!

tenki-reviewer Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenki-reviewer Bot left a comment

Choose a reason for hiding this comment

Summary

Finding-001: Cache miss black hole in pro/logic/metrics.go

Finding-002: Dead return path in pro/logic/status.go

Risk Assessment

Uh oh!

tenki-reviewer Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

tenki-reviewer Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tenki-reviewer Bot commented Jun 29, 2026 •

edited

Loading

Finding-001: Cache miss black hole in `pro/logic/metrics.go`

Finding-002: Dead return path in `pro/logic/status.go`