Skip to content

Commit 2f46e53

Browse files
committed
docs: add a infra reference
1 parent f90fcf1 commit 2f46e53

File tree

1 file changed

+388
-0
lines changed

1 file changed

+388
-0
lines changed

k3s/docs/infrastructure.md

Lines changed: 388 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,388 @@
1+
# K3s Infrastructure Documentation
2+
3+
> Last updated: 2025-11-27
4+
5+
## Overview
6+
7+
Self-hosted k3s cluster on DigitalOcean for internal tools (Appsmith, etc.).
8+
9+
| Component | Value |
10+
|-----------|-------|
11+
| Region | NYC3 |
12+
| Nodes | 3 (HA control plane) |
13+
| K3s Version | v1.33.6+k3s1 |
14+
| Container Runtime | containerd 2.1.5 |
15+
| OS | Ubuntu 24.04.3 LTS |
16+
17+
---
18+
19+
## DigitalOcean Resources
20+
21+
### VPC
22+
23+
| Property | Value |
24+
|----------|-------|
25+
| Name | `ops-vpc-tools-k3s-nyc3` |
26+
| ID | Get via: `doctl vpcs list \| grep k3s` |
27+
| IP Range | `10.108.0.0/20` |
28+
| Region | nyc3 |
29+
30+
### Droplets
31+
32+
| Name | Private IP | Specs |
33+
|------|-----------:|-------|
34+
| ops-vm-tools-k3s-nyc3-01 | 10.108.0.4 | 4 vCPU, 8GB RAM, 160GB |
35+
| ops-vm-tools-k3s-nyc3-02 | 10.108.0.5 | 4 vCPU, 8GB RAM, 160GB |
36+
| ops-vm-tools-k3s-nyc3-03 | 10.108.0.6 | 4 vCPU, 8GB RAM, 160GB |
37+
38+
All tagged: `tools-k3s`
39+
40+
### Load Balancer
41+
42+
| Property | Value |
43+
|----------|-------|
44+
| Name | `ops-lb-tools-k3s-nyc3-01` |
45+
| IP | Get via: `doctl compute load-balancer list \| grep k3s` |
46+
| VPC | `ops-vpc-tools-k3s-nyc3` |
47+
| Target Droplets | All 3 k3s nodes |
48+
49+
**Forwarding Rules:**
50+
51+
| Entry Protocol | Entry Port | Target Protocol | Target Port | TLS |
52+
|----------------|------------|-----------------|-------------|-----|
53+
| HTTP | 80 | HTTP | 30080 | - |
54+
| HTTPS | 443 | HTTPS | 30443 | Passthrough |
55+
56+
**Health Check:**
57+
- Protocol: TCP
58+
- Port: 30443
59+
- Interval: 10s
60+
- Timeout: 5s
61+
- Healthy threshold: 5
62+
- Unhealthy threshold: 3
63+
64+
### Firewall
65+
66+
| Property | Value |
67+
|----------|-------|
68+
| Name | `tools-fw-nyc3` |
69+
| ID | Get via: `doctl compute firewall list \| grep tools` |
70+
71+
**Inbound Rules:**
72+
73+
| Protocol | Ports | Source |
74+
|----------|-------|--------|
75+
| ICMP | - | VPC (10.108.0.0/20) |
76+
| TCP | All | VPC (10.108.0.0/20) |
77+
| UDP | All | VPC (10.108.0.0/20) |
78+
| TCP | 22 | 0.0.0.0/0 (SSH) |
79+
| TCP | 30080 | Load Balancer only |
80+
| TCP | 30443 | Load Balancer only |
81+
82+
**Outbound Rules:** All traffic allowed (TCP/UDP/ICMP to 0.0.0.0/0)
83+
84+
---
85+
86+
## Kubernetes Cluster
87+
88+
### Control Plane
89+
90+
All 3 nodes are control-plane/etcd/master (HA configuration):
91+
92+
```
93+
┌─────────────────────────────────────────────────────────────┐
94+
│ K3s HA Cluster │
95+
├─────────────────┬─────────────────┬─────────────────────────┤
96+
│ Node 01 │ Node 02 │ Node 03 │
97+
│ 10.108.0.4 │ 10.108.0.5 │ 10.108.0.6 │
98+
├─────────────────┼─────────────────┼─────────────────────────┤
99+
│ control-plane │ control-plane │ control-plane │
100+
│ etcd │ etcd │ etcd │
101+
│ master │ master │ master │
102+
├─────────────────┼─────────────────┼─────────────────────────┤
103+
│ coredns │ traefik │ appsmith │
104+
│ metrics-server │ longhorn │ longhorn │
105+
│ longhorn │ │ │
106+
├─────────────────┴─────────────────┴─────────────────────────┤
107+
│ Longhorn Replicated Storage (2 replicas) │
108+
└─────────────────────────────────────────────────────────────┘
109+
```
110+
111+
### API Server Access
112+
113+
```yaml
114+
server: https://ops-vm-tools-k3s-nyc3-01:6443
115+
```
116+
117+
Kubeconfig uses hostname resolution (likely via `/etc/hosts` or Tailscale).
118+
119+
### Resource Usage (as of inspection)
120+
121+
| Node | CPU | Memory |
122+
|------|-----|--------|
123+
| node-01 | 92m (2%) | 1497Mi (18%) |
124+
| node-02 | 78m (1%) | 817Mi (10%) |
125+
| node-03 | 58m (1%) | 1978Mi (24%) |
126+
127+
**Total Capacity per Node:** 4 CPU, 8GB RAM
128+
129+
---
130+
131+
## Networking
132+
133+
### Traffic Flow
134+
135+
```
136+
Internet
137+
138+
139+
┌──────────────────────────────────┐
140+
│ DO Load Balancer │
141+
│ - HTTP:80 → NodePort:30080 │
142+
│ - HTTPS:443 → NodePort:30443 │
143+
└──────────────────────────────────┘
144+
145+
▼ (VPC: 10.108.0.0/20)
146+
┌──────────────────────────────────┐
147+
│ Firewall (tools-fw-nyc3) │
148+
│ - Only LB can reach 30080/30443 │
149+
│ - SSH open (consider limiting) │
150+
└──────────────────────────────────┘
151+
152+
153+
┌──────────────────────────────────┐
154+
│ Traefik (NodePort Service) │
155+
│ - 30080 → web (HTTP) │
156+
│ - 30443 → websecure (HTTPS) │
157+
└──────────────────────────────────┘
158+
159+
160+
┌──────────────────────────────────┐
161+
│ Gateway API │
162+
│ - GatewayClass: traefik │
163+
│ - Gateway per namespace │
164+
│ - HTTPRoutes for routing │
165+
└──────────────────────────────────┘
166+
167+
168+
┌──────────────────────────────────┐
169+
│ Application Services (ClusterIP)│
170+
│ - appsmith:80 │
171+
└──────────────────────────────────┘
172+
```
173+
174+
### Traefik Configuration
175+
176+
Located at: `cluster/charts/traefik/values.yaml`
177+
178+
Key settings:
179+
- **Service Type:** NodePort (for DO LB compatibility)
180+
- **NodePorts:** 30080 (HTTP), 30443 (HTTPS)
181+
- **Gateway API:** Enabled
182+
- **TLS Passthrough:** Yes (terminates at app Gateway)
183+
- **Access Logs:** Enabled
184+
185+
### Pod Network
186+
187+
- CIDR: `10.42.0.0/16` (default k3s)
188+
- Service CIDR: `10.43.0.0/16`
189+
- DNS: CoreDNS at `10.43.0.10`
190+
191+
---
192+
193+
## Storage
194+
195+
### Longhorn (Primary)
196+
197+
Distributed block storage with cross-node replication.
198+
199+
| Property | Value |
200+
|----------|-------|
201+
| Version | v1.10.1 |
202+
| Provisioner | `driver.longhorn.io` |
203+
| Default Replicas | 2 (survives 1 node failure) |
204+
| Data Path | `/var/lib/longhorn/` |
205+
| Config | `cluster/charts/longhorn/values.yaml` |
206+
207+
**Failover Tested:** Pod reschedules to healthy node, mounts replica, continues working.
208+
209+
```bash
210+
# Longhorn UI (port-forward)
211+
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80
212+
213+
# Check volumes
214+
kubectl get volumes.longhorn.io -n longhorn-system
215+
kubectl get replicas.longhorn.io -n longhorn-system -o wide
216+
```
217+
218+
### Local Path (Legacy)
219+
220+
Still available for non-critical workloads. Single-node, no replication.
221+
222+
| Property | Value |
223+
|----------|-------|
224+
| Provisioner | `rancher.io/local-path` |
225+
| Storage Path | `/var/lib/rancher/k3s/storage/` |
226+
227+
### Storage Classes
228+
229+
| Name | Provisioner | Replicas | Use For |
230+
|------|-------------|----------|---------|
231+
| `longhorn` | driver.longhorn.io | 2 | Databases, stateful apps |
232+
| `local-path` (default) | rancher.io/local-path | 1 | Ephemeral, non-critical |
233+
234+
---
235+
236+
## Installed Components
237+
238+
### System (kube-system)
239+
240+
| Component | Purpose |
241+
|-----------|---------|
242+
| CoreDNS | Cluster DNS |
243+
| Traefik | Ingress/Gateway controller |
244+
| Local Path Provisioner | Legacy storage |
245+
| Metrics Server | Resource metrics |
246+
247+
### Longhorn (longhorn-system)
248+
249+
| Component | Replicas |
250+
|-----------|----------|
251+
| longhorn-manager | 3 (DaemonSet) |
252+
| longhorn-driver-deployer | 1 |
253+
| longhorn-csi-plugin | 3 (DaemonSet) |
254+
| longhorn-ui | 1 |
255+
| csi-attacher/provisioner/resizer/snapshotter | 2 each |
256+
257+
### Gateway API CRDs
258+
259+
- `gatewayclasses.gateway.networking.k8s.io`
260+
- `gateways.gateway.networking.k8s.io`
261+
- `httproutes.gateway.networking.k8s.io`
262+
- `grpcroutes.gateway.networking.k8s.io`
263+
- `referencegrants.gateway.networking.k8s.io`
264+
265+
### Traefik CRDs
266+
267+
- `middlewares.traefik.io`
268+
- `ingressroutes.traefik.io`
269+
- `serverstransports.traefik.io`
270+
- `tlsoptions.traefik.io`
271+
272+
---
273+
274+
## Applications
275+
276+
### Appsmith
277+
278+
| Property | Value |
279+
|----------|-------|
280+
| Namespace | `appsmith` |
281+
| Domain | `appsmith.freecodecamp.net` |
282+
| Gateway | `appsmith-gateway` |
283+
| HTTPRoutes | `appsmith-route`, `http-redirect` |
284+
| Storage | 10Gi PVC (longhorn, 2 replicas) |
285+
286+
### Outline
287+
288+
| Property | Value |
289+
|----------|-------|
290+
| Namespace | `outline` |
291+
| Domain | `outline.freecodecamp.net` |
292+
| Gateway | `outline-gateway` |
293+
| HTTPRoutes | `outline-route`, `http-redirect` |
294+
| Storage | 10Gi PostgreSQL + 10Gi data (longhorn) |
295+
| Auth | Google OAuth |
296+
| Components | Outline + PostgreSQL + Redis (single pod) |
297+
298+
---
299+
300+
## Security Considerations
301+
302+
1. **SSH Access:** Currently open to 0.0.0.0/0 - consider restricting to known IPs or Tailscale
303+
2. **Firewall:** NodePorts only accessible via Load Balancer (good)
304+
3. **TLS:** Passthrough to application Gateways (Cloudflare origin certs)
305+
4. **API Server:** Accessible via hostname (requires VPN/hosts entry)
306+
307+
---
308+
309+
## DNS Configuration
310+
311+
| Domain | Type | Target |
312+
|--------|------|--------|
313+
| appsmith.freecodecamp.net | A | `<LB_IP>` |
314+
| outline.freecodecamp.net | A | `<LB_IP>` |
315+
316+
DNS managed in Cloudflare (proxied or DNS-only based on requirements).
317+
318+
---
319+
320+
## Maintenance Commands
321+
322+
```bash
323+
# Set kubeconfig (or use direnv)
324+
cd k3s
325+
export KUBECONFIG=./.kubeconfig.yaml
326+
327+
# Check cluster health
328+
kubectl get nodes -o wide
329+
kubectl top nodes
330+
331+
# Check all pods
332+
kubectl get pods -A
333+
334+
# Check storage
335+
kubectl get pv,pvc -A
336+
kubectl get storageclass
337+
338+
# Check gateways
339+
kubectl get gateway,httproute -A
340+
341+
# DO resources
342+
doctl compute droplet list | grep k3s
343+
doctl compute load-balancer list | grep k3s
344+
doctl compute firewall list | grep tools
345+
```
346+
347+
---
348+
349+
## Architecture Diagram
350+
351+
```
352+
┌─────────────────────────────────┐
353+
│ Cloudflare │
354+
│ appsmith.freecodecamp.net │
355+
└───────────────┬─────────────────┘
356+
357+
358+
┌───────────────────────────────────────────────────────────────────────────────┐
359+
│ DigitalOcean NYC3 │
360+
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
361+
│ │ Load Balancer │ │
362+
│ │ HTTP:80 → 30080, HTTPS:443 → 30443 │ │
363+
│ └─────────────────────────────────────────────────────────────────────────┘ │
364+
│ │ │
365+
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
366+
│ │ Firewall (tools-fw-nyc3) │ │
367+
│ └─────────────────────────────────────────────────────────────────────────┘ │
368+
│ │ │
369+
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
370+
│ │ VPC (10.108.0.0/20) │ │
371+
│ │ ┌─────────────────┬─────────────────┬─────────────────┐ │ │
372+
│ │ │ Node 01 │ Node 02 │ Node 03 │ │ │
373+
│ │ │ 10.108.0.4 │ 10.108.0.5 │ 10.108.0.6 │ │ │
374+
│ │ │ │ │ │ │ │
375+
│ │ │ ┌───────────┐ │ ┌───────────┐ │ ┌───────────┐ │ │ │
376+
│ │ │ │ coredns │ │ │ traefik │ │ │ appsmith │ │ │ │
377+
│ │ │ │ metrics │ │ │ longhorn │ │ │ longhorn │ │ │ │
378+
│ │ │ │ longhorn │ │ │ │ │ │ │ │ │ │
379+
│ │ │ └───────────┘ │ └───────────┘ │ └───────────┘ │ │ │
380+
│ │ │ │ │ │ │ │
381+
│ │ │ ════════════ Longhorn Replicated Storage ═══════════ │ │
382+
│ │ │ │ │ │ │ │
383+
│ │ │ [etcd] │ [etcd] │ [etcd] │ │ │
384+
│ │ │ [api-server] │ [api-server] │ [api-server] │ │ │
385+
│ │ └─────────────────┴─────────────────┴─────────────────┘ │ │
386+
│ └─────────────────────────────────────────────────────────────────────────┘ │
387+
└───────────────────────────────────────────────────────────────────────────────┘
388+
```

0 commit comments

Comments
 (0)