-
Notifications
You must be signed in to change notification settings - Fork 0
Performance
FastSense achieves dramatic performance improvements over MATLAB's built-in plot() function through intelligent downsampling, multi-level caching, and optimized MEX kernels. Here's what you can expect and how to measure it yourself.
Based on benchmarks with 10M data points on Apple M4 with GNU Octave 11:
| Metric | Value | Description |
|---|---|---|
| Zoom cycle time | 4.7 ms | Time to re-downsample and redraw on zoom/pan |
| Effective zoom FPS | 212 FPS | Interactive frames per second during zoom |
| Point reduction | 99.96% | 10M points → ~4K rendered points |
| GPU memory usage | 0.06 MB | vs 153 MB for equivalent plot()
|
The key advantage isn't just initial render time — it's maintaining fluid interactivity. With plot(), 10M points make zoom/pan unusable, while FastSense maintains sub-5ms response times.
| Points | plot() render | FastSense render | Speedup |
|---|---|---|---|
| 10K | instant | instant | ~1x |
| 100K | moderate lag | instant | ~5x |
| 1M | slow | fast | ~10x |
| 10M | very slow | 0.19 s | ~50x |
| 100M | often fails | works | ∞ |
At 100M+ points, plot() frequently runs out of memory or becomes completely unresponsive, while FastSense handles it gracefully.
Multi-tile dashboards show increasing advantage as tile count grows:
| Layout | subplot() | FastSenseGrid | Speedup |
|---|---|---|---|
| 1x1 | 0.195 s | 0.187 s | 1.0x |
| 2x2 | 0.451 s | 0.377 s | 1.2x |
| 3x3 | 0.964 s | 0.709 s | 1.4x |
Each FastSenseGrid tile downsamples independently to ~4K points regardless of raw data size, so rendering cost stays nearly flat. Traditional approaches scale linearly with total point count.
Compiled MEX kernels provide substantial acceleration for core operations:
| Operation (10M points) | MATLAB | MEX | Speedup |
|---|---|---|---|
| Binary search | ~1 ms | ~0.05 ms | 20x |
| MinMax downsample | ~25 ms | ~7 ms | 3.5x |
| LTTB downsample | ~200 ms | ~4 ms | 50x |
| Violation detection | ~50 ms | ~2 ms | 25x |
MEX kernels use SIMD instructions (AVX2/NEON) to process 4 doubles per CPU cycle when possible.
FastSense includes benchmark scripts to measure performance on your system. From the examples/ directory:
% Stress test with 100M points
example_100M;
% Compare LTTB vs MinMax downsampling algorithms
example_lttb_vs_minmax;
% Multi-dashboard stress test: 5 tabs, 26 sensors, 104 thresholds
example_stress_test;The stress test creates a realistic large-scale scenario with 5 tabbed dashboards, 26 sensors, ~86M total points, and 104 dynamic thresholds that change based on machine state.
Only renders ~4,000 points regardless of dataset size. A 100M point dataset uses the same GPU memory as a 4K dataset once downsampled.
Uses O(log N) binary search instead of O(N) linear scanning to find visible data ranges on zoom/pan:
% Binary search is 20x faster than MATLAB fallback
idx = binary_search(x, xValue, 'left'); % First index where x >= xValue
idx = binary_search(x, xValue, 'right'); % Last index where x <= xValuePre-computes downsampled levels (100:1, 10000:1, etc.) so zooming out never touches raw data. Cache is built incrementally as needed.
C implementations use vectorized instructions to process multiple data points per CPU cycle:
- AVX2 on x86_64: processes 4 doubles simultaneously
- NEON on ARM64: processes 2-4 elements per cycle
Build the MEX kernels for maximum performance:
build_mex(); % Compile with platform-specific SIMD optimizationCombines multiple operations in single passes:
- Violation detection + pixel coordinate culling
- Downsampling + threshold line intersection
- Range lookup + metadata forwarding
Updates line data via direct XData/YData assignment — the fastest path through MATLAB's graphics system. Avoids object recreation or property listeners.
Uses drawnow limitrate to cap display refresh at 20 FPS, preventing GPU thrashing during rapid zoom/pan sequences.
Several properties control the performance vs. quality trade-off:
fp = FastSense();
% Increase points per pixel for denser traces (default: 2)
fp.DownsampleFactor = 4;
% Adjust pyramid compression (default: 100)
fp.PyramidReduction = 50; % more levels, finer granularity
% Switch algorithms for different data characteristics
fp.DefaultDownsampleMethod = 'lttb'; % vs 'minmax'
% Control when downsampling kicks in (default: 5000)
fp.MinPointsForDownsample = 10000;FastSense automatically switches between in-memory and disk-backed storage:
fp = FastSense();
% Force storage mode (default: 'auto')
fp.StorageMode = 'memory'; % always RAM
fp.StorageMode = 'disk'; % always SQLite
% Adjust memory threshold (default: 500 MB)
fp.MemoryLimit = 1e9; % 1 GB thresholdThe 'auto' mode uses FastSenseDataStore for lines exceeding the memory limit, seamlessly providing disk-based storage without performance degradation.
Enable verbose output to see detailed timing information:
fp = FastSense('Verbose', true);
fp.addLine(x, y);
fp.render();
% Output:
% [FastSense] Line 1: 10000000 points → 3847 (MinMax, 23.4 ms)
% [FastSense] Pyramid L1: 100000 points (7.8 ms)
% [FastSense] Pyramid L2: 1000 points (0.3 ms)
% [FastSense] Total render: 187.2 msThe ConsoleProgressBar class (used internally) is also available for your own batch operations:
pb = ConsoleProgressBar();
pb.start();
for k = 1:1000
% your processing
pb.update(k, 1000, 'Processing');
end
pb.finish();For headless or batch workflows, use DeferDraw to skip intermediate display updates:
fp = FastSense();
fp.DeferDraw = true; % Skip drawnow during render
fp.ShowProgress = false; % Hide console progress bar
fp.addLine(x, y);
fp.render();
drawnow; % Manual drawnow when ready to displayThis is demonstrated in the 100M point stress test example, where it provides measurable performance gains for very large datasets.
FastPlot Wiki
API Reference
Guides
Use Cases
Internals
Resources