2026-05-06
Stack Overflow: View Question
Tags: verilog, system-verilog, fpga, register-transfer-level
Score: 1 | Views: 110
The asker built a 100-tap low-pass FIR filter in Verilog with coefficients from MATLAB's fir1(99, 0.2, 'low'), fed it a synthesized sine wave at 950/1100/2000 Hz sampled at 10 kHz, and quantized everything to Q2.14. MATLAB's reference output and the RTL output agree from y[1] onward, but the very first sample y[0] = 0000 never gets written to the output file. Subsequent values look correct.
Why this is more subtle than it looks. A missing first sample is almost always a phase-of-write bug, not a math bug. The filter's output register is presumably reset to zero, so the value exists — but the testbench's $fwrite (or $fdisplay) is firing on the wrong edge or under a guard condition that excludes the reset cycle. Common culprits:
valid/ready handshake that doesn't assert until the first input has been clocked through. y[0] is produced before the pipeline declares itself "running."always @(posedge clk) block with a reset-deassert check, where the first non-reset cycle is skipped because the pipeline depth shifts the index off-by-one.$fopen happening after time 0, so the first sample is emitted before the file descriptor is valid.initial block that does #1 $fwrite, missing the t=0 transition.Approach. First, prove the value really is computed: add a $display("%t y=%h", $time, y) right next to the file write, with no guards. If the console shows y[0] but the file doesn't, the bug is in the write predicate. If neither shows it, the output register is being sampled one cycle too late relative to when MATLAB considers y[0] defined.
Second, align the index convention. MATLAB's filter() defines y(1) as the response to x(1) with zero history — i.e., y[0] = b[0]*x[0]. In RTL, if the multiply-accumulate is registered, that value emerges one clock after x[0] is latched. The testbench must log on the cycle the output register updates, not the cycle the input is presented. Off-by-one here will silently drop sample 0.
Gotchas. Q2.14 with 100 taps and a Q2.14 input gives Q4.28 products and needs ~7 guard bits for the sum (log₂100 ≈ 6.64) — so the accumulator should be at least Q11.28 before truncation back to Q2.14. If the asker is also seeing slight numerical drift, that's a separate issue. Also: fir1(99, ...) returns 100 coefficients (order 99), so the tap count had better match. Finally, $fwrite doesn't flush on simulator crash — always pair with $fclose in a final block, or the last few samples (and possibly the first, if buffered weirdly) can vanish.
y[0] is rarely a filter bug — it's almost always a testbench timing/index-alignment bug between when the RTL produces a value and when the logging predicate decides to write it.
