FIR Filter Implementation Using Octave GNU Tool and C Language
1. Abbreviation
FIR – Finite Impulse Response
IIR – Infinite Impulse Response
DFT – Discrete Fourier Transformation
IDFT – Inverse Discrete Fourier Transformation
FFT – Fast Fourier Transformation
ADC – Analog Digital Converter
2. Introduction
A FIR filter is a linear digital filter with a finite impulse response and whose ztransform has only nulls. Such a filter is unrealizable in analog technology, which makes it unique in the digital field. The advantage of the FIR filter is always stable and easy to implement with a linear phase response.
Referring to
[1]: L. Rabiner, B. Gold ” Theory and application of digital signal processing“
and
[2]: A. Oppenheim, R. Schafer ” DiscreteTime Signal Processing“,
here I will give the main properties of the FIR filter, which we will use later.

The ztransform of the FIR filter has the form:
where h(0), h(1), h(2), …, h(N1) – the pulse response of this filter with a N length. 
Conditions for the linearity of the phase response of the FIR filter:
h(k) = h(N1k) where 0 ≤ k ≤ N1 (2)
or
h(k) = – h(N1k) where 0 ≤ k ≤ N1 (3)
For example, (2) is the pulse response of a lowpass filter, and (3) is the pulse response of a highpass filter, bandpass filter, or differentiator. 
FIR filter delay with linear phase:

The nulls of the FIR filter with a linear phase response locate on the unit circle or must satisfy the inversion condition:
If
is a null of the FIR filter then
(5)
is the null of the FIR filter too! 
For the design of FIR filters used by Parks and McClellan algorithm using Chebyshev theory and Remez exchange method. On the internet you can easily find a calculator for the FIR filter coefficients by this method.
For example: http://tfilter.engineerjs.com
3. FIR Filter in recursive Form
This method is rather theoretical in nature and I have not met the use of this implementation in practice. The ztransform of the FIR filter can be represented as follows (see [1]):
where
– DFT (Discrete Fourier Transformation) from the impulse response of the FIR filter:
h (0), h(1),…,h (N1), so this implementation is called a frequencysampled structure.
The implementation of the FIR filter according to this scheme is as follows:
Figure 31: FIR Filter in recursive Form
Notes:

The poles of this filter
are compensated by nulls 
All poles location on the unit circle, which can lead to instability of the filter, since all calculations have finite accuracy

The filter is used the complex numbers, which requires double the computing power, since calculations must be done separately for the imaginary and real parts of complex values

This filter takes a simple form in the case of a narrowband filter, since in this case most of the values of H(k) are zero

I do not provide an code implementation of the filter realization in demo SW
4. FIR Filter in direct Form
The variant is used mainly in practical applications. This form is described in (1) and the implementation is showed in Figure 41:
Figure 41: FIR Filter in direct Form
Notes:

The realization has N1 delays and requires to save N1 input samples: x(n1), … , x(nN+1)

I present a demo SW with the method implementation. See the functions fir_coeff_float(…)/ fir_coeff_integer(…) from file: fir_filter.c/h

Using the symmetry of the impulse response (2) and (3), it is possible to reduce the number of multiplications:


Variant 1: h(k) = h(N1k). If N is an even number then we can write:
If N is an odd number then: 
Variant 2: h(k) = – h(N1k). If N is an even number then we can write:
If N is an odd number then:

5. FIR Filter in cascade Form
As the filter order increases (but the FIR filter always has a high order) there is a high sensitivity of the filter nulls from quantization or rounding of the filter coefficients. In this case, using a cascading implementation of the FIR filter can help.
To preserve the linearity of the phase characteristic of the filter, the inverse nulls (5) should be combined together. In this case, we get a 4order block:
For real inverse nulls, we get a 2order block:
For nulls on the unit circle, two options are possible. This is a 2order block for complex nulls and a firstorder block for real null 1 or 1:
Thus, the FIR filter can be implemented in a cascade form, using the 1, 2, 4 – order blocks.
Figure 51: FIR Filter in cascade Form
Notes:

See the [1]. The FIR filter in direct form gives a good result and usually the cascading form does not need in practice. This is due to the fact that the nulls of the FIR filter are located at a sufficient distance from each other and quantization of the coefficients does not lead to a deterioration in the characteristics of the filter. Therefore, I hope that you will not need the cascading form of your implementation in practice

You can implement a cascading form by repeatedly using the fir_coeff_float(…)/ fir_coeff_integer(…) functions from the file fir_filter.c/h
6. FIR Filter using Convolution with the OverlapAdd Method
Sometimes there is a long data sequences of input samples and we need to quickly pass the samples in the filter. The calculations must be performed in separate data blocks.
The filter response can be calculated using the convolution:
Let the input x (n) sequence have length M (x (0), x (1),…, x (M1)) and the impulse response of the filter h(n) has length N (h (0), h (1),…, h(N1)), then the convolution of these two sequences will have length M+N1, and the result can be calculated by the formula:
To calculate the output long data sequence, we divide the input data into blocks of length M. We will calculate the convolution for each individual block. To combine the results of these calculations, there are two methods: overlapadd method and overlapsave method (see [1], [2]). I will focus here only on the overlapadd method. In this case, the filter response to individual blocks is calculated, and the overlap between these reactions is summed up. See the Figure 61.
Figure 61: FIR Filter. Convolution with the OverlapAdd Method
The calculation of convolution for each data block takes quite a long time, so the fast convolution algorithm is often used. This method is based on the DFT and IDFT algorithms, which can be effectively implemented based on the FFT algorithm. It is known (see [1] or [2]) that the signal spectra y (n), x (n) and h (n) in (16) can be calculated by the formula:
where Y(k), X(k), H(k) are DFT from signals:
Next, I will describe a fast convolution algorithm that can be used with the OverlapAdd Method to calculate the filtered signal of the Long Data Sequences:

Add zeros to the impulse response: h (0), h(1),…, h(N1), 0,…, 0 so that the array size is N + M – 1, where M is the size of the input signal block. Then using FFT algorithm, the DFT is calculated: H(k)

Take a block, for example iblock of input data of length M. We add zeros to this data:
,
to get an array of length N+M1, where N is the size of the impulse response of the FIR filter. Using FFT algotithm, we calculate the DFT: Xi(k) 
Multiply the Yi(k)=Xi(k)H(k). Calculate the IDFT from the Yi(k) and get the yi(n)
To get the output signal, calculate the sum for all yi(n) (see the Figure 61):
The algorithm is implemented in my article “Fast Fourier Transform (FFT) and C Implementation Using the Octave GNU Tool“.
7. MSource Code FIR_Filter.m for “Octave“ Gnu Tool
For example I designed a Low Pass FIR filter on the internet site http://tfilter.engineerjs.com/ with the following parameters:
Sampling frequency Fs=12 kHz
Pass Band: 0 Hz …1 kHz with ripple 2 dB
Stop Band: 2 kHz…6 kHz with ripple 40 dB
Filter Order: 17
Next, to analyze the resulting filter, I wrote the fir_filter.m code for the Octave GNU Tool:

Normalized the filter coefficients so that H(0Hz)=1. See the transformation the fir_coeff_original to fir_coeff;

Coefficients quantization of the FIR filter from float format to integer format (signed int16) with the factor:
using the my function: FirFilterQuantization(…). See the transformation the fir_coeff to fir_coeff_q.
Note: the value 17 is chosen solely as an example, which does not give an integer overflow for this FIR filter with these coefficients. Also, of course, it is necessary to take into account the bit size of the input signal x(n), which is often determined by the ADC: 814 bits; 
Build the Magnitude and Phase Responses using my function FirFilterResponse(…) or standard Octave functions: freqz(…)/freqz_plot(…). See the Figure 71

Prepare the few test signals: step signal, sinusoidal signal 500Hz (pass band space) and sinusoidal signal 2kHz (stop band space). Check the FIR filter response on the signals using my function FirFilter_FftConvolutionOverlapAdd(…). The function is realized with the OverlapAdd Convolution Method. See the Figure 72

Generate the help files with data arrays for the CCode using my functions IntegerTestSignals2file(…), IntegerParamVector2file(…), FloatParamVector2file(…). The data can be copied to CCode: FIR filters coefficients, test signals and so on;
Figure 71: Low Pass FIR Filter. Magnitude and Phase Responses
Figure 71 notes:

Low Pass FIR Filter. Sampling Frequency is 12kHz. Pass band is 0…1kHz. Stop Band 2kHz…6kHz

The filter has linear phase response. Phase jumps of 180 degrees are explained by a bug in the function unwrap() of the Octave Gnu. Really the phase response is linear

Magnitude and Phase Responses was done after quantization coefficients of the FIR filter from float to integer format
Figure 72: FIR Filter. Signal Responses for Low Pass Filter.
The blue signal is filter input and the red signal is filter output.
Figure 72 notes:

Despite the linear phase characteristic, the filter response on the step signal has two spikes (about 7%). This interference occurs because the pulse response (FIR) is limited in time, as opposed to the pulse response of an ideal lowpass filter

The signal with a frequency of 500Hz (pass band space) passes through the filter only with a delay (N1)/2=(171)/2=8 samples

The signal with a frequency of 2 kHz (stop band space) is suppressed by 40 dB (in 100 times)
8. CSource Code: fir_filter.c/h
The file has the follow definitions:
1. Constants:
#define FIR_ORDER N – Order filter definitions
#define COEFF_SHIFT_FACTOR M – The definition is supported the FIR filter with the integer coefficients. The factor is
Support the fir_filter_integer(…) function
const float fir_coeff_float[FIR_ORDER] = {…} – FIR filter coefficients in float format. Support the fir_filter_float(…) function
const sint16 fir_coeff_integer[FIR_ORDER] = {…} – FIR filter coefficients in int16 format. Support the fir_filter_integer(…) function
2. Functions
void fir_filter_initialization(void) – initialize / clear the filter history (x(n1),…x(nN+1))
sint16 fir_filter_float(sint16 x_input) – calculate the output signal y(n) using the input signal x(n)=x_input. The function is used the float FIR coefficients
sint16 fir_filter_integer(sint16 x_input) – calculate the output signal y(n) using the input signal x(n)=x_input. The function is used the integer FIR coefficients
9. Download the FIR_Filter.m and fir_filter.c/h
You can download the files with the button:
10. Literature / References
[1] L.Rabiner, B.Gold “Theory and application of digital signal processing“, PrenticeHall, Ing Englewood Cliffs, New Jercy 1975
[2] A.Oppenheim, R.Schafer “DiscreteTime Signal Processing“, published by Pearson Education, Inc, publishing as Prentice Hall, 1999