Breathing rate estimator
Overview
This is a system that estimates the breathing rate of a person from a video source.
To do that, first of all the system detect the chest area of the subject and extract the region of interest (ROI) corresponding to it, the chest motion is then magnified using the Eulerian Video Magnification (EVM) and the result is analysed through an Optical-Flow algorithm, to extract 4 different signals corresponding to the 4 different chest parts. The 4 signals are then further elaborated with the Independent Component Analysis (ICA) and the Principal Component Analysis (PCA), to detect and extract the respiratory signal. The respiratory rate is then calculated from the spectrogram of the two signals taking the dominant frequencies in a sliding window of 20 seconds, with a stride of 1 second.
Dataset used for testing
The dataset used was cohface, consisting of 40 subjects, each of whom has 2 videos in ideal lighting conditions, and 2 videos in more natural conditions, all recorded at 20hz with a resolution of 640x480 pixels and with a duration of 1 min.
The average age of the subjects was 35.6 years and, on the total of 40 subject, 12 were female.
Each one of them wore two sensors for the collection of data regarding the ground-truth of respiratory rate and heartbeat. These data (collected with a sampling rate of 32hz) are saved in hdf5 files containing, the following datasets, all 1D of float-64 bits with the same length:
pulse (the readings of the BVP)
respiration (the readings of the respiration belt, 32 samples/second)
time (the relative time of the sample in seconds, 256 samples/second)
Among these data, the respiratory signal was used for comparison with respiratory rate estimation.
ROI Selection
The only area in the video which we want to analyze is the chest, so we want to select a ROI from it. This operation has two advantages: first of all, it lowers the noise due to not chest related movement, and also reduces the computation time.
The ROI has been selected using the Pose component of the mediapipe package, which permits to detect landmarks of human bodies in an image. Once detected the shoulders’ landmarks of the subject in the first frame of the video, the chest area was selected as the area between the shoulders and the bottom edge of the frame.
All frames were thus cropped accordingly to those coordinates and the ROI is then divided in 4 parts and each one is sent further to the EVM.
Signal Extraction
Once the sub-ROIs are selected, the next step is the signal extraction. The first step is to magnify the chest movement inside the video, so that the optical flow algorithm can extract a better signal. Here's where EVM comes in handy.
EVM is a video processing technique that allows to amplify color and motion variations in a video, to highlight and display subtle changes that would otherwise be difficult to detect. To perform this operation, the pyEVM module has been used, with some slight changes to permit the use of a list of frames instead of taking a video from a path.
After that, the optical flow algorithm comes into action. Optical flow is the motion of objects between two consecutives frames. The basic idea behind it is that we can represent the image intensity as a function f(x,y,t), where x and y are space coordinates in the frame while t is the time. So if we move the pixel of the image by (δx, δy) over δt time, we obtain a new image that can be represented as a function f(x + δx, y + δy, t + δt).
To calculate the optical flow of each frame it has been used the Farneback method, and in particular the calcOpticalFlowFarneback of the OpenCV module.
The vertical optical flow values of the 4 ROIs have then been summed, to obtain a single value for each frame, representing the movement of the chest and thus the hypothetical respiratory signal, while horizontal optical flow values have been discarded since the dataset's characteristics permitted it.
For more theoretical information about EVM and Optical flow algorithm please check the links at the end of this page.
Signal Analysis
The four signals derived from the optical flow phase are those that hide within them the respiratory signal. These signals, however, do not yet represent the actual respiratory signal because of noise sources that may be present within the video, such as motion unrelated to breathing. In order to remove this noise and thus have a clearer signal multiple approaches has been tested. First a band-pass filter was applied to isolate the respiration frequencies (0.1hz - 1.0hz) and then a demixing operation on the signals was done, specifically with ICA and PCA.
However, the best results are obtained using the optical flow signals without filtering as input for ICA and PCA.
Independent Component Analysis (ICA) is a technique used in signal processing and data analysis to separate a multivariate signal into independent, non-Gaussian components, assuming that at most one sub-component is Gaussian and that the sub-components are statistically independent.
The four filtered signals extracted through the optical flow algorithm serve as input for ICA, which identifies four statistically independent components. Then the component with the highest signal-noise ratio is chosen, measured in the frequency domain as amplitudeMax/amplitudeMin.
Principal Component Analysis (PCA) on the other hand is a method used to reduce the dimensionality of high-dimensional data, by identifying the most important features to preserve as much information as possible and finding a set of new variables called princiapal components that capture most of the variance in the original data.
As for ICA, the 4 signals extracted from the optical flow phase have been used as 4 variables for PCA, taking then only the first principal component.
For both ICA and PCA the scikit-learn module has been used.
Again, for more theoretical information about ICA and PCA please check the links at the end of this page.
Respiration rate extraction
Respiration rate is extracted from the spectrogram of the signal as the dominant frequency in a sliding window of 10, 15 or 20 seconds. If the quality of the signal is low, a longer window is preferable, whereas a shorter window would allow for a more precise tracking of the variations in the respiration rate. In this case, a window of 20 seconds has been found to be the most robust for most videos. The extracted frequency is then multiplied by 60 to obtain the breaths per minute.
Results
In standardized cases such as those presented in the dataset, this approach seems to be quite a solid solution for the estimation of the respiratory rate. In particular PCA performs better overall, with a median error of 0.4 breath per minute, compared to the 0.69 error on breath per minute for ICA, with both of them tending to overestimate.