Medical Image Processing of High-Speed Endoscopic Video

One of the challenges in the clinical routine for patients affected by voice disorders is that the source of the acoustic signal, the human vocal folds, oscillate rapidly (> 100Hz) and on a small scale. One of the primary diagnostic tools is therefore endoscopic videostroboscopy. More recently, high-speed video endoscopy has gained popularity as well. Especially the latter suffers from low illumination of the vocal folds and videos are negatively affected by noise, reflections, shadows and factors such as patient or endoscope movement.

My recent research in this area explores image enhancement methods to alleviate these issues. In particular, I try to enhance low-light high-speed endoscopy videos using a U-Net-like architecture. This will then allow clinicians a better diagnosis and can also aid in research, where the resulting videos can be segmented and analyzed. Without enhancement a high number of recorded videos is too degraded for extensive analysis.

Voice parameters

One aspect of my research focuses on determining certain physical properties of the vocal folds or the process of phonation. Of particular interest are the mass and stiffness of the vocal folds as well as the air pressure below the vocal folds (subglottal pressure), which causes the vocal fold oscillation. In our work Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework my coauthors and I demonstrated that we could estimate the subglottal pressure in an inverse problem. Using a numerical model we perform an optimization on the parameter space of the initial condition of the model to match the model dynamics to high-speed video recordings of experimental, ex vivo vocal fold oscillations.

Recently, this area of my research is about using a long short-term memory neural network trained on synthetic oscillations of the model with given initial conditions to then estimate parameters of real experimental oscillations. The advantage of this is that inference using a neural network is several orders of magnitude quicker than repeatedly evaluating a numerical model.


Gómez, P., Semmler, M., Schützenberger, A., Bohr, C. & Döllinger, M. (2019). Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network. Medical & Biological Engineering & Computing,

Gómez, P., Schützenberger, A., Semmler, M., & Döllinger, M. (2019). Laryngeal Pressure Estimation With a Recurrent Neural Network. IEEE journal of translational engineering in health and medicine, 7, 1-11.

Gómez, P., Schützenberger, A., Kniesburges, S., Bohr, C., & Döllinger, M. (2018). Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework. Biomechanics and modeling in mechanobiology, 1-16.

Döllinger, M., Gómez, P., Patel, R. R., Alexiou, C., Bohr, C., & Schützenberger, A. (2017). Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PloS one, 12(11), e0187486.

Gómez, P., Kniesburges, S., Schützenberger, A., Bohr, C., & Döllinger, M. (2017, April). Degrees of freedom in a vocal fold inverse problem. In International Conference on Bioinformatics and Biomedical Engineering (pp. 475-484). Springer, Cham.

Peherstorfer, B., Gómez, P., & Bungartz, H. J. (2015). Reduced models for sparse grid discretizations of the multi-asset Black-Scholes equation. Advances in Computational Mathematics, 41(5), 1365-1389.