Toda mi faceta investigadora plasmada en papers y ordenada de más reciente a más antigua.
Tesis de Máster

Temporal copying and local hallucination for video inpainting

David Álvarez
Video inpainting is the task of removing objects from videos. In particular, the goal is not only to fill every frame with plausible content but also to maintain a temporal consistency so that no abrupt changes can be perceived. The current state of the art in video inpainting, which builds upon deep neural networks, suffers from the problem of handling large amounts of frames while working on decent resolutions. In our work, we propose to tackle the problem of video inpainting by dividing it into two independent sub-tasks. The first one, a Dense Flow Prediction Network (DFPN) capable of predicting the movement of the background by taking into account the movement of the object to remove. And the second one, a Copy-and-Hallucinate Network (CHN) that uses the output of the previous network to copy the regions that are visible in reference frames while hallucinating those that are not. Both networks are trained independently and mixed using one of our three algorithm proposals: the Frame-by-Frame (FF) algorithm, the Inpaint-and-Propagate (IP) algorithm, or the Copy-and-Propagate (CP) algorithm. We analyze our results by taking both an objective and a subjective approach in two different data sets. In both cases, we realize that our models are close to the state of the art but do not overpass it.
ConferenciaAceptado as poster in ICLR 2020

Input complexity and out-of-distribution detection with likelihood-based generative models

Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F Núñez y Jordi Luque
Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to the excessive influence that input complexity has in generative models' likelihoods. We report a set of experiments supporting this hypothesis, and use an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison. We find such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.
ConferenciaAceptado as oral in ISCA Speech Synthesis Workshop 2019

Problem-agnostic speech embeddings for multi-speaker text-to-speech with SampleRNN

David Álvarez, Santiago Pascual y Antonio Bonafonte
Text-to-speech (TTS) acoustic models map linguistic features into an acoustic representation out of which an audible waveform is generated. The latest and most natural TTS systems build a direct mapping between linguistic and waveform domains, like SampleRNN. This way, possible signal naturalness losses are avoided as intermediate acoustic representations are discarded. Another important dimension of study apart from naturalness is their adaptability to generate voice from new speakers that were unseen during training. In this paper we first propose the use of problem-agnostic speech embeddings in a multi-speaker acoustic model for TTS based on SampleRNN. This way we feed the acoustic model with speaker acoustically dependent representations that enrich the waveform generation more than discrete embeddings unrelated to these factors. Our first results suggest that the proposed embeddings lead to better quality voices than those obtained with discrete embeddings. Furthermore, as we can use any speech segment as an encoded representation during inference, the model is capable to generalize to new speaker identities without retraining the network. We finally show that, with a small increase of speech duration in the embedding extractor, we dramatically reduce the spectral distortion to close the gap towards the target identities.
Tesis de GradoAceptado as in UPCommons

Real-time stock predictions with deep learning and news scrapping

David Álvarez y José Adrián Rodríguez
Predict the stock market has always been one of the most challenging problems of the world. It is known that the market is influenced by uncountable things, however intuition says that one of the most correlated information is in fact public and accessible: the news. The goal of this thesis is to use neural networks to check if it is possible to predict the stock market using only news previous to the opening of the session. To deal with the problem first we acquire the data, pre-process it and then we test different models to check if we can improve a random predictor. Finally, the different models are compared and conclusions related with the correlation between news and stock market variations are exposed.