Bidirectional Recurrent Neural Networks
I. INTRODUCTION
A. General
MANY classification and regression problems of engineering interest are currently solved with statistical approaches using the principle of “learning from examples.” For a certain model with a given structure inferred from the prior knowledge about the problem and characterized by a number of parameters, the aim is to estimate these parameters accurately and reliably using a finite amount of training data. In general, the parameters of the model are determined by a supervised training process, whereas the structure of the model is defined in advance. Choosing a proper structure for the model is often the only way for the designer of the system to put in prior knowledge about the solution of the problem.
Artificial neural networks (ANNrsquo;s) (see [2] for an excellent introduction) are one group of models that take the principle “infer the knowledge from the data” to an extreme. In this paper, we are interested in studying ANN structures for one particular class of problems that are represented by temporal sequences of input–output data pairs. For these types of problems, which occur, for example, in speech recognition, time series prediction, dynamic control systems,etc., one of the challenges is to choose an appropriate network structure that, at least theoretically, is able to use all available input information to predict a point in the output space.
Many ANN structures have been proposed in the literature to deal with time varying patterns. Multilayer perceptrons (MLPrsquo;s) have the limitation that they can only deal with static data patterns (i.e., input patterns of a predefined dimensionality), which requires definition of the size of the input window in advance. Waibel et al. [16] have pursued time delay neural networks (TDNNrsquo;s), which have proven to be a useful improvement over regular MLPrsquo;s in many applications. The basic idea of a TDNN is to tie certain parameters in a regular MLP structure without restricting the learning capability of the ANN too much. Recurrent neural networks (RNNrsquo;s) provide another alternative for incorporating temporal dynamics and are discussed in more detail in a later section.
In this paper, we investigate different ANN structures for incorporating temporal dynamics. We conduct a number of experiments using both artificial and real-world data. We show the superiority of RNNrsquo;s over the other structures. We then point out some of the limitations of RNNrsquo;s and propose a modified version of an RNN called a bidirectional recurrent neural network, which overcomes these limitations.
B. Technical
Consider a (time) sequence of input data vectors
and a sequence of corresponding output data vectors
with neighboring data-pairs (in time) being somehow statistically dependent. Given time sequences X and Y as training data, the aim is to learn the rules to predict the output data given the input data. Inputs and outputs can, in general, be continuous and/or categorical variables. When outputs are continuous, the problem is known as a regression problem, and when they are categorical (class labels), the problem is known as a classification problem. In this paper, the term prediction is used as a general term that includes regression and classification.
1) Unimodal Regression: For unimodal regression or function approximation, the components of the output vectors are continuous variables. The ANN parameters are estimated to maximize some predefined objective criterion (e.g., maximize the likelihood of the output data). When the distribution of the errors between the desired and the estimated output vectors is assumed to be Gaussian with zero mean and a fixed global data-dependent variance, the likelihood criterion reduces to the convenient Euclidean distance measure between the desired and the estimated output vectors or the mean-squared-error criterion, which has to be minimized during training. It has been shown by a number of researchers that neural networks can estimate the conditional average of the desired output (or target) vectors at their network outputs, i.e., where is an expectation operator.
2) Classification: In the case of a classification problem, one seeks the most probable class out of a given pool of K classes for every time frame t , given an input vector sequence X . To make this kind of problem suitable to be solved by an ANN, the categorical variables are usually coded as vectors as follows. Consider that is the desired class label for the frame at time t . Then, construct an output vector Yt such that its th component is one and other components are zero. The output vector sequenceconstructed in this manner along with the input vector sequence can be used to train the network under some optimality criterion, usually the cross-entropy criterion [2], [9], which results from a maximum likelihood estimation assuming a multinomial output distribution. It has been shown that the th network output at each time point tcan be interpreted as an estimate of the conditional posterior probability of class membership [for class i , with the quality of the estimate depending on the size of the training data and the complexity of the network.
For some applications, it is not necessary to estimate the conditional posterior probabilityof a single class given the sequence of input vectors but the conditional posterior probability of a sequence of classes given the sequence of input vectors.1
C. Organization of the Paper
This paper is organized in two parts. Given a series of paired input/output vectors
, we want to train bidirectional recurrent neural networks to train bidirectional recurrent neural networks to perform the following tasks.
bull; Unimodal regression (i.e., compute ) or classification [i.e., compute for every output class and decide the class using the maximum
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[410043],资料为PDF文档或Word文档,PDF文档可免费转换为Word
课题毕业论文、外文翻译、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。