Uma lista bem incompleta:
Análise dos componentes principais
Pearson, K. On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine v. 2, p. 559–572, 1901.
Walker, S. H.; Duncan, D. B. "Estimation of the probability of an event as a function of several independent variables. Biometrika v. 54, p. 167–178, 1967.
Cox, D. R. The regression analysis of binary sequences (with discussion. J Roy Stat Soc B, v. 20, p. 215–242, 1958.
Programação Dinâmica e Aprendizagem por Reforço
Barto, A. G., Sutton, R. S., and Anderson, C. W. Neuronlike elements that can solve difficult learning control problems. In IEEE Transactions on Systems, Man, and Cybernetics, v. 13, 835-846, 1983.
Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.
Sutton, R. S. Learning to predict by the methods of temporal differences. Machine Learning, v. 3, p. 9-44, 1988.
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation v. 6, p. 215-219, 1994.
J. N. Tsitsiklis, and B. Van Roy. Average Cost Temporal-Difference Learning, Automatica, v. 35, p. 1799-1808, 1999.
J. N. Tsitsiklis and B. Van Roy. An Analysis of Temporal-Difference Learning with Function Approximation, IEEE Transactions on Automatic Control, v. 42, p. 674-690, 1997.
Redes Neurais e Backpropagation
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning internal representations by error propagation. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, MIT Press, Cambridge, MA. pp 318-362, 1986.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. Nature, 323, 533--536
Support Vector Machine
Cortes, C.; Vapnik, V. Support-vector networks". Machine Learning v. 20, p. 273, 1995.
Hinton, G. E. Deterministic Boltzmann learning performs steepest descent in weight-space. Neural computation v. 1, p. 143-150, 1989:
Hinton, G. E. and Salakhutdinov, R. R. (2006) Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
LeCun, Y., Bengio, Y. and Hinton, G. E. Deep Learning. Nature, v. 521, pp 436-444, 2015.
Hinton, G. E. A practical guide to training restricted Boltzmann machines. Momentum v. 9, p. 926, 2010.
T. Vincent, H. Larochelle Y. Bengio and P.A. Manzagol, Extracting and Composing Robust Features with Denoising Autoencoders, Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML‘08), pages 1096 - 1103, ACM, 2008.