CS 6890: Deep Learning

Spring 2018

This course will introduce the

Logistic and Softmax Regression, Feed-Forward Neural Networks, Backpropagation, Vectorization, PCA and Whitening, Deep Networks, Convolution and Pooling, Recurrent Neural Networks, Long Short-Term Memory, Gated Recurrent Units, Neural Attention Models, Sequence-to-Sequence Models, Memory Networks, Distributional Representations, Generative Adversarial Networks, Deep Reinforcement Learning.

Previous exposure to basic concepts in machine learning, such as: supervised vs. unsupervised learning, classification vs. regression, linear regression, logistic and softmax regression, cost functions, overfitting and regularization, gradient-based optimization. Substantial experience with programming and familiarity with basic concepts in linear algebra and statistics.

- Syllabus & Introduction
- Hand notes Jan 16 one, Jan 16 two.

- Logistic Regression, Softmax Regression, and Gradient Descent
- Hand notes Jan 23.
- An overview of gradient descent optimization algorithms, Sebastian Ruder, CoRR 2016

- Linear algebra and optimization in Python
- Machine Learning with PyTorch
- PyTorch examples
- Linear regression with gradient descent in PyTorch
- PyTorch video lecture and slides by Soumith Chintala.

- Feed-Forward Neural Networks and Backpropagation
- Hand notes Feb 6 one, Feb 6 two, Feb 6 three.

- Unsupervised Feature Learning with Autoencoders
- PCA, PCA whitening, and ZCA whitening
- Variational Autoencoders
- Hand notes Feb 15 one, Feb 15 two, Feb 20, and Feb 22
- Auto-Encoding Variational Bayes, Kingma and Welling, ICLR2014
- Tutorial on Variational Autoencoders, Carl Doersch, CMU 2016
- VEA implementation in PyTorch, Agustinus Kristiadi's Blog, 2017
- VAE lecture from U of Illinois
- Attribute2Image: Conditional Image Generation from Visual Attributes, Yan et al., ECCV 2016

- Convolutional Neural Networks
- Andrej Karpathy's notes on CS231n: Convolutional Neural Networks for Visual Recognition.
- UFLDL Tutorial at Stanford.

- Recurrent Neural Networks, LSTMS, GRUs
- Understanding LSTM Networks, Christopher Olah's Blog, 2015.
- Slides on Sequence-to-Sequence Architectures, Kapil Thadani, 2017.
- Supervised Sequence Labelling with Recurrent Neural Networks, Alex Graves, PhD Thesis 2012.
- Chapter 4.6: Forward and Backward Propagation equations.

- Recurrent Neural Networks for large scale Language Modeling, Jozefowicz et al., Google Brain 2016.
- Character-Aware Neural Language Models, Kim et al, AAAI 2015.

- Unsupervised Learning of Word Representations
- Slides on Language Representation and Modeling, Kapil Thadani, 2017.
- A Neural Probabilistic Language Model, Bengio, Ducharme, Vincent, and Jauvin, JMLR 2003.
- Natural Language Processing (Almost) from Scratch, Collobert, Weston, Bottou, Karlen, Kavukcuoglu, and Kuksa, JMLR 2011.
- Distributed Representations of Words and Phrases and their Compositionality, Mikolov, Sutskever, Chen, Corrado, and Dean, NIPS 2013.

- Shallow vs. Deep Learning
- On the Number of Linear Regions of Deep Neural Networks, Montufar, Pascanu, Cho, and Bengio. NIPS 2014.
- The power of deeper networks for expressing natural functions, Rolnick and Tegmark. ICLR 2018.
- Why does deep and cheap learning work so well?, Lin and Tegmark. CoRR 2016.

- Limitations of Deep Learning?

- Assignment and code.
- Assignment and code.
- Assignment, code and data.
- Assignment, code and data.
- Assignment, code, word2vec Google News embeddings, and the Stanford Natural Language Inference (SNLI) dataset.
- Reasoning about entailment with neural attention, Rocktaschel et al., ICLR 2016.

- James H. Martin's Introduction to probabilities
- Jason Eisner's equestrian Introduction to probabilities
- Inderjit Dhillon's Linear Algebra Background
- MIT instructor's Introduction to Matrices and Linear Algebra Review
- Strang's Video Lectures on Linear Algebra
- Convex Optimization, Stephen Boyd and Lieven Vandenberghe, Cambridge University Press 2004
- Mike Brookes' Matrix Reference Manual
- Petersen et al.'s The Matrix Cookbook