Here’s a selection of impactful work that caught my eye, grouped in categories:
Language models as knowledge bases? Facebook and UCL
. This paper investigates whether pre-trained language models build up their own relational knowledge bases that can serve as question/answer systems. They find that “without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches.”
Scaling Law for Neural Language Models
. This is one of few that provides empirical evidence and theory around neural network model scaling. They focus on transformer models and show that model performance scales as a power-law with more data, more model parameters, and more training time.
Towards a Human-like Open-Domain Chatbot
, GoogleAI. This paper presents “Meena”, an evolved transformer model that has 2.6 billion parameters and is trained on 341 GB of text, filtered from public domain social media conversations. This neural conversational model has 1.7x the capacity and is trained on 8.5x the amount of data than OpenAI’s GPT-2. The authors describe a new evaluation metric called Sensibleness and Specificity Average (SSA), which captures basic, but important attributes for natural conversations according to crowd workers. They show that perplexity, which is automatically calculated by neural models, correlates well with SSA, thus providing a quicker evaluation method than sampling crowd workers.
Microsoft released Turing-NLG
, a 17 billion parameter language model capable of generation, Q&A, and summarisation. They show improved performance over GPT-2 and Megatron.
Then, SambaNova published a blog
post saying that they’d trained a 100 billion parameter language model on their Dataflow-optimised compute system. They suggest that it is conceivable to run a 1 trillion parameter model soon (!).
Transfusion: Understanding Transfer Learning for Medical Imaging, Google Research
. This paper studies the effect of transfer learning on medical imaging model performance. They find that pre-training on ImageNet doesn’t actually improve model performance on diagnosing diabetic retinopathy or the classification of lung disease from chest X-rays.
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data, Uber
. The authors seek to speed up neural architecture search - an exciting approach to automatically design neural networks to have high predictive performance. They do so by using a generator (from a GAN) to synthesize artificial training data and show that applying neural architecture search on this unbounded synthetic data for a few steps predicts whether the NAS will perform well on real data. This means you can evaluate lots of neural architectures on synthetic data to find the right architecture and then move to real data to complete training.
On the relationship between self-attention and convolutional layers
. Attention-based neural networks, which have taken NLP by storm due to their ability to model sequences, have also been shown to match CNNs on computer vision tasks. As a result, this paper explores whether learned attention layers operate similarly to convolutional layers. They show that multi-head self-attention layers attend to pixel-grid patterns similarly to CNN layers.
Turning any 2D photo into 3D using CNNs
, Facebook AI
. The authors trained a mobile phone resource-aware CNN using neural architecture search on millions of pairs of 3D images and their accompanying depth maps.
EfficientDet: Scalable and Efficient Object Detection
, Google Brain
. The authors present several optimizations to neural architecture search to develop the EfficientDet family of object detector models. Their best model is more accurate than the next best with 4x fewer parameters and using 13x fewer FLOPs on the COCO dataset.
Predicting the future
. This is a cool paper that addresses the biggest problem in self-driving: Predicting how a given scene will evolve and planning accordingly. The approach involves learning a model for the probability of future events. It is trained from observed future sequences. Then, they learn a second distribution to reflect the present world, which only has access to past data. During inference, they jointly predict future scene representation (semantic segmentation, depth, and optical flow).
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction, Google Brain
. As the next step to their work on learning world models, the authors test whether RL agents can learn a world model by going through a more messy and slow process of evolution instead of minimizing a forward-predictive loss. To do so, they artificially constrain the probability that an agent is allowed to observe its real environment at each training step. As a result, the agent has to fill in its observation gaps to build out its world model. Even though the agent has been explicitly trained to predict the future, the resulting world model allows the agent to display key skills to make it successful in that environment.
Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video, University of Freiburg
. Designing reward functions that enable agents to learn desired behaviors is challenging in the real world. Unsupervised learning can help. This paper presents an approach to learn a task-agnostic skill embedding space from unlabeled multiview videos. They show that the learned embedding can guide an RL-agent to solve a wide range of tasks by composing previously unseen skills.
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
, Google Research
. The authors propose a training architecture to increase the number of frames per second that an RL agent can learn from on a given number of computing resources. Here, a central learner trains the model on GPUs using input from the distributed inference of an actor in its environment on hundreds of machines. The result is a significant speed-up in wall-clock time and computational efficiency over existing methods like IMPALA that use CPUs for neural network inference.
Systems and methods
Causal Discovery from Incomplete Data: A Deep Learning Approach, Eindhoven University of Technology and MIT
. This paper addresses the problem of learning causality in situations with missing data. They propose a deep learning framework called Imputated Causal Learning
(ICL), for iterative missing data imputation and causal structure discovery, producing both imputed data and causal skeletons. The paper presents simulations on both synthetic and real data to show that ICL can outperform state-of-the-art methods under different missing data mechanisms.
Science (bio, health, etc.)
A Deep Learning Approach to Antibiotic Discovery, MIT and Harvard.
This paper is exciting because it demonstrates how deep learning on molecular graphs can be used to predict molecules with antibacterial activity. They use this approach to screen a large pool of chemical molecules and discover a molecule, Halicin, that is structurally divergent from conventional antibiotics. Halicin displays bactericidal activity against a wide phylogenetic spectrum of pathogens including the bacterium that causes tuberculosis.
Learning to grow: control of materials self-assembly using evolutionary reinforcement learning, Lawrence Berkeley National Lab and Vector Institute.
The authors study molecular self-assembly, a process by which molecules or nanoparticles naturally come together into ordered structures. Today, if we’re given a set of molecules, conditions and a time period, it is not possible to predict the structure, phase, and yield of the structures that will form as a result. This paper shows how neuroevolutionary RL can learn a network that can “enact a time-dependent protocol of temperature and chemical potential in order to promote the self-assembly of the desired structure or choose between two competing polymorphs. In both cases the network identifies strategies different from those informed by human intuition, but which can be analyzed and used to provide new insight.”
Detection of anaemia from retinal fundus images via deep learning, Google Health and Google Research
. Anemia manifests by a reduction in the red blood cell or hemoglobin count and the condition affects an estimated 1.6B people worldwide. Testing for it requires a blood draw. This paper shows that anemia can be regularly screened for using non-invasive retinal fundus images.
Machine learning on DNA-encoded libraries: A new paradigm for hit-finding, Google Applied Science, X-Chem, ZebiAI, and Cognitive Dataworks
. This paper is great. It focuses on DNA-encoded libraries, a powerful technique for large-scale screening of small molecules in drug discovery. The technique works by barcoding chemical molecules, mixing them all together with a drug target of interest (e.g. a protein), then deconvoluting which molecules bound to the drug by using next-generation sequencing and barcode counting. The paper throws ML into the mix by training a graph CNN model on round 1 of a DEL experiment that identifies which chemicals are bound to a target. The model is used to virtually screen large libraries (approx 88M compounds) to predict which molecules are worth empirically testing in the next round of DEL experiment. The authors report hit rates between 29% - 72% compared to 1% hit rates in non-ML guided DEL experiments. Air Street Capital has made an investment in a related company called Anagenex
Unified rational protein engineering with sequence-based deep representation learning
, Harvard and MIT.
This paper applies techniques from presentation learning in NLP to proteins. The authors train LSTMs to learn statistical representations of proteins as unlabelled amino acid sequences from approx. 24 million sequences. The model summarises arbitrary protein sequences into fixed-length vectors that approximate fundamental protein features (function, stability, secondary structure). They show that these representations can be used to predict the structural and functional properties of proteins. While having solved 3D protein structures is a gold standard for developing new proteins, this approach should help accelerate things! Code here
International evaluation of an AI system for breast cancer screening, Google Health et al.
This paper reports a large-scale screening mammography trial in the US and UK. It shows that deep learning models can predict biopsy-confirmed cancer cases and bring about an absolute reduction in false positives and false negatives. This could reduce the workload of a second reviewer (in the UK’s two reviewer system) by 88%. After its release, the paper drew criticism
by some physicians who state that predicting biopsy-confirmed cancer isn’t the point of screening, which is to find more curable cancers. More criticism
came because the work does not offer a detailed methods section nor does it offer open-source code, which is an impediment to reproducibility and transparency. Another NYU group paper
published around the same time evaluated CNNs on over 1M breast cancer images. The code and trained models are available here
Learning to Simulate Complex Physics with Graph Networks
. This paper shows how to learn a realistic simulator of complex physics. This is done by representing the state of a physical system with particles, expressed as nodes in a graph, and computing dynamics via learned message-passing. They show how this model accurately simulates fluids, rigid solids, and deformable materials interacting with one another.
A Survey of Deep Learning for Scientific Discovery
. This is an overview of “many widely used deep learning models, spanning visual, sequential and graph-structured data, associated tasks and different training methods, along with techniques to use deep learning with fewer data and better interpret these complex models — two central considerations for many scientific use cases.”
Other papers and posts
A review of several cool NeurIPS 2019 papers here
Deep learning learns
to solve math problems here
Graphcore Research directions in 2020 here
Reliance on metrics is a fundamental challenge in ML here
On the Measure of Intelligence here
. This work argues that task-based, metric-driven development in AI is not a rigorous path towards developing intelligent systems.