April 5 · Issue #40 · View online
Monthly analysis of AI technology, geopolitics, research, and startups.
I hope this reaches you safe and well at 🏡 Following from last Sunday’s guide to AI in Q1 2020 part 1 of 2, here is part 2! In this edition we’ll focus on AI research (NLP, vision, RL, science, systems) and startup activity (investments and M&A).
every 2-4 weeks! This past Thursday we moved our meetup online and hosted hundreds of viewers for talks from PolyAI, Graphcore, and ZOE/KCL (in fact, Tim went live on CNBC
right after to spread the word about the COVID Symptom Tracker
Join us on the Facebook group here
and please hit reply
if you’re interested in discussing your research or applied AI work at a future event.
Here’s a selection of impactful work that caught my eye, grouped in categories:
Language models as knowledge bases? Facebook and UCL
. This paper investigates whether pre-trained language models build up their own relational knowledge bases that can serve as question/answer systems. They find that “without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches.”
Scaling Law for Neural Language Models
. This is one of few that provides empirical evidence and theory around neural network model scaling. They focus on transformer models and show that model performance scales as a power-law with more data, more model parameters, and more training time.
Towards a Human-like Open-Domain Chatbot
, GoogleAI. This paper presents “Meena”, an evolved transformer model that has 2.6 billion parameters and is trained on 341 GB of text, filtered from public domain social media conversations. This neural conversational model has 1.7x the capacity and is trained on 8.5x the amount of data than OpenAI’s GPT-2. The authors describe a new evaluation metric called Sensibleness and Specificity Average (SSA), which captures basic, but important attributes for natural conversations according to crowd workers. They show that perplexity, which is automatically calculated by neural models, correlates well with SSA, thus providing a quicker evaluation method than sampling crowd workers.
Microsoft released Turing-NLG
, a 17 billion parameter language model capable of generation, Q&A, and summarisation. They show improved performance over GPT-2 and Megatron.
Then, SambaNova published a blog
post saying that they’d trained a 100 billion parameter language model on their Dataflow-optimised compute system. They suggest that it is conceivable to run a 1 trillion parameter model soon (!).
Transfusion: Understanding Transfer Learning for Medical Imaging, Google Research
. This paper studies the effect of transfer learning on medical imaging model performance. They find that pre-training on ImageNet doesn’t actually improve model performance on diagnosing diabetic retinopathy or the classification of lung disease from chest X-rays.
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data, Uber
. The authors seek to speed up neural architecture search - an exciting approach to automatically design neural networks to have high predictive performance. They do so by using a generator (from a GAN) to synthesize artificial training data and show that applying neural architecture search on this unbounded synthetic data for a few steps predicts whether the NAS will perform well on real data. This means you can evaluate lots of neural architectures on synthetic data to find the right architecture and then move to real data to complete training.
On the relationship between self-attention and convolutional layers
. Attention-based neural networks, which have taken NLP by storm due to their ability to model sequences, have also been shown to match CNNs on computer vision tasks. As a result, this paper explores whether learned attention layers operate similarly to convolutional layers. They show that multi-head self-attention layers attend to pixel-grid patterns similarly to CNN layers.
Turning any 2D photo into 3D using CNNs
, Facebook AI
. The authors trained a mobile phone resource-aware CNN using neural architecture search on millions of pairs of 3D images and their accompanying depth maps.
EfficientDet: Scalable and Efficient Object Detection
, Google Brain
. The authors present several optimizations to neural architecture search to develop the EfficientDet family of object detector models. Their best model is more accurate than the next best with 4x fewer parameters and using 13x fewer FLOPs on the COCO dataset.
Predicting the future
. This is a cool paper that addresses the biggest problem in self-driving: Predicting how a given scene will evolve and planning accordingly. The approach involves learning a model for the probability of future events. It is trained from observed future sequences. Then, they learn a second distribution to reflect the present world, which only has access to past data. During inference, they jointly predict future scene representation (semantic segmentation, depth, and optical flow).
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction, Google Brain
. As the next step to their work on learning world models, the authors test whether RL agents can learn a world model by going through a more messy and slow process of evolution instead of minimizing a forward-predictive loss. To do so, they artificially constrain the probability that an agent is allowed to observe its real environment at each training step. As a result, the agent has to fill in its observation gaps to build out its world model. Even though the agent has been explicitly trained to predict the future, the resulting world model allows the agent to display key skills to make it successful in that environment.
Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video, University of Freiburg
. Designing reward functions that enable agents to learn desired behaviors is challenging in the real world. Unsupervised learning can help. This paper presents an approach to learn a task-agnostic skill embedding space from unlabeled multiview videos. They show that the learned embedding can guide an RL-agent to solve a wide range of tasks by composing previously unseen skills.
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
, Google Research
. The authors propose a training architecture to increase the number of frames per second that an RL agent can learn from on a given number of computing resources. Here, a central learner trains the model on GPUs using input from the distributed inference of an actor in its environment on hundreds of machines. The result is a significant speed-up in wall-clock time and computational efficiency over existing methods like IMPALA that use CPUs for neural network inference.
Systems and methods
Causal Discovery from Incomplete Data: A Deep Learning Approach, Eindhoven University of Technology and MIT
. This paper addresses the problem of learning causality in situations with missing data. They propose a deep learning framework called Imputated Causal Learning
(ICL), for iterative missing data imputation and causal structure discovery, producing both imputed data and causal skeletons. The paper presents simulations on both synthetic and real data to show that ICL can outperform state-of-the-art methods under different missing data mechanisms.
Science (bio, health, etc.)
A Deep Learning Approach to Antibiotic Discovery, MIT and Harvard.
This paper is exciting because it demonstrates how deep learning on molecular graphs can be used to predict molecules with antibacterial activity. They use this approach to screen a large pool of chemical molecules and discover a molecule, Halicin, that is structurally divergent from conventional antibiotics. Halicin displays bactericidal activity against a wide phylogenetic spectrum of pathogens including the bacterium that causes tuberculosis.
Learning to grow: control of materials self-assembly using evolutionary reinforcement learning, Lawrence Berkeley National Lab and Vector Institute.
The authors study molecular self-assembly, a process by which molecules or nanoparticles naturally come together into ordered structures. Today, if we’re given a set of molecules, conditions and a time period, it is not possible to predict the structure, phase, and yield of the structures that will form as a result. This paper shows how neuroevolutionary RL can learn a network that can “enact a time-dependent protocol of temperature and chemical potential in order to promote the self-assembly of the desired structure or choose between two competing polymorphs. In both cases the network identifies strategies different from those informed by human intuition, but which can be analyzed and used to provide new insight.”
Detection of anaemia from retinal fundus images via deep learning, Google Health and Google Research
. Anemia manifests by a reduction in the red blood cell or hemoglobin count and the condition affects an estimated 1.6B people worldwide. Testing for it requires a blood draw. This paper shows that anemia can be regularly screened for using non-invasive retinal fundus images.
Machine learning on DNA-encoded libraries: A new paradigm for hit-finding, Google Applied Science, X-Chem, ZebiAI, and Cognitive Dataworks
. This paper is great. It focuses on DNA-encoded libraries, a powerful technique for large-scale screening of small molecules in drug discovery. The technique works by barcoding chemical molecules, mixing them all together with a drug target of interest (e.g. a protein), then deconvoluting which molecules bound to the drug by using next-generation sequencing and barcode counting. The paper throws ML into the mix by training a graph CNN model on round 1 of a DEL experiment that identifies which chemicals are bound to a target. The model is used to virtually screen large libraries (approx 88M compounds) to predict which molecules are worth empirically testing in the next round of DEL experiment. The authors report hit rates between 29% - 72% compared to 1% hit rates in non-ML guided DEL experiments. Air Street Capital has made an investment in a related company called Anagenex
Unified rational protein engineering with sequence-based deep representation learning
, Harvard and MIT.
This paper applies techniques from presentation learning in NLP to proteins. The authors train LSTMs to learn statistical representations of proteins as unlabelled amino acid sequences from approx. 24 million sequences. The model summarises arbitrary protein sequences into fixed-length vectors that approximate fundamental protein features (function, stability, secondary structure). They show that these representations can be used to predict the structural and functional properties of proteins. While having solved 3D protein structures is a gold standard for developing new proteins, this approach should help accelerate things! Code here
International evaluation of an AI system for breast cancer screening, Google Health et al.
This paper reports a large-scale screening mammography trial in the US and UK. It shows that deep learning models can predict biopsy-confirmed cancer cases and bring about an absolute reduction in false positives and false negatives. This could reduce the workload of a second reviewer (in the UK’s two reviewer system) by 88%. After its release, the paper drew criticism
by some physicians who state that predicting biopsy-confirmed cancer isn’t the point of screening, which is to find more curable cancers. More criticism
came because the work does not offer a detailed methods section nor does it offer open-source code, which is an impediment to reproducibility and transparency. Another NYU group paper
published around the same time evaluated CNNs on over 1M breast cancer images. The code and trained models are available here
Learning to Simulate Complex Physics with Graph Networks
. This paper shows how to learn a realistic simulator of complex physics. This is done by representing the state of a physical system with particles, expressed as nodes in a graph, and computing dynamics via learned message-passing. They show how this model accurately simulates fluids, rigid solids, and deformable materials interacting with one another.
A Survey of Deep Learning for Scientific Discovery
. This is an overview of “many widely used deep learning models, spanning visual, sequential and graph-structured data, associated tasks and different training methods, along with techniques to use deep learning with fewer data and better interpret these complex models — two central considerations for many scientific use cases.”
Other papers and posts
A review of several cool NeurIPS 2019 papers here
Deep learning learns
to solve math problems here
Graphcore Research directions in 2020 here
Reliance on metrics is a fundamental challenge in ML here
On the Measure of Intelligence here
. This work argues that task-based, metric-driven development in AI is not a rigorous path towards developing intelligent systems.
Here’s a highlight of the most intriguing financing rounds:
, a US-based provider of enterprise robotic process automation software, raised
a $290M Series B round at a post-money valuation of $6.8B. Like others in the market, AA is pushing its marketplace ecosystem of RPA bots and third-party integrations to drive vendor lock-in.
, makers of warehouse robots and software focused automating pick/place/parcel movement, raised
a $263M Series B led by SoftBank.
, a San Diego-based provider of checkout-free stores, raised
a $30M Series A from SoftBank.
, which produces recycling sorting robots, raised
a $16M Series A led by Sequoia.
, a London-based provider of cloud-native core banking software, raised
a $83M Series B led by Draper.
, a US/Indian startup offering sales call analysis and coaching, raised
a $26M Series A.
, a US-based call center agent voice training system, raised
a $75M round led by Goldman Sachs.
, a Chinese-based maker of AV robots for delivery in the style of Nuro.ai
a 200M RMB Series A.
, the Bristol-based developer of the Intelligence Processing Unit, raised
an additional $150M to add to its $200M Series D closed last year.
, authors of a leading open-source NLP library called Transformers
a $15M Series A led by Lux Capital.
, an AI-first climate-focused startup, raised
a $4.1M Seed round.
, an AI-first therapeutics discovery company, raised
a $40M Series B.
, which develops a symptom checking app with telehealth services, raised
a $48M Series C.
, makers of a specialized AI chipset, raised
a $250M Series C led by BlackRock.
, which develops super realistic human avatars, raised
a $40M Series B.
, the London-based self-driving software company, raised
a $41M Series B to go to market with B2B software products instead of offering a B2C self-driving ride-sharing service.
, a London-based employee compliance monitoring software company, raised
a $100M round led by SoftBank Vision Fund 2.
, the Israeli AI chipmaker focused on edge computing, raised
a $60M Series B.
AI chipmaker Habana
for $2B, which will remain as an independent unit.
100-person data preparation startup Paxata
to eat earlier steps of the ML pipeline. The acquisition did not report a price. Paxata had raised some $90M in venture financing.
expands its engineering effort into Oxford, UK by acquiring Latent Logic
for an undisclosed sum. The startup was led by Shimon Whiteson, Professor at the University of Oxford, whose work spanned from multi-agent systems to inverse RL on video data to learn safe driving. This would help develop human-inspired driving behaviors (more on this from Lyft here
Snap acquired AI Factory
for $166M, a computer vision startup they’d been working with to create Snap’s Cameos feature. AI Factory’s founder had previously built and sold Looksery to Snap in 2015, which kick-started Snap’s facial filter features.
Apple acquired a few companies:
- The big deal was for Seattle-based Xnor.ai, which was acquired for a reported $200M. Xnor.ai was building low-power, edge-based AI chips. It spun out of the Allen Institute for AI and was led by Ali Farhadi, Associate Professor at the University of Washington.
Dark Sky, a hyperlocal weather app, was acquired. The app will survive on iOS but is shut down on Android and elsewhere. Users aren’t happy. More on how the system works.
Voysis, an Irish speech and NLP startup that had generated press attention around speech synthesis and making WaveNet work on a very small footprint, was acquired.
DocuSign acquired Seal Software
for $188M in cash. Seal’s product was an AI-driven contract analysis tool that makes it simpler and faster to find, analyze, and extract data from contracts.
for an undisclosed sum. The team will expand Square’s efforts to infuse ML across its product lines.
Nathan Benaich, 5 April 2020
Air Street Capital is a venture capital firm that invests in AI-first technology and life science companies. We’re a team of experienced investors, engineering leaders, entrepreneurs and AI researchers from the World’s most innovative technology companies and research institutions.
Did you enjoy this issue?
If you don't want these updates anymore, please unsubscribe here
If you were forwarded this newsletter and you like it, you can subscribe here