Dense Passage Retrieval for Open-Domain Question Answering

November 16, 2020 By: Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih Abstract Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder […]

Read more

Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions

Abstract A grammatical gender system divides a lexicon into a small number of relatively fixed grammatical categories. How similar are these gender systems across languages? To quantify the similarity, we define gender systems extensionally, thereby reducing the problem of comparisons between languages’ gender systems to cluster evaluation. We borrow a rich inventory of statistical tools for cluster evaluation from the field of community detection (Driver and Kroeber, 1932; Cattell, 1945), that enable us to craft novel information-theoretic metrics for measuring […]

Read more

An Imitation Game for Learning Semantic Parsers from User Interaction

November 16, 2020 By: Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, Yu Su Abstract Despite the widely successful applications, building a semantic parser is still a tedious process in practice with challenges from costly data annotation and privacy risks. We suggest an alternative, human-in-the-loop methodology for learning semantic parsers directly from users. A semantic parser should be introspective of its uncertainties and prompt for user demonstrations when uncertain. In doing so it also gets to imitate the user behavior […]

Read more

Generating Fact Checking Briefs

Abstract Fact checking at scale is difficult—while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often error-prone, and thus in practice restricted to claim detection. We investigate how to increase the accuracy and efficiency of fact checking by providing information about the claim before performing the check, in the form of natural language briefs. We investigate passage-based briefs, containing […]

Read more

Measuring Systematic Generalization in Neural Proof Generation with Transformers

November 27, 2020 By: Nicolas Gontier, Koustuv Sinha, Siva Reddy, Christopher Pal Abstract We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded in the form of natural language. We investigate systematic generalization abilities on an inductive logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate logical proofs represented in natural […]

Read more

Deep Transformers with Latent Depth

Abstract The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. However, how to leverage model capacity with large or variable depths is still an open challenge. We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. As an extension of this framework, we propose a novel method to train one shared Transformer network for multilingual machine translation with different layer selection posteriors for each language pair. The […]

Read more

Resource Constrained Dialog Policy Learning via Differentiable Inductive Logic Programming

Abstract Motivated by the needs of resource constrained dialog policy learning, we introduce dialog policy via differentiable inductive logic (DILOG). We explore the tasks of one-shot learning and zero-shot domain transfer with DILOG on SimDial and MultiWoZ. Using a single representative dialog from the restaurant domain, we train DILOG on the SimDial dataset and obtain 99+% in-domain test accuracy. We also show that the trained DILOG zero-shot transfers to all other domains with 99+% accuracy, proving the suitability of DILOG […]

Read more

A Review of 2020 and Trends in 2021 – A Technical Overview of Machine Learning and Deep Learning!

Introduction Data science is not a choice anymore. It is a necessity. 2020 is almost in the books now. What a crazy year from whichever standpoint you look at it. A pandemic raged around the world and yet it failed to dim the light on data science. The thirst to learn more continued unabated in our community and we saw some incredible developments and breakthroughs this year. From OpenAI’s mind-boggling GPT-3 framework to Facebook’s DETR model, this was a year […]

Read more

Issue #112 -Translating markup tags in Neural Machine Translation

17 Dec20 Issue #112 -Translating markup tags in Neural Machine Translation Author: Dr. Patrik Lambert, Senior Machine Translation Scientist @ Iconic Introduction Text to be translated is often encapsulated in structured documents containing inline tags in different formats, such as XML, HTML, Microsoft Word, PDF, XLIFF, etc. Transferring these inline tags into the target language is not a trivial task. However, it is a crucial component of the MT system, because a correct tag placement ensures a good readability of […]

Read more

Research at Microsoft 2020: Addressing the present while looking to the future

Microsoft researchers pursue the big questions about what the world will be like in the future and the role technology will play. Not only do they take on the responsibility of exploring the long-term vision of their research, but they must also be ready to react to the immediate needs of the present. This year in particular, they were asked to use their roles as futurists to address pressing societal challenges. In early 2020, as countries began responding to COVID-19 […]

Read more
1 707 708 709 710 711 927