Articles About Natural Language Processing

Syntactic Nuclei in Dependency Parsing – A Multilingual Exploration

In the previous sections, we have shown how syntactic nuclei can be identified in the UD annotation and how transition-based parsers can be made sensitive to these structures in their internal representations through the use of nucleus composition. We now proceed to a set of experiments investigating the impact of nucleus composition on a diverse selection of languages. 5.1 Experimental Settings We use UUParser (de Lhoneux et al., 2017, Smith    

Read more

Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Figure 1 shows a high-level outline of our experimental design, which aims to establish whether injecting structure derived from a variety of syntacto-semantic formalisms into neural language model representations can lead to better correspondence with human brain activation data. We utilize fMRI recordings of human subjects reading a set of texts. Representations of these texts are then derived from the activations of the language models. Following Gauthier and Levy (

Read more

Transition-based Graph Decoder for Neural Machine Translation

Abstract While a number of works showed gains from incorporating source-side symbolic syntactic and semantic structure into neural machine translation (NMT), much fewer works addressed the decoding of such structure. We propose a general Transformer-based approach for tree and graph decoding based on generating a sequence of transitions, inspired by a similar approach that uses RNNs by Dyer et al. (2016). Experiments with using the proposed decoder with Universal Dependencies syntax on English-German, German-English and English-Russian show improved performance over […]

Read more

NLPBK at VLSP-2020 shared task: Compose transformer pretrained models for Reliable Intelligence Identification on Social network

In Our model, we generate representations of post message in three methods: tokenized syllables-level text through Bert4News, tokenized word-level text through PhoBERT and tokenized syllables-level text through XLM. We simply concatenate both this three representations with the corresponding post metadata features. This can be considered as a naive model but are proved that can improve performance of systems (Tu et al. (2017), Thanh et al. (

Read more

Speech Enhancement for Wake-Up-Word detection in Voice Assistants

With the aim of assessing the quality of the trained SE models, we use several trigger word detection classifier models, reporting the impact of the SE module at WUW classification performance. The WUW classifiers used here are a LeNet, a well-known standard classifier, easy to optimize [13]; Res15, Res15-narrow and Res8 based on a reimplementation by Tang and Lin [26] of Sainath and Parada’s Convolutional Neural Networks (CNNs) for    

Read more

Industrial Strength Natural Language Processing

Having spent a big part of my career as a graduate student researcher and now a Data Scientist in the industry, I have come to realize that a vast majority of solutions proposed both in academic research papers and in the work place are just not meant to ship — they just don’t scale! And when I say scale, I mean handling real world uses cases,  ability to handle large amounts of data and ease of deployment in a production […]

Read more

What are N-Grams?

N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occurring words within a given window and when computing the n-grams you typically move one word forward (although you can move X words forward in more advanced scenarios). For example, for the sentence “The cow jumps over the moon”. If N=2 (known as bigrams), then the ngrams would be: the cow cow jumps jumps over over the the moon So […]

Read more

What is Term-Frequency?

Term Frequency (TF) Term frequency (TF) often used in Text Mining, NLP and Information Retrieval tells you how frequently a term occurs in a document. In the context natural language, terms correspond to words or phrases. Since every document is different in length, it is possible that a term would appear more often in longer documents than shorter ones. Thus, term frequency is often divided by the  the total number of terms in the document as a way of normalization. […]

Read more
1 38 39 40 41 42 72