1

Full-Text Argumentation Mining on Scientific Publications

Scholarly Argumentation Mining (SAM) has recently gained attention due to its potential to help scholars with the rapid growth of published scientific literature. It comprises two subtasks: argumentative discourse unit recognition (ADUR) and …

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics. Prior work …

Anaphora Resolution in Dialogue: System Description (CODI-CRAC 2022 Shared Task)

We describe three models submitted for the CODI-CRAC 2022 shared task. To perform identity anaphora resolution, we test several combinations of the incremental clustering approach based on the Workspace Coreference System (WCS) with other coreference …

MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction

Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing …

Exploiting Social Media Content for Self-Supervised Style Transfer

Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders. By contrast, …

A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Pre-trained language models (PLM) are effective components of few-shot named entity recognition (NER) approaches when augmented with continued pre-training on task-specific out-of-domain data or fine-tuning on in-domain data. However, their …

Why only Micro-$F_1$? Class Weighting of Measures for Relation Classification

Relation classification models are conventionally evaluated using only a single measure, e.g., micro-F1, macro-F1 or AUC. In this work, we analyze weighting schemes, such as micro and macro, for imbalanced datasets. We introduce a framework for …

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

Despite the advances in digital healthcare systems offering curated structured knowledge, much of the critical information still lies in large volumes of unlabeled and unstructured clinical texts. These texts, which often contain protected health …

Temporal Knowledge Graph Reasoning with Low-rank and Model-agnostic Representations

Temporal knowledge graph completion (TKGC) has become a popular approach for reasoning over the event and temporal knowledge graphs, targeting the completion of knowledge with accurate but missing information. In this context, tensor decomposition …

Long-Tail Zero and Few-Shot Learning via Contrastive Pretraining on and for Small Data

For natural language processing 'text-to-text' tasks, the prevailing approaches heavily rely on pretraining large self-supervised models on increasingly larger 'task-external' data. Transfer learning from high-resource pretraining works well, but …