Software & Resources
This page lists open-source frameworks, datasets, and benchmarks developed or co-authored by our group.
💻 Frameworks & Toolkits
- GAN-BERT – Few-shot adversarial learning on Transformers (ACL 2020).
- GAN-BERT-PyTorch – PyTorch/HF port of GAN-BERT (ACL 2020).
- GroundedSRL4HRI – Synthetic multimodal dataset and framework for Grounded Semantic Role Labeling in Human–Robot Interaction (EMNLP 2025).
- Sanskrit Voyager – Unified web platform for interactive reading and linguistic analysis of Sanskrit texts (EMNLP 2025 System Demonstrations).
- UniTor@BioASQ – Biomedical QA system and benchmark for the BioASQ@CLEF 2025 challenge (CLEF 2025).
- WikiGame-LLM-Eval – Reproducible pipeline for evaluating LLMs on Wikipedia graph navigation (CLiC-it 2025).
- MM-IGLU-Dialogues – Multimodal benchmark for dialogue planning in 3D grounded environments (ACL 2025).
- BacKGen – Background Knowledge Generator (Analogy-Angle@ACL 2025).
- dats – Data augmentation for NLP (NAACL 2022).
- MT-GANBERT – Multi-task + GAN-BERT for sustainable NLP (NL4AI 2021).
- KeLP – Kernel-based Learning Platform for scalable ML (JMLR 2017).
- EthicalNN – Ethics by Design framework in PyTorch (AIxIA 2022).
- GrUT – Semantic parsing for Human–Robot Interaction (AIxIA 2022).
- LU4R – Adaptive spoken language understanding for robots (IJCAI 2016).
- ACLPUB2 – ACL proceedings generation tool.
📚 Datasets & Benchmarks
- ExtremITA – Instruction-tuned LLM for Italian (EVALITA 2023).
- MM-IGLU – Multimodal grounded understanding benchmark (COLING 2024).
- MM-IGLU-IT – Italian benchmark for grounded instruction following (AIxIA 2024).
- U-DepPLLaMA – Universal dependency parsing with LLMs (IJCoL 2024).
- FEVER-it – Italian fact-checking dataset & pipeline (CLiC-it 2024).
- HuRIC – Human–Robot Interaction Corpus 2.0 (AI Journal 2020).
- GQA-it – 1M+ Visual Question Answering pairs in Italian (CLiC-it 2021).
- mscoco-it – 600K captions for Italian Image Captioning (IJCoL 2019).
- msr-vtt-it – 200K Italian video caption pairs (IJCoL 2019).
- SQuAD-it – 60K Q/A triples for reading comprehension (AIxIA 2018).
- ABSITA – Tourism opinion mining dataset (EVALITA 2018).
- SENTIPOLC – 10K annotated Italian tweets (EVALITA 2016).
🎓 Teaching Material
- CLiC-it 2023 Tutorial – LLMs & multitask learning tutorial.
- AILC Lectures 2021 Lab – PyTorch/HF lab on sentence classification.
- BISS 2024 – Bertinoro International Spring School course.
- Advances in AI 2024 – Lecture materials from Lake Como Summer School.