Legal Natural Language Processing
? Datasets
Legal Judgement Prediction (LJP)
Dataset |
Links |
Domain |
Language |
Size |
FSCS (Niklaus et al., 2021) |
? ? |
Swiss court judgments |
?? ?? ?? |
85K cases w/ 2 outcomes |
ECtHR (Chalkidis et al., 2021) |
? ? |
EU court judgments |
?? |
11K cases w/ 11 outcomes |
ECHR (Aletras et al., 2019) |
? ? |
EU court judgments |
?? |
11.5K cases w/ 11 outcomes |
CAIL (Xiao et al., 2018) |
? |
Chinese court judgements |
?? |
2.6M cases w/ 6 outcomes |
Legal Text Classification (LTC)
Dataset |
Links |
Domain |
Language |
Size |
GLC (Papaloukas et al., 2021) |
? ? |
Greek legislation |
?? |
47.5K laws w/ 2.7K labels |
CUAD (Hendrycks et al., 2021) |
? ? |
Contracts |
?? |
510 contracts w/ 41 classes |
MultiEURLEX (Chalkidis et al., 2021) |
? ? |
EU legislation |
?? ?? ?? ?? ?? (18+) |
65K laws w/ 4.5K labels |
LEDGAR (Tuggener et al., 2020) |
? ? |
Contracts |
?? |
60.5K contracts w/ 12.6K labels |
Contract Discovery (Borchmann et al., 2020) |
? |
Contracts |
?? |
2.6K clauses w/ 21 classes |
EURLEX-57K (Chalkidis et al., 2019) |
? ? |
EU legislation |
?? |
57K laws w/ 4.3K labels |
Unfair-ToS (Lippi et al., 2018) |
? ? |
Contracts |
?? |
9.4K sentences w/ 9 classes |
Contract Elements (Chalkidis et al., 2017) |
? ? |
Contracts |
?? |
2.4K contracts w/ 10 classes |
OPP-115 (Wilson et al., 2016) |
? ? |
Privacy laws |
?? |
115 policies w/ 23K labels |
Legal Information Retrieval (LIR)
Dataset |
Links |
Domain |
Language |
Size |
BSARD (Louis et al., 2022) |
? ? |
Belgian legislation |
?? |
1.1K questions w/ 22.6K candidate statutory articles |
EU2UK (Chalkidis et al., 2021) |
? ? |
EU & UK legislation |
?? |
2K query documents w/ 52.5K candidate documents |
UK2EU (Chalkidis et al., 2021) |
? ? |
EU & UK legislation |
?? |
2.1K query documents w/ 3.9K candidate documents |
COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) |
? ? |
Canadian precedents |
?? |
650 query cases w/ 128K candidate cases |
COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) |
? ? |
Japanese legislation |
?? ?? |
808 questions w/ 768 candidate statutory articles |
CAIL2019-SCM (Xiao et al., 2019) |
? |
Chinese court judgements |
?? |
8.9K triplets of cases |
Legal Question Answering (LQA)
Dataset |
Links |
Domain |
Language |
Size |
CaseHOLD (Zheng et al., 2021) |
? |
US case holdings |
?? |
53.1K multiple-choice questions |
JEC-QA (Zhong et al., 2019) |
? ? |
Chinese law |
?? |
26.3K multiple-choice questions |
CJRC (Duan et al., 2019) |
? |
Chinese court judgements |
?? |
50K question-answers from 10K documents |
PrivacyQA (Ravichander et al., 2019) |
? |
Privacy policies |
?? |
1.7K question-answers from 35 documents |
Legal Textual Entailment (LTE)
Dataset |
Links |
Domain |
Language |
Size |
COLIEE-Case-Law-Entailment (Rabelo et al., 2020) |
? ? |
Canadian precedents |
?? |
425 cases w/ related case |
COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) |
? ? |
Japanese legislation |
?? ?? |
808 questions w/ related statutory article |
Legal Text Summarization (LTS)
Dataset |
Links |
Domain |
Language |
Size |
UK-Abs (Shukla et al., 2022) |
? ? |
UK court cases |
?? |
793 pairs of (case, abastractive summary) from the UK Supreme Court |
IN-Abs (Shukla et al., 2022) |
? ? |
Indian court cases |
?? |
7.1K pairs of (case, abastractive summary) from the Indian Supreme Court |
IN-Ext (Shukla et al., 2022) |
? ? |
Indian court cases |
?? |
50 pairs of (case, extractive summary) from the Indian Supreme Court |
TOS;DR (Keymanesh et al., 2020) |
? |
Terms of service |
?? |
1.6K pairs of (agreement text, summary) from data privacy policies |
BillSum (Kornilova et al., 2019) |
? ? |
US Congressional bills |
?? |
22.2K pairs of (bill, summary) |
TL;DRLegal (Manor et al., 2019) |
? |
Terms of service |
?? |
84 pairs of (agreement text, summary) from software licenses |
TOS;DR (Manor et al., 2019) |
? |
Terms of service |
?? |
421 pairs of (agreement text, summary) from data privacy policies |
BVA Cases (Zhong et al., 2019) |
? |
US court cases |
?? |
92 pairs of (case, summary) from the US Board of Veterans' Appeal |
LCR (Galgani et al., 2012) |
? ? |
Australian court cases |
?? |
3.9K pairs of (case, catchphrases) |
Legal Language Modeling (LLM)
Dataset |
Links |
Language |
Size |
Pile of Law (Henderson et al., 2022) |
? ? |
?? |
~256GB of legal and administrative legal text |
Benchmarks
Dataset |
Task |
Language |
Tasks |
FairLex (Chalkidis et al., 2022) |
? ? |
?? ?? ?? ?? ?? |
Clasification (x1), legal judgement prediction (x3) |
LexGLUE (Chalkidis et al., 2022) |
? ? |
?? |
Classsification (x6), multiple-choice QA (x1) |
Models
Model |
Links |
Language |
Size |
Legal-HeBERT (Chriqui et al., 2022) |
? ? |
?? |
110M |
PoL-BERT-Large (Henderson et al., 2022) |
? ? |
?? |
336M |
Italian-LEGAL-BERT (Licari and Comande, 2022) |
? ? |
?? |
110M |
JuriBERT (Douka et al., 2021) |
? ? |
?? |
{6M, 15M, 42M, 110M} |
Custom-LEGAL-BERT (Zheng et al., 2021) |
? ? |
?? |
110M |
LEGAL-BERT (Chalkidis et al., 2020) |
? ? |
?? |
{35M, 110M} |
LEGAL-GPT-{1,2} (Borchmann et al., 2020) |
? |
?? |
{117M, 1.5B} |
Books
- [
2017
] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link]
? Surveys
- [
2020-05
] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf]
- [
2019-09
] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf]
- [
2018-12
] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf]
? Talks
- [
2019-06
] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides]
- [
2019-04
] Artificial Intelligence and Law – An Overview and History, H. Surden. [video]
? Conferences & Workshops
- The Natural Legal Language Processing (NLLP) Workshop [website]
- The International Conference on Artificial Intelligence and Law (ICAIL) [website]
- The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
- The EXplainable AI in Law (XAILA) Workshop [website]
- The International Workshop on Juris-informatics (JURISIN) [website]
- The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
- The International Workshop on Legal Information Retrieval [website]