Awesome Resource Efficient LLM Papers Download - Awesome Resource Efficient LLM Papers Source code download

Awesome Resource Efficient LLM Papers

Other source code

1.0.0

Download

Awesome Resource-Efficient LLM Papers

A curated list of high-quality papers on resource-efficient LLMs.

This is the GitHub repo for our survey paper Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models.

Awesome Resource-Efficient LLM Papers
- Table of Contents
- LLM Architecture Design
  - Efficient Transformer Architecture
  - Non-transformer Architecture
- LLM Pre-Training
  - Memory Efficiency
    - Distributed Training
    - Mixed precision training
  - Data Efficiency
    - Importance Sampling
    - Data Augmentation
    - Training Objective
- LLM Fine-Tuning
  - Parameter-Efficient Fine-Tuning
  - Full-Parameter Fine-Tuning
- LLM Inference
  - Model Compression
    - Pruning
    - Quantization
  - Dynamic Acceleration
    - Input Pruning
- System Design
  - Deployment optimization
  - Support Infrastructure
  - Other Systems
- Resource-Efficiency Evaluation Metrics & Benchmarks
  - ? Computation Metrics
  - ? Memory Metrics
  - ⚡️ Energy Metrics
  - ? Financial Cost Metric
  - ? Network Communication Metric
  - Other Metrics
  - Benchmarks
- Reference

LLM Architecture Design

Efficient Transformer Architecture

Date	Keywords	Paper	Venue
2024	Approximate attention	Simple linear attention language models balance the recall-throughput tradeoff	ArXiv
2024	Hardware attention	MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases	ArXiv
2024	Approximate attention	LoMA: Lossless Compressed Memory Attention	ArXiv
2024	Approximate attention	Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation	ICML
2024	Hardware optimization	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	ICLR
2023	Hardware optimization	Flashattention: Fast and memory-efficient exact attention with io-awareness	NeurIPS
2023	Approximate attention	KDEformer: Accelerating Transformers via Kernel Density Estimation	ICML
2023	Approximate attention	Mega: Moving Average Equipped Gated Attention	ICLR
2022	Hardware optimization	xFormers - Toolbox to Accelerate Research on Transformers	GitHub
2021	Approximate attention	Efficient attention: Attention with linear complexities	WACV
2021	Approximate attention	An Attention Free Transformer	ArXiv
2021	Approximate attention	Self-attention Does Not Need O(n^2) Memory	ArXiv
2021	Hardware optimization	LightSeq: A High Performance Inference Library for Transformers	NAACL
2021	Hardware optimization	FasterTransformer: A Faster Transformer Framework	GitHub
2020	Approximate attention	Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention	ICML
2019	Approximate attention	Reformer: The efficient transformer	ICLR

Non-transformer Architecture

Date	Keywords	Paper	Venue
2024	Decoder	You Only Cache Once: Decoder-Decoder Architectures for Language Models	ArXiv
2024	BitLinear layer	Scalable MatMul-free Language Modeling	ArXiv
2023	RNN LM	RWKV: Reinventing RNNs for the Transformer Era	EMNLP-Findings
2023	MLP	Auto-Regressive Next-Token Predictors are Universal Learners	ArXiv
2023	Convolutional LM	Hyena Hierarchy: Towards Larger Convolutional Language models	ICML
2023	Sub-quadratic Matrices based	Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture	NeurIPS
2023	Selective State Space Model	Mamba: Linear-Time Sequence Modeling with Selective State Spaces	ArXiv
2022	Mixture of Experts	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	JMLR
2022	Mixture of Experts	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts	ICML
2022	Mixture of Experts	Mixture-of-Experts with Expert Choice Routing	NeurIPS
2022	Mixture of Experts	Efficient Large Scale Language Modeling with Mixtures of Experts	EMNLP
2017	Mixture of Experts	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer	ICLR

LLM Pre-Training

Memory Efficiency

Distributed Training

Date	Keywords	Paper	Venue
2024	Model Parallelism	ProTrain: Efficient LLM Training via Adaptive Memory Management	Arxiv
2024	Model Parallelism	MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs	Arxiv
2023	Data Parallelism	Palm: Scaling language modeling with pathways	Github
2023	Model Parallelism	Bpipe: memory-balanced pipeline parallelism for training large language models	JMLR
2022	Model Parallelism	Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning	OSDI
2021	Data Parallelism	FairScale: A general purpose modular PyTorch library for high performance and large scale training	JMLR
2020	Data Parallelism	Zero: Memory optimizations toward training trillion parameter models	IEEE SC20
2019	Model Parallelism	GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism	NeurIPS
2019	Model Parallelism	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	Arxiv
2019	Model Parallelism	PipeDream: generalized pipeline parallelism for DNN training	SOSP
2018	Model Parallelism	Mesh-tensorflow: Deep learning for supercomputers	NeurIPS

Mixed precision training

Date	Keywords	Paper	Venue
2022	Mixed Precision Training	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model	Arxiv
2018	Mixed Precision Training	Bert: Pre-training of deep bidirectional transformers for language understanding	ACL
2017	Mixed Precision Training	Mixed Precision Training	ICLR

Data Efficiency

Importance Sampling

Date	Keywords	Paper	Venue
2024	Importance sampling	LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning	Arxiv
2023	Survey on importance sampling	A Survey on Efficient Training of Transformers	IJCAI
2023	Importance sampling	Data-Juicer: A One-Stop Data Processing System for Large Language Models	Arxiv
2023	Importance sampling	INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models	EMNLP
2023	Importance sampling	Machine Learning Force Fields with Data Cost Aware Training	ICML
2022	Importance sampling	Beyond neural scaling laws: beating power law scaling via data pruning	NeurIPS
2021	Importance sampling	Deep Learning on a Data Diet: Finding Important Examples Early in Training	NeurIPS
2018	Importance sampling	Training Deep Models Faster with Robust, Approximate Importance Sampling	NeurIPS
2018	Importance sampling	Not All Samples Are Created Equal: Deep Learning with Importance Sampling	ICML

Data Augmentation

Date	Keywords	Paper	Venue
2024	Data Augmentation	LLMRec: Large Language Models with Graph Augmentation for Recommendation	WSDM
2024	Data augmentation	LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition	Arxiv
2023	Data augmentation	MixGen: A New Multi-Modal Data Augmentation	WACV
2023	Data augmentation	Augmentation-Aware Self-Supervision for Data-Efficient GAN Training	NeurIPS
2023	Data augmentation	Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis	EMNLP
2023	Data augmentation	FaMeSumm: Investigating and Improving Faithfulness of Medical Summarization	EMNLP

Training Objective

Date	Keywords	Paper	Venue
2023	Training objective	Challenges and Applications of Large Language Models	Arxiv
2023	Training objective	Efficient Data Learning for Open Information Extraction with Pre-trained Language Models	EMNLP
2023	Masked language-image modeling	Scaling Language-Image Pre-training via Masking	CVPR
2022	Masked image modeling	Masked Autoencoders Are Scalable Vision Learners	CVPR
2019	Masked language modeling	MASS: Masked Sequence to Sequence Pre-training for Language Generation	ICML

LLM Fine-Tuning

Parameter-Efficient Fine-Tuning

Date	Keywords	Paper	Venue
2024	LoRA-based fine-tuning	Dlora: Distributed parameter-efficient fine-tuning solution for large language model	Arxiv
2024	LoRA-based fine-tuning	SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models	Arxiv
2024	LoRA-based fine-tuning	Data-efficient Fine-tuning for LLM-based Recommendation	SIGIR
2024	LoRA-based fine-tuning	MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter	ACL
2023	LoRA-based fine-tuning	DyLoRA: Parameter-Efficient Tuning of Pretrained Models using Dynamic Search-Free Low Rank Adaptation	EACL
2022	Masking-based fine-tuning	Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively	NeurIPS
2021	Masking-based fine-tuning	BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models	ACL
2021	Masking-based fine-tuning	Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning	EMNLP
2021	Masking-based fine-tuning	Unlearning Bias in Language Models by Partitioning Gradients	ACL
2019	Masking-based fine-tuning	SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization	ACL

Full-Parameter Fine-Tuning

Date	Keywords	Paper	Venue
2024	Full-parameter fine-tuning	Hift: A hierarchical full parameter fine-tuning strategy	Arxiv
2024	Study of full-parameter fine-tuning optimizations	A Study of Optimizations for Fine-tuning Large Language Models	Arxiv
2023	Comparative study betweeen full-parameter and LoRA-base fine-tuning	A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model	Arxiv
2023	Comparative study betweeen full-parameter and parameter-efficient fine-tuning	Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification	Arxiv
2023	Full-parameter fine-tuning with limited resources	Full Parameter Fine-tuning for Large Language Models with Limited Resources	Arxiv
2023	Memory-efficient fine-tuning	Fine-Tuning Language Models with Just Forward Passes	NeurIPS
2023	Full-parameter fine-tuning for medicine applications	PMC-LLaMA: Towards Building Open-source Language Models for Medicine	Arxiv
2022	Drawback of full-parameter fine-tuning	Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution	ICLR

LLM Inference

Model Compression

Pruning

Date	Keywords	Paper	Venue
2024	Unstructured Pruning	SparseLLM: Towards Global Pruning for Pre-trained Language Models	NeurIPS
2024	Structured Pruning	Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models	Arxiv
2024	Structured Pruning	BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation	Arxiv
2024	Structured Pruning	ShortGPT: Layers in Large Language Models are More Redundant Than You Expect	Arxiv
2024	Structured Pruning	NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models	Arxiv
2024	Structured Pruning	SliceGPT: Compress Large Language Models by Deleting Rows and Columns	ICLR
2024	Unstructured Pruning	Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs	ICLR
2024	Structured Pruning	Plug-and-Play: An Efficient Post-training Pruning Method for Large Language Models	ICLR
2023	Unstructured Pruning	One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models	Arxiv
2023	Unstructured Pruning	SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot	ICML
2023	Unstructured Pruning	A Simple and Effective Pruning Approach for Large Language Models	ICLR
2023	Unstructured Pruning	AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference With Transformers	TCAD
2023	Structured Pruning	LLM-Pruner: On the Structural Pruning of Large Language Models	NeurIPS
2023	Structured Pruning	LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation	ICML
2023	Structured Pruning	Structured Pruning for Efficient Generative Pre-trained Language Models	ACL
2023	Structured Pruning	ZipLM: Inference-Aware Structured Pruning of Language Models	NeurIPS
2023	Contextual Pruning	Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time	ICML

Quantization

Date	Keywords	Paper	Venue
2024	Weight Quantization	Evaluating Quantized Large Language Models	Arxiv
2024	Weight Quantization	I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models	Arxiv
2024	Weight Quantization	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Arxiv
2024	Weight-Activation Co-Quantization	Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs	NeurIPS
2024	Weight Quantization	OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models	ICLR
2023	Weight Quantization	Flexround: Learnable rounding based on element-wise division for post-training quantization	ICML
2023	Weight Quantization	Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling	EMNLP
2023	Weight Quantization	OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models	AAAI
2023	Weight Quantization	Gptq: Accurate posttraining quantization for generative pre-trained transformers	ICLR
2023	Weight Quantization	Dynamic Stashing Quantization for Efficient Transformer Training	EMNLP
2023	Weight Quantization	Quantization-aware and tensor-compressed training of transformers for natural language understanding	Interspeech
2023	Weight Quantization	QLoRA: Efficient Finetuning of Quantized LLMs	NeurIPS
2023	Weight Quantization	Stable and low-precision training for large-scale vision-language models	NeurIPS
2023	Weight Quantization	Prequant: A task-agnostic quantization approach for pre-trained language models	ACL
2023	Weight Quantization	Olive: Accelerating large language models via hardware-friendly outliervictim pair quantization	ISCA
2023	Weight Quantization	Awq: Activationaware weight quantization for llm compression and acceleration	arXiv
2023	Weight Quantization	Spqr: A sparsequantized representation for near-lossless llm weight compression	arXiv
2023	Weight Quantization	SqueezeLLM: Dense-and-Sparse Quantization	arXiv
2023	Weight Quantization	LLM-QAT: Data-Free Quantization Aware Training for Large Language Models	arXiv
2022	Activation Quantization	Gact: Activation compressed training for generic network architectures	ICML
2022	Fixed-point Quantization	Boost Vision Transformer with GPU-Friendly Sparsity and Quantization	ACL
2021	Activation Quantization	Ac-gc: Lossy activation compression with guaranteed convergence	NeurIPS

Dynamic Acceleration

Input Pruning

Date	Keywords	Paper	Venue
2024	Score-based Token Removal	Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation	COLM
2024	Score-based Token Removal	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	Arxiv
2024	Learning-based Token Removal	LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression	ACL
2024	Learning-based Token Removal	Compressed Context Memory For Online Language Model Interaction	ICLR
2023	Score-based Token Removal	Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference	KDD
2023	Learning-based Token Removal	PuMer: Pruning and Merging Tokens for Efficient Vision Language Models	ACL
2023	Learning-based Token Removal	Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model	arXiv
2023	Learning-based Token Removal	SmartTrim: Adaptive Tokens and Parameters Pruning for Efficient Vision-Language Models	arXiv
2022	Learning-based Token Removal	Transkimmer: Transformer Learns to Layer-wise Skim	ACL
2022	Score-based Token Removal	Learned Token Pruning for Transformers	KDD
2021	Learning-based Token Removal	TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference	NAACL
2021	Score-based Token Removal	Efficient sparse attention architecture with cascade token and head pruning	HPCA

System Design

Deployment optimization

Date	Keywords	Paper	Venue
2024	Hardware Optimization	LUT TENSOR CORE: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration	Arxiv
2023	Hardware offloading	FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU	PMLR
2023	Hardware offloading	Fast distributed inference serving for large language models	arXiv
2022	Collaborative inference	Petals: Collaborative Inference and Fine-tuning of Large Models	arXiv
2022	Hardware offloading	DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale	IEEE SC22

Support Infrastructure

Date	Keywords	Paper	Venue
2024	Edge devices	MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases	ICML
2024	Edge devices	EdgeShard: Efficient LLM Inference via Collaborative Edge Computing	Arxiv
2024	Edge devices	Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs	ICML
2024	Edge devices	The breakthrough memory solutions for improved performance on llm inference	IEEE Micro
2024	Edge devices	MELTing point: Mobile Evaluation of Language Transformers	MobiCom
2024	Edge devices	LLM as a System Service on Mobile Devices	Arxiv
2024	Edge devices	LocMoE: A Low-overhead MoE for Large Language Model Training	Arxiv
2024	Edge devices	Jetmoe: Reaching llama2 performance with 0.1 m dollars	Arxiv
2023	Edge devices	Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices	ICASSP
2023	Edge devices	Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly	arXiv
2023	Libraries	Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training	ICPP
2023	Libraries	GPT-NeoX-20B: An Open-Source Autoregressive Language Model	ACL
2023	Edge devices	Large Language Models Empowered Autonomous Edge AI for Connected Intelligence	arXiv
2022	Libraries	DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale	IEEE SC22
2022	Libraries	Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning	OSDI
2022	Edge devices	EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation	arXiv
2022	Edge devices	ProFormer: Towards On-Device LSH Projection-Based Transformers	ACL
2021	Edge devices	Generate More Features with Cheap Operations for BERT	ACL
2021	Edge devices	SqueezeBERT: What can computer vision teach NLP about efficient neural networks?	SustaiNLP
2020	Edge devices	Lite Transformer with Long-Short Range Attention	arXiv
2019	Libraries	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	IEEE SC22
2018	Libraries	Mesh-TensorFlow: Deep Learning for Supercomputers	NeurIPS

Other Systems

Date	Keywords	Paper	Venue
2023	Other Systems	Tabi: An Efficient Multi-Level Inference System for Large Language Models	EuroSys
2023	Other Systems	Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation	PACMMOD

Resource-Efficiency Evaluation Metrics & Benchmarks

? Computation Metrics

Metric	Description	Example Usage
FLOPs (Floating-point operations)	the number of arithmetic operations on floating-point numbers	[FLOPs]
Training Time	the total duration required for training, typically measured in wall-clock minutes, hours, or days	[minutes, days] [hours]
Inference Time/Latency	the average time required generate an output after receiving an input, typically measured in wall-clock time or CPU/GPU/TPU clock time in milliseconds or seconds	[end-to-end latency in seconds] [next token generation latency in milliseconds]
Throughput	the rate of output tokens generation or tasks completion, typically measured in tokens per second (TPS) or queries per second (QPS)	[tokens/s] [queries/s]
Speed-Up Ratio	the improvement in inference speed compared to a baseline model	[inference time speed-up] [throughput speed-up]

? Memory Metrics

Metric	Description	Example Usage
Number of Parameters	the number of adjustable variables in the LLM’s neural network	[number of parameters]
Model Size	the storage space required for storing the entire model	[peak memory usage in GB]

⚡️ Energy Metrics

Metric	Description	Example Usage
Energy Consumption	the electrical power used during the LLM’s lifecycle	[kWh]
Carbon Emission	the greenhouse gas emissions associated with the model’s energy usage	[kgCO2eq]

The following are available software packages designed for real-time tracking of energy consumption and carbon emission.

CodeCarbon

Carbontracker

experiment-impact-tracker

You might also find the following helpful for predicting the energy usage and carbon footprint before actual training or

ML CO2 Impact

LLMCarbon

? Financial Cost Metric

Metric	Description	Example Usage
Dollars per parameter	the total cost of training (or running) the LLM by the number of parameters

? Network Communication Metric

Metric	Description	Example Usage
Communication Volume	the total amount of data transmitted across the network during a specific LLM execution or training run	[communication volume in TB]

Other Metrics

Metric	Description	Example Usage
Compression Ratio	the reduction in size of the compressed model compared to the original model	[compress rate] [percentage of weights remaining]
Loyalty/Fidelity	the resemblance between the teacher and student models in terms of both predictions consistency and predicted probability distributions alignment	[loyalty] [fidelity]
Robustness	the resistance to adversarial attacks, where slight input modifications can potentially manipulate the model's output	[after-attack accuracy, query number]
Pareto Optimality	the optimal trade-offs between various competing factors	[Pareto frontier (cost and accuracy)] [Pareto frontier (performance and FLOPs)]

Benchmarks

Benchmark	Description	Paper
General NLP Benchmarks	an extensive collection of general NLP benchmarks such as GLUE, SuperGLUE, WMT, and SQuAD, etc.	A Comprehensive Overview of Large Language Models
Dynaboard	an open-source platform for evaluating NLP models in the cloud, offering real-time interaction and a holistic assessment of model quality with customizable Dynascore	Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
EfficientQA	an open-domain Question Answering (QA) challenge at NeurIPS 2020 that focuses on building accurate, memory-efficient QA systems	NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned
SustaiNLP 2020 Shared Task	a challenge for development of energy-efficient NLP models by assessing their performance across eight NLU tasks using SuperGLUE metrics and evaluating their energy consumption during inference	Overview of the SustaiNLP 2020 Shared Task
ELUE (Efficient Language Understanding Evaluation)	a benchmark platform for evaluating NLP model efficiency across various tasks, offering online metrics and requiring only a Python model definition file for submission	Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
VLUE (Vision-Language Understanding Evaluation)	a comprehensive benchmark for assessing vision-language models across multiple tasks, offering an online platform for evaluation and comparison	VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Long Range Arena (LAG)	a benchmark suite evaluating efficient Transformer models on long-context tasks, spanning diverse modalities and reasoning types while allowing evaluations under controlled resource constraints, highlighting real-world efficiency	Long Range Arena: A Benchmark for Efficient Transformers
Efficiency-aware MS MARCO	an enhanced MS MARCO information retrieval benchmark that integrates efficiency metrics like per-query latency and cost alongside accuracy, facilitating a comprehensive evaluation of IR systems	Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Reference

If you find this paper list useful in your research, please consider citing:

@article{bai2024beyond,
  title={Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models},
  author={Bai, Guangji and Chai, Zheng and Ling, Chen and Wang, Shiyu and Lu, Jiaying and Zhang, Nan and Shi, Tingwei and Yu, Ziyang and Zhu, Mengdan and Zhang, Yifei and others},
  journal={arXiv preprint arXiv:2401.00625},
  year={2024}
}

Expand

Additional Information