計算的力量: 很多證據表明,機器學習的進步很大程度上是由計算驅動的,而不是研究,請參考:"The Bitter Lesson",而且往往會出現Emergence和Homogenization現象。 有研究表明,人工智能計算使用量大約每3.4個月翻一番,而效率提升每16個月才翻一番。其中計算使用量主要由計算力驅動,而效率則由研究驅動。 這意味著計算增長在歷史上主導了機器學習和其子領域的進步。 GPT-4的出現更加證明了這一點。儘管如此,未來是否有更顛覆Transformer的架構仍需要我們重視,比如說S4。 目前的NLP研究熱點大部分基於更先進的LLM (~100B,
關於LLM更多topics的論文請參考這里和這裡。
論文(粗糙類別)
資源
【對GPT-4的測試,limitation】Sparks of Artificial General Intelligence: Early experiments with GPT-4
【InstructGPT論文,包括sft,ppo等,最重要的文章之一】Training language models to follow instructions with human feedback
【scalable oversight: 人類在模型超過自己的任務後怎麼持續的提升模型? 】Measuring Progress on Scalable Oversight for Large Language Models
【Alignment的定義,deepmind出品】Alignment of Language Agents
A General Language Assistant as a Laboratory for Alignment
【RETRO論文,利用CCA+檢索的模型】Improving language models by retrieving from trillions of tokens
Fine-Tuning Language Models from Human Preferences
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
【中英文的大模型,超過GPT-3】GLM-130B: An Open Bilingual Pre-trained Model
【預訓練目標優化】UL2: Unifying Language Learning Paradigms
【Alignment新的基準,模型庫和新方法】Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
【通過技術不使用[MASK]標記進行MLM】Representation Deficiency in Masked Language Modeling
【文字轉為圖像訓練,緩解了Vocabulary的需要並抗某些攻擊】Language Modelling with Pixels
LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
InCoder: A Generative Model for Code Infilling and Synthesis
【檢索Text相關圖像進行語言模型預訓練】Visually-Augmented Language Modeling
A Non-monotonic Self-terminating Language Model
【通過prompt設計進行負面反饋比較微調】Chain of Hindsight Aligns Language Models with Feedback
【Sparrow模型】Improving alignment of dialogue agents via targeted human judgements
【用小模型參數加速大模型訓練過程(不從頭)】Learning to Grow Pretrained Models for Efficient Transformer Training
【多種知識源MoE半參數知識融合模型】Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
【不同數據集上的多個已訓練模型合併方法】Dataless Knowledge Fusion by Merging Weights of Language Models
【很有啟發,檢索機制代替Transformer 中的FFN 的通用架構(×2.54 time),以便解耦存儲在模型參數中的知識】Language model with Plug-in Knowldge Memory
【自動生成Instruction tuning的數據用於GPT-3的訓練】Self-Instruct: Aligning Language Model with Self Generated Instructions
-
Towards Conditionally Dependent Masked Language Models
【迭代地校準不完美生成的獨立校正器,Sean Welleck的後續文章】Generating Sequences by Learning to Self-Correct
【持續學習:新任務增加一個prompt,且上一個任務的prompt和大模型不變】Progressive Prompts: Continual Learning for Language Models without Forgetting
【EMNLP 2022,模型的持續更新】MemPrompt: Memory-assisted Prompt Editing with User Feedback
【新的神經架構(FOLNet),其中包含一階邏輯歸納偏差】Learning Language Representations with Logical Inductive Bias
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
【基於state-space models的預訓練語言模型,超過BERT】Pretraining Without Attention
【預訓練的時候就考慮人類反饋】Pretraining Language Models with Human Preferences
【Meta的開源LLaMA模型,7B-65B,訓練比通常使用的更多的標記的小模型,在各種推理預算下實現最佳性能】LLaMA: Open and Efficient Foundation Language Models
【通過少量示例教大型語言模型自我調試並解釋生成代碼,但目前已經經常這樣用過】Teaching Large Language Models to Self-Debug
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
LIMA: Less Is More for Alignment
【Tree-of-thought, 越來越像alphago了】Deliberate Problem Solving with Large Language Models
【應用ICL的多步推理方法,很有啟發】ReAct: Synergizing Reasoning and Acting in Language Models
【CoT直接生成program code,然後讓python interpreter執行】Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
【大模型直接產生證據上下文】Generate rather than Retrieve: Large Language Models are Strong Context Generators
【具有4個特定操作的寫作模型】PEER: A Collaborative Language Model
【將Python、SQL執行器和大模型結合】Binding Language Models in Symbolic Languages
【檢索文檔生成代碼】DocPrompting: Generating Code by Retrieving the Docs
【Grounding+LLM的系列文章接下來會有很多】LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
【自我迭代生成(利用python驗證過)訓練數據】Language Models Can Teach Themselves to Program Better
相關文章:Specializing Smaller Language Models towards Multi-Step Reasoning
STaR: Bootstrapping Reasoning With Reasoning, 來自Neurips 22 (生成CoT數據用於模型微調), 引起後續一系列教小模型的CoT的文章
類似想法【知識蒸餾】 Teaching Small Language Models to Reason 與Learning by Distilling Context
類似想法KAIST和Xiang Ren組(【CoT的rationale微調(教授)時進行擾動】PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales等) 與Large Language Models Are Reasoning Teachers
ETH的【CoT的數據分別訓練問題分解和問題解答模型】Distilling Multi-Step Reasoning Capabilites of Large Language Models into Smaller Models via Semantic Decompositions
【讓小模型學會CoT能力】In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
【大模型教小模型CoT】Large Language Models Are Reasoning Teachers
【大模型生成證據(背誦)然後進行小樣本閉卷問答】Recitation-Augmented Language Models
【歸納推理的自然語言方式】Language Models as Inductive Reasoners
【GPT-3用於數據標註(如情感分類)】Is GPT-3 a Good Data Annotator?
【基於多任務訓練用於少樣本數據增強的模型】KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP
【procedural planning的工作,暫時不感興趣】Neuro-Symbolic Procedural Planning with Commonsense Prompting
【目標:為維基百科中某些參考文獻支持的Query生成一篇事實正確的文章】WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus
【將外部物理模擬器的結果結合在context中】Mind's Eye: Grounded Language Model Reasoning through Simulation
【檢索增強的CoT做知識Intensive的任務】Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
【對比一致搜索(CCS)無監督識別語言模型中的潛在(二元)知識】Discovering Latent Knowledge in Language Models Without Supervision
【Percy Liang組,可信搜索引擎,只有51.5%的生成句子得到引用的完全支持】Evaluating Verifiability in Generative Search Engines
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
【在我看來是最重要的文章之一,語言模型在交叉熵損失下的比例定律,損失與模型大小,數據集大小,用於訓練的計算量成冪律關係,而寬度深度等架構細節影響較小】Scaling Laws for Neural Language Models
【另一篇最重要的文章之一,Chinchilla,限定計算下,最優的模型並不是最大的模型,而是更多數據訓練的較小模型(60-70B)】Training Compute-Optimal Large Language Models
【哪種架構和優化目標有助於零樣本泛化】What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
【Grokking “頓悟”學習過程Memorization->Circuit formation->Cleanup】Progress measures for grokking via mechanistic interpretability
【調查檢索式模型的特點,發現兩者均對reasoning有限】Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model
【人類-AI語言交互評價框架】Evaluating Human-Language Model Interaction
What learning algorithm is in-context learning? Investigations with linear models
【模型編輯,這塊是Hot topic】Mass-Editing Memory in a Transformer
【模型對無關上下文的敏感性,向提示中示例添加不相關的信息和添加忽略不相關上下文的指令部分解決】Large Language Models Can Be Easily Distracted by Irrelevant Context
【zero-shot CoT在敏感問題下會表現出bias和toxicity】 On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
【大模型的CoT具有跨語言能力】Language models are multilingual chain-of-thought reasoners
【不同Prompt序列困惑度越低性能越好】 Demystifying Prompts in Language Models via Perplexity Estimation
【大模型的binary implicature resolution任務,這種暗示難並沒有縮放現象】Large language models are not zero-shot communicators (https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/ implicatures)
【複雜的提示提升了CoT】Complexity-Based Prompting for Multi-step Reasoning
What Matters In The Structured Pruning of Generative Language Models?
【AmbiBench數據集,任務歧義:縮放RLHF 模型在消除歧義任務方面表現最佳。微調比few-shot prompting更有幫助】Task Ambiguity in Humans and Language Models
【GPT-3的測試,包括記憶,校準,偏見等】Prompting GPT-3 To Be Reliable
【OSU研究CoT哪個部分對性能有效】Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
【離散提示的跨語言模型研究】Can discrete information extraction prompts generalize across language models?
【記憶率與訓練中的模型大小、前綴長度和重複率呈對數線性關係】Quantifying Memorization Across Neural Language Models
【很有啟發,將問題通過GPT迭代分解為子問題並回答】Measuring and Narrowing the Compositionality Gap in Language Models
【對GPT-3類似公務員那種智力題類比測試】Emergent Analogical Reasoning in Large Language Models
【短文本訓練,長文本測試,評估模型的變長適應能力】A Length-Extrapolatable Transformer
【什麼時候檢索,什麼時候用大模型足夠】When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories
【ICL是另一種形式的gradient更新】Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers
Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective
【對OPT模型進行不同大小訓練的過程研究,發現困惑度是ICL的指標】Training Trajectories of Language Models Across Scales
【EMNLP 2022, 預訓練純英語語料包含著其他語言,模型跨語言能力可能來自於數據洩露】Language Contamination Helps Explains the Cross-lingual Capabilities of English Pretrained Models
【Overriding語義先驗而使用prompt中的信息是一項湧向能力】Larger language models do in-context learning differently
【EMNLP 2022 findings】What Language Model to Train if You Have One Million GPU Hours?
【在推理時引入CFG技術極大的提升小模型的指令遵循能力】Stay on topic with Classifier-Free Guidance
【用openai的GPT-4訓練自己的LLaMA模型打敗openai的GPT-4,只能說佩服】Instruction Tuning with GPT-4
Reflexion: an autonomous agent with dynamic memory and self-reflection
【個性化風格的prompt學習,OPT】Extensible Prompts for Language Models
【加速大模型解碼,利用小模型和大模型直接的共識一次調用多次可用,畢竟輸入長了會很慢】 Accelerating Large Language Model Decoding with Speculative Sampling
【利用soft prompt減輕微調帶來的ICL能力下降,一階段微調prompt,二階段微調模型】Preserving In-Context Learning ability in Large Language Model Fine-tuning
【語義解析任務,ICL的樣例選擇方法,CODEX和T5-large】Diverse Demonstrations Improve In-context Compositional Generalization
【一種文本生成的新的優化方式】Tailoring Language Generation Models under Total Variation Distance
【條件生成的不確定性估計,採用多個採樣輸出的語義聚類合併後簇的熵來估計】Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models
【很有啟發,自由文本約束下的文本生成方法】Controllable Text Generation with Language Constraints
【生成預測時採用相似度選phrase而不是softmax預測token】Nonparametric Masked Language Modeling
【長文本的ICL方法】Parallel Context Windows Improve In-Context Learning of Large Language Models
【InstructGPT模型自己生成ICL的樣例】Self-Prompting Large Language Models for Open-Domain QA
【通過分組和注意力機制使得ICL能夠輸入更多的標註樣本】Structured Prompting: Scaling In-Context Learning to 1,000 Examples
Momentum Calibration for Text Generation
【兩種ICL樣例選擇的方法,基於OPT和GPTJ的實驗】Careful Data Curation Stabilizes In-context Learning
【對Mauve(pillutla 等人)生成評估指標的分析】On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation
Promptagator: Few-shot Dense Retrieval From 8 Examples
【三個臭皮匠,頂個諸葛亮】Self-Consistency Improves Chain of Thought Reasoning in Language Models
【反轉,輸入和標籤為條件生成指令】Guess the Instruction! Making Language Models Stronger Zero-Shot Learners
【LLM 的反向推導自我驗證】Large Language Models are reasoners with Self-Verification
【檢索-生成證據流程下的安全場景的方法】Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI
【基於beam search的文本生成式信息抽取片段的置信度估計】How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning
【對抽取式摘要黃金標籤的探討】Text Summarization with Oracle Expectation
【基於馬氏距離的條件文本生成OOD檢測方法】Out-of-Distribution Detection and Selective Generation for Conditional Language Models
【注意力模塊集成Prompt進行樣例級別的預測】Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning
【多個任務的Prompt通過分解和蒸餾到一個Prompt】Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
【step-by-step推理生成文本的評估指標,可以作為下次分享選題】ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
【校準序列似然改進條件語言生成】Calibrating Sequence likelihood Improves Conditional Language Generation
【基於梯度優化的文本攻擊方法】TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization
【GMM建模ICL決策分類邊界從而校準】Prototypical Calibration for Few-shot Learning of Language Models
【改寫問題,以及基於圖的ICL聚合方法】Ask Me Anything: A simple strategy for prompting language models
【用於從未註釋的示例池中選擇好的候選作為ICL的數據庫】Selective Annotation Makes Language Models Better Few-Shot Learners
PromptBoosting: Black-Box Text Classification with Ten Forward Passes
Attention-Guided Backdoor Attacks against Transformers
【Prompt Mask位置自動選標籤詞】Pre-trained Language Models can be Fully Zero-Shot Learners
【壓縮FiD輸入向量的長度,且輸出時重新排序來輸出文檔排名】FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
【大模型教小模型生成解釋】PINTO: Faithful Language Reasoning Using Prompted-Generated Rationales
【尋找預訓練影響子集】ORCA: Interpreting Prompted Language Models via Locating Supporting Evidence in the Ocean of Pretraining Data
【提示工程,針對的是Instruction,一階段生成二階段排序過濾】Large Language Models are Human-Level Prompt Engineers
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Editing models with task arithmetic
【不用每次都輸入指令和样例,將其轉換為參數高效模塊,】HINT: Hypernetwork Instruction Tuning for Efficient Zero-Shot Generalisation
【不需要人工選樣例的ICL展示生成方法】Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
【任務Instruction和文本一起生成Embedding】One Embedder, Any Task: Instruction-Finetuned Text Embeddings
【大模型教小模型CoT】KNIFE: Knowledge Distillation with Free-Text Rationales
【信息提取式生成模型的源和目標分詞不一致問題】Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks
Parsel: A Unified Natural Language Framework for Algorithmic Reasoning
【ICL樣例選擇,一階段選擇二階段排序】Self-adaptive In-context Learning
【精讀,可讀的prompt無監督選擇方法,GPT-2】Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too
【PRONTOQA數據集測試CoT推理能力,發現Planning能力仍受限】Language Models Can (kind of) Reason: A Systematic Formal Analysis of Chain-of-Thought
【reasoning數據集】WikiWhy: Answering and Explaining Cause-and-Effect Questions
【reasoning數據集】STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK
【reasoning數據集,比較OPT預訓練和微調,包括CoT微調模型】 ALERT: Adapting Language Models to Reasoning Tasks
【浙大張寧豫團隊對近期reasoning的總結】Reasoning with Language Model Prompting: A Survey
【復旦肖仰華團隊對文本生成技術和方向的總結】Harnessing Knowledge and Reasoning for Human-Like Natural Language Generation: A Brief Review
【近期reasoning文章的總結,來自UIUC的Jie Huang】Towards Reasoning in Large Language Models: A Survey
【回顧數學推理和DL的任務、數據集和方法】A Survey of Deep Learning for Mathematical Reasoning
A Survey on Natural Language Processing for Programming
獎勵建模數據集:
Red-teaming數據集,harmless vs. helpful, RLHF +scale更難被攻擊(另一個有效的技術是CoT fine-tuning):
【知識】+【推理】+【生成】
如果對您有幫助,請star支持一下,歡迎Pull Request~
主觀整理,時間上主要從ICLR 2023 Rebuttal期間開始的,包括ICLR,ACL,ICML等預印版論文。
不妥之處或者建議請指正! Dongfang Li, [email protected]