Murphy's Laws for Machine Learning & Neural Networks
In the spirit of "Anything that can go wrong will go wrong", these laws capture the quirks and challenges of working with ML and Neural Networks in the real world. They are derived from the practical problems we face when our models are pushed to production.
The Laws
- Law of Critical Application: The more critical the application, the more likely the neural network will fail to generalize.
- Law of Excessive Complexity: The complexity of a neural network will always exceed the available data.
- Law of Premature Deployment: A neural network model that takes weeks to train will have a bug discovered within minutes of deployment.
- Law of Interpretability's Inverse: The most accurate model will be the least interpretable.
- Law of Hyperparameter Inconsistency: Hyperparameters that worked best in your last project will be the worst for your current project.
- Law of Layered Confusion: The more layers you add, the less you understand.
- Law of Validation Oversight: A 99% accuracy on your validation set usually means you’ve forgotten to include a critical class of data.
- Law of Blind Architecture: If you don't understand the architecture, adding more layers will not help.
- Law of Model Obsolescence: The moment you deploy your state-of-the-art model, a new paper will come out rendering it obsolete.
- Law of Misplaced Confidence: A neural network's confidence in its prediction is inversely proportional to its accuracy at the most critical moments.
- Law of GPU's Last Gasp: The GPU will crash minutes before the end of a weeks-long training session.
- Law of Random Tweaking: The more you tweak a neural network, the closer it gets to being a random number generator.
- Law of Training Duration's Deception: The model that took days to train will be outperformed by a simpler model that took minutes.
- Law of Documentation Lag: The documentation for the latest neural network framework will always be one version behind.
- Law of Model Complexity Irony: Your most complex model will achieve similar performance as a linear regression on the same data.
- Law of Hyperparameter Hindsight: The best hyperparameters are always found after you stop searching.
- Law of Reproduction Anxiety: The moment you can't replicate your results is when your boss asks for them.
- Law of Unexpected Inputs: Every neural network has a special set of inputs that will make it behave unexpectedly, and you will only discover them in production.
- Law of Simple Mistakes: No matter how advanced the model, its errors will always appear to be foolishly simple to humans.
- Law of Depth: The deeper the network, the more elusive the vanishing gradient problem until deployment.
- Law of Recurrence: Your RNN will remember everything, except the one sequence pattern that's critical.
- Law of Gated Memory: The moment you decide LSTMs have solved your sequence problems, your data will evolve to prove you wrong.
- Law of Bidirectionality: When a BiLSTM starts to make sense, your sequences will demand attention elsewhere.
- Law of Convolution: The most critical feature will always be just outside your CNN's receptive field.
- Law of Local Reception: After painstakingly optimizing your CNN's kernel size, a change in input resolution will make it irrelevant.
- Law of Attention: Your model will attend to everything in a sequence except the most relevant part.
- Law of Self-Attention: The one time a Transformer fails, it'll be on the input you least expected it to.
- Law of Transfer Learning: The more specific your task, the less transferable a pre-trained model will be.
- Law of Reinforcement: Your agent will master every strategy, except the one that maximizes reward in the real world.
- Law of Environment Dynamics: The one time your RL model seems perfect, the environment will suddenly turn non-stationary.
- Law of Large Models: The bigger the model, the more embarrassing its simplest mistake.
- Law of Over-parametrization: Your most overfitted model will generalize perfectly during testing but fail miserably in the real world.
- Law of Gradient Flow: The layer where you need the gradient the most is where it will vanish.
- Law of Modality Adaptation: The moment you fine-tune a CNN for non-image data, you'll find a dataset where a simple ANN outperforms it.
- Law of Dynamic Architecture: The more dynamic your network, the harder it will be to explain its sudden failures.
- Law of Adversarial Robustness: The adversarial attack you didn't prepare for will be the first one you encounter.
- Law of Multimodality: Whenever you combine data types, the network will excel in one and fail spectacularly in the other.
- Law of Sparsity: Your most pruned network will miss the one connection that's critical.
- Law of Neural Plasticity: The day after you repurpose a neural network is when it will yearn for its original task.
- Law of Supervised Illusion: In supervised learning, the more precisely your model fits the training data, the more it believes it understands the world—until it meets the real-world data.
? Contributions
Feel free to submit a PR if you've encountered another "law" in your experience or if you have any suggestions or improvements. Let's grow this list together and bring a little humor to our daily ML struggles.
? License
This repository is licensed under the MIT License.
Acknowledgements
- Inspired by Murphy's Law and the collective wisdom (and pain) of Machine Learning practitioners everywhere.
- Special thanks to the ML community for the shared experiences and insights.
- Inspired by Murphy's laws collection at Angelo State University's blog.