‘tabular ML’ tag
- See Also
- Gwern
-
Links
- “Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond”, Jeffares et al 2024
- “Questionable Practices in Machine Learning”, Leech et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, Zhao et al 2024
- “Attention As an RNN”, Feng et al 2024
- “The Harms of Class Imbalance Corrections for Machine Learning Based Prediction Models: a Simulation Study”, Carriero et al 2024
- “Many-Shot In-Context Learning”, Agarwal et al 2024
- “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
- “Chronos: Learning the Language of Time Series”, Ansari et al 2024
- “StructLM: Towards Building Generalist Models for Structured Knowledge Grounding”, Zhuang et al 2024
- “Why Do Random Forests Work? Understanding Tree Ensembles As Self-Regularizing Adaptive Smoothers”, Curth et al 2024
- “Illusory Generalizability of Clinical Prediction Models”
- “Attention versus Contrastive Learning of Tabular Data—A Data-Centric Benchmarking”, Rabbani et al 2024
- “TabLib: A Dataset of 627M Tables With Context”, Eggert et al 2023
- “Unambiguous Discrimination of All 20 Proteinogenic Amino Acids and Their Modifications by Nanopore”, Wang et al 2023d
- “Generating and Imputing Tabular Data via Diffusion and Flow-Based Gradient-Boosted Trees”, Jolicoeur-Martineau et al 2023
- “Generating Tabular Datasets under Differential Privacy”, Truda 2023
- “TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT”, Zha et al 2023
- “Language Models Are Weak Learners”, Manikandan et al 2023
- “RGD: Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization”, Kumar et al 2023
- “Large Language Models Are Few-Shot Health Learners”, Liu et al 2023
- “Deep Learning Based Forecasting: a Case Study from the Online Fashion Industry”, Kunz et al 2023
- “Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes”, Arora et al 2023
- “TSMixer: An All-MLP Architecture for Time Series Forecasting”, Chen et al 2023
- “Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-Based Reasoning”, Ye et al 2023
- “Fast Semi-Supervised Self-Training Algorithm Based on Data Editing”, Li et al 2023
- “Table-To-Text Generation and Pre-Training With TabT5”, Andrejczuk et al 2022
- “Language Models Are Realistic Tabular Data Generators”, Borisov et al 2022
- “Forecasting With Trees”, Januschowski et al 2022
- “Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?”, Grinsztajn et al 2022
- “Revisiting Pretraining Objectives for Tabular Deep Learning”, Rubachev et al 2022
- “TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data”, Hollmann et al 2022
- “Transfer Learning With Deep Tabular Models”, Levin et al 2022
- “Hopular: Modern Hopfield Networks for Tabular Data”, Schäfl et al 2022
- “Predicting Romantic Interest during Early Relationship Development: A Preregistered Investigation Using Machine Learning”, Eastwick et al 2022
- “On Embeddings for Numerical Features in Tabular Deep Learning”, Gorishniy et al 2022
- “To SMOTE, or Not to SMOTE?”, Elor & Averbuch-Elor 2022
- “M5 Accuracy Competition: Results, Findings, and Conclusions”, Makridakis et al 2022
- “The GatedTabTransformer: An Enhanced Deep Learning Architecture for Tabular Modeling”, Cholakov & Kolev 2022
- “PFNs: Transformers Can Do Bayesian Inference”, Müller et al 2021
- “DANets: Deep Abstract Networks for Tabular Data Classification and Regression”, Chen et al 2021
- “Deep Neural Networks and Tabular Data: A Survey”, Borisov et al 2021
- “An Unsupervised Model for Identifying and Characterizing Dark Web Forums”, Nazah et al 2021
- “TAPEX: Table Pre-Training via Learning a Neural SQL Executor”, Liu et al 2021
- “ARM-Net: Adaptive Relation Modeling Network for Structured Data”, Cai et al 2021
- “Decision Tree Heuristics Can Fail, Even in the Smoothed Setting”, Blanc et al 2021
- “SCARF: Self-Supervised Contrastive Learning Using Random Feature Corruption”, Bahri et al 2021
- “Revisiting Deep Learning Models for Tabular Data”, Gorishniy et al 2021
- “The Epic Sepsis Model Falls Short—The Importance of External Validation”, An et al 2021
- “Well-Tuned Simple Nets Excel on Tabular Datasets”, Kadra et al 2021
- “Tabular Data: Deep Learning Is Not All You Need”, Shwartz-Ziv & Armon 2021
- “Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning”, Kossen et al 2021
- “SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training”, Somepalli et al 2021
- “Intelligence and General Psychopathology in the Vietnam Experience Study: A Closer Look”, Kirkegaard & Nyborg 2021
- “Converting Tabular Data into Images for Deep Learning With Convolutional Neural Networks”, Zhu et al 2021
- “External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients”, Wong et al 2021
- “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting”, Zhou et al 2020
- “TabTransformer: Tabular Data Modeling Using Contextual Embeddings”, Huang et al 2020
- “Engineering In-Place (Shared-Memory) Sorting Algorithms”, Axtmann et al 2020
- “Kaggle Forecasting Competitions: An Overlooked Learning Opportunity”, Bojer & Meldgaard 2020
- “TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data”, Yin et al 2020
- “Neural Additive Models: Interpretable Machine Learning With Neural Nets”, Agarwal et al 2020
- “TAPAS: Weakly Supervised Table Parsing via Pre-Training”, Herzig et al 2020
- “A Market in Dream: the Rapid Development of Anonymous Cybercrime”, Zhou et al 2020b
- “VIME: Extending the Success of Self-Supervised and Semi-Supervised Learning to Tabular Domain”, Yoon et al 2020
- “Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods”, Slack et al 2019
- “The Bouncer Problem: Challenges to Remote Explainability”, Merrer & Tredan 2019
- “OHAC: Online Hierarchical Clustering Approximations”, Menon et al 2019
- “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”, Ke et al 2019
- “TabNet: Attentive Interpretable Tabular Learning”, Arik & Pfister 2019
- “3D Human Pose Estimation via Human Structure-Aware Fully Connected Network”, Zhang et al 2019d
- “ID3 Learns Juntas for Smoothed Product Distributions”, Brutzkus et al 2019
- “Behavioral Patterns in Smartphone Usage Predict Big Five Personality Traits”, Stachl et al 2019
- “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, Spigler et al 2019
- “N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting”, Oreshkin et al 2019
- “SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data”, Sun et al 2019
- “Fairwashing: the Risk of Rationalization”, Aïvodji et al 2019
- “Tweedie Gradient Boosting for Extremely Unbalanced Zero-Inflated Data”, Zhou et al 2018
- “Neural Arithmetic Logic Units”, Trask et al 2018
- “Learning and Memorization”, Chatterjee 2018
- “Large-Scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL”, Mayr et al 2018
- “Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery”, Simm et al 2018
- “Improving Palliative Care With Deep Learning”, An et al 2018
- “Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario”, Vie et al 2017
- “Neural Collaborative Filtering”, He et al 2017
- “OpenML Benchmarking Suites”, Bischl et al 2017
- “CatBoost: Unbiased Boosting With Categorical Features”, Prokhorenkova et al 2017
- “Resource-Efficient Machine Learning in 2 KB RAM for the Internet of Things”, Kumar et al 2017
- “XGBoost: A Scalable Tree Boosting System”, Chen & Guestrin 2016
- “"Why Should I Trust You?": Explaining the Predictions of Any Classifier”, Ribeiro et al 2016
- “The MovieLens Datasets: History and Context”, Harper & Konstan 2015
- “Planning As Satisfiability: Heuristics”, Rintanen 2012
- “Leakage in Data Mining: Formulation, Detection, and Avoidance”, Kaufman 2011
- “Random Survival Forests”, Ishwaran et al 2008
- “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, Perlich et al 2003
- “A Survey of Methods for Scaling Up Inductive Algorithms”, Provost & Kolluri 1999
- “On the Boosting Ability of Top-Down Decision Tree Learning Algorithms”, Kearns & Mansour 1999
- “On The Effect of Data Set Size on Bias And Variance in Classification Learning”, Brain & Webb 1999
- “The Effects of Training Set Size on Decision Tree Complexity”, Oates & Jensen 1997
- “Scaling up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, Kohavi 1996
- “Stupid Data Miner Tricks: Overfitting the S&P 500”, Leinweber 1995
- “The MONK’s Problems-A Performance Comparison of Different Learning Algorithms”, Thrun et al 1991
- “Symbolic and Neural Learning Algorithms: An Experimental Comparison”, Shavlik et al 1991
- “A Meta-Analysis of Overfitting in Machine Learning”
- “Statistical Modeling: The Two Cultures”, Breiman 2024
- “How Good Are LLMs at Doing ML on an Unknown Dataset?”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Gwern
“Fully-Connected Neural Nets”, Gwern 2021
“Weather and My Productivity”, Gwern 2013
Links
“Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond”, Jeffares et al 2024
“Questionable Practices in Machine Learning”, Leech et al 2024
“Probing the Decision Boundaries of In-Context Learning in Large Language Models”, Zhao et al 2024
Probing the Decision Boundaries of In-context Learning in Large Language Models
“Attention As an RNN”, Feng et al 2024
“The Harms of Class Imbalance Corrections for Machine Learning Based Prediction Models: a Simulation Study”, Carriero et al 2024
“Many-Shot In-Context Learning”, Agarwal et al 2024
“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
“Chronos: Learning the Language of Time Series”, Ansari et al 2024
“StructLM: Towards Building Generalist Models for Structured Knowledge Grounding”, Zhuang et al 2024
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
“Why Do Random Forests Work? Understanding Tree Ensembles As Self-Regularizing Adaptive Smoothers”, Curth et al 2024
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers
“Illusory Generalizability of Clinical Prediction Models”
“Attention versus Contrastive Learning of Tabular Data—A Data-Centric Benchmarking”, Rabbani et al 2024
Attention versus Contrastive Learning of Tabular Data—A Data-centric Benchmarking
“TabLib: A Dataset of 627M Tables With Context”, Eggert et al 2023
“Unambiguous Discrimination of All 20 Proteinogenic Amino Acids and Their Modifications by Nanopore”, Wang et al 2023d
Unambiguous discrimination of all 20 proteinogenic amino acids and their modifications by nanopore
“Generating and Imputing Tabular Data via Diffusion and Flow-Based Gradient-Boosted Trees”, Jolicoeur-Martineau et al 2023
Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees
“Generating Tabular Datasets under Differential Privacy”, Truda 2023
“TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT”, Zha et al 2023
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
“Language Models Are Weak Learners”, Manikandan et al 2023
“RGD: Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization”, Kumar et al 2023
RGD: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization
“Large Language Models Are Few-Shot Health Learners”, Liu et al 2023
“Deep Learning Based Forecasting: a Case Study from the Online Fashion Industry”, Kunz et al 2023
Deep Learning based Forecasting: a case study from the online fashion industry
“Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes”, Arora et al 2023
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
“TSMixer: An All-MLP Architecture for Time Series Forecasting”, Chen et al 2023
TSMixer: An All-MLP Architecture for Time Series Forecasting
“Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-Based Reasoning”, Ye et al 2023
“Fast Semi-Supervised Self-Training Algorithm Based on Data Editing”, Li et al 2023
Fast semi-supervised self-training algorithm based on data editing
“Table-To-Text Generation and Pre-Training With TabT5”, Andrejczuk et al 2022
“Language Models Are Realistic Tabular Data Generators”, Borisov et al 2022
“Forecasting With Trees”, Januschowski et al 2022
“Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?”, Grinsztajn et al 2022
Why do tree-based models still outperform deep learning on tabular data?
“Revisiting Pretraining Objectives for Tabular Deep Learning”, Rubachev et al 2022
“TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data”, Hollmann et al 2022
TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data
“Transfer Learning With Deep Tabular Models”, Levin et al 2022
“Hopular: Modern Hopfield Networks for Tabular Data”, Schäfl et al 2022
“Predicting Romantic Interest during Early Relationship Development: A Preregistered Investigation Using Machine Learning”, Eastwick et al 2022
“On Embeddings for Numerical Features in Tabular Deep Learning”, Gorishniy et al 2022
On Embeddings for Numerical Features in Tabular Deep Learning
“To SMOTE, or Not to SMOTE?”, Elor & Averbuch-Elor 2022
“M5 Accuracy Competition: Results, Findings, and Conclusions”, Makridakis et al 2022
“The GatedTabTransformer: An Enhanced Deep Learning Architecture for Tabular Modeling”, Cholakov & Kolev 2022
The GatedTabTransformer: An enhanced deep learning architecture for tabular modeling
“PFNs: Transformers Can Do Bayesian Inference”, Müller et al 2021
“DANets: Deep Abstract Networks for Tabular Data Classification and Regression”, Chen et al 2021
DANets: Deep Abstract Networks for Tabular Data Classification and Regression
“Deep Neural Networks and Tabular Data: A Survey”, Borisov et al 2021
“An Unsupervised Model for Identifying and Characterizing Dark Web Forums”, Nazah et al 2021
An Unsupervised Model for Identifying and Characterizing Dark Web Forums
“TAPEX: Table Pre-Training via Learning a Neural SQL Executor”, Liu et al 2021
TAPEX: Table Pre-training via Learning a Neural SQL Executor
“ARM-Net: Adaptive Relation Modeling Network for Structured Data”, Cai et al 2021
ARM-Net: Adaptive Relation Modeling Network for Structured Data
“Decision Tree Heuristics Can Fail, Even in the Smoothed Setting”, Blanc et al 2021
Decision tree heuristics can fail, even in the smoothed setting
“SCARF: Self-Supervised Contrastive Learning Using Random Feature Corruption”, Bahri et al 2021
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
“Revisiting Deep Learning Models for Tabular Data”, Gorishniy et al 2021
“The Epic Sepsis Model Falls Short—The Importance of External Validation”, An et al 2021
The Epic Sepsis Model Falls Short—The Importance of External Validation
“Well-Tuned Simple Nets Excel on Tabular Datasets”, Kadra et al 2021
“Tabular Data: Deep Learning Is Not All You Need”, Shwartz-Ziv & Armon 2021
“Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning”, Kossen et al 2021
Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
“SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training”, Somepalli et al 2021
SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
“Intelligence and General Psychopathology in the Vietnam Experience Study: A Closer Look”, Kirkegaard & Nyborg 2021
Intelligence and General Psychopathology in the Vietnam Experience Study: A Closer Look
“Converting Tabular Data into Images for Deep Learning With Convolutional Neural Networks”, Zhu et al 2021
Converting tabular data into images for deep learning with convolutional neural networks
“External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients”, Wong et al 2021
“Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting”, Zhou et al 2020
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
“TabTransformer: Tabular Data Modeling Using Contextual Embeddings”, Huang et al 2020
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
“Engineering In-Place (Shared-Memory) Sorting Algorithms”, Axtmann et al 2020
“Kaggle Forecasting Competitions: An Overlooked Learning Opportunity”, Bojer & Meldgaard 2020
Kaggle forecasting competitions: An overlooked learning opportunity
“TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data”, Yin et al 2020
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data
“Neural Additive Models: Interpretable Machine Learning With Neural Nets”, Agarwal et al 2020
Neural Additive Models: Interpretable Machine Learning with Neural Nets
“TAPAS: Weakly Supervised Table Parsing via Pre-Training”, Herzig et al 2020
“A Market in Dream: the Rapid Development of Anonymous Cybercrime”, Zhou et al 2020b
A Market in Dream: the Rapid Development of Anonymous Cybercrime
“VIME: Extending the Success of Self-Supervised and Semi-Supervised Learning to Tabular Domain”, Yoon et al 2020
VIME: Extending the Success of Self-supervised and Semi-supervised Learning to Tabular Domain
“Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods”, Slack et al 2019
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
“The Bouncer Problem: Challenges to Remote Explainability”, Merrer & Tredan 2019
“OHAC: Online Hierarchical Clustering Approximations”, Menon et al 2019
“LightGBM: A Highly Efficient Gradient Boosting Decision Tree”, Ke et al 2019
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
“TabNet: Attentive Interpretable Tabular Learning”, Arik & Pfister 2019
“3D Human Pose Estimation via Human Structure-Aware Fully Connected Network”, Zhang et al 2019d
3D human pose estimation via human structure-aware fully connected network
“ID3 Learns Juntas for Smoothed Product Distributions”, Brutzkus et al 2019
“Behavioral Patterns in Smartphone Usage Predict Big Five Personality Traits”, Stachl et al 2019
Behavioral Patterns in Smartphone Usage Predict Big Five Personality Traits
“Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, Spigler et al 2019
Asymptotic learning curves of kernel methods: empirical data versus Teacher-Student paradigm
“N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting”, Oreshkin et al 2019
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
“SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data”, Sun et al 2019
SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data
“Fairwashing: the Risk of Rationalization”, Aïvodji et al 2019
“Tweedie Gradient Boosting for Extremely Unbalanced Zero-Inflated Data”, Zhou et al 2018
Tweedie Gradient Boosting for Extremely Unbalanced Zero-inflated Data
“Neural Arithmetic Logic Units”, Trask et al 2018
“Learning and Memorization”, Chatterjee 2018
“Large-Scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL”, Mayr et al 2018
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
“Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery”, Simm et al 2018
Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery
“Improving Palliative Care With Deep Learning”, An et al 2018
“Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario”, Vie et al 2017
Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario
“Neural Collaborative Filtering”, He et al 2017
“OpenML Benchmarking Suites”, Bischl et al 2017
“CatBoost: Unbiased Boosting With Categorical Features”, Prokhorenkova et al 2017
“Resource-Efficient Machine Learning in 2 KB RAM for the Internet of Things”, Kumar et al 2017
Resource-Efficient Machine Learning in 2 KB RAM for the Internet of Things
“XGBoost: A Scalable Tree Boosting System”, Chen & Guestrin 2016
“"Why Should I Trust You?": Explaining the Predictions of Any Classifier”, Ribeiro et al 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
“The MovieLens Datasets: History and Context”, Harper & Konstan 2015
“Planning As Satisfiability: Heuristics”, Rintanen 2012
“Leakage in Data Mining: Formulation, Detection, and Avoidance”, Kaufman 2011
Leakage in Data Mining: Formulation, Detection, and Avoidance:
“Random Survival Forests”, Ishwaran et al 2008
“Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, Perlich et al 2003
Tree Induction vs. Logistic Regression: A Learning-Curve Analysis
“A Survey of Methods for Scaling Up Inductive Algorithms”, Provost & Kolluri 1999
“On the Boosting Ability of Top-Down Decision Tree Learning Algorithms”, Kearns & Mansour 1999
On the Boosting Ability of Top-Down Decision Tree Learning Algorithms
“On The Effect of Data Set Size on Bias And Variance in Classification Learning”, Brain & Webb 1999
On The Effect of Data Set Size on Bias And Variance in Classification Learning
“The Effects of Training Set Size on Decision Tree Complexity”, Oates & Jensen 1997
The Effects of Training Set Size on Decision Tree Complexity
“Scaling up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, Kohavi 1996
Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid
“Stupid Data Miner Tricks: Overfitting the S&P 500”, Leinweber 1995
“The MONK’s Problems-A Performance Comparison of Different Learning Algorithms”, Thrun et al 1991
The MONK’s Problems-A Performance Comparison of Different Learning Algorithms
“Symbolic and Neural Learning Algorithms: An Experimental Comparison”, Shavlik et al 1991
Symbolic and neural learning algorithms: An experimental comparison
“A Meta-Analysis of Overfitting in Machine Learning”
“Statistical Modeling: The Two Cultures”, Breiman 2024
“How Good Are LLMs at Doing ML on an Unknown Dataset?”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
data-structure generation-ml time-series analysis modeling summarization
decision-tree
tabular-learning
Wikipedia
Miscellaneous
-
https://spectrum.ieee.org/its-too-easy-to-hide-bias-in-deeplearning-systems
-
https://www.maskaravivek.com/post/gan-synthetic-data-generation/
: -
https://www.oneusefulthing.org/p/it-is-starting-to-get-strange
-
https://www.reddit.com/r/Anki/comments/1c29775/fsrs_is_one_of_the_most_accurate_spaced/
: -
https://www.thelancet.com/journals/lanhl/article/PIIS2666-7568(23)00189-7/fulltext
Bibliography
-
https://arxiv.org/abs/2406.11233
: “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, -
https://arxiv.org/abs/2404.07544
: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, -
https://arxiv.org/abs/2402.16671
: “StructLM: Towards Building Generalist Models for Structured Knowledge Grounding”, -
https://arxiv.org/abs/2306.09222#google
: “RGD: Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization”, -
https://arxiv.org/abs/2303.06053#google
: “TSMixer: An All-MLP Architecture for Time Series Forecasting”, -
https://www.sciencedirect.com/science/article/pii/S0169207021001679
: “Forecasting With Trees”, -
https://arxiv.org/abs/2207.01848
: “TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data”, -
2022-eastwick.pdf
: “Predicting Romantic Interest during Early Relationship Development: A Preregistered Investigation Using Machine Learning”, -
https://www.sciencedirect.com/science/article/pii/S0169207021001874
: “M5 Accuracy Competition: Results, Findings, and Conclusions”, -
https://arxiv.org/abs/2112.10510
: “PFNs: Transformers Can Do Bayesian Inference”, -
https://arxiv.org/abs/1905.10843
: “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, -
2003-perlich.pdf
: “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, -
https://www.sciencedirect.com/science/article/pii/S0022000097915439
: “On the Boosting Ability of Top-Down Decision Tree Learning Algorithms”,